2022 Book IntelligentComputingOptimizati

Lecture Notes in Networks and Systems 371
Pandian Vasant
Ivan Zelinka
Gerhard-Wilhelm Weber Editors
Intelligent
Computing &
Optimization
Proceedings of the 4th International
Conference on Intelligent Computing
and Optimization 2021 (ICO2021)
Lecture Notes in Networks and Systems
Volume 371
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA; Institute of Automation, Chinese Academy
of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering,
University of Alberta, Alberta, Canada; Systems Research Institute,
Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/15179

Pandian Vasant Ivan Zelinka
• •
Gerhard-Wilhelm Weber
Editors
Intelligent Computing
& Optimization
Proceedings of the 4th International
Conference on Intelligent Computing
and Optimization 2021 (ICO2021)
123
Editors
Pandian Vasant Ivan Zelinka
Faculty of Electrical & Electronic Faculty of Electrical Engineering
Engineering and Computer Science
MERLIN Research Centre, Ton Duc VŠB TU Ostrava
Thang University Ostrava-Poruba, Czech Republic
Hồ Chí Minh City, Vietnam
Faculty of Engineering Management
Poznan University of Technology
Poznan, Poland
ISSN 2367-3370 ISSN 2367-3389 (electronic)

Lecture Notes in Networks and Systems
ISBN 978-3-030-93246-6 ISBN 978-3-030-93247-3 (eBook)
https://doi.org/10.1007/978-3-030-93247-3
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The 4th edition of the popular as well as prestigious International Conference on

Intelligent Computing and Optimization (ICO) 2021, in short ICO’2021, will be
held along an “online” platform, herewith respecting the care for everyone as
necessitated by the pandemic COVID-19. The physical conference is foreseen to be
celebrated at G Hua Hin Resort & Mall in Hua Hin, Thailand, once the COVID-19
will be jointly overcome. Indeed, the core objective of the international conference
is to bring together in the spirit of community the global research leaders, distin-
guished experts and scholars from the scientific areas of Intelligent Computing and
Optimization gathered all over the globe to share their knowledge and experiences
on the current research achievements in diverse fields, to learn from their old and
new friends and together create new research ideas and designs of collaboration.
This conference creates and provides a “golden chance” for the international
research community to interact and introduce their newest research advances,
results, innovative discoveries and inventions in the midst of their scientific col-
leagues and their friends. The proceedings book of ICO’2021 is published by the
renowned house of Springer Nature (Lecture Notes in Networks and Systems).
Almost 150 authors submitted their full papers for ICO’2021. They represent
more than 40 countries, such as Algeria, Austria, Bangladesh, Bulgaria, Canada,
China, Croatia, Cyprus, Ethiopia, India, Iran, Iraq, Japan, Jordan, Malaysia,
Mauritius, Mexico, Morocco, Nepal, Oman, Pakistan, Peru, Philippines, Poland,
Portugal, Russia, Slovenia, Spain, South Africa, Sweden, Taiwan, Thailand,
Turkey, Turkmenistan, Ukraine, United Arab Emirates, USA, UK, Vietnam and
others. This worldwide representation clearly demonstrates the growing interest
of the global research community in our conference series.
In the contemporary edition, this book of conference proceedings encloses the
original and innovative, creative and recreative scientific fields on optimization and
optimal control, renewable energy and sustainability, artificial intelligence and
operational research, economics and management, smart cities and rural planning,
meta-heuristics and big data analytics, cyber security and blockchains, IoTs and
Industry 4.0, mathematical modelling and simulation, health care and medicine.
The Organizing Committee of ICO’2021 cordially expresses its thanks to all the
v
vi Preface
authors and co-authors and all the diligent reviewers for their precious contributions
to both the conference and this book. In fact, carefully selected and high-quality
papers have been reviewed and chosen by the International Programme Committee
in order to become published in the series Lecture Notes in Networks and Systems
of Springer Nature.
ICO’2021 presents enlightening contributions for research scholars across the
planet in the research areas of innovative computing and novel optimization
techniques and with the cutting-edge methodologies and applications. This con-
ference could not have been organized without the strong support and help from the
committee members of ICO’2021. We would like to sincerely thank Prof. Elias
Munapo (North-West University, South Africa), Prof. Rustem Popa (Dunarea de
Jos University, Romania), Professor Jose Antonio Marmolejo (Universidad
Panamericana, Mexico) and Prof. Román Rodríguez-Aguilar (Universidad
Panamericana, Mexico) for their great help and support in organizing the
conference.
We also appreciate the valuable guidance and great contribution from Dr.
J. Joshua Thomas (UOW Malaysia KDU Penang University College, Malaysia),
Prof. Gerhard-Wilhelm Weber (Poznan University of Technology, Poland; Middle
East Technical University, Turkey), Prof. Mohammad Shamsul Arefin (Chittagong
University of Engineering and Technology, Bangladesh), Prof. Mohammed
Moshiul Hoque (Chittagong University of Engineering & Technology,
Bangladesh), Prof. Ivan Zelinka (VSB-TU Ostrava, Czech Republic), Prof. Ugo
Fiore (Federico II University, Italy), Dr. Mukhdeep Singh Manshahia (Punjabi
University Patiala, India), Mr. K. C. Choo (CO2 Networks, Malaysia), Prof. Karl
Andersson (Luleå University of Technology (LTU), Sweden), Prof. Tatiana
Romanova (National Academy of Sciences of Ukraine, Ukraine), Prof. Nader
Barsoum (Curtin University of Technology, Australia), Prof. Goran Klepac
(Hrvatski Telekom, Croatia), Prof. Sansanee Auephanwiriyakul (Chiang Mai
University, Thailand), Dr. Thanh Dang Trung (Thu Dau Mot University, Vietnam),
Dr. Leo Mrsic (Algebra University College, Croatia) and Dr. Shahadat Hossain
(City University, Bangladesh).
Finally, we would like to convey our utmost sincerest thanks to Prof. Dr. Janusz
Kacprzyk, Dr. Thomas Ditzinger, Dr. Holger Schaepe and Mr. Nareshkumar Mani
of SPRINGER NATURE for their wonderful help and support in publishing
ICO’2021 conference proceedings Book in Lecture Notes in Networks and
Systems.
December 2021 Pandian Vasant

Ivan Zelinka
Conference Committees ICO’2021
Steering Committee
Elias Munapo North-West University, South Africa

Jose Antonio Marmolejo Panamerican University, Mexico
Joshua Thomas UOW Malaysia, KDU Penang University
College, Malaysia
General Chair
Pandian Vasant MERLIN Research Centre, TDTU, Vietnam
Honorary Chairs
Gerhard W. Weber Poznan University of Technology, Poland
Rustem Popa Dunarea De Jos University, Romania
Leo Mrsic Algebra University College, Croatia
Ivan Zelinka Technical University of Ostrava, Czech Republic
Roman Rodriguez-Aguilar Panamerican University, Mexico
TPC Chairs
Joshua Thomas KDU Penang University College, Malaysia
Special Sessions Chairs

Mohammad Shamsul Arefin CUET, Bangladesh
Mukhdeep Singh Manshahia Punjabi University Patiala, India
vii
viii Conference Committees ICO’2021
Keynote Chairs and Panel Chairs

Ugo Fiore Federico II University, Italy
Mariusz Drabecki Warsaw University of Technology, Poland
Publicity and Social Media Chairs

Anirban Banik National Institute of Technology Agartala, India
Kwok Tai Chui Hong Kong Metropolitan University, Hong Kong
Workshops and Tutorials Chairs

Mohammed Moshiul Hoque CUET, Bangladesh
Posters and Demos Chairs

Roberto Alonso San Pablo CEU University, Spain
González-Lezcano
Iztok Fister University of Maribor, Slovenia
Sponsorship and Exhibition Chairs

K. C. Choo CO2 Networks, Malaysia
Igor Litvinchev Nuevo Leon State University, Mexico
Publications Chairs
Rustem Popa Dunarea De Jos University, Romania
Ugo Fiore Federico II University, Italy
Webinar Coordinator
Joshua Thomas UOW Malaysia, KDU Penang University
College, Malaysia
Web Editor
K. C. Choo CO2 Networks, Malaysia
Conference Committees ICO’2021 ix
Reviewers
The volume editors of LNNS Springer Nature of ICO’2021 would like to sincerely
thank the following reviewers for their outstanding work in reviewing all the papers
for ICO’2021 conference proceedings via Easychair (https://www.icico.info/).
Aditya Singh Lovely Professional University, India
Ahed Abugabah Zayed University, United Arab Emirates
Ahmad Al Smadi Xidian University, China
Anton Abdulbasah Kamil Istanbul Gelisim University, Turkey
Azazkhan Ibrahimkhan Sardar Vallabhbhai National Institute
Pathan of Technology, India
Dang Trung Thanh Thu Dau Mot University, Vietnam
Danijel Kučak Algebra University College, Croatia
Elias Munapo North-West University
F. Hooshmand Amirkabir University of Technology, Iran
Igor Litvinchev Universidad Autónoma de Nuevo León, Mexico
Jaramporn Hassamontr King Mongkut’s University of Technology North
Bangkok, Thailand
Jean Baptiste Bernard Jiangsu University, China
Pea-Assounga
Jonnel Alejandrino De La Salle University, Philippines
Jose Antonio Marmolejo Universidad Panamericana, Mexico
Mingli Song Zhejiang University, China
Mohammed Boukendil LMFE, Morocco
Morikazu Nakamura University of the Ryukyus, Japan
Mukhdeep Singh Manshahia Punjabi University Patiala, India
Prithwish Sen IIIT Guwahati, India
Roman Rodriguez-Aguilar Universidad Panamericana, Mexico
Ronnie Concepcion II De La Salle University
Rustem Popa Dunarea de Jos University, Romania
Shahadat Hossain City University, Bangladesh
Sinan Melih Nigdeli Istanbul University, Turkey
Stefan Ivanov Technical University of Gabrovo, Bulgaria
Telmo Matos CIICESI, Portugal
Thanh Hung Bui Thu Dau Mot University, Vietnam
Ugo Fiore University of Naples Parthenope, Italy
Vedran Juričić University of Zagreb, Croatia
Contents
Sustainable Artificial Intelligence Applications

Low-Light Image Enhancement with Artificial Bee Colony Method . . . . 3
Anan Banharnsakun
Optimal State-Feedback Controller Design for Tractor Active
Suspension System via Lévy-Flight Intensified Current
Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Thitipong Niyomsat, Wattanawong Romsai, Auttarat Nawikavatan,
and Deacha Puangdownreong
The Artificial Intelligence Platform with the Use of DNN to Detect
Flames: A Case of Acoustic Extinguisher . . . . . . . . . . . . . . . . . . . . . . . . 24
Stefan Ivanov and Stanko Stankov
Adaptive Harmony Search for Cost Optimization of Reinforced
Concrete Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Aylin Ece Kayabekir, Sinan Melih Nigdeli, and Gebrail Bekdaş
Efficient Traffic Signs Recognition Based on CNN Model
for Self-Driving Cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Said Gadri and Nour ElHouda Adouane
Optimisation and Prediction of Glucose Production from Oil Palm
Trunk via Simultaneous Enzymatic Hydrolysis . . . . . . . . . . . . . . . . . . . 55
Chan Mieow Kee, Wang Chan Chin, Tee Hoe Chun,
and Nurul Adela Bukhari
Synthetic Data Augmentation of Cycling Sport Training Datasets . . . . . 65
Iztok Fister, Grega Vrbančič, Vili Podgorelec, and Iztok Fister Jr.
Hybrid Pooling Based Convolutional Neural Network
for Multi-class Classification of MR Brain Tumor Images . . . . . . . . . . . 75
Gazi Jannatul Ferdous, Khaleda Akhter Sathi, and Md. Azad Hossain
xi
xii Contents
Importance of Fuzzy Logic in Traffic

and Transportation Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Aditya Singh
A Fuzzy Based Clustering Approach to Prolong the Network
Lifetime in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Enaam A. Al-Hussain and Ghaida A. Al-Suhail
Visual Expression Analysis from Face Images Using
Morphological Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Md. Habibur Rahman, Israt Jahan, and Yeasmin Ara Akter
Detection of Invertebrate Virus Carriers Using Deep Learning
Networks to Prevent Emerging Pandemic-Prone Disease
in Tropical Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Daeniel Song Tze Hai, J. Joshua Thomas, Justtina Anantha Jothi,
and Rasslenda-Rass Rasalingam
Classification and Detection of Plant Leaf Diseases Using Various
Deep Learning Techniques and Convolutional Neural Network . . . . . . . 132
Partha P. Mazumder, Monuar Hossain, and Md Hasnat Riaz
Deep Learning and Machine Learning Applications

Distributed Self-triggered Optimization for Multi-agent Systems . . . . . . 145
Komal Mehmood and Maryam Mehmood
Automatic Categorization of News Articles and Headlines Using
Multi-layer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Fatima Jahara, Omar Sharif, and Mohammed Moshiul Hoque
Using Machine Learning Techniques for Estimating the Electrical
Power of a New-Style of Savonius Rotor: A Comparative Study . . . . . . 167
Youssef Kassem, Hüseyin Çamur, Gokhan Burge,
Adivhaho Frene Netshimbupfe, Elhamam A. M. Sharfi, Binnur Demir,
and Ahmed Muayad Rashid Al-Ani
Tree-Like Branching Network for Multi-class Classification . . . . . . . . . 175
Mengqi Xue, Jie Song, Li Sun, and Mingli Song
Multi-resolution Dense Residual Networks
with High-Modularization for Monocular Depth Estimation . . . . . . . . . 185
Din Yuen Chan, Chien-I Chang, Pei Hung Wu, and Chung Ching Chiang
A Decentralized Federated Learning Paradigm for Semantic
Segmentation of Geospatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Yash Khasgiwala, Dion Trevor Castellino, and Sujata Deshmukh
Contents xiii
Development of Contact Angle Prediction for Cellulosic Membrane . . . 207

Ahmad Azharuddin Azhari bin Mohd Amiruddin, Mieow Kee Chan,
and Sokchoo Ng
Feature Engineering Based Credit Card Fraud Detection for Risk
Minimization in E-Commerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Md. Moinul Islam, Rony Chowdhury Ripan, Saralya Roy, and Fazle Rahat
DCNN-LSTM Based Audio Classification Combining Multiple
Feature Engineering and Data Augmentation Techniques . . . . . . . . . . . 227
Md. Moinul Islam, Monjurul Haque, Saiful Islam, Md. Zesun Ahmed Mia,
and S. M. A. Mohaiminur Rahman
Sentiment Analysis: Developing an Efficient Model Based
on Machine Learning and Deep Learning Approaches . . . . . . . . . . . . . 237
Said Gadri, Safia Chabira, Sara Ould Mehieddine,
and Khadidja Herizi
Improved Face Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Ratna Chakma, Juel Sikder, and Utpol Kanti Das
Paddy Price Prediction in the South-Western Region of Bangladesh . . . 258
Juliet Polok Sarkar, M. Raihan, Avijit Biswas, Khandkar Asif Hossain,
Keya Sarder, Nilanjana Majumder, Suriya Sultana, and Kajal Sana
Paddy Disease Prediction Using Convolutional Neural Network . . . . . . 268
Khandkar Asif Hossain, M. Raihan, Avijit Biswas, Juliet Polok Sarkar,
Suriya Sultana, Kajal Sana, Keya Sarder, and Nilanjana Majumder
Android Malware Detection System: A Machine Learning and Deep
Learning Based Multilayered Approach . . . . . . . . . . . . . . . . . . . . . . . . . 277
Md Shariar Hossain and Md Hasnat Riaz
IOTs, Big Data, Block Chain and Health Care

Blockchain as a Secure and Reliable Technology in Business
and Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Vedran Juričić, Danijel Kučak, and Goran Đambić
iMedMS: An IoT Based Intelligent Medication Monitoring System
for Elderly Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Khalid Ibn Zinnah Apu, Mohammed Moshiul Hoque, and Iqbal H. Sarker
Internet Banking and Bank Investment Decision: Mediating Role
of Customer Satisfaction and Employee Satisfaction . . . . . . . . . . . . . . . 314
Jean Baptiste Bernard Pea-Assounga and Mengyun Wu
xiv Contents
Inductions of Usernames’ Strengths in Reducing Invasions

on Social Networking Sites (SNSs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Md. Mahmudur Rahman, Shahadat Hossain, Mimun Barid,
and Md. Manzurul Hasan
Tomato Leaf Disease Recognition Using Depthwise Separable
Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Syed Md. Minhaz Hossain, Khaleque Md. Aashiq Kamal, Anik Sen,
and Kaushik Deb
End-to-End Scene Text Recognition System for Devanagari
and Bengali Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
Prithwish Sen, Anindita Das, and Nilkanta Sahu
A Deep Convolutional Neural Network Based Classification Approach
for Sleep Scoring of NFLE Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Sarker Safat Mahmud, Md. Rakibul Islam Prince, Md. Shamim,
and Sarker Shahriar Mahmud
Remote Fraud and Leakage Detection System Based on LPWAN
System for Flow Notification and Advanced Visualization
in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Dario Protulipac, Goran Djambic, and Leo Mršić
An Analysis of AUGMECON2 Method on Social Distance-Based
Layout Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Şeyda Şimşek, Eren Özceylan, and Neşe Yalçın
An Intelligent Information System and Application
for the Diagnosis and Analysis of COVID-19 . . . . . . . . . . . . . . . . . . . . . 391
Atif Mehmood, Ahed Abugabah, Ahmad A. L. Smadi,
and Reyad Alkhawaldeh
Hand Gesture Recognition Based Human Computer Interaction
to Control Multiple Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Sanzida Islam, Abdul Matin, and Hafsa Binte Kibria
Towards Energy Savings in Cluster-Based Routing for Wireless
Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Enaam A. Al-Hussain and Ghaida A. Al-Suhail
Utilization of Self-organizing Maps for Map Depiction
of Multipath Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Jonnel Alejandrino, Emmanuel Trinidad, Ronnie Concepcion II,
Edwin Sybingco, Maria Gemel Palconit, Lawrence Materum,
and Elmer Dadios
Contents xv
Big Data for Smart Cities and Smart Villages: A Review . . . . . . . . . . . 427
Tajnim Jahan, Sumayea Benta Hasan, Nuren Nafisa,
Afsana Akther Chowdhury, Raihan Uddin,
and Mohammad Shamsul Arefin
A Compact Radix-Trie: A Character-Cell Compressed Trie
Data-Structure for Word-Lookup System . . . . . . . . . . . . . . . . . . . . . . . 440
Rahat Yeasin Emon and Sharmistha Chanda Tista
Digital Twins and Blockchain: Empowering the Supply Chain . . . . . . . 450
Jose Eduardo Aguilar-Ramirez, Jose Antonio Marmolejo-Saucedo,
and Roman Rodriguez-Aguilar
Detection of Malaria Disease Using Image Processing
and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Md. Maruf Hasan, Sabiha Islam, Ashim Dey, Annesha Das,
and Sharmistha Chanda Tista
Fake News Detection of COVID-19 Using Machine Learning
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
Promila Ghosh, M. Raihan, Md. Mehedi Hassan, Laboni Akter,
Sadika Zaman, and Md. Abdul Awal
Sustainable Modelling, Computing and Optimization

1D HEC-RAS Modeling Using DEM Extracted River Geometry -
A Case of Purna River; Navsari City; Gujarat, India . . . . . . . . . . . . . . 479
Azazkhan Ibrahimkhan Pathan, P. G. Agnihotri, D. Kalyan,
Daryosh Frozan, Muqadar Salihi, Shabir Ahmad Zareer, D. P. Patel,
M. Arshad, and S. Joseph
A Scatter Search Algorithm for the Uncapacitated Facility
Location Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
Telmo Matos
An Effective Dual-RAMP Algorithm for the Capacitated Facility
Location Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Telmo Matos
Comparative Study of Blood Flow Through Normal, Stenosis
Affected and Bypass Grafted Artery Using Computational
Fluid Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
Anirban Banik, Tarun Kanti Bandyopadhyay, and Vladimir Panchenko
Transportation Based Approach for Solving the Generalized
Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
Elias Munapo
xvi Contents
Generalized Optimization: A First Step Towards Category

Theoretic Learning Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Dan Shiebler
Analysis of Non-linear Structural Systems via Hybrid Algorithms . . . . . 536
Sinan Melih Nigdeli, Gebrail Bekdaş, Melda Yücel, Aylin Ece Kayabekir,
and Yusuf Cengiz Toklu
Ising Model Formulation for Job-Shop Scheduling Problems Based
on Colored Timed Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Kohei Kaneshima and Morikazu Nakamura
Imbalanced Sample Generation and Evaluation for Power System
Transient Stability Using CTGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Gengshi Han, Shunyu Liu, Kaixuan Chen, Na Yu, Zunlei Feng,
and Mingli Song
Efficient DC Algorithm for the Index-Tracking Problem . . . . . . . . . . . . 566
F. Hooshmand and S. A. MirHassani
Modelling External Debt Using VECM and GARCH Models . . . . . . . . 577
Naledi Blessing Mokoena, Johannes Tshepiso Tsoku, and Martin Chanza
Optimization of Truss Structures with Sizing of Bars by Using
Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Melda Yücel, Gebrail Bekdaş, and Sinan Melih Nigdeli
Information Extraction from Receipts Using Spectral Graph
Convolutional Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Bui Thanh Hung
An Improved Shuffled Frog Leaping Algorithm with Rotating
and Position Sequencing in 2-Dimension Shapes
for Discrete Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
Kanchana Daoden
Lean Procurement in an ERP Cloud Base . . . . . . . . . . . . . . . . . . . . . . . 623
Adrian Chin-Hernandez, Jose Antonio Marmolejo-Saucedo,
and Jania Saucedo-Martinez
An Approximate Solution Proposal to the Vehicle Routing Problem
Through Simulation-Optimization Approach . . . . . . . . . . . . . . . . . . . . . 634
Jose Antonio Marmolejo-Saucedo and Armando Calderon Osornio
Hybrid Connectionist Models to Investigate the Effects
on Petrophysical Variables for Permeability Prediction . . . . . . . . . . . . . 647
Mohammad Islam Miah and Mohammed Adnan Noor Abir
Contents xvii
Sustainable Environmental, Social and Economics Development

Application of Combined SWOT and AHP Analysis to Assess
the Reality and Select the Priority Factors for Social and Economic
Development (a Case Study for Soc Trang City) . . . . . . . . . . . . . . . . . . 659
Dang Trung Thanh and Nguyen Huynh Anh Tuyet
Design and Analysis of Water Distribution Network Using Epanet
2.0 and Loop 4.0 – A Case Study of Narangi Village . . . . . . . . . . . . . . . 671
Usman Mohseni, Azazkhan I. Pathan, P. G. Agnihotri, Nilesh Patidar,
Shabir Ahmad Zareer, D. Kalyan, V. Saran, Dhruvesh Patel,
and Cristina Prieto
Effect of Climate Change on Sea Level Rise with Special Reference
to Indian Coastline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
Dummu Kalyan, Azazkhan Ibrahimkhan Pathan, P. G. Agnihotri,
Mohammad Yasin Azimi, Daryosh Frozan, Joseph Sebastian,
Usman Mohseni, Dhruvesh Patel, and Cristina Prieto
Design and Analysis of Water Distribution Network Using
Watergems – A Case Study of Narangi Village . . . . . . . . . . . . . . . . . . . 695
Usman Mohseni, Azazkhan I. Pathan, P. G. Agnihotri, Nilesh Patidar,
Shabir Ahmad Zareer, V. Saran, and Vaishali Rana
Weight of Factors Affecting Sustainable Urban Agriculture
Development (Case Study in Thu Dau Mot Smart City) . . . . . . . . . . . . 707
Trung Thanh Dang, Quang Minh Vo, and Thanh Vu Pham
Factors Behind the World Crime Index: Some Parametric
Observations Using DBSCAN and Linear Regression . . . . . . . . . . . . . . 718
Shahadat Hossain, Md. Manzurul Hasan, Md. Mahmudur Rahman,
and Mimun Barid
Object Detection in Foggy Weather Conditions . . . . . . . . . . . . . . . . . . . 728
Prithwish Sen, Anindita Das, and Nilkanta Sahu
Analysis and Evaluation of TripAdvisor Data: A Case
of Pokhara, Nepal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
Tan Wenan, Deepanjal Shrestha, Bijay Gaudel, Neesha Rajkarnikar,
and Seung Ryul Jeong
Simulation of the Heat and Mass Transfer Occurring During
Convective Drying of Mango Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
Ripa Muhury, Ferdusee Akter, and Ujjwal Kumar Deb
A Literature Review on the MPPT Techniques Applied in Wind
Energy Harvesting System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762
Tigilu Mitiku and Mukhdeep Singh Manshahia
xviii Contents
Developing a System to Analyze Comments of Social Media

and Identify Friends Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Tasfia Hyder, Rezaul Karim, and Mohammad Shamsul Arefin
Comparison of Watershed Delineation and Drainage Network
Using ASTER and CARTOSAT DEM of Surat City, Gujarat . . . . . . . . 788
Arbaaz A. Shaikh, Azazkhan I. Pathan, Sahita I. Waikhom,
and Praveen Rathod
Numerical Investigation of Natural Convection Combined with
Surface Radiation in a Divided Cavity Containing Air and Water . . . . 801
Zouhair Charqui, Lahcen El Moutaouakil, Mohammed Boukendil,
Rachid Hidki, and Zaki Zrikem
Key Factors in the Successful Integration of the Circular Economy
Approach in the Industry of Non-durable Goods:
A Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
Marcos Jacinto-Cruz, Román Rodríguez-Aguilar,
and Jose-Antonio Marmolejo-Saucedo
Profile of the Business Science Professional for the Industry 4.0 . . . . . . 820
Antonia Paola Salgado-Reyes and Roman Rodríguez-Aguilar
Rainfall-Runoff Simulation and Storm Water Management Model
for SVNIT Campus Using EPA SWMM 5.1 . . . . . . . . . . . . . . . . . . . . . 832
Nitin Singh Kachhawa, Prasit Girish Agnihotri,
and Azazkhan Ibrahimkhan Pathan
Emerging Smart Technology Applications

Evaluation and Customized Support of Dynamic Query Form
Through Web Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
B. Bazeer Ahamed and Murugan Krishnamurthy
Enhancing Student Learning Productivity with Gamification-Based
E-learning Platform: Empirical Study and Best Practices . . . . . . . . . . . 857
Danijel Kučak, Adriana Biuk, and Leo Mršić
Development of Distributed Data Acquisition System . . . . . . . . . . . . . . . 867
Bertram Losper, Vipin Balyan, and B. Groenewald
Images Within Images? A Multi-image Paradigm with Novel
Key-Value Graph Oriented Steganography . . . . . . . . . . . . . . . . . . . . . . 879
Subhrangshu Adhikary
Application of Queuing Theory to Analyse an ATM
Queuing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Kolentino N. Mpeta and Otsile R. Selaotswe
Contents xix
A Novel Prevention Technique Using Deep Analysis Intruder

Tracing with a Bottom-Up Approach Against Flood Attacks
in VoIP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Sheeba Armoogum and Nawaz Mohamudally
Data Mining for Software Engineering: A Survey . . . . . . . . . . . . . . . . . 905
Maisha Maimuna, Nafiza Rahman, Razu Ahmed,
Simulation of Load Absorption and Deflection of Helical Suspension
Spring: A Case of Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . 917
Rajib Karmaker, Shipan Chandra Deb Nath, and Ujjwal Kumar Deb
Prediction of Glucose Concentration Hydrolysed from Oil Palm
Trunks Using a PLSR-Based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
Wan Sieng Yeo, Mieow Kee Chan, and Nurul Adela Bukhari
Ontology of Lithography-Based Processes in Additive
Manufacturing with Focus on Ceramic Materials . . . . . . . . . . . . . . . . . 938
Marc Gmeiner, Wilfried Lepuschitz, Munir Merdan,
and Maximilian Lackner
Natural Convection and Surface Radiation in an Inclined Square
Cavity with Two Heat-Generating Blocks . . . . . . . . . . . . . . . . . . . . . . . 948
Rachid Hidki, Lahcen El Moutaouakil, Mohammed Boukendil,
Zouhair Charqui, and Abdelhalim Abdelbaki
Improving the Route Selection for Geographic Routing Using
Fuzzy-Logic in VANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958
Israa A. Aljabry and Ghaida A. Al-Suhail
Trends and Techniques of Biomedical Text Mining: A Review . . . . . . . 968
Maliha Rashida, Fariha Iffath, Rezaul Karim,
Electric Vehicles as Distributed Micro Generation Using Smart
Grid for Decision Making: Brief Literature Review . . . . . . . . . . . . . . . . 981
Julieta Sanchez-García, Román Rodríguez-Aguilar,
and Jose Antonio Marmolejo-Saucedo
A Secured Network Layer and Information Security for Financial
Institutions: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
Md Rahat Ibne Sattar, Shrabonti Mitra, Sadia Sultana,
Umme Salma Pushpa, Dhruba Bhattacharjee, Abhijit Pathak,
and Mayeen Uddin Khandaker
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003

Editors
Dr. Pandian Vasant

MERLIN Research Centre, TDTU, Vietnam
E-mail: eic.ijeoe@gmail.com
Pandian Vasant is Research Associate at MERLIN Research Centre, Vietnam, and
Editor in Chief of International Journal of Energy Optimization and Engineering
(IJEOE). He holds PhD in Computational Intelligence (UNEM, Costa Rica), MSc
(University Malaysia Sabah, Malaysia, Engineering Mathematics) and BSc (Hons,
Second Class Upper) in Mathematics (University of Malaya, Malaysia). His
research interests include soft computing, hybrid optimization, innovative com-
puting and applications. He has co-authored research articles in journals, conference
proceedings, presentations, special issues Guest Editor, chapters (300 publications
indexed in Research-Gate) and General Chair of EAI International Conference on
Computer Science and Engineering in Penang, Malaysia (2016) and Bangkok,
Thailand (2018). In the years 2009 and 2015, he was awarded top reviewer and
outstanding reviewer for the journal Applied Soft Computing (Elsevier). He has 30
years of working experiences at the universities. Currently, he is General Chair of
International Conference on Intelligent Computing and Optimization (https://www.
icico.info/) and Member of AMS (USA), NAVY Research Group (TUO, Czech
Republic) and MERLIN Research Centre (TDTU, Vietnam). H-Index Google
Scholar = 34; i-10-index = 143.
Professor Ivan Zelinka

Technical University of Ostrava (VSB-TU), Faculty of Electrical Engineering and
Computer Science, Czech Republic
Email: zelinkaivan65@gmail.com
Ivan Zelinka is currently working at the Technical University of Ostrava
(VSB-TU), Faculty of Electrical Engineering and Computer Science. He graduated
consequently at Technical University in Brno (1995—MSc.), UTB in Zlin (2001—
PhD) and again at Technical University in Brno (2004—Assoc. Prof.) and VSB-TU
(2010—Professor). Before his academic career, he was employed like TELECOM
xxi
xxii Editors
technician, computer specialist (HW+SW) and commercial bank (computer and

LAN supervisor). During his career at UTB, he proposed and opened seven dif-
ferent lectures. He also has been invited for lectures at numerous universities in
different EU countries plus the role of the keynote speaker at the Global Conference
on Power, Control and Optimization in Bali, Indonesia (2009), Interdisciplinary
Symposium on Complex Systems (2011), Halkidiki, Greece, and IWCFTA 2012,
Dalian China. The field of his expertise is mainly on unconventional algorithms and
cybersecurity. He is and was responsible Supervisor of three grants of fundamental
research of Czech grant agency GAČR, Co-supervisor of grant FRVŠ—laboratory
of parallel computing. He was also working on numerous grants and two EU project
like Member of the team (FP5—RESTORM), Supervisor (FP7—PROMOEVO)
of the Czech team and Supervisor of international research (founded by TACR
agency) focused on the security of mobile devices (Czech—Vietnam). Currently, he
is Professor at the Department of Computer Science and in total, and he has been
Supervisor of more than 40 MSc. and 25 Bc. diploma thesis. He is also Supervisor
of doctoral students including students from abroad. He was awarded by Siemens
Award for his PhD thesis, as well as by journal Software news for his book about
artificial intelligence. He is Member of British Computer Society, Editor in Chief of
Springer book series: Emergence, Complexity and Computation (http://www.
springer.com/series/10624), Editorial board of Saint Petersburg State University
Studies in Mathematics, a few international programme committees of various
conferences and international journals. He is Author of journal articles as well as of
books in the Czech and English language and one of three founders of TC IEEE on
big data http://ieeesmc.org/about-smcs/history/2014-archives/44-about-smcs/
history/2014/technical-committees/204-big-data-computing/. He is also head of
research group NAVY http://navy.cs.vsb.cz.
Professor Gerhard-Wilhelm Weber

Poznan University of Technology, Poznan, Poland
Email: gerhard-wilhelm.weber@put.poznan.pl
G.-W. Weber is Professor at Poznan University of Technology, Poznan, Poland, at
Faculty of Engineering Management, and Chair of Marketing and Economic
Engineering. His research is on OR, financial mathematics, optimization and con-
trol, neuro- and bio-sciences, data mining, education and development; he is
involved in the organization of scientific life internationally. He received his
Diploma and Doctorate in mathematics and economics/business administration, at
RWTH Aachen, and his Habilitation at TU Darmstadt. He held professorships by
proxy at the University of Cologne, and TU Chemnitz, Germany. At IAM, METU,
Ankara, Turkey, he was Professor in the programmes of Financial Mathematics and
Scientific Computing and Assistant to Director, and he has been Member of further
graduate schools, institutes, and departments of METU. Further, he has affiliations
at the universities of Siegen, Ballarat, Aveiro, North Sumatra, and Malaysia
University of Technology, and he is “Advisor to EURO Conferences”.
Editors xxiii
Professor Elias Munapo

North West University, South Africa
Email: Elias.Munapo@nwu.ac.za
Elias Munapo is Professor of Operations Research, and he holds a BSc. (Hons) in
Applied Mathematics (1997), MSc. in Operations Research (2002) and a PhD in
Applied Mathematics (2010). All these qualifications are from the National
University of Science and Technology (N.U.S.T.) in Zimbabwe. In addition, he has
a certificate in outcomes-based assessment in higher education and open distance
learning, from the University of South Africa (UNISA) and another certificate in
University Education Induction Programme from the University of KwaZulu-Natal
(UKZN). He is Professional Natural Scientist certified by the South African Council
for Natural Scientific Professions (SACNASP) in 2012. He has vast experience in
university education and has worked for five (5) institutions of higher learning. The
institutions are Zimbabwe Open University (ZOU), Chinhoyi University of
Technology (CUT), University of South Africa (UNISA), University of
KwaZulu-Natal (UKZN) and North-West University (NWU). He is currently
Professor at NWU. In addition to teaching, Professor Munapo was in charge of
research activities in the faculty and had the chance to manage over 100 doctoral
students, over 800 master’s students. He has successfully supervised/co-supervised
ten doctoral students and over 20 master’s students to completion. He has published
over 100 research articles. Of these publications, one is a book, several are book
chapters and conference proceedings, and the majority are journal articles. In
addition, he has been awarded the North-West University Institutional Research
Excellence Award (IREA) thrice, is Editor of a couple of journals, has edited
several books and is Reviewer of a number of journals. He is Member of the
Operations Research Society of South Africa (was ORSSA—Executive Committee
Member in 2012 and 2013), South African Council for Natural Scientific
Professions (SACNASP) as Certified Natural Scientist, European Conference on
Operational Research (EURO) and the International Federation of Operations
Research Societies (IFORS). In addition, he is Member of the organizing committee
for ICO conference held every year.
Professor Jose Antonio Marmolejo

Panamerican University, Mexico
Email: jmarmolejo@up.edu.mx
Professor Jose Antonio Marmolejo is Professor at Panamerican University,
Mexico. His research is on operations research, large-scale optimization techniques,
computational techniques and analytical methods for planning, operations and
control of electric energy and logistic systems. He received his Doctorate in
Operations Research (Hons) at National Autonomous University of Mexico. At
present, He has the third highest country-wide distinction granted by the Mexican
National System of Research Scientists for scientific merit (SNI Fellow, Level 1).
xxiv Editors
He is Member of the Network for Decision Support and Intelligent Optimization of

Complex and Large Scale Systems and Mexican Society for Operations Research.
He has co-authored research articles in science citation index journals, conference
proceedings, presentations and chapters.
Sustainable Artificial Intelligence
Applications
Low-Light Image Enhancement with Artificial
Bee Colony Method
Anan Banharnsakun(&)
Computational Intelligence Research Laboratory (CIRLab), Computer

Engineering Department, Faculty of Engineering at Sriracha, Kasetsart
University Sriracha Campus, Chonburi 20230, Thailand
ananb@ieee.org
Abstract. Images taken in low-light environments tend to show incomplete

detail because most of the information is masked in low-visibility areas, which is
the major part that deteriorates the image quality. Improving the clarity of the
image to reveal the complete detail in the image remains a challenging task for
researchers. Moreover, a good quality image is essential for image processing
tasks in various fields such as medical imaging, remote sensing, and computer
vision applications. To improve the visual quality of low-light images, an
effective image enhancement technique based on optimizing gamma correction
by using the artificial bee colony algorithm is proposed in this work. Quality
evaluation methods for enhanced images obtained from the proposed technique
are detailed, and comparisons of the proposed technique with other recent
techniques are presented. Experiments show that the effectiveness of the pro-
posed technique can be serving as an alternative method for low light image
enhancement.
Keywords: Artificial Bee Colony (ABC) Image contrast enhancement

Gamma correction Entropy Grayscale image
1 Introduction
Since contrast is one of many factors that are important in determining image quality,
enhancement of image contrast is thus one of the most important processes in image
processing, which is used in many fields of science and engineering, such as medical
image analysis for therapeutic, aerial image processing, remote sensing, and computer
vision applications [1]. The good quality of an image resulting from proper contrast
helps humans to better perceive and understand the image and also makes it easier to
take advantage of the image in other automated image processing tasks. However, it is
often found that the resulting image has too low or too high contrast when the image is
captured from an under-lit or overexposed environment. Thus, enhancing image con-
trast has been a much attractive and challenging task for researchers in recent times.
Over the past decades, many methods have been proposed, such as gray trans-
formation methods, histogram equalization methods, and frequency-domain methods
[2].Image enhancement method using adaptive sigmoid transfer function was proposed
by Srinivas and Bhandari [3] to preserve the naturalness and bright region information
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022

P. Vasant et al. (Eds.): ICO 2021, LNNS 371, pp. 3–13, 2022.
https://doi.org/10.1007/978-3-030-93247-3_1
4 A. Banharnsakun
effectively with minimum distortions of low light images. To improve the overall visual
quality of images, the contrast enhancement approach using texture regions-based
histogram equalization was introduced by Singh et al. [4]. Their idea is based on
suppressing the impact of pixels in non-textured areas and exploiting texture features
for the computation of histogram in the process of histogram equalization. A contrast
enhancement method based on dual-tree complex wavelet transform was presented by
Jung et al. [5]. A logarithmic function was employed in their work for global brightness
enhancement based on the nonlinear response of human vision to luminance and the
local contrast was enhanced by contrast limited adaptive histogram equalization in low-
pass sub-bands to make image structure clearer.
Although the existing algorithms can effectively enhance low-light images and
achieve good results, they all have certain disadvantages [6], such as loss of detail,
color distortion, or high computational complexity, as well as they cannot guarantee the
performance of a vision system in a low-light environment. Thus, developing an
effective method for low-light image enhancement still remains a challenge.
Over the past two decades, biology-inspired algorithms in previous research have
shown great potential to deal with many problems in science and engineering [7, 8].
Particularly, there are a number of techniques that use biologically inspired algorithms
to deal with problems in the image enhancement domain [9]. A contrast enhancement
method based on genetic algorithm (GA) was proposed by Hashemi et al. [10]. Their
proposed method is based on using a simple chromosome structure and genetic
operators to increase the visible details and contrast of low illumination images. To
increase the information content and enhance the details of an image, swarm
intelligence-based particle swarm optimization (PSO) was employed by Kanmani and
Narsimhan [11] in order to estimate an optimal gamma value in the gamma correction
approach. A bat algorithm for optimizing the control parameters of contrast stretching
transformation was introduced by Asokan et al. [12] to preserve the brightness levels in
the satellite image processing. However, no specific technique satisfies all the need for
image enhancement, so the need to create different algorithmic approaches continues to
motivate researchers to propose new algorithms to achieve solutions more efficient to
enhance the quality of the image.
The Artificial Bee Colony (ABC) method proposed by Karaboga [13] is one of
many popular methods used to find optimal solutions to numerical optimization
problems [14]. The ABC method mimics the natural process of obtaining good quality
food for bees. Previous research [15–17] has shown that the ABC method can be used
to find the optimal solutions to a wide range of optimization problems with more
efficiency and effectiveness as compared to other methods. In this work, we consider
the image enhancement as an optimization problem and solved using the ABC method.
The contribution of this work is to show that the ABC method, a simple and efficient
method based on imitation from bee foraging in nature, can apply and serve as a useful
method in the image enhancement domain.
The remainder of the paper is organized as follows. The background and knowl-
edge, including image contrast and the artificial bee colony algorithm, are briefly
described in Sect. 2. Enhancement of image contrast by using ABC is proposed in
Sect. 3. Experimental settings and results are presented and discussed in Sect. 4.
Finally, Sect. 5 concludes this paper.
Low-Light Image Enhancement with Artificial Bee Colony Method 5
2 Image Contrast
Contrast [18] is the difference in luminance (darkest and lightest) or color presented in
an image that makes it possible to distinguish elements in the image within the same
field of view. An image with the proper contrast allows all the details contained in the
image to be seen. The example of an image in a low contrast compared with high
contrast can be illustrated in Fig. 1.
Fig. 1. A low contrast (left) and high contrast (right) image.
Let G be the gray level that specifies the size of the co-occurrence matrices and the
example of co-occurrence matrix Md,h (i,j) obtained from spatial distance parameter
d = 1; then, angles h = 0° can be constructed, as shown in Fig. 2.
Let md,h (i,j) be the number of occurrences of gray values, with i and j being
neighbors with the distance d in the direction of h; thus, the probability distributions
(Pij) of an entry md,h (i,j) in Md,h (i,j) can be expressed by Eq. (1).
md;h ði; jÞ
Pij ¼ G1 G1 ð1Þ
P P
md;h ði; jÞ
i¼0 j¼0
Contrast is a variance of the gray level determined by measuring the intensity

between a pixel and its neighbor over the whole image. It can be calculated by Eq. (2).
XG1 X
G1
Contrast ¼ i¼0
ði jÞ2 Pij ð2Þ
j¼0
6 A. Banharnsakun
0 1 2 3 4 5 6 7
2 1 0 3 3 0 0 0 0 0 1 0 0 0 0
5 4 2 4 2 1 1 1 1 0 0 3 0 0 0
7 3 2 1 4 7 2 0 3 1 0 1 0 0 0
1 4 6 3 2 2 3 1 0 2 2 0 1 1 0
5 1 1 4 3 6 4 0 0 2 2 0 0 1 1
6 4 3 3 5 4 5 0 1 0 0 2 0 0 0
6 0 0 0 1 1 0 0 0
image size 6×6 7 0 0 0 1 0 0 0 0
Co-occurrence matrix M1,0 (i,j) size 8×8
Fig. 2. Example of co-occurrence matrix construction [19], Left: Matrix representation of

grayscale image size 6 6 with gray levels 0 to 7 (G = 8), Right: Co-occurrence matrix M1,0 (i,
j), size 8 8
The classical methods used to enhance the image contrast based on the histogram
equalization technique have been proposed in previous literature. However, they are
not providing satisfying results for images that suffer from gamma distortion [20].
Gamma correction [21], one of the histogram modification techniques, is considered as
an appropriate method to solve the gamma distortion issue. The transformed gamma
correction (TGC) of an image is calculated by Eq. (3).
Iin
TGC ¼ Imax ð Þc ð3Þ
Imax
where Iin is an actual intensity value of the input image and Imax is the maximum
intensity value of the input image.
The intensity value of each pixel is transformed using Eq. (3) by substituting the
gamma value (c). The gamma value thus ranges between 0 and infinity. A gamma
value of 1 means that the resulting output image is the same as the input image. If the
gamma value is less than 1.0, the image will be brighter and a gamma value greater
than 1.0 will darken the image. However, using fixed gamma values for different types
of images will show the same change in intensity. Therefore, it is necessary to select the
optimal gamma value depending on the image in order to obtain a better quality image.
Optimizing gamma value can thus be considered as one of the optimization problems.
3 Enhancement of Image Contrast by Using ABC
In order to discover the optimal gamma value in the image enhancement process
effectively, the use of ABC method for finding the optimal gamma value is proposed.
The value of the gamma is considered as a parameter that will be optimized in the
optimization process of the proposed ABC method. In other words, the objective is to
find the optimal gamma value that maximizes the fitness function, as proposed in
Eq. (4). The applied ABC algorithm is illustrated in Fig. 3.
1 þ E 0:01ðSF 4EÞ2
argt maxðFitnessÞ ¼ ð4Þ
2
where SF is spatial frequency and E is the entropy of the image.
Spatial frequency (SF) [22], which is used to measure the overall activity level of
an image and can be used to reflect the clarity of an image [23]. For M x N image block
I, with gray values I(i,j) at position (i,j), the SF is defined as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
SF ¼ ðRFÞ2 þ ðCFÞ2 þ ðMDFÞ2 þ ðSDFÞ2 ð5Þ
where RF, CF, MDF, SDF are the four first-order gradients along four directions
defined as:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 1 X M X N
RF ¼ t ½Iði; jÞ Iði; j 1Þ2 ð6Þ
M N i¼1 j¼2
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 1 X N X M
CF ¼ t ½Iði; jÞ Iði 1; jÞ2 ð7Þ
M N j¼1 i¼2
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 1 X M X N
MDF ¼ twd ½Iði; jÞ Iði 1; j 1Þ2 ð8Þ
M N i¼2 j¼2
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 1 X N 1 X M
SDF ¼ twd ½Iði; jÞ Iði 1; j þ 1Þ2 ð9Þ
M N j¼1 i¼2
where the distance weight wd is p1ffiffi2.

The entropy (E) of the image is defined as follows:
X
255
E¼ Pi log2 ðPi Þ ð10Þ
i¼0
where Pi is the probability of occurrence of ith intensity of the image.

8 A. Banharnsakun
Evaluate fitness of Evaluate fitness of

Re-generate
solution solution
new solution for scout
bee
Update Each onlooker bee

Initialize Yes
Start new solutions selects solution
the solutions
for employed bees from employed bees Abandoned
and updates it solution?
No
No
Yes
Show global
End Criterion satisfied?
solution
Fig. 3. ABC algorithm flowchart for finding the optimal gamma value in image enhancement
As seen in Fig. 3, first, the initial solutions (gamma value) which are treated as the
food source positions are generated by randomization for the bee agents. After the food
source positions are generated, the bee agents in the artificial bee colony algorithm will
perform three major tasks including updating feasible food source positions by
employed bees, selecting feasible food source positions by onlooker bees, and avoiding
further unimproved quality of food sources by scout bees. During the first task, the
employed bee will search for a new food source position by using Eq. (11). This
updating is based on comparing their own food source position with their other
neighborhood bees of the previously selected food source position.

vij ¼ xij þ /ij xij xkj ð11Þ
where vij is a new feasible solution that is modified from its previous solution value (xij)
based on a comparison with a randomly selected position from its neighboring solution
(xkj), /ij is a random number between [-1,1] that is used to randomly adjust the old
solution to become a new solution in the next iteration, and k 2 {1,2,3,..,SN} ^
k 6¼ i and j 2 {1,2,3,..,D} are randomly chosen indexes. The difference between xij and
xkj is a difference of position in a particular dimension.
The old food source position in an employed bee’s memory will be replaced by a
new candidate food source position if the new position has a better fitness value.
Employed bees will return to their hive and share the fitness value of their new food
sources with the onlooker bees. In the second task, each onlooker bee selects one of the
proposed food sources depending on the fitness value obtained from the employed
bees. The probability that a proposed food source will be selected can be obtained from
Eq. (12) below:
fiti
Pi ¼ ð12Þ
P
SN
fiti
i¼1
where fiti is the fitness value of the food source i, which is calculated by using Eq. (4).
The probability of a proposed food source being selected by the onlooker bees
increases as the fitness value of the food source increases. After the food source is
selected, the onlooker bees will go to the selected food source and select a new
candidate food source position in the neighborhood of the selected food source. The
new candidate food source can be calculated and expressed by Eq. (11).
In the third task, any food source position that does not have an improved fitness
value will be abandoned and replaced by a new position that is randomly determined
by a scout bee. This helps avoid suboptimal solutions. The new random position
chosen by the scout bee will be calculated by Eq. (13) below:
xij ¼ xmin
j þ rand½0; 1ðxmax
j xmin
j Þ; ð13Þ
where xmin
j and xmax
j are the lower bound and the upper bound of the food source
position in dimension j, respectively.
The number of iterations is defined as a termination criterion. The three major tasks
described above will be repeated until the number of iterations equals the determined
value.
4 Experimental Settings and Results
In this section, the performance evaluation of the proposed method for enhancing
image contrast is presented. In order to test the effectiveness of the proposed method,
the other approaches based on biologically inspired algorithms designed for image
enhancement, including the genetic algorithm (GA) [10], the particle swarm opti-
mization (PSO) [11], and the bat algorithm (BA) [12] were used for comparison with
the image enhancement results that were obtained from a number of different images.
The experiment was conducted on the standard image dataset obtained from the Low-
Light dataset (LOL) [24] as shown in Fig. 4. The contrast and the measure of
enhancement (EME) [25] are used as measurement indicators in comparison to the
efficacy of the proposed method and other aforementioned methods. It can be inter-
preted that the greater the contrast and the EME mean the better the image
enhancement.
(a) (b) (c) (d)
Fig. 4. LOL image set: (a) LOL1, (b) LOL2, (c) LOL3, (d) LOL4
10 A. Banharnsakun
All methods in this experiment were programmed in C++, and all experiments were
run on a PC with an Intel Core i7 CPU, 2.8 GHz and 16 GB memory. The number of
iterations for each method was set to 50. For the ABC methods, the number of
employed and onlooker bees was set to 20. For the parameter settings of the PSO
method, the number of particles was 20, and the parameters used in the PSO were
defined as: c1 = c2 = 2, x = 0.7. For the parameter settings of the GA method, the
number of population (NP) was 20, the crossover probability (CR) was 0.3, and the
mutation rate (MR) was 0.15. For the parameter settings of the BA method, the number
of bats was 20, and the parameters used in the BA were defined as: fmax = 2, fmin = 0,
a = c = 0.9. Note that these parameter settings were found to be appropriate for our
image sets in the preliminary study of this work.
Tables 1 and 2 show that the proposed method yields higher average contrast values
than the GA, the PSO, and the BA methods while the average EME results produced by
the proposed method also give higher values than the aforementioned method. This
indicates that the best quantitative evaluation results can be achieved by our proposed
method. The improvement of the average contrast value using the proposed method
when compared to the GA, the PSO, and the BA methods was 31.46%, 26.26%, and
22.44%, respectively and the improvement of the average EME value using the pro-
posed method when compared to the GA, the PSO, and the BA methods was 5.64%,
4.43%, and 3.19%, respectively. Figure 5 illustrates a clear comparison of the results
obtained from the various algorithms being presented.
Table 1. Proposed and existing methods contrast comparison with the LOL image set
Image GA PSO BA Proposed ABC
LOL1 0.0471 0.0499 0.0739 0.2043
LOL2 0.2589 0.2615 0.2652 0.2710
LOL3 0.1525 0.1648 0.1623 0.1692
LOL4 0.1821 0.1911 0.1867 0.1979
Average 0.1602 0.1668 0.1720 0.2106
Table 2. Proposed and existing methods EME comparison with the LOL image set
Image GA PSO BA Proposed ABC
LOL1 88.77 88.91 89.17 91.85
LOL2 28.43 28.75 29.21 30.80
LOL3 15.62 16.44 17.13 17.52
LOL4 25.23 25.76 26.28 26.77
Average 39.51 39.97 40.45 41.74
As shown in Fig. 5, the quality of results obtained from the image enhancement
processed by using the proposed method is noticeably higher than other methods and it
can be seen that the detail in the enhanced image obtained by the proposed method is
clearly visible more than the aforementioned methods. In addition, all of the enhanced
images of the LOL image set processed by the proposed ABC method are illustrated in
Fig. 6.
(a) (b) (c) (d)
Fig. 5. Enhanced LOL1 image yielded from: (a) GA, (b) PSO, (c) BA, (d) proposed ABC
(a) (b) (c) (d)
Fig. 6. Enhanced images of the LOL image set by using proposed ABC method: (a) LOL1,
(b) LOL2, (c) LOL3, (d) LOL4
5 Conclusions
In this work, image enhancement using an ABC-based gamma correction method is

proposed. Success in developing an effective method for improving low-light image, in
which the parameter of the gamma correction is optimized in a supervised process by
the proposed ABC technique, is the major contribution of this work. In our experi-
ments, a detailed benchmarking between the proposed technique and other methods
using biologically inspired based algorithms, including the GA, the PSO, and the BA
methods is presented. The experiment was performed on the LOL image data set. It can
be found that the proposed technique is a highly effective method of delivering good
results in terms of contrast and the EME. Therefore, it can be concluded that the
proposed ABC method can be an option used to improve the low-light image.
References
1. Gu, K., Zhai, G., Lin, W., Liu, M.: The analysis of image contrast: From quality assessment
to automatic enhancement. IEEE Trans. Cybern. 46(1), 284–297 (2015)
2. Park, S., Kim, K., Yu, S., Paik, J.: Contrast enhancement for low-light image enhancement:
A survey. IEIE Trans. Smart Process. Comput. 7(1), 36–48 (2018)
12 A. Banharnsakun
3. Srinivas, K., Bhandari, A.K.: Low light image enhancement with adaptive sigmoid transfer
function. IET Image Proc. 14(4), 668–678 (2019)
4. Singh, K., Vishwakarma, D.K., Walia, G.S., Kapoor, R.: Contrast enhancement via texture
region based histogram equalization. J. Mod. Opt. 63(15), 1444–1450 (2016)
5. Jung, C., Yang, Q., Sun, T., Fu, Q., Song, H.: Low light image enhancement with dual-tree
complex wavelet transform. J. Vis. Commun. Image Represent. 42, 28–36 (2017)
6. Wang, W., Wu, X., Yuan, X., Gao, Z.: An experiment-based review of low-light image
enhancement methods. IEEE Access 8, 87884–87917 (2020)
7. Yang, X.S.: Nature-inspired optimization algorithms: challenges and open problems.
J. Comput. Sci. 46, 101104 (2020)
8. Tzanetos, A., Dounias, G.: Nature inspired optimization algorithms or simply variations of
metaheuristics? Artif. Intell. Rev. 54(3), 1841–1862 (2020). https://doi.org/10.1007/s10462-
020-09893-8
9. Dhal, K.G., Ray, S., Das, A., Das, S.: A survey on nature-inspired optimization algorithms
and their application in image enhancement domain. Arch. Comput. Methods Eng. 26(5),
1607–1638 (2019)
10. Hashemi, S., Kiani, S., Noroozi, N., Moghaddam, M.E.: An image contrast enhancement
method based on genetic algorithm. Pattern Recogn. Lett. 31(13), 1816–1824 (2010)
11. Kanmani, M., Narsimhan, V.: An image contrast enhancement algorithm for grayscale
images using particle swarm optimization. Multimed. Tools Appl. 77(18), 23371–23387
(2018). https://doi.org/10.1007/s11042-018-5650-0
12. Asokan, A., Popescu, D.E., Anitha, J., Jude Hemanth, D.: Bat algorithm based non-linear
contrast stretching for satellite image enhancement. Geosciences 10(2), 78 (2020). https://
doi.org/10.3390/geosciences10020078
13. Karaboga, D.: An Idea Based on Honey Bee Swarm for Numerical Optimization. Technical
Report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department,
Turkey (2005)
14. Karaboga, D., Akay, B.: A comparative study of artificial bee colony algorithm. Appl. Math.
Comput. 214(1), 108–132 (2009)
15. Karaboga, D., Gorkemli, B., Ozturk, C., Karaboga, N.: A comprehensive survey: artificial
bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 42(1), 21–57 (2012). https://
doi.org/10.1007/s10462-012-9328-0
16. Banharnsakun, A.: Artificial bee colony algorithm for solving the knight’s tour problem. In:
Proceedings of the International Conference on Intelligent Computing & Optimization 2018,
pp. 129–138 (2018)
17. Banharnsakun, A.: Feature point matching based on ABC-NCC algorithm. Evol. Syst. 9(1),
71–80 (2017). https://doi.org/10.1007/s12530-017-9183-y
18. Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE
Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
19. Banharnsakun, A.: Artificial bee colony algorithm for content-based image retrieval.
Comput. Intell. 36(1), 351–367 (2020)
20. Amiri, S.A., Hassanpour, H.: A preprocessing approach for image analysis using gamma
correction. Int. J. Comput. Appl. 38(12), 38–46 (2012)
21. Huang, S.C., Cheng, F.C., Chiu, Y.S.: Efficient contrast enhancement using adaptive gamma
correction with weighting distribution. IEEE Trans. Image Process. 22(3), 1032–1041 (2012)
22. Eskicioglu, A.M., Fisher, P.S.: Image quality measures and their performance. IEEE Trans.
Commun. 43(12), 2959–2965 (1995)
23. Li, S., Kwok, J.T., Wang, Y.: Combination of images with diverse focuses using the spatial
frequency. Information fusion 2(3), 169–176 (2001)
24. Wei, C., Wang, W., Yang, W., Liu, J.: Deep retinex decomposition for low-light
enhancement. In: Proceedings of British Machine Vision Conference 2018, pp. 127–136
(2018)
25. Agaian, S.S., Panetta, K., Grigoryan, A.M.: A new measure of image enhancement. In:
Proceedings of IASTED International Conference on Signal Processing & Communication,
pp. 19–22 (2000)
Optimal State-Feedback Controller Design
for Tractor Active Suspension System
via Lévy-Flight Intensified Current Search
Algorithm
Thitipong Niyomsat1, Wattanawong Romsai2, Auttarat Nawikavatan3,

and Deacha Puangdownreong3(&)
1
Department of Industrial Engineering, Rajapark Institute, Bangkok, Thailand
2
National Telecom Public Company Limited: NT, Bangkok, Thailand
3
Department of Electrical Engineering, Southeast Asia University,
Bangkok, Thailand
deachap@sau.ac.th
Abstract. This paper proposes the optimal state-feedback controller design for
the tractor active suspension system via the Lévy-flight intensified current search
(LFICuS) algorithm. As one of the newest and most efficient metaheuristic
optimization search techniques, the LFICuS algorithm is formed from the
behavior of the electrical current in the electric networks associated with the
random drawn from the Lévy-flight distribution, the adaptive radius
(AR) mechanism and the adaptive neighborhood (AN) mechanism. In this
paper, the LFICuS algorithm is applied to optimally design the state-feedback
controller for the tractor active suspension system to eliminate the transmitted
vibrations to the driver’s cabin caused by road roughness. Results obtained by
the LFICuS algorithm will be compared with those obtained by the conventional
pole-placement method. As results, the LFICuS algorithm can successfully
provide the optimal state-feedback controller for the tractor active suspension
system. The tractor active suspension system controlled by the state-feedback
controller designed by the LFICuS algorithm yields very satisfactory response
with smaller oscillation and faster regulating time than that designed by the pole-
placement method, significantly.
Keywords: State-feedback controller Tractor active suspension Lévy-flight

intensified current search Metaheuristic optimization
1 Introduction
Thailand is one of the agricultural countries located in Southeast Asia region. It has an
area of approximately 513,000 km2 (approximately 321 million Rais) and a population
of over 66 million people. More than 9 million Thais are farmers who usually use
tractors in their fields of rice, corn, cassava, sugar cane, palm and rubber, etc., on area
of approximately 220,542 km2 (approximately 138 million Rais) or approximately
43% of overall area [1].

https://doi.org/10.1007/978-3-030-93247-3_2
Optimal State-Feedback Controller Design for Tractor Active Suspension System 15
A tractor is one of the heavy-duty machineries and vehicles commonly used in the
farms. Using tractors for a long period harm drivers due to great-continuously vibra-
tion. Mechanical vibration is transmitted to tractor’s driver caused by the unevenness of
the road or soil profile. Also, moving elements within the machine or devices can cause
physiological and psychological harm effects. Normally, the endurance limit of human
body in vertical acceleration is in the range of 4–8 Hz and root-mean-square
(RMS) acceleration is less than 1 m/s2 [2–4]. During ploughing and harrowing periods,
farm tractors’ drivers are subjected to such vibrations. With continuous exposure to
whole body vibration, it can cause severe discomfort and injuries including low back
and pain disorders, hernia, abscess, colon, testicular and prostate cancers. Vehicle
active suspension systems are needed in modern tractors to improve both ride quality
and handling vibration performance. Following the literatures, the active suspension
control system for road vehicles has been quite challenging over the last 20 years [5].
Various control strategies have been proposed for such the systems, such as linear
quadratic regulation (LQR) [6], robust control [7], sliding mode control [8] and state-
feedback control [9].
Recently, the control system design has been changed from the conventional
method to modern optimization using the potential metaheuristic search technique as an
optimizer [10]. One of the newest and most efficient metaheuristic optimization search
techniques is the Lévy-flight intensified current search (LFICuS) algorithm formed
from the flowing behavior of the electrical current in the electric networks [11]. The
LFICuS algorithm utilizes the random drawn from the Lévy-flight distribution for
generating the elite solutions in each search round. In addition, it possesses the adaptive
radius (AR) mechanism and the adaptive neighborhood (AN) mechanism to speed up
the search process. The LFICuS algorithm has performed the effectiveness against
many benchmark functions [11] and applied to design the proportional-integral-
derivative (PID) controller for the car active suspension system [12], the PID controller
for the brushless direct current (BLDC) motor speed control system [13] and the PID
controller for antenna azimuth position control system [14].
In this paper, the LFICuS algorithm is applied to design the optimal state-feedback
controller for the tractor active suspension system based on the state-space model
representation and modern optimization. This paper consists of five sections. After an
introduction is given in Sect. 1, the dynamic model of the tractor active suspension
system is described in Sect. 2. Problem formulation of the LFICuS-based state-
feedback controller design optimization is illustrated in Sect. 3. Results and discussions
are detailed in Sect. 4. Finally, conclusions are provided in Sect. 5.
2 Dynamic Model of Tractor Active Suspension System
The tractor active suspension system can be represented by the schematic diagram as
shown in Fig. 1. The front and rear suspensions are lumped by single wheel and axle
connected to the quarter portion of the tractor body through an active spring-damper
combination, where M1 is the tractor mass, M2 is the suspension mass, xs is the
displacement of tractor body, xw is the displacement of the suspension mass, k1 and k2
are the spring coefficients, and b1 and b2 are the damper coefficients, respectively.
16 T. Niyomsat et al.
Fig. 1. Schematic diagram of tractor active suspension system (modified from [9]).
Based on the Newton’s law, the equation of vertical motion of the tractor active
suspension system shown in Fig. 1 can be formulated as expressed in (1) and (2),
where u is the control force from the actuator regarded as the input of the system and
r is the road disturbance.

d 2 xs 1 dxw dxs
¼ b1 þ k1 ðxw xs Þ þ u ð1Þ
dt2 M1 dt dt

d 2 xw 1 dxs dxw dr dxw
¼ b1 þ k1 ðxs xw Þ þ b2 þ k2 ðr xw Þ u ð2Þ
dt2 M2 dt dt dt dt
2 3 2 M1 ðb1 þ b2 Þ þ b1 M2 32 3 2 3 9
x_ 1 M1 ðk1 þ k2MÞ þ1 MM2 2 k1 þ b1 b2 b1 kM2 1þMb22 k1 Mk11 kM22 x1 1 >
>
6 x_ 7 6
M1 M2
76 7 6 7 > >
6 27 6 1 0 0 0 7 6 x2 7 6 0 7 u >
>
>
6 7¼6 6 76 7 þ 6 7 >
4 x_ 3 5 4 74 x 5 4 0 5 r >>
>
0 1 0 0 5 3 >
>
>
=
x_ 4 0 0 1 0 x4 0
2 3 ð3Þ
x >
>
" # 1 >
>
M1 þ M2 b2 k2 6x 7 >
>
y1 0 M1 M 2 6 2 7 >
>
¼
M1 M2 M 1 M2
6 7 >
>
M1 b2 M 1 k2
0 4 x3 5 >
>
y2 0 >
>
M1 M 2 M1 M2
;
x4
Regarding to the modern control system, the state-space model representation can
be formulated from (1) and (2) as stated in (3), where x1 = xs, x2 = x_ s , x3 = y and x4 = y_
are the state variables, [u r]T are the input variables, y = (xs – xw) is the output variable,
y1 = y is the output depending on the control force u (r = 0) and y2 = y is the output
depending on the road disturbance r (u = 0) [9].
3 Problem Formulation
From the state-space

R model of the tractor active suspension system in (3), a new state
variable x5 ¼ ydt is added into the system model to achieve zero dynamic. Once the
system response reaches the steady-state interval, this integral action will produce zero
steady-state error. The closed-loop state-space model representation for the full-state
feedback controller is performed in (4) [9], where ½x1 ; x2 ; x3 ; x4 ; x5 T ¼ ½xs ; x_ s ; y ¼
R
ðxs xw Þ; y_ ; ydtT . The model in (4) shows that after the tractor tire is subjected to
the road disturbance, x3 = y = (xs – xw) will ultimately reach to equilibrium point.
02 39
0 1 0 0 0 >
2 3 B6 b1 b2 >
>
b21 b21
07 >
x_ 1 B6 M1 M2 0 þ M1 M2 Mb11 bM22 Mk11
M12
þ Mb11 7>>
>
>
B 6 7
6 x_ 7 B6
6 2 7 B6 b 2
7>>
>
>
7
7>
b1 b1 b2
6 7 ¼ B6 M 2 0 1 0 >
4 x_ 3 5 B6 7>
M1 M2 M2
>
B6 k2 7>>
x_ 4 @4 M 2 0 Mk11 Mk12 Mk22 0 05> >
>
>
>
>
0 0 1 0 0 > >
>
2 3 1 >
>
0 0 >
>
2 3 >
>
6 1 b1 2 7
b C x1 > >
>
6 M1 M1 M2 7 C >
>
6 7 C 6 7 >
6 b2 7 1 C 6 x 27> >
6 0 M2 7 ½ K1 K2 K3 K4 K5 C6 7 > >
6 7 0 C4 x3 5 > >
>
6 M1 þ M2 7 C >
4 M1 M2 Mk22 5 A x =
4
0 0 >
>
2 3 >
>
>
>
0 0 >
>
6 b1 b2 7 >
>
1 >
6 M1 M2 7 > >
6 M1
7 >
>
6 b 2 7 u >
>
þ6 0 M2 7 >
6 7 r > >
>
>
6 M1 þ M2 k2 7 >
>
4 M1 M2 M2 5 >
>
>
>
>
>
0 0 >
2 3> >
x1 > >
>
>
6x 7> >
6 27> >
y ¼ ½ 0 0 1 0 0 6 7 > >
4 x3 5 >>
>
>
;
x4
ð4Þ
The control objective of the tractor active suspension system is to create control
force u from the actuator in such a way that the output y will be able to regulate the road
disturbance with smallest overshoot and shortest regulating time. With this control
objective, the sum-squared error (SSE) between the input r (r = 0) and the output y is
set as the objective function f(K) of the LFICuS-based state-feedback controller design
optimization as expressed in (5), where N is the number of data. The objective function
f(K) in (5) will be minimized by the LFICuS algorithm by searching for the optimal
five values of gain K = [K1, K2, K3, K4, K5] in (4) within their corresponding
boundaries and giving very satisfactory responses to meet the inequality constraints as
stated in (6), where Mp is the maximum percent overshoot, Mp_max is the maximum
allowance of Mp, treg is the regulating time, treg_max is the maximum allowance of treg,
ess is the steady-state error, ess_max is the maximum allowance of ess, K1_min and K1_max
are the boundaries of K1, K2_min and K2_max are the boundaries of K2, K3_min and K3_max
are the boundaries of K3, K4_min and K4_max are the boundaries of K4, K5_min and K5_max
are the boundaries of K5, respectively.
X
N X
N
Min f ðKÞ ¼ ½ri yi 2 ¼ ½yi 2 ð5Þ
i¼1 i¼1
9
Subject to Mp Mp max ; >
>
treg treg max ; >
>
>
>
>
>
ess ess max ; >
>
=
K1 min K1 K1 max ;
ð6Þ
K2 min K2 K2 max ; >
>
>
>
>
>
>
>
;
K5 min K5 K5 max
The LFICuS algorithm [11–14] uses the random drawn from the Lévy-flight dis-
tribution to generate the neighborhood members as the elite solutions in each iteration.
The Lévy-flight random distribution L can be calculated by (7), where s is step length, k
is an index and C(k) is the Gamma function as expressed in (8). Also, the AR and AN
mechanisms are conducted in the LFICuS algorithm by reducing the search radius
R and the number of neighborhood members n to speed up the search process. The
LFICuS algorithm for designing an optimal state-feedback controller of the tractor
active suspension system can be described step-by-step as follows.
kCðkÞ sinðpk=2Þ 1
L 1þk ð7Þ
p s
Z 1
CðkÞ ¼ tk1 et dt ð8Þ
0
Step-0 Initialize the objective function f(K) in (5) and constraint functions in (6),
search space X = [K1_min, K1_max], [K2_min, K2_max], [K3_min, K3_max],
[K4_min, K4_max] and [K5_min, K5_max], memory lists (ML) W, Ck and
N = ∅, maximum allowance of solution cycling jmax, number of initial
solutions N, number of neighborhood members n, search radius R = X,
k = j = 1.
Step-1 Uniformly random initial solution Xi = {K1, K2, K3, K4 and K5} within X.
Evaluate f(Xi) via (5) and (6), then rank and store Xi in W.
Step-2 Let x0 = Xk as selected initial solution. Set Xglobal = Xlocal = x0.
Step-3 Generate new solutions xi = {K1, K2, K3, K4 and K5} by Lévy-flight random
in (7) and (8) around x0 within R. Evaluate f(xi) via (5) and (6), and set the
best one as x*.
Step-4 If f(x*) < f(x0), keep x0 into Ck, update x0 = x* and set j = 1. Otherwise,
keep x* into Ck and update j = j+1.
Step-5 Activate AR mechanism by R = qR, 0 < q < 1 and invoke AN mechanism
by n = an, 0 < a < 1.
Step-6 If j jmax, go back to Step-3.
Step-7 Update Xlocal = x0 and keep Xglobal into N.
Step-8 If f(Xlocal) < f(Xglobal), update Xglobal = Xlocal.
Step-9 Update k = k+1 and set j = 1. Let x0=Xk as selected initial solution.
Step-10 If k N, go back to Step-2. Otherwise, stop the search process and report
the best solution Xglobal = {K1, K2, K3, K4 and K5} found.
4 Results and Discussions
Referring to model in (4), the numerical values of the suspension model parameters of
Kubota M110X tractor [9] are conducted as follow, M1 = 700 kg, M2 = 90 kg,
k1 = 62,000 N/m, k2 = 570,000 N/m, b1 = 500 N.s/m and b2 = 22,500 N.s/m. The
state-feedback controller for the Kubota M110X tractor active suspension system was
designed by the pole-placement method [9] as stated in (9).
K ¼ ½ 250 500 300 200 150 ð9Þ
To design an optimal state-feedback controller for the tractor active suspension

system, the LFICuS algorithm was coded by MATLAB version 2017b (License No.
#40637337) run on Intel(R) Core(TM) i7-10510U CPU@1.80 GHz, 2.30 GHz,
16.0 GB-RAM. The road disturbances were simulated by the step function representing
the step road and the sinusoidal function representing the bumpy and pothole roads.
The search parameters of the LFICuS are set from the preliminary study, i.e., R (initial
search radius) = X (search spaces) = [K1_min, K1_max], [K2_min, K2_max], [K3_min,
K3_max], [K4_min, K4_max] and [K5_min, K5_max], step length s = 0.01, index k = 0.3,
number of initial neighborhood members n = 100 and number of search directions
N = 50. Each search direction will be terminated by the maximum iteration (Max_Iter)
of 200. Number of states for activating the AR and AN mechanisms h = 2, state-(i): at
the 100th iteration, R = 25% of X and n = 50, state-(ii): at the 150th iteration, R = 5%
of X and n = 25. The constraint functions in (6) are set as stated in (10). 50-trials are
run to obtain the optimal values of of gain K = [K1, K2, K3, K4, K5].
9
Subject to Mp 10:00%; >
>
treg 5:00 sec; >
>
>
>
>
>
ess 0:01%; >
>
=
1; 000 K1 5; 000;
ð10Þ
1; 000 K2 5; 000; >
>
>
1; 000 K3 5; 000; >
>
>
>
100 K4 1; 000; > >
>
;
100 K5 1; 000
Once the search process stopped over 50-trial runs, the LFICuS can successfully
provide the optimal state-feedback controller for the tractor active suspension system as
stated in (11). The convergent rates over 50-trial runs are plotted in Fig. 2.
K ¼ ½ 3; 814:22 3; 693:58 3; 172:14 127:49 271:86 ð11Þ
Fig. 2. Convergent rates of LFICuS-based state-feedback controller design over 50-trial runs.
The step responses of the tractor active suspension system without controller
(passive suspension system), with the state-feedback controller designed by the pole-
placement method in (9) and with the state-feedback controller designed by the LFICuS
algorithm in (11) are depicted in Fig. 3, where the step road profile is plotted by thin-
solid blue line. It was found that, the tractor passive suspension system (thin-dotted
black line) yields great oscillation and slow response with Mp = 29.07%, treg = 9.78 s.
and ess = 0.00%. The tractor active suspension system with the state-feedback con-
troller designed by the pole-placement method in (9) (thin dash-dotted black line) gives
smaller oscillation and faster response than the passive suspension system with Mp =
6.02%, treg = 4.57 s. and ess = 0.00%. For the tractor active suspension system with
the state-feedback controller designed by the LFICuS algorithm (11) (thick-solid black
line), it provides smaller oscillation and faster response than the active suspension
system with the state-feedback controller designed by the pole-placement method with
Mp = 2.42%, treg = 2.16 s. and ess = 0.00%.
The sinusoidal responses of the tractor active suspension controlled system are
plotted in Fig. 4, where the sinusoidal road profile is plotted by thin-solid blue line. It
can be observed that, the tractor passive suspension system (thin-dotted black line)
yields great oscillation and slow response with Mp = 48.07%, treg = 9.69 s. and ess =
0.00%. The tractor active suspension system with the state-feedback controller
designed by the pole-placement method in (9) (thin dash-dotted black line) gives
smaller oscillation and faster response than the passive suspension system with Mp =
8.98%, treg = 4.53 s. and ess = 0.00%. For the tractor active suspension system with
the state-feedback controller designed by the LFICuS algorithm (11) (thick-solid black
line), it provides smaller oscillation and faster response than the active suspension
system with the state-feedback controller designed by the pole-placement method with
Mp = 7.54%, treg = 2.01 s. and ess = 0.00%.
From overall results in Fig. 3 and Fig. 4, it can be noticed that the tractor active
suspension system controlled by the state-feedback controller designed by the LFICuS
algorithm provides very satisfactory response superior to the tractor passive suspension
system and the tractor active suspension system controlled by the state-feedback
controller designed by the pole-placement method, significantly.
Fig. 3. Step responses of the tractor active suspension controlled system.

Fig. 4. Sinusoidal responses of the tractor active suspension controlled system.
5 Conclusions
The application of the Lévy-flight intensified current search (LFICuS) algorithm to

design an optimal state-feedback controller for the tractor active suspension system has
been proposed in this paper. Based on the modern optimization, the LFICuS algorithm
utilizing the random drawn from the Lévy-flight distribution as well as AR and AN
mechanisms has been applied to optimize the values of the feedback gains of the state-
feedback controller to eliminate the tractor vibrations due to road roughness. From
simulation results by MATLAB, the LFICuS algorithm could provide the optimal state-
feedback controller for the tractor active suspension system. By comparison once the
road disturbances (step and bumpy-pothole roads) are assumed to be occurred in the
system, the tractor active suspension system controlled by the state-feedback controller
designed by the LFICuS algorithm provided very satisfactory response with smaller
oscillation and faster regulating time than that designed by the pole-placement method,
significantly. For future research, applications of the LFICuS algorithm will be
extended to other control system design optimization problems including the PIDA,
FOPID and FOPIDA controllers for more complicated real-world systems.
References
1. National Statistical Office Homepage. http://www.nso.go.th
2. Deprez, K., Moshou, D., Anthonnis, J., De Baerdemaeker, J., Ramon, H.: Improvement of
vibrational comfort on agricultural vehicles by passive and semiactive cabin suspensions.
Comput. Electron. Agric. 49, 431–440 (2005)
3. Muzammil, M., Siddiqui, S.S., Hasan, F.: Physiological effect of vibrations on tractor drivers
under variable ploughing conditions. J. Occup. Health 46, 403–409 (2004)
4. Scarlett, A.J., Price, J.S., Stayner, R.M.: Whole-body vibration: evaluation of emission and
exposure levels arising from agricultural tractors. J. Terrramech. 44, 65–73 (2007)
5. Hrovat, D.: Survey of advanced suspension developments and related optimal control
applications. Automatica 33(10), 1781–1817 (1997)
6. Zhen, L., Cheng, L., Dewen, H.: Active suspension control design using a combination of
LQR and backstepping. In: 25th IEEE Chinese Control Conference, pp. 123–125. Harbin,
Heilongjiang, China (2006)
7. Yousefi, A., Akbari, A., Lohmann, B.: Low order robust controllers for active vehicle
suspensions. In: The IEEE International Conference on Control Applications, pp. 693–698,
Munich, Germany (2006)
8. Chamseddine, A., Noura, H., Raharijaona, T.: Control of linear full vehicle active
suspension system using sliding mode techniques. In: The IEEE International Conference on
Control Applications, pp. 1306–1311. Munich, Germany (2006)
9. Shamshiri, R., Ismail, W.I.W.: Design and analysis of full-state feedback controller for a
tractor active suspension: implications for crop yield. Int. J. Agric. Biol. 15, 909–914 (2013)
10. Zakian, V.: Control Systems Design: A New Framework. Springer-Verlag (2005)
11. Romsai, W., Leeart, P., Nawikavatan, A.: Lévy-flight intensified current search for
multimodal function minimization. In: The 2020 International Conference on Intelligent
Computing and Optimization (ICO’2020), pp. 597–606. Hua Hin, Thailand (2020)
12. Romsai, W., Nawikavatan, N., Puangdownreong, D.: Application of Lévy-flight intensified
current search to optimal PID controller design for active suspension system. Int. J. Innov.
Comput. Inf. Control 17(2), 483–497 (2021)
13. Leeart, P., Romsai, W., Nawikavatan, A.: PID controller design for BLDC motor speed
control system by Lévy-flight intensified current search. In: The 2020 International
Conference on Intelligent Computing and Optimization (ICO’2020), pp. 1176–1185. Hua
Hin, Thailand (2020)
14. Romsai, W., Lurang, K., Nawikavatan, A., Puangdownreong, D.: Optimal PID controller
design for antenna azimuth position control system by Lévy-flight intensified current search
algorithm. In: The 18th International Conference on Electrical Engineering/Electronics,
Computer, Telecommunications and Information Technology (ECTI-CON 2021), pp. 858–
861. Chiang Mai, Thailand (2021)
The Artificial Intelligence Platform
with the Use of DNN to Detect Flames: A Case
of Acoustic Extinguisher
Stefan Ivanov(&) and Stanko Stankov
Department of Automation, Information and Control Systems, Technical

University of Gabrovo, Hadji Dimitar 4, 5300 Gabrovo, Bulgaria
st_ivanov@abv.bg
Abstract. In practice, it is possible to combine an acoustic extinguishing of

flames with their detection using artificial intelligence, which is the main aim of
this article. This paper presents the possibility of using DNN (Deep Neural
Network) in an autonomous acoustic extinguisher for flame detection. It is a
developed robotized mobile platform that is applied to test algorithms for fire
detection and evaluation of the fire source. Experimental results show that DNN
can be used in the autonomous acoustic fire extinguisher. Based on the research
work, it is feasible to apply multiple DNN algorithms and models in a single
intelligent and autonomous acoustic fire extinguisher (a new approach to fire
protection).
Keywords: Deep Neural Networks (DNN) Fire detection High-power

acoustic extinguisher Low-cost sensor Machine learning
1 Introduction
In addition to the danger to human and animal life and material losses, fires are a
significant cause of devastation of the environment. They result in deforestation,
desertification, and air pollution. It is estimated that the occurrence of fires results in
20% of the total CO2 emissions to the atmosphere. Besides, fires cause the recirculation
of heavy metals and radionuclides [1]. Consequently, the efforts of scientists are aimed
at finding innovative and effective ways to detect fires as soon as possible (this is a key
issue especially in forests and sparsely populated areas), so that firefighting action can
be taken before the fire spreads. The solutions described in [2, 3] may be useful for data
transmission from hard-to-reach areas.
On a global scale, satellite imaging can be used for fire detection. Chitade and
Katiyar present color segmentation capabilities for segmenting satellite images. This
technique may be applicable for fire detection [4]. However, a weakness of tracking fire
locations using satellites is that the spatial resolution and time scale are too low, which
unfortunately prevents the effective use of this knowledge on a local scale [5, 6]. This is
essential because the fire caused by wind gusts spreads very quickly, causing sub-
stantial financial losses, environmental degradation, and, above all, often the loss of
human life.

https://doi.org/10.1007/978-3-030-93247-3_3
The Artificial Intelligence Platform with the Use of DNN 25
Artificial intelligence, including a subdiscipline of machine learning that does not

require human control – deep learning – is helpful there. It finds its application, in
particular, in places where the use of typical sensors is very difficult or impossible
(especially in open spaces due to the limited range of classical sensors) [7–18].
Regardless of whether conventional sensors or artificial intelligence are applied, the
primary goal is the detection of flames, which is related to the simultaneous detection of
the location of the fire.
A new scientific development in recent years has been the use of acoustic waves to
extinguish flames. Since sound waves are not chemical, they do not pollute the envi-
ronment. Low-cost intelligent sensors may be installed in the acoustic extinguisher, so
that extinguishing can be started immediately after flame detection (without unneces-
sary time delay and without human intervention). This technique can become an ele-
ment supporting the safety of industrial halls, warehouses, flammable liquid tanks (no
barrier to the propagation of acoustic waves), e.g., as a stationary (permanently
installed) fire extinguishing system. However, the use of portable extinguishers is also
possible, but sometimes problematic. The advantage of these systems is usually a high
data processing speed (less than 10 ms). Such research is currently being conducted
within the framework of cooperation in Bulgaria and Poland, but, in general, research is
being conducted in the United States, Europe and Asia, for example, [19–25]. This has
resulted in many patents such as [26–34]. It is interesting in terms of analyzing the
extinguishing capabilities of acoustic waves.
Research work carried out in Central and Eastern Europe, with the support of
professional services that deal with fire protection, resulted in the development of an
acoustic extinguisher that can be equipped with an intelligent module for flame
detection (scientific novelty). Implementing such a module may contribute to reducing
to a minimum the time delay to start the firefighting action because the system can be
activated automatically as soon as flames are detected (without the need for its manual
activation by a human). This is particularly important in the case of extinguishing a fire,
before the flames even have time to spread.
During the research work, a prototype model of an autonomous high-power
acoustic fire extinguisher was developed (Fig. 1). Several new patents can be found in
this area [31–34].
Fig. 1. Simplified 3D model of the prototype of the autonomous acoustic fire extinguisher.
26 S. Ivanov and S. Stankov
Since the authors propose a method of flame detection using deep neural network in
this paper, they do not insist on the presentation of the extinguishing system, but only
on the flame detection module. Such robots may be used in crisis management [35]. To
analyze the influence of various factors on the possibility of the occurrence of a given
phenomenon, various models can be applied, based on real data, as is done, among
others, in mathematical sciences [36–38].
This paper is divided into several sections. In Section II, it is shown that it is
possible to design a system based on artificial vision systems using mobile robots that
have artificial human-like visual attention. This allows them to support the work of
firefighters in difficult-to-reach places. This section presents the possibilities of flame
detection using deep neural networks. In this context, an intelligent module can be part
of the acoustic extinguisher equipment, which is a novel scientific approach. The
architecture of the neural networks used and the operation algorithm implemented on
the robotized platform are presented in Section III. In Section IV, a short summary is
provided, which synthesizes the most important information from the article and
directions for future research.
2 The Use of Deep Neural Networks for Flame Detection
Undoubtedly, in the era of recent years, intelligent computing and optimization are
important in the development of society and innovative solutions. Work in this area is
being carried out by many scientific and academic centers around the world [39–41].
Research on the use of deep neural networks for flame detection is part of this trend.
The architecture of broadband information systems for crisis management is essential
[42], as is familiarity with digital image recognition [43].
The intelligent module can be part of the acoustic extinguisher equipment. This is
especially important because the acoustic system does not require human presence, and
thus the human is not exposed to the influence of low-frequency acoustic field, which
may cause various health problems [44, 45]. In addition to wide open space (for
example, forests and wildlands), neural networks allow for flame detection in buildings,
means of transport, and places where environmental conditions (ambient temperature,
dust) significantly affect the effectiveness of the use of other fire detection techniques.
Examples are, among others, sand dryers, foundries, and heat treatment plants. In
addition to the well-known and available methods, the benefits of using deep neural
networks include low cost of purchasing components and high performance (fire
detection efficiency is typically over 90%).
In this paper, Fig. 2 shows an example of a robot, constructed by us, detecting
flames using deep neural networks. This robotized mobile platform simulates the
operation of the acoustic fire extinguisher. All electronic hardware and algorithms for
fire detection, motor control, fire approaching, and detection of fake fire sources can be
used directly in the autonomous fire extinguisher.
Fig. 2. A flame detection robot using deep neural networks.
The developed robot uses Jetson Nano as the main platform for fire detection.
A Logitech C310 USB camera is connected to the Jetson Nano and provides a reso-
lution of 1280 720 pixels. The proposed system incorporates an LCD display for
visualization of a video stream from a camera and for drawing the contours of the fire
sources and flames that are found in the video signal.
The control of the robot drive is performed with the help of a specially designed
control board based on the ESP32 microcontroller. The board controls two DC motors
and gathers data from temperature, ultrasound, gas, and flame sensors. The control
board receives commands from Jeston Nano for its movements in the space and returns
sensor data to Jetson Nano that can be used to better determine the distance to the fire
source.
To detect fire using the robotic platform, two types of neural networks were applied
in the tests, which have the following advantages and disadvantages of their
architectures:
• SSD MobileNet – has high performance but low accuracy when searching for small
objects compared to other architectures. When searching for large objects, a
MobileNet SSD can have higher accuracy than R-CNN;
• Mask R-CNN – based on R-CNN, it has the ability to return the location of an
object and apply a mask on its pixels. Unlike SSD MobileNet, this neural network
has a relatively longer response time.
3 Architecture of Neural Networks Used
MobileNet is a neural network architecture designed for use in embedded applications

running on a device with limited computational power. MobileNet utilizes so-called
depth-wise separable filters. The SSD (Single Shot Detector) may be used to detect
multiple objects within an image. It is based on a convolutional network which works

in the following way – initially extracts feature maps, and after that utilizes the con-
volution filter to detect objects. The combination of MobileNet and SSD represents
SSD MobileNet, which is applied in current research.
The Mask R-CNN can be described as an improvement of Faster R-CNN network
because it also returns the object mask of the detected object. The network initially
extracts feature maps from the images, which are then passed through a Region Pro-
posal Network (RPN), which returns the coordinates of the bounding boxes of the
detected objects. Finally, using segmentation, the network generates masks on the
detected objects, in our case the fire source.
Neural network training data is collected from a variety of freely available sources
on the Internet. The original array contains about 300 images. With the use of a
‘labeling’ tool in the image preprocessing process, the fire coordinates are localized in
each image and a label is assigned.
With the help of a Python script that performs rotation, translation, and scaling of
images, a new array with about 3000 processed images is generated. The larger the
database, the higher the expected accuracy of the neural network. For this purpose, new
images are additionally generated by changing the brightness, and this new array
already has about 15,000 images. This array is used in the training process of the two
selected neural networks.
The Transfer Learning method was used for the training of neural networks. For
this purpose, a trained model of a neural network is taken, which has multiple layers
and recognizes a large number of classes. The last layers, which are fully connected
layers, depending on the number of classes, are cut out and replaced with a new fully
connected layer, which is trained to detect fire.
Two types of neural networks were trained – the architectures mentioned above:
SSD MobileNet and Mask-RCNN. In the learning process, it is necessary that all
images provided for training be scaled and adjusted to a size corresponding to the input
layer of the neural network. The standard size of the developed system is 300x300 for
MobileNet and 800x600 for Mask-RCNN.
The training itself was performed on a personal computer with an NVIDIA
GeForce GTX 1080 Ti video card using the TensorFlow library.
The trained neural networks were tested on a set of 1000 images of fires in different
fire positions and different overall illumination of the images themselves.
To verify the successful training, it is necessary to examine how the neural network
behaves when we have objects resembling fire in the image. Figure 3 shows an image
of a fire in a closed room, as in the image there are objects with a color similar to the
fire. The image shows that the recognition is successful.
Fig. 3. Detection of fire indoors.
Another important aspect of successful fire detection is the ability to detect it in

daylight. Figure 4 shows an image of a burning roof, and the recognition is also
successful.
Fig. 4. Detection of fire on a burning building.
Since two types of neural networks are used for fire detection, MobileNet and
Mask-RCNN, differences in their way of working can be observed between them.
The trained models which are used are downloaded to the Jetson Nano board. The
algorithm implemented on the robotized platform is the following:
Initialization of resources of Jetson Nano;
Reading of image from the USB camera;
Scaling of image according to the size expected by SSD MobileNet of Mask-
RCNN, respectively;
The neural network recognizes the current image and returns the coordinates of the
detected fire;
Depending on the coordinates of fire, the robotized platform is rotated so that the
fire is situated in the center of the images sent by the USB camera;
The robot is moved towards the fire source;

The robot decides when to stop in front of the fire source based on the video image,
information from an ultrasonic sensor, and readings from two temperature sensors;
The robot switches on a signal that simulates the control signal for activation of the
acoustic fire extinguisher.
The method is described and the experimental results are credible. When used by
Jetson Nano, the speed with which they perform recognition as well as their accuracy
are shown in Table 1.
Table 1. Recognition speed and accuracy of SSD MobileNet vs Mask R-CNN.

Heading level SSD MobileNet Mask R-CNN
Speed of recognition [ms] 103 308
Accuracy [%] 79.4 96.1
The data show that the Mask R-CNN has higher recognition accuracy, at the
expense of lower recognition speed.
Some of the test images are taken by the USB camera onboard. In this way, it is
possible to evaluate how the system behaves using the resolution of its own camera.
The conclusion is that the USB camera has good quality video signal and that the real
fire sources are successfully detected. Figure 5 shows an image from the robot camera,
in which the actual source of the fire is recognized using a MobileNet network.
Fig. 5. Recognition of a fire source by a mobile robot.

The ability to recognize real fire from static fire images displayed in front of a
camera is added to the developed mobile robot control software. The software also
includes an algorithm to make a rough determination of the distance to the fire source.
The developed electronic board for robot control uses two temperature sensors, as well
as a flame sensor, to protect the robotic platform (and in the future the autonomous
acoustic fire extinguisher) from too close positioning to the fire.
4 Conclusion
A deep learning system that realizes an acoustic fire extinguisher is described and
evaluated. While acoustic technology shows promise, new research is needed to
improve the ability of acoustic waves to effectively extinguish flames (currently the
range of this technology is about 2 m), as well as to prepare, test, and improve (if
necessary) flame detection algorithms (the second issue was the subject of this article).
In this paper, a mobile platform was used to detect the flame by using deep neural
networks. The results obtained are encouraging and show the possibility of using deep
neural networks effectively for fire detection. All algorithms and know-how achieved
during the research work can be applied in the autonomous fire extinguisher. In
practice, the research work continues with the use of high-power acoustic waves
without exceeding safe sound pressure levels (110 dB). There is some limitation
involved, due to the provision of human safe sound pressure levels. In addition, there is
a need to analyze the extinguishing capabilities of acoustic waves for different sub-
stances and fuels, depending on the specified wave parameters. On the other hand, it
becomes possible to extinguish flames without human intervention if both techniques
(acoustic extinguishing with artificial intelligence) are combined. The advantage is that
there is no time delay between the detection of the flame and the start of extinguishing.
This is a new approach to fire protection that motivates the authors of this article to
collaborate with other researchers in acoustic fire protection (especially from Poland
due to the first high-power acoustic fire extinguisher). The projected direction of future
research is also to analyze the possibilities of extinguishing acoustic waves taking into
account the multipoint distribution of sound sources so that the acoustic stream is
directed as precisely as possible to the source of flames [21, 23, 46]. In this analyzed
process, DNN as well as other machine learning approaches can be implemented.
Moreover, the authors, on the basis of achieved results, conclude that autonomous
acoustic fire extinguishers can be realized using low-cost hardware platforms and
implementing algorithms for artificial intelligence based on deep neural networks.
References
1. Toulouse, T., Rossi, L., Akhloufi, M., Çelik, T., Maldague, X.: Benchmarking of wildland
fire colour segmentation algorithms. IET Image Proc. 9(12), 1064–1072 (2015)
2. Šerić, L., Stipaničev, D., Krstinić, D.: ML/AI in intelligent forest fire observer network. In:
3rd International Conference on Management of Manufacturing Systems. EAI, Dubrovnik
(2018)
3. Šerić, L., Stipaničev, D., Štula, M.: Observer network and forest fire detection. Information
Fusion 12(3), 160–175 (2011)
4. Chitade, A.Z., Katiyar, S.K.: Colour based image segmentation using k-means clustering.
Int. J. Eng. Sci. Technol. 2(10), 5319–5325 (2010)
5. San-Miguel-Ayanz, J., Ravail, N.: Active fire detection for fire emergency management:
potential and limitations for the operational use of remote sensing. Nat. Hazards 35, 361–376
(2005)
6. Wilk-Jakubowski, J.: Information systems engineering using VSAT networks. Yugosl.
J. Oper. Res. 31(3), 409–428 (2020)
7. Szegedy, Ch., Toshev, A., Erhan, D.: Deep neural networks for object detection. In:
Proceedings of the 26th International Conference on Neural Information Processing
Systems, pp. 2553–2561. Curran Associates Inc., New York (2013)
8. Chen, T., Wu, P., Chiou, Y.: An early fire-detection method based on image processing. In:
Proceedings of International Conference on Image Processing, pp. 1707–1710. IEEE Press,
Singapore (2004)
9. Foley, D., O’Reilly, R.: An evaluation of convolutional neural network models for object
detection in images on low-end devices. In: Proceedings for the 26th AIAI Irish Conference
on Artificial Intelligence and Cognitive Science, pp. 350–361. Trinity College Dublin,
Dublin (2018)
10. Janků, P., Komínková Oplatková, Z., Dulík, T.: Fire detection in video stream by using
simple artificial neural network. Mendel 24(2), 55–60 (2018)
11. Kurup, A.R.: Vision based fire flame detection system using optical flow features and
artificial neural network. Int. J. Sci. Res. 3(10), 2161–2168 (2014)
12. Zhang, X.: Simple understanding of mask RCNN. Medium (2018), https://medium.com/
@alittlepain833/simple-understanding-of-mask-rcnn-134b5b330e95
13. Rossi, L., Akhloufi, M., Tison, Y.: On the use of stereovision to develop a novel
instrumentation system to extract geometric fire fronts characteristics. Fire Saf. J. 46(1–2),
9–20 (2011)
14. Li, Z., Mihaylova, L.S., Isupova, O., Rossi, L.: Autonomous flame detection in videos with a
Dirichlet process Gaussian mixture color model. IEEE Trans. Industr. Inf. 14(3), 1146–1154
(2018)
15. Çelik, T.: Fast and efficient method for fire detection using image processing. Electronics and
Telecommunications Research Institute Journal 32(6), 881–890 (2010)
16. Toulouse, T., Rossi, L., Campana, A., Çelik, T., Akhloufi, M.: Computer vision for wildfire
research: an evolving image dataset for processing and analysis. Fire Saf. J. 92, 188–194
(2017)
17. Marbach, G., Loepfe, M., Brupbacher, T.: An image processing technique for fire detection
in video images. Fire Saf. J. 41(4), 285–289 (2006)
18. Horng, W.-B., Peng, J.-W., Chen, C.-Y.: A new image-based real-time flame detection
method using color analysis. In: IEEE International Conference on Networking. Sensing and
Control, pp. 100–105. IEEE Press, Tucson (2005)
19. Friedman, A.N., Stoliarov, S.I.: Acoustic extinction of laminar line-flames. Fire Saf. J. 93,
102–113 (2017)
20. Węsierski, T., Wilczkowski, S., Radomiak, H.: Wygaszanie procesu spalania przy pomocy
fal akustycznych. Bezpieczeństwo i Technika Pożarnicza 30(2), 59–64 (2013)
21. Wilk-Jakubowski, J.: Analysis of flame suppression capabilities using low-frequency
acoustic waves and frequency sweeping techniques. Symmetry 13(7), 1299 (2021)
22. Stawczyk, P., Wilk-Jakubowski, J.: Non-invasive attempts to extinguish flames with the use
of high-power acoustic extinguisher. Open Eng. 11(1), 349–355 (2021)
23. Niegodajew, P., Gruszka, K., Gnatowska, R., Šofer, M.: Application of acoustic oscillations
in flame extinction in a presence of obstacle. In: XXIII Fluid Mechanics Conference. IOP,
Zawiercie (2018)
24. Niegodajew, P., et al.: Application of acoustic oscillations in quenching of gas burner flame.
Combust. Flame 194, 245–249 (2018)
25. Radomiak, H., Mazur, M., Zajemska, M., Musiał, D.: Gaszenie płomienia dyfuzyjnego przy
pomocy fal akustycznych. Bezpieczeństwo i Technika Pożarnicza 40(4), 29–38 (2015)
26. Methods and systems for disrupting phenomena with waves, by: Tran, V., Robertson, S.
(Nov. 24, 2015). Patent US, no application: W02016/086068
27. Fire extinguishing appliance and appended supplementary appliances, by: Davis, Ch.B.
(Apr. 13, 1987). Patent US, no application 07/040393
28. Remote lighted wick extinguisher, by: Thigpen, H.D. (Oct. 29, 1997). Patent US, no
application: 08/960,372
29. Sposób gaszenia płomieni falami akustycznymi (The method of extinguishing flames with
acoustic waves, in Polish), by: Wilczkowski, S., Szecówka, L., Radomiak, H., Moszoro, K.
(Dec. 18, 1995). Patent PL, PAT.177792, no application: P.311909
30. Urządzenie do gaszenia płomieni falami akustycznymi (System for suppressing flames by
acoustic waves, in Polish), by: Wilczkowski, S., Szecówka, L., Radomiak, H., Moszoro, K.
(Dec. 18, 1995). Patent PL, PAT.177478, no application: P.311910
31. Urządzenie do gaszenia płomieni falami akustycznymi (System for suppressing flames by
acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Feb. 13, 2018). Small patent PL,
RWU.070441, no application: W.127019
32. Urządzenie do gaszenia płomieni falami akustycznymi (Device for flames suppression with
acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Nov. 30, 2018). Patent PL,
PAT.233026, no application: P.428002
acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Nov. 30, 2018). Patent PL,
PAT.233025, no application: P.427999
acoustic waves, in Polish), by: Wilk-Jakubowski, J. (Jan. 18, 2019). Patent PL, PAT.234266,
no application: P.428615
35. Harabin, R., Wilk-Jakubowski, G., Ivanov, S.: Robotics in crisis management: a review of
the literature. Technol. Soc. 2021 (under review). University of Social Sciences in Łódź &
Varna University of Management, Łódź-Varna (2020)
36. Marek, M.: Wykorzystanie ekonometrycznego modelu klasycznej funkcji regresji liniowej
do przeprowadzenia analiz ilościowych w naukach ekonomicznych. Rola informatyki w
naukach ekonomicznych i społecznych. Innowacje i implikacje interdyscyplinarne.
Wydawnictwo Wyższej Szkoły Handlowej im. B. Markowskiego w Kielcach, Kielce (2013)
37. Wilk-Jakubowski, J.: Predicting satellite system signal degradation due to rain in the
frequency range of 1 to 25 GHz. Pol. J. Environ. Stud. 27(1), 391–396 (2018)
38. Wilk-Jakubowski, J.: Total signal degradation of Polish 26–50 GHz satellite systems due to
Rain. Pol. J. Environ. Stud. 27(1), 397–402 (2018)
39. Intelligent Computing & Optimization, Conference proceedings 2018 (ICO 2018). https://
www.springer.com/gp/book/9783030009786
40. Intelligent Computing and Optimization, Proceedings of the 2nd International Conference on
Intelligent Computing and Optimization 2019 (ICO 2019). https://www.springer.com/gp/
book/9783030335847
41. Intelligent Computing and Optimization Proceedings of the 3rd International Conference on
Intelligent Computing and Optimization 2020 (ICO 2020). https://link.springer.com/book/
10.1007/978-3-030-68154-8
42. Wilk-Jakubowski, J.: Overview of broadband information systems architecture for crisis
management. Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska 10
(2), 20–23 (2020)
43. Wilk, J.Ł: Techniki cyfrowego rozpoznawania krawędzi obrazów. Wydawnictwo Sto-
warzyszenia Współpracy Polska-Wschód. Oddział Świętokrzyski, Kielce (2009)
44. Tempest, W.: Infrasound and Low Frequency Vibration. Academic Press Inc., London
(1976)
45. Noga, A.: Przegląd obecnego stanu wiedzy z zakresu techniki infradźwiękowej i możliwości
wykorzystania fal akustycznych do oczyszczania urządzeń energetycznych. Zeszyty
Energetyczne 1, 225–234 (2014)
46. Yi, E.-Y., Bae, M.-J: A study on the directionality of sound fire extinguisher in electric fire.
Convergence Research Letter of Multimedia Services Convergent with Art, Humanities and
Sociology 3, 1449–1452 (2017)
Adaptive Harmony Search for Cost
Optimization of Reinforced Concrete Columns
Aylin Ece Kayabekir1(&), Sinan Melih Nigdeli2, and Gebrail Bekdaş2

1
Department of Civil Engineering, Istanbul Gelisim University,
34310 Avcılar, Istanbul, Turkey
aekayabekir@gelisim.edu.tr
2
Department of Civil Engineering, Istanbul University-Cerrahpaşa,
{melihnig,bekdas}@iuc.edu.tr
Abstract. The performance of metaheuristic algorithms used in engineering

optimization is evaluated via the robustness of the method. To make a better
algorithm for specific problems, investigations have to be continued by applying
methods to new problems. In the present study, adaptive harmony search
(AHS) that automatically updates the algorithm parameters is presented for
optimum cost design of reinforced concrete (RC) columns. The results were
compared with the approach that constant parameters within the optimization
process. Results proved that AHS is not greatly affected by the choice of initial
algorithm parameters.
Keywords: Optimization Reinforced concrete Metaheuristics Harmony

search
1 Introduction
Metaheuristics are the algorithms that are used to solve challenging problems by using a
process that applies the tasks in order. The heuristic starts with the first people since it
employs the human mind [1], and algorithms started to be generated via formulating
processes as metaphors. One of the oldest algorithms called tabu search used the human
mind [2], and then evaluation [3–5], several processes [6–10], swarm-intelligence [11–
13] and nature [14–18] had been used in inspiration. These algorithms are greatly helpful
for problems that can not be solved or can be only solve via numerical iterations [19–21].
Civil engineering and especially structural engineering have a lot of these kinds of
problems and metaheuristics is a popular tool in optimization and analysis problems [22].
The great number of studies using metaheuristic in structural engineering are optimiza-
tion problems including truss structures [23–26], structural control tuning [27–35] and
optimum design of reinforced concrete (RC) members [36–56], but metaheuristics can be
also used in structural analysis of complex and non-linear systems [57–64].
In the present study, adaptive harmony search is presented to optimize dimension
and reinforcement variables of RC columns to minimize the total material cost. The
adaptive version of the algorithm was compared with the classical one to show the
advantage of the adaptive one in the choice of algorithm parameters. The reason for the

https://doi.org/10.1007/978-3-030-93247-3_4
36 A. E. Kayabekir et al.
usage of an adaptive version of the algorithm is to avoid the parameter setting process,
and the results showed that adaptive algorithms are not dramatically affected by the
parameters while it is vice-versa for the classical form of the algorithm.
2 Methodology
In this section, the optimization process based on the Adaptive Harmony Search
(AHS) metaheuristic algorithm for cost minimization of reinforced concrete
(RC) columns is introduced. The loading and reinforcement conditions of the RC
column under the axial (Nz) and bending moment (My) can be seen in Fig. 1.
y
Nz
My x
Fig. 1. RC column under uniaxial bending moment

Adaptive Harmony Search for Cost Optimization 37
In the analysis of RC members, several assumptions are done. The assumptions

done in this optimization study is as follows:
1- Firstly, plane sections normal to the axis are also assumed as planes after the
deformation via bending. Due to that, the strain at any point in the cross-section is
proportional to the distance from the neutral axis.
2- For the ductile design of columns, the axial force must be limited with the proposed
values in the design regulations.
3- The relationship between the stress-strain distribution for the concrete is assumed
as a parabolic one, and then, it is idealized as an equivalent rectangular stress
block.
4- The tensile strength of the concrete is very low, and it is ignored in calculations in
design.
5- The buckling and second-order effects are not considered for the column. The
investigation is not suitable for slender columns.
6- The column is subjected to a bending moment only around a single axis.
The Harmony Search (HS) algorithm, inspired by a musician’s process of searching
for the most appropriate combination of notes (harmony), is a metaheuristic algorithm
developed by Geem et al. [8]. The optimization process via HS can be summarized
with five steps.
Step 1: Design constants (Table 1), range of design variables (Table 2) of the
optimization problem, algorithm-specific parameters such as harmony memory size
(HMS) in other words population number (pn), initial harmony memory considering
rate (HMCRin), initial pitch adjusting rate (PARin) and stopping criterion of the opti-
mization (the numbers of maximum iterations: MI) are defined.
Step 2: An initial HM matrix is generated as in Eq. (1). This matrix consists of
totally pn candidate solution sets that include values of each design variable (Xi,
i = 1–N). The problem has three design variables as seen in Table 2.
2 3
X 1;1 X 1;2 X 1;pn
6 X 2;1 X 2;2 X 2;pn 7
6 7
6 .. .. .. 7
HM ¼ 6 . . . 7 ð1Þ
6 7
4 X N1;1 X N1;2 X N1;pn 5
X N;1 X N;2 X N;pn
Initial values of design variables are randomly generated between maximum Xi(max)
and minimum Xi(min) limits defined in Step 1 (Eq. 2).
X i ¼ X iðminÞ þ randðX iðmaxÞ X iðminÞ Þ ð2Þ
In Eq. (2), rand is a random number between 0 and 1.

Step 3: The objective function of the problem is calculated for each solution set and
design constants (Table 3 as calculated according to ACI 318: Building Code
Requirement for reinforced concrete [65]) are checked. For the objective function value
of the solutions that do not provide design constants, a penalization value is assigned.
Then all values of objectives are stored in a vector.
The objective function (OF) as given in Eq. (3) determines the total material cost
and to ensure that this function is minimum during the optimization process, an
appropriate solution set is searched.
OF ¼ C c V c þ C s W s ð3Þ
In Eq. (3), Cc and Cs are cost of per unit volume of concrete and cost of per unit
weight of reinforcing steel respectively. Volume of the concrete and weight of the
reinforcing steel are symbolized with Vc and Ws respectively.
Step 4: New solution set is generated. According to the HS algorithm rules, the
new value of each design variable (Xi(new)) can be generated via two equations given in
Eqs. (4) and (5).
The first equation (Eqs. (4)) is similar to generating the initial solutions.
X iðnewÞ ¼ X iðminÞ þ randðX iðmaxÞ X iðminÞ Þ ð4Þ
The second equation (Eqs. (5)) randomly generates new values within the range
obtained by multiplying Pitch Adjusting Rate (PAR) and differences between ultimate
limits of design variable (Xi(max) – Xi(min)). This generation is done for a randomly
selected solution set.
XiðnewÞ ¼ Xi;k þ randPARðX iðmaxÞ X iðminÞ Þ ð5Þ
In Eq. (5), Xi,k express the value of a design variable in the selected solution set.
Which of these two equations to use is decided according to Harmony Memory
Considering Rate (HMCR).
In AHS, HMCR and PAR values are updated according to Eqs. (6) and (7) con-
cerning the current iteration (IN).
IN
PAR ¼ PARin ð1 Þ ð6Þ
MI
IN
HMCR ¼ HMCRin ð1 Þ ð7Þ
MI
Step 5: Comparisons are done between the new solution set and the existing
solutions. In terms of the objective function, in the case of the new solution is better
than the worst solution in the existing solution matrix, the worst solution is replaced by
the new solution. Otherwise, no modification is done in the solution matrix.
The last two steps are continued until the stopping criterion is satisfied.
3 Numerical Example
In Tables 1 and 2, the numerical example data for design constants and variables are
presented. The HS parameters HMS, HMCRin and PARin are defined as 30, 0.5 and
0.25, respectively for classical HS and case 1 of AHS. As the second case of AHS,
HMCRin and PARin are both taken as 1 to validate that AHS is not very dependent on
the specific parameters. In classic HS, the algorithm is only a random search method
when both parameters are taken as 1. Also, this case is presented as the random search
(RS).
Table 1. Design constants of the optimization

Definition Symbol Value
Flexural moment Mu 300 kNm
Axial force Nu 2000 kN
Length of column L 3m
Strain corresponding ultimate stress of concrete ec 0.003
Max. aggregate diameter Dmax 16 mm
Yield strength of steel fy 420 MPa
0
Compressive strength of concrete fc 30 MPa
Elasticity modulus of steel Es 200000 MPa
Specific gravity of steel cs 7.86 t/m3
Cost of the concrete per m3 Cc 40 $
Cost of the steel per ton Cs 700 $
Table 2. Design variables of the optimization

Definition Symbol Value
Breadth of column b 250 mm < b < 400 mm
Height of column h 300 mm < h < 500 mm
Reinforcement ratio q 0.01 < q < 0.06
In the process, a maximum iteration number is used for the stopping criterion and
this iteration number is taken as 10000 for the numerical example. 10 cycles of the
optimization process were done for evaluation of the results. The optimum results are
presented in Table 4 with minimum (OFmin), average (OFave), maximum (OFmax) and
standard deviation (std) of these 10 cycles.
Table 3. Design constraints of the optimization

Definition Constraint
0
Maximum axial force (Nmax) Nd Nmax = 0.5 f c bh
Minimum steel area, Asmin As Asmin = 0.01bh
Maximum steel area, Asmax As Asmax = 0.06bh (seismic design)
Flexural strength capacity, Md Md Mu
Axial force capacity, Nd Nd Nu
Table 4. Optimum results

RS HS AHS (Case 1) AHS (Case 2)
b (mm) 399.862 400 400 400
h (mm) 499.7068 500 500 500
q 0.01267 0.012409 0.012409 0.012409
OFmin ($) 65.7658 64.9650 64.96500 64.9650
OFave ($) 67.2248 64.9663 64.96504 64.9653
OFmax ($) 68.4724 64.9683 64.96515 64.9662
std 7.93E-01 9.76E-04 4.58E-05 3.58E-04
4 Conclusion
In the present study, the cost optimization of RC columns was investigated for AHS
and the results were compared with classical HS. Two cases of HS parameters were
investigated.
For the first case of parameters that are tested as the best combination for the HS
algorithm, HS can find the optimum results with a slight difference with AHS that uses
these parameters as the initial value. In that situation, the std value of 10 runs of the
optimization process is very different for HS and AHS. In that case, AHS is robust
according to HS and this situation can be easily followed via close average and
maximum values of OF to the best one.
As known, these algorithm parameters are very effective in the optimum design
including convergence and sensibility. For that reason, a second case is done. In this
case, HMCR and PAR values are taken as 1. That means the only generation of all
candidate solutions will be generated via solution range and the HS approach can be
mentioned as a random search. In AHS, the initial values taken as 1 will be updated
according to iteration. This case is the worst choice for these parameters and RS results
are not effective to find the best optimum solution reported in Case 1. The std value of
RS is also very big compared to the other. Whereas AHS is also effective with these
parameters and the best solution can be found within 10 cycles and an acceptable std
value is obtained and it is smaller than Case 1 results of HS.
As conclusion, parameter setting is very important for metaheuristic algorithms.
This situation needs the validation of results for different parameters, but this situation
is not urgently needed for adaptive algorithms that use an active change of these
parameters during the optimization process.
Acknowledgments. This study was funded by Scientific Research Projects Coordination Unit
of Istanbul University-Cerrahpasa. Project number: FYO-2019–32735.
References
1. Sörensen, K., Sevaux, M., Glover, F.: A history of metaheuristics. Handbook of heuristics,
pp. 1–18 (2018)
2. Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput.
Oper. Res. 13(5), 533–549 (1986)
3. Goldberg, D.E., Samtani, M.P.: Engineering optimization via genetic algorithm. In:
Proceedings of Ninth Conference on Electronic Computation. ASCE, New York, NY,
pp. 471–482 (1986)
4. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press,
Ann Arbor, Michigan (1975)
5. Storn, R., Price, K.: Differential evolution–a simple and efficient heuristic for global
optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)
6. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science
220(4598), 671–680 (1983)
7. Erol, O.K., Eksin, I.: A new optimization method: big bang–big crunch. Adv. Eng. Softw. 37
(2), 106–111 (2006)
8. Geem, Z.W., Kim, J.H., Loganathan, G.V.: A new heuristic optimization algorithm:
harmony search. Simulation 76, 60–68 (2001)
9. Rao, R.V., Savsani, V.J., Vakharia, D.P.: Teaching–learning-based optimization: a novel
method for constrained mechanical design optimization problems. Comput. Aided Des. 43
(3), 303–315 (2011)
10. Rao, R.: Jaya: a simple and new optimization algorithm for solving constrained and
unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016)
11. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE
International Conference on Neural Networks No. IV, November 27-December 1, pp. 1942–
1948. Perth Australia (1995)
12. Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: optimization by a colony of
cooperating agents. IEEE Trans. Syst. Man Cybern. B 26, 29–41 (1996)
13. Karaboga, D., Basturk, B.: A powerful and efficient algorithm for numerical function
optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39(3), 459–471 (2007)
14. Yang, X.S., Deb, S.: Engineering optimisation by cuckoo search. Int. J. Math. Model.
Numer. Optim. 1(4), 330–343 (2010)
15. Yang, X.S.: Firefly algorithm, stochastic test functions and design optimisation. Int. J. of
Bio-Inspir. Com. 2(2), 78–84 (2010)
16. Yang, X. S.: A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative
Strategies for Optimization (NICSO 2010), pp. 65–74. Springer, Berlin, Heidelberg (2010)
17. Yang, X.S.: Flower pollination algorithm for global optimization. In: International
Conference on Unconventional Computing and Natural Computation, pp. 240–249.
Springer, Berlin, Heidelberg (2012)
18. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61
(2014)
19. Vasant, P., Zelinka, I., Weber, G.W. (eds.): Intelligent Computing & Optimization. In:
Proceedings of the 2nd International Conference on Intelligent Computing and Optimization
2018 (ICO 2018). Springer (2018)
20. Vasant, P., Zelinka, I., Weber, G.W. (eds.): Intelligent Computing & Optimization. In:
Proceedings of the 2nd International Conference on Intelligent Computing and Optimization
2019 (ICO 2019). Springer (2019)
21. Vasant, P., Zelinka, I., Weber, G.W. (eds.): Intelligent Computing & Optimization.
Proceedings of the 3rd International Conference on Intelligent Computing and Optimization
2020 (ICO 2020). Springer (2020)
22. Toklu, Y.C., Bekdas, G., Nigdeli, S.M.: Metaheuristics for Structural Design and Analysis.
John Wiley & Sons (2021)
23. Talatahari, S., Goodarzimehr, V.: A discrete hybrid teaching-learning-based optimization
algorithm for optimization of space trusses, J. Struct. Eng. Geo-Techniques. 9(1) (2019)
24. Salar, M., Dizangian, B.: Sizing optimization of truss structures using ant lion optimizer. In:
2nd International Conference on Civil Engineering, Architecture and Urban Management in
Iran. August-2019 Tehran University (2019)
25. Bekdaş, G., Yucel, M., Nigdeli, S.M.: Evaluation of metaheuristic-based methods for
optimization of truss structures via various algorithms and lèvy flight modification.
Buildings 11(2), 49 (2021)
26. Leung, A.Y.T., Zhang, H.: Particle swarm optimization of tuned mass dampers. Eng. Struct.
31(3), 715–728 (2009)
27. Bekdaş, G., Nigdeli, S.M.: Estimating optimum parameters of tuned mass dampers using
harmony search. Eng. Struct. 33, 2716–2723 (2011)
28. Pourzeynali, S., Salimi, S., Kalesar, H.E.: Robust multi-objective optimization design of tmd
control device to reduce tall building responses against earthquake excitations using genetic
algorithms. Sci. Iran. 20(2), 207–221 (2013)
29. Arfiadi, Y.: Reducing response of structures by using optimum composite tuned mass
dampers. Procedia Eng. 161, 67–72 (2016)
30. Farshidianfar, A., Soheili, S.: ABC optimization of tmd parameters for tall buildings with
soil structure interaction. Interaction and Multiscale Mechanics 6(4), 339–356 (2013)
31. Bekdaş, G., Nigdeli, S.M., Yang, X.S.: A novel bat algorithm based optimum tuning of mass
dampers for improving the seismic safety of structures. Eng. Struct. 159, 89–98 (2018)
32. Yucel, M., Bekdaş, G., Nigdeli, S.M., Sevgen, S.: Estimation of optimum tuned mass
damper parameters via machine learning. J. Build. Eng. 26, 100847 (2019)
33. Ulusoy, S., Bekdas, G., Nigdeli, S.M.: Active structural control via metaheuristic algorithms
considering soil-structure interaction. Struct. Eng. Mech. 75(2), 175–191 (2020)
34. Ulusoy, S., Nigdeli, S.M., Bekdaş, G.: Novel metaheuristic-based tuning of PID controllers
for seismic structures and verification of robustness. J. Build. Eng. 33, 101647 (2021)
35. Ulusoy, S., Bekdaş, G., Nigdeli, S.M., Kim, S., Geem, Z.W.: Performance of optimum tuned
PID controller with different feedback strategies on active-controlled structures. Appl. Sci.
11(4), 1682 (2021)
36. Coello, C.C., Hernandez, F.S., Farrera, F.A.: Optimal design of reinforced concrete beams
using genetic algorithms. Expert Syst. Appl. 12, 101–108 (1997)
37. Govindaraj, V., Ramasamy, J.V.: Optimum detailed design of reinforced concrete
continuous beams using genetic algorithms. Comput. Struct. 84, 34–48 (2005). https://doi.
org/10.1016/j.compstruc.2005.09.001
38. Fedghouche, F., Tiliouine, B.: Minimum cost design of reinforced concrete T-beams at
ultimate loads using Eurocode2. Eng. Struct. 42, 43–50 (2012). https://doi.org/10.1016/j.
engstruct.2012.04.008
39. Leps, M., Sejnoha, M.: New approach to optimization of reinforced concrete beams.
Comput. Struct. 81, 1957–1966 (2003). https://doi.org/10.1016/S0045-7949(03)00215-3
40. Akin, A., Saka, M.P.: Optimum detailed design of reinforced concrete continuous beams
using the harmony search algorithm. In: The tenth international conference on computational
structures technology, pp. 131 (2010)
41. Bekdaş, G., Nigdeli, S.M.: Cost optimization of t-shaped reinforced concrete beams under
flexural effect according to ACI 318. In: 3rd European Conference of Civil Engineering
(2012)
42. Bekdaş, G., Nigdeli, S.M.: Optimization of T-shaped RC flexural members for different
compressive strengths of concrete. Int. J. Mech. 7, 109–119 (2013)
43. Bekdaş, G., Nigdeli, S.M., Yang, X.: Metaheuristic Optimization for the Design of
Reinforced Concrete Beams under Flexure Moments
44. Bekdaş, G., Nigdeli, S.M.: Optimum design of reinforced concrete beams using teaching-
learning-based optimization. In: 3rd International Conference on Optimization Techniques in
Engineering (OTENG’15), p. 7–9 (2015)
45. Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M.: Optimum design of t-beams using jaya
algorithm. In: 3rd International Conference on Engineering Technology and Innovation
(ICETI), Belgrad, Serbia (2019)
46. Koumousis, V.K., Arsenis, S.J.: Genetic algorithms in optimal detailed design of reinforced
concrete members. Comput-Aided Civ. Inf. 13, 43–52 (1998)
47. Yepes, V., Martí, J.V., García-Segura, T.: Cost and CO2 emission optimization of precast–
prestressed concrete U-beam road bridges by a hybrid glowworm swarm algorithm. Autom.
Constr. 49, 123–134 (2015)
48. Rafiq, M.Y., Southcombe, C.: Genetic algorithms in optimal design and detailing of
reinforced concrete biaxial columns supported by a declarative approach for capacity
checking. Comput. Struct. 69, 443–457 (1998)
49. Gil-Martin, L.M., Hernandez-Montes, E., Aschheim, M.: Optimal reinforcement of RC
columns for biaxial bending. Mater. Struct. 43, 1245–1256 (2010)
50. Camp, C.V., Pezeshk, S.,Hansson, H.H.: Flexural Design of reinforced concrete frames
using a genetic algorithm. J. Struct. Eng.-ASCE. 129, 105–11 (2003)
51. Govindaraj, V., Ramasamy, J.V.: Optimum detailed design of reinforced concrete frames
using genetic algorithms. Eng. Optimiz. 39(4), 471–494 (2007)
52. Ceranic, B., Fryer, C., Baines, R.W.: An application of simulated annealing to the optimum
design of reinforced concrete retaining structures. Comput. Struct. 79, 1569–1581 (2001)
53. Camp, C.V., Akin, A.: Design of retaining walls using big bang–big crunch optimization.
J Struct. Eng.-ASCE. 138(3), 438–448 (2012)
54. Kaveh, A., Abadi, A.S.M.: Harmony search based algorithms for the optimum cost design of
reinforced concrete cantilever retaining walls. Int. J. Civ. Eng. 9(1), 1–8 (2011)
55. Talatahari, A., heikholeslami, R., Shadfaran, M., Pourbaba,M.: Optimum design of gravity
retaining walls using charged system search algorithm. Math. Probl. Eng. Article ID 301628
(2012)
56. Sahab, M.G., Ashour, A.F., Toropov, V.V.: Cost optimisation of reinforced concrete flat slab
buildings. Eng. Struct. 27, 313–322 (2005). https://doi.org/10.1016/j.engstruct.2004.10.002
57. Toklu, Y.C.: Nonlinear analysis of trusses through energy minimization. Comput. Struct. 82
(20–21), 1581–1589 (2004)
58. Nigdeli, S.M., Bekdaş, G., Toklu, Y.C.: Total potential energy minimization using
metaheuristic algorithms for spatial cable systems with increasing second order effects. In:
12th International Congress on Mechanics (HSTAM2019), pp. 22–25 (2019)
59. Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Toklu, Y.C.: Advanced energy‐based analyses
of trusses employing hybrid metaheuristics. Struct. Des. Tall and Spec. Build. 28(9), e1609
(2019)
60. Toklu, Y.C., et al.: Total potential optimization using metaheuristic algorithms for solving
nonlinear plane strain systems. Appl. Sci. 11(7), 3220 (2021)
61. Toklu, Y.C., Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Yücel, M.: Total potential
optimization using hybrid metaheuristics: a tunnel problem solved via plane stress members.
In: Advances in Structural Engineering—Optimization, pp. 221–236. Springer, Cham (2021)
62. Toklu, Y.C., Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M., Yücel, M.: Analysis of plane-stress
systems via total potential optimization method considering nonlinear behavior. J. Struct.
Eng. 146(11), 04020249 (2020)
optimization using metaheuristics: analysis of cantilever beam via plane-stress members. In:
International Conference on Harmony Search Algorithm, pp. 127–138. Springer, Singapore
(2020)
64. Kayabekir, A.E., Toklu, Y.C., Bekdaş, G., Nigdeli, S.M., Yücel, M., Geem, Z.W.: A novel
hybrid harmony search approach for the analysis of plane stress systems via total potential
optimization. Appl. Sci. 10(7), 2301 (2020)
65. ACI Committee, American Concrete Institute, & International Organization for Standard-
ization: Building code requirements for structural concrete (ACI 318-05) and commentary.
American Concrete Institute (2008)
Efficient Traffic Signs Recognition Based
on CNN Model for Self-Driving Cars
Said Gadri(&) and Nour ElHouda Adouane
Laboratory of Informatics and Its Applications of M’sila LIAM, Department of

Computer Science, Faculty of Mathematics and Informatics, University
Mohamed Boudiaf of M’sila, 28000 M’sila, Algeria
{said.kadri,nourelhouda.adouane}@univ-msila.dz
Abstract. Self-Driving Cars or Autonomous Cars provide many benefits for

humanity, such as reduction of deaths and injuries in road accidents, reduction
of air pollution, increasing the quality of car control. For this purpose, some
cameras or sensors are placed on the car, and an efficient control system must be
set up, this system allows to receive images from different cameras and/or
sensors in real-time especially those representing traffic signs, and process them
to allows high autonomous control and driving of the car. Among the most
promising algorithms used in this field, we find convolutional neural networks
CNN. In the present work, we have proposed a CNN model composed of many
convolutional layers, max-pooling layers, and fully connected layers. As pro-
gramming tools, we have used python, Tensorflow, and Keras which are cur-
rently the most used in the field.
Keywords: Machine learning Deep learning Traffic signs recognition

Convolutional neural networks Autonomous driving Self-driving cars
1 Introduction
One of the applications of Machine Learning ML and Deep Learning DL is in the field
of autonomous driving or self-driving cars. It is a new high technology that might
operate the self-driving of future cars. As a baseline algorithm, CNN (Convolutional
Neural Network) model is used to predict the control command from the video frames.
One interesting task of this control system is to recognize different traffic signs present
on the road to guaranty safe driving [1]. For this purpose, a CNN model is trained on
map pixels from processed images taken from cameras and sensors placed on the car.
This kind of model proved its performance in many other works such as medical
imaging, pattern recognition (text, speech, etc.), computer vision, and other interesting
applications [2]. Several benefits can be achieved using this high technology, notably:
the reduction of deaths and injuries in road accidents, reduction of air pollution,
increasing the quality of car control, etc. in one word the main objective is to achieve
the safety of humans. In this way, an automatic system of detection helps the driver to
recognize the different signs quickly and consequently some risks, especially when this
driver is in a bad mental state or drives his car in a crowded city or any other complex
environment, which can cause the driver to overlook messages sent from the traffic

https://doi.org/10.1007/978-3-030-93247-3_5
46 S. Gadri and N. E. Adouane
signs put on the side of the road. Thus, the sign is to report correct messages as soon as
possible to the driver and then reduce the burden of the driver and increase the safety of
driving and decrease the risk of accidents. The present paper is organized as follows:
Sect. 1 is a short introduction presenting the area of our work and its advantages and
benefits. Section 2 is a detailed overview of the related works in the same area. In the
third section, we described our proposed model. The fourth section presents the
experimental part we have done to validate our proposed model. In Sect. 5, we illus-
trated the obtained results when applying our new model. In Sect. 6, we discussed the
results obtained in the previous section. In the last section, we summarized the realized
work and suggested some perspectives for future researches.
2 Related Work
The idea of autonomous driving started at the end of the 1920s, but the first autono-
mous car appeared in the 1980s. some promising projects have been realized in this
period such as the autonomous car called NAVLAB in 1988 and its control system
ALVINN [3, 4]. Among the most important tasks in the field of self-driving cars or
autonomous vehicles, we find the traffic signs recognition. For this purpose, several
methods based on feature extraction have been developed, including Scale-Invariant
Feature Transformation (SIFT) [5], Histogram of Oriented Gradient (HOG) [6], and
Speed Up Robust Feature (SURF) [7]. The use of ANN in autonomous driving is not
new, Pomerleau used in ALVINN system a fully connected neural network with a
single hidden layer with 29 neurons to predict steering commands for the vehicle. The
rise in machine learning and DL, especially, the famous CNN models helped to
improve significantly the performance of the traffic signs detection and surprising
results have been achieved [8]. In 2004, the company DARPA seeded a project named
DAVE or DARPA autonomous vehicle (Net-scale Technologies 2004) based on the
use of the CNN model. Many years later, some new methods based on CNNs have
been also developed, it is the case of the method called semantic segmentation aware
SSA [9] and the DP-KELM method [10, 11]. More recently, the Nvidia team trained
large CNN mapping images obtained from driving a real car to steering commands
[12]. Today, CNN models have been developed and applied on many other interesting
applications, notably: AlexNet [13], VGG [14], GoogleNet [15], ResNet [16], R-CNN
series [17, 18], Yolo [19], SDD [20], R-FCN [21]. These models are widely used by
researchers in different areas of object recognition and gave an excellent performance
on most of the available datasets. This encourages researchers in the field of traffic
signs recognition and self-driving cars to develop new models more performant and
accurate. For instance, GoogleNet structure which is a multilabel CNN neural network
has been used for road scene recognition [22]. Similarly, ResNet architecture based on
a multilevel CNN network has been used to classify traffic signs [23]. [24] used A
hybrid method that combines CNN and RNN networks to extract deep features based
on: object semantics, global and contextual appearances, then used them for scene
recognition. [25] realized a system based on a feed-forward CNN model to detect
objects. [26] developed an efficient system based on stacked encoder-decoder 2D
CNNs to perform contextual aggregation and to predict a disparity map. [27] proposed
Efficient Traffic Signs Recognition Based on CNN Model 47
a new model called Generative Adversarial Network GANs. They used: source data, a
prediction map, and a ground truth label as input for lane detection. [28] proposed a
new method that combines CNN and RNN in cascade for feature learning and lane
prediction. [29] proposed a contextual deconvolution model by combining two con-
textual modules: the channel and the spatial modules. The model also used global and
local features for semantic segmentation.
3 The Proposed Model
In the present work, we have developed an automatic system that allows us to detect and
classify some given images representing traffic signs panels. For this purpose, we have
proposed a CNN model composed of many convolutional layers, max-pooling layers,
and a fully connected layer. As programming tools, we have used python, Tensorflow,
and Keras which are the most used in the field. Figure 1 presents a detailed diagram of
the proposed CNN model to improve the performance of the classification task.
Conv1 16 filters Conv2 32 filters

INPUT image of Size 3x3 MaxPooling of Size 3x3
28 x 28 x 1 of Size 2x2
Full-connected
(flatten) layer Output
MaxPooling Conv3 64 filters
of Size 2x2 of Size 3x3 0
1
2
.
.
.
.
60
61
Fig. 1. The architecture of the proposed CNN model
4 Experimental Work
4.1 Used Dataset
In our experiments, we have used the Belgium traffic signs dataset which is a collection
of images usually written in French or Dutch because these two languages are the
official and the most spoken languages in Belgium. The collection can be divided into
six (06) categories of traffic signs: warning signs, priority signs, prohibitory signs,
mandatory signs, parking and standing on the road signs, designatory signs. After
downloading Belgium traffic signs files (training and testing files), we take a look at the
folder structure of this data set, we can see that the training, as well as the testing data
folders, contain 62 subfolders, which present 62 types of traffic signs used for classi-
fication. All images have the format (.ppm: Portable PixMap). Thus, the performed task
is to classify a given image into one of 62 classes representing traffic signs panels.
Figure 2 represents an illustration of some traffic sign panels issued of the Belgium
traffic signs dataset, Fig. 3 gives the distribution of these panels by class (type/group).
Fig. 2. Examples of images in Belgium Traffic Signs Dataset
Fig. 3. Distribution of panels by labels in Belgium Traffic Signs Dataset

4.2 Programming Tools
Python: Python is currently one of the most popular languages for scientific appli-
cations. It has a high-level interactive nature and a rich collection of scientific libraries
which lets it a good choice for algorithmic development and exploratory data analysis.
It is increasingly used in academic establishments and also in industry. It contains a
famous module called the scikit-learn tool integrating a large number of ML algorithms
for supervised and unsupervised problems such as decision trees, logistic regression,
naïve bayes, KNN, ANN, etc. This package of algorithms allows to simplify ML to
non-specialists working on a general-purpose.
Tensorflow: TensorFlow is a multipurpose open-source library for numerical com-
putation using data flow graphs. It offers APIs for beginners and experts to develop for
desktop, mobile, web, and cloud. TensorFlow can be used from many programming
languages such as Python, C++, Java, Scala, R, and Runs on a variety of platforms
including Unix, Windows, iOS, Android. We note also that Tensorflow can be run on
single machines (CPU, GPU, TPU) or distributed machines of many 100s of GPU
cards.
Keras: Keras is the official high-level API of TensorFlow which is characterized by
many important characteristics: Minimalist, highly modular neural networks library
written in Python, Capable of running on top of either TensorFlow or Theano, Large
adoption in the industry and research community, Easy production of models, Supports
both convolutional networks and recurrent networks and combinations of the two,
Supports arbitrary connectivity schemes (including multi-input and multi-output
training), Runs seamlessly on CPU and GPU.
4.3 Evaluation
To validate the different ML algorithms, and obtain the best model, we have used the
cross-validation method consisting in splitting our dataset into 10 parts, train on 9 and
test on 1, and repeat for all combinations of train/test splits. For the CNN model, we
have used two parameters which are: loss value and accuracy metric.
1. Accuracy metric: This is a ratio of the number of correctly predicted instances
divided by the total number of instances in the dataset multiplied by 100 to give a
percentage (e.g., 90% accurate).
2. Loss value: used to optimize an ML algorithm or DL model. It must be calculated
on training and validation datasets. Its simple interpretation is based on how well
the ML algorithm or the DL built model is doing in these two datasets. It gives the
sum of errors made for each example in the training or validation set.
5 Illustration of the Obtained Results
To build an efficient predictive model and achieve a higher accuracy rate, we have
performed the following task:
Designing a CNN (Convolutional Neural Network) model composed of many

layers as it was presented in Sect. 3 and Fig. 1. We can also describe our proposed
model as follows:
• The first convolutional layer Conv1 constituted of 16 filters of size (3 3).
• A Max-Pooling layer of size (2 2) allowing the reduction of dimensions (weigh,
high) of images issued of the previous layer after applying the different filters of
Conv1.
• A second convolutional layer Conv2 constituted of 32 filters of size (3 3).
• A Max-Pooling layer of size (2 2) allowing the reduction dimensions (weigh,
high) of images issued of the previous layer after applying the different filters of
Conv2.
• A third convolutional layer Conv3 constituted of 64 filters of size (3 3).
• A flatten Layer.
• A full connected layer FC of size 100 allowing to transform the output of the
previous layer into a mono-dimensional vector.
• An output layer represented by a reduced mono-dimensional vector having as size
the number of traffic signs classes (62).
• For all the previous layers a “Relu” activation function and a “softmax” function are
used to normalize values obtained in each layer.
Table 1. Description of the proposed CNN model

Layer type Output shape Nb. parameters
conv2d_1 (Conv2D) (None, 26, 26, 16) 160
max_pooling2d_1 (MaxPooling2) (None, 13, 13, 16) 0
conv2d_2 (Conv2D) (None, 11, 11, 32) 4640
max_pooling2d_2 (MaxPooling2) (None, 5, 5, 32) 0
conv2d_3 (Conv2D) (None, 3, 3, 64) 18496
flatten_1 (Flatten) (None, 576) 0
dense_1 (Dense) (None, 62) 35774
Total parameters 59,070
Trainable parameters 59,070
Non-trainable parameters 0
To validate our CNN model, we have used two parameters which are: loss value
and accuracy metric. Below pseudocode written in Tensorflow and Keras which
allowed us to build our model (Table 1).
model = Sequential()
model.add(Convolution2D(16,(3,3),activation=
'relu',kernel_initializer='he_uniform',
input_shape=(28,28,1)))
model.add(MaxPooling2D(2, 2))
model.add(Convolution2D(32, (3,3), activation='relu',
kernel_initializer='he_uniform'))
model.add(MaxPooling2D(2, 2))
model.add(Convolution2D(64, (3,3), activation='relu',
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(100, activation='relu',
model.add(Dense(62, activation='softmax'))
Table 2 below summarize the obtained results after applying the proposed CNN
model.
Table 2. Loss value and accuracy value obtained when applying the proposed model
Loss value Accuracy value
Training set 0.0921 97,31%
Test set 0.0169 99,80%
Fig. 4. Training loss Vs Validation loss of the CNN model

Fig. 5. Training accuracy Vs Validation accuracy of the CNN model
6 Discussion
Table 2 presents the obtained results when applying the proposed CNN model on the
training set and the test set. Two performance measures are considered in this case, the
loss value which calculates the sum of errors after training the model, and the accuracy
value which gives the rate of correctness. It seems clear that the loss value is very low
against the accuracy which is very high and depends on the size of the used set. It is the
reason for which the accuracy of the training set is higher than the accuracy of the test
set (in our case they are very closest).
In the same way, Fig. 4 shows the evaluation of training loss and validation loss
over time in terms of the number of epochs. It begins very high for the training and the
test sets and ends very low when increasing the number of epochs.
Similarly, Fig. 5 plots the evolution of training accuracy and validation accuracy in
terms of the number of epochs. Contrary to the loss value, the accuracy starts very low
and ends very high. This property is clearer with the training set because of its large
size.
Finally, we can also underline that the performance of our proposed model (the
classification accuracy) is very high (99.80%) compared to other realized models cited
in the literature section (Sect. 2) which helps to increase the quality of car control by
recognizing all traffic signs placed on the road, and consequently, to guaranty more
safety for humans and vehicles. We can extend this work to other object detection such
as pedestrians, animals, and other complex obstacles.
7 Conclusion and Future Suggestions
In the last years, traffic signs detection is based essentially on ML approach that gives
high performance. Many years later, some important progress in the ML area has been
made especially with the apparition of a new subfield called deep learning. It is mainly
based on the use of many neural networks of simple interconnected units to extract
meaningful patterns from a large amount of data to solve some complex problems such
as medical image classification, fraud detection, character recognition, etc. currently,
we can use larger datasets to learn powerful models, and better techniques to avoid
overfitting and underfitting. Until our days, the obtained results in this area of research
are very surprising in different domains. We talk about very high values of accuracy
which often exceed the threshold of 90%. For example, the accuracy rate on the digits
set is over 97%. In the present paper, we have performed a task of classification on a
traffic signs dataset. For this purpose, we have built a CNN model to perform the same
task of classification. The achieved performance is very surprising. As perspectives of
this promising work, we propose to improve these results by improving the architecture
of the built CNN model by changing some model parameters such as the number of
filters, the number of convolution and max-pooling layers, the size of each filter, the
number of training epochs and the size of data batches. Another suggestion that seems
important, is to combine CNN with recurrent networks ResNets and other types of
ANN. We can also extend our model to detect other objects such as pedestrians,
animals, and other complex obstacles.
References
1. Intelligent Computing and Optimization. In: Conference proceedings ICO 2018, Springer,
Cham, ISBN: 978-3-030-00978-6. https://www.springer.com/gp/book/9783030009786
2. Intelligent Computing and Optimization. In: Proceedings of the 2nd International
Conference on Intelligent Computing and Optimization 2019 (ICO 2019), Springer
International Publishing, ISBN: 978-3-030-33585-4. https://www.springer.com/gp/book/
9783030335847
3. Thorpe, M.H., Hebert, T., Kanade, S.A.: Shafer: vision and navigation for the Carnegie-
Mellon Navlab. IEEE Trans. Pattern Anal. Mach. Intell. 10(3), 362–373 (1988)
4. Pomerleau, D.A.: ALVINN: an Autonomous Land Vehicle in a Neural Network. Technical
Report, Carnegie Mellon University, Computer Science Department (1989)
5. Nassu, B.T., Ukai, M.: Automatic recognition of railway signs using sift features. In:
Intelligent Vehicles Symposium, pp. 348–354 (2010)
6. Creusen, I.M., Wijnhoven, R.G.J., Herbschleb, E., P.H.N.D.: With: color exploitation in
hogbased traffic sign detection. In: IEEE International Conference on Image Processing,
pp. 2669–2672 (2010)
7. Duan, J., Viktor, M.: Real time road edges detection and road signs recognition. In:
International Conference on Control, Automation and Information Sciences, pp. 107–112
(2015)
8. Intelligent Computing and Optimization. In: Proceedings of the 3rd International Conference
on Intelligent Computing and Optimization 2020 (ICO 2020). https://link.springer.com/
book/10.1007/978-3-030-68154-8
9. Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-
aware CNN model. In: IEEE International Conference on Computer Vision, pp. 1134–1142
(2015)
10. Zeng, X., Ouyang, W., Wang, X.: Multi-stage contextual deep learning for pedestrian
detection. In: IEEE International Conference on Computer Vision, pp. 121–128 (2013)
11. Zeng, Y., Xu, X., Shen, D., Fang, Y., Xiao, Z.: Traffic sign recognition using kernel extreme
learning machines with deep perceptual features. IEEE Trans. Intell. Transp. Syst. 18(6),
1647–1653 (2017)
12. Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.
07316 (2016)
13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. In: International Conference on Neural Information Processing Systems,
pp. 1097–1105 (2012)
14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. Comput. Sci. (2014)
15. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern
Recognition, pp. 1–9 (2015)
16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Computer Vision and Pattern Recognition, pp. 770–778 (2016)
17. Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision,
pp. 1440–1448 (2015)
18. Ren, S., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region
proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
19. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time
object detection. In: Computer Vision and Pattern Recognition, pp. 779–788 (2016)
20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single
Shot MultiBox Detector. Springer International Publishing (2016)
21. Dai, J., Li, Y., He, K., Sun, J.: R-fcn: object detection via region-based fully convolutional
networks. In: 30th Conf. Neural Info. Proc. Syst (NIPS 2016), Barcelona, Spain (2016)
22. Chen, L., Zhan, W., Tian, W., He, Y., Zou, Q.: Deep integration: a multi-label architecture
for road scene recognition. IEEE Trans. Image Process. 2019(28), 4883–4898 (2019)
23. Zhang, L., Li, L., Pan, X., Cao, Z., Chen, Q., Yang, H.: Multi-level ensemble network for
scene recognition. Multimed. Tools Appl. 2019(78), 28209–28230 (2019). https://doi.org/
10.1007/s11042-019-07933-2
24. Sun, N., Li, W., Liu, J., Han, G., Wu, C.: Fusing object semantics and deep appearance
features for scene recognition. IEEE Trans. Circuits Syst. Video Technol. 29, 1715–1728
(2019)
25. Parmar, Y., Natarajan, S., Sobha, G.: Deep range – deep-learning-based object detection and
ranging in autonomous driving. IET Intell. Trans. Syst. 2019(13), 1256–1264 (2019)
26. Nguyen, T.P., Jeon, J.W.: Wide context learning network for stereo matching. Signal
Process. Image Commun. 2019(78), 263–273 (2019)
27. Zou, Q., Jiang, H., Dai, Q., Yue, Y., Chen, L., Wang, Q.: Robust lane detection from
continuous driving scenes using deep neural networks. IEEE Trans. Veh. Technol. 2020(69),
41–54 (2020)
28. Ghafoorian, M., Nugteren, C., Baka, N., Booij, O., Hofmann, M.: EL-GAN: embedding loss
driven generative adversarial networks for lane detection. In: Leal-Taixé, L., Roth, S. (eds.)
ECCV 2018. LNCS, vol. 11129, pp. 256–272. Springer, Cham (2019). https://doi.org/10.
1007/978-3-030-11009-3_15
29. Fu, J., et al.: Contextual deconvolution network for semantic segmentation. Pattern Recognit.
101, 107152 (2020)
Optimisation and Prediction of Glucose
Production from Oil Palm Trunk
via Simultaneous Enzymatic Hydrolysis
Chan Mieow Kee1(&), Wang Chan Chin1, Tee Hoe Chun1,

and Nurul Adela Bukhari2
1
Centre for Bioprocess Engineering, Faculty of Engineering and the Built
Environment, SEGi University, Jalan Teknologi, Kota Damansara,
47810 Petaling Jaya, Selangor Darul Ehsan, Malaysia
mkchan@segi.edu.my
2
Energy and Environment Unit, Engineering and Processing Research Division,
Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi,
43000 Kajang, Selangor, Malaysia
Abstract. Malaysia is the second-largest palm oil producer in the world.

Nevertheless, limited research was found on using oil palm trunk (OPT) for
glucose production. The objective of this study is to optimise the glucose pro-
duction from OPT via simultaneous enzymatic process. Response Surface
Methodology (RSM) was adopted to optimise the mass of OPT, stirring speed,
and the hydrolysis time for glucose production. All the three parameters were
significant with p < 0.001. Quadratic regression model well described the
experiment data with predicted R2 = 0.9700 and adjusted R2 = 0.8828. Mean-
while, artificial neuron network (ANN) predicted the data with correlation
R = 0.9612 and mean square error = 0.0021. The highest concentration of
glucose, 30.1 mmol/L, was produced by using 30 g of OPT, 225 rpm for 16 h at
60 °C. The prediction from both RSM and ANN are comparable and highly
accurate.
Keywords: Oil palm trunk Simultaneous enzymatic process Glucose

Starch Optimisation
1 Introduction
Oil palm industry contributed huge lignocellulosic biomass in Malaysia as we are the
second-largest palm oil producer in the world after Indonesia [1] with 19.47 million
tonne of palm oil produced in 2020 [2, 3]. Replantation, milling activities produce large
amount of biomass, includes oil palm fronds, oil palm trunk (OPT), empty fruit bun-
ches, mesocarp fibers and palm kernel shells. According to [4], 100 g of dried OPT
consists of 27 g of starch, which could be converted to glucose which is a high value
product. The latest global market analysis released by United States Department of

https://doi.org/10.1007/978-3-030-93247-3_6
56 C. M. Kee et al.
Agriculture (USDA) indicated that sugar production is forecast up 6 million tons to 186
million. This is responded to the high sugar demand in China and India [5].
There are three commonly used hydrolysis process to produce glucose from starch,
which are acid hydrolysis, sequencing enzymatic hydrolysis and simultaneous enzy-
matic hydrolysis. Conventionally, glucose is produced by using acid hydrolysis which
using the strong acids such as hydrochloric acid and sulphuric acid to break down the
structure of the starch molecules into disaccharide or monosaccharide [6]. This method
is simple, however the formation of undesired byproducts, low yield and high process
temperature are the disadvantages of the acid hydrolysis. Recently, acid hydrolysis is
replaced by enzymatic hydrolysis, which is environmentally friendly, no-inhibitory
byproduct formed due to specificity of enzyme [7].
Starch hydrolysis to produce glucose via sequencing enzymatic hydrolysis is per-
formed by three process which are gelatinization, liquefaction and saccharification
processes [8]. Gelatinization process is to weaken the hydrogen bond between the
starch molecules for easier the downstream process. After that, starch is degraded to
disaccharides by liquefaction process and following the saccharification process to
degrade the disaccharides to monosaccharides which using a-amylase and glucoamy-
lase enzymes. Sequencing enzymatic process is two-stage process which converting the
glucose by liquefaction following by saccharification process. On another hand,
simultaneous enzymatic process is combined liquefaction and saccharification process
by mixing the two enzymes and added into starch solution [9]. Simultaneous enzymatic
hydrolysis is preferable as the process required single vessel for reaction, reduce res-
idence time and capital cost. To date, studies have been done on producing glucose
from cassava starch [10], native rye starch [11] and sago hampas [12] via the enzymatic
process. Nevertheless, as a country with abundant of oil palm biomass residues, limited
research was found on using OPT as the source of starch for glucose production.
Process optimization is important in process improvement to ensures the process
design and cost remain cost competitive [13]. Prediction helps engineers to estimate the
performance of a process without involve the time consuming and high-cost experi-
mental analysis. Mathematic statistical model such as Response Surface Methodology
(RSM) and Artificial Neuron Network (ANN) are adopted by the researchers to
develop prediction model by recognises the pattern of data distribution and perform
prediction without the explicit, rule-based programming.
The objective of this study is to find out the performance of simultaneous enzy-
matic hydrolysis in glucose production, using a-amylase and glucoamylase. The impact
of three factors, namely stirring speed, mass of OPT and hydrolysis time was studied,
and the optimum condition was also identified. The collected data was analysed and
served as the inputs to develop prediction models via ANN and RSM approach.
Optimisation and Prediction of Glucose Production from Oil Palm Trunk 57
2 Methodology
2.1 Material
The oil palm trunk powder was kindly supplied by Malaysia Oil Palm Board. Potassium
iodide, iodine crystals, sodium alginate and calcium chloride were purchased from Merck
Sdn Bhd while chitosan powder was purchased from Thermo Fisher Scientific Company.
a-Amylase and glucoamylase were purchased from Shaanxi Yuantai Biological Tech-
nology and Xi’an Lyphar Biotech Co., Ltd. All the chemicals were used as received.
2.2 Preparation of Immobilized Beads of Enzymes

About 0.2 g of a-amylase and 0.2 g of glucoamylase were immobilized in alginate
beads by mixing the enzymes with 0.2 g of sodium alginate in 10 ml of reverse
osmosis (RO) water. The mixture was added dropwise into 0.2 M of calcium chloride
solution. The beads were allowed to be immersed in the solution for 2 h. The beads
were washed thoroughly by using RO water and immersed into a mixture of chitosan
and glacial acetic acid for 1 h. After the coating process, the beads were washed with
RO water and stored at 4 °C for 24 h for complete solidification [14] before use.
2.3 Extraction of Starch and Enzymatic Hydrolysis

Starch was extracted by heating appropriate amount of OPT in RO water, as shown in
Table 1 at 100 °C for 20 min [4]. The mixture was sieved and cooled to room tem-
perature before dispersing the immobilized enzyme into the starch mixture. The
presence of starch was determined by using iodine test [15]. The hydrolysis experiment
was carried out according to condition in Table 1, at 60 °C. The concentration of
glucose was measured by using One Touch Select Simple Glucometer [16].
2.4 Experimental Design and Statistical Analysis by Response Surface

Methodology (RSM)
The three important variables, namely stirring speed (A) ranged from 150 to 300 rpm,
mass of OPT as substrate (B), ranged from 5 to 20 g and hydrolysis time (C), ranged
from 8 to 24 h; in affecting the enzymatic hydrolysis process were optimized by using
Design Expert 11 (Version: 11.1.2.0). RSM-Central composite design (CCD) was
applied to find out the interaction between the three parameters on the glucose pro-
duction via simultaneously enzymatic hydrolysis of OPT. 20 experiments, which
consisted of six central points were designed with the three parameters (A, B and C)
and the concentration of glucose was adopted as the response.
58 C. M. Kee et al.
Table 1. Experiment design and the respective response in terms of glucose concentration
Number of Factor 1 Factor 2 Factor 3 Response 1
runs A: Stirring B: Mass of C: Hydrolysis Glucose concentration
speed (rpm) OPT (g) time (h) (mmol/L)
1 150 5 8 2.1
2 300 5 8 4.3
3 150 20 8 9.8
4 300 20 8 12.9
5 150 5 24 3.7
6 300 5 24 5.1
7 150 20 24 12.4
8 300 20 24 17.2
9 98.87 12.5 16 5.1
10 351.13 12.5 16 12.5
11 225 0 16 0
12 225 25.11 16 19.8
13 225 12.5 2.55 5.7
14 225 12.5 29.45 10
15 225 12.5 16 13.2
16 225 12.5 16 13.6
17 225 12.5 16 12.8
18 225 12.5 16 13.6
19 225 12.5 16 13.2
20 225 12.5 16 13.2
2.5 Glucose Prediction by RSM and Artificial Neuron Network (ANN)

It is desirable to work out a mathematical model which can predict the glucose con-
centration with inputs considering stirring speed (factor A), mass of OPT (factor B) and
hydrolysis time (factor C) as shown in Table 1. However, by examining the correlation
between the output response and input factors based on the experimental data set from
Table 1, the output response tends to exhibit complex nonlinear relationship with the
three independent input variables which is difficult for researchers to identify appro-
priate trend line equations using conventional spreadsheet tools. Thus, an approach
with DOE (Design of Experiments) by RSM and curve fitting capability by ANN
regression analysis are common for mathematical modelling by researchers [17].
3 Result and Discussion
3.1 Morphology of Immobilised Enzymes

Figure 1 presented the morphology of pure alginate bead and immobilised enzymes in
alginate bead at varied magnifications. It is notable that the surface roughness of
alginate beads increased due to the presence of enzymes. This finding is consistent with
Fig. 1. Cross sectional morphology of (a) pure alginate bead and (b) immobilised enzymes in
alginate bead at (i) 50x, (ii) 1000x and (iii) 10kx.
the SEM images published by Kumar et al. [18], where rough surface was observed on
the xylanase entrapped-alginate bead. This rough surface also indicated that the
enzymes were successfully immobilised on the alginate beads.
3.2 Hydrolysis Performance of Immobilised Enzymes

Result in Table 1 showed that the highest amount of glucose was produced by the
immobilised enzymes, when 25.11 g of OPT was used as the substrate to produce
starch, under the stirring speed at 225 rpm for 16 h. Since the optimised mass of OPT
exceeded the pre-defined range, which was 5–20 g, thus it is reasonable to deduce that
the enzymatic activity was yet to be optimised. Additional experiment was conducted
by increasing the OPT to 40 g. The result showed that as the mass of OPT increased
from 25.11 g to 30 g, the amount of glucose produced was increased from 19.8 to
30.1 mmol/L, as presented in Table 2. However, when the mass of OPT further
increased to 40 g, only 22.8 mmol/L of glucose was produced. It could be due to
substrate inhibition [19].
Table 2. Hydrolysis performance of immobilised enzyme at high OPT condition.

Factor 1 Factor 2 Factor 3 Response 1
A: Stirring speed B: Mass of OPT C: Hydrolysis time Glucose concentration
(rpm) (g) (h) (mmol/L)
225 25.11 16 19.8
225 30 16 30.1
225 40 16 22.8
Result in Table 3 showed that all the three factors, namely stirring speed, mass of
OPT and time were significant, where p < 0.05 was observed [20]. Additionally, A2,
B2 and C2 were also contributed significant impact to the model, as p < 0.0001 was
recorded.
60 C. M. Kee et al.
Table 3. ANOVA surface response analysis result

Source Sum of squares df Mean square F-value p-value
Model 525.48 9 58.39 69.16 <0.0001 (significant)
A-stirring speed 41.98 1 41.98 49.73 <0.0001 (significant)
B-mass of OPT 362.90 1 362.90 429.84 <0.0001 (significant)
C-time 20.01 1 20.01 23.70 0.0007 (significant)
AB 2.31 1 2.31 2.74 0.1290
AC 0.1013 1 0.1013 0.1199 0.7363
BC 2.53 1 2.53 3.00 0.1140
A2 37.34 1 37.34 44.23 <0.0001 (significant)
B2 21.48 1 21.48 25.44 0.0005 (significant)
C2 54.55 1 54.55 64.61 <0.0001 (significant)
Residual 8.44 10 0.8443
Lack of fit 7.99 5 1.60 17.62 0.0034
Pure error 0.4533 5 0.0907
Cor Total 533.92 19
The reusability of enzyme was showed in Fig. 2, in terms of glucose production.

The result showed that as the reuse cycles increased, the hydrolysis performance of
enzyme reduced. This could be due to the combined effect of poor non-covalent bonds
and the leaking of enzyme due to the hydrophilicity of alginate beads, where the pore
size increased after every reuse cycle. Similar finding was reported by Li et al. [21]
where the performance of immobilized Trichoderma cellulase on alginate was reduced
to 60% relative activity when it was reused in the third cycle. Similarly, the relative
activity was *50% in the third cycle of reuse in this study.
Fig. 2. Reusability of enzyme in glucose production

3.3 Prediction of Glucose Production

3.3.1 RSM Modelling
Glucose production by simultaneous enzymatic hydrolysis was predicted by using the
model suggested by RSM and ANN. The model summary tabulated in Table 4 showed
that Quadratic model is the best model to predict the glucose production, with lack of fit
p-value of 0.0034, predicted R2 of 0.97 and Adjusted R2 of 0.8828. Cubic model has
the highest predicted R2 of 0.9969 and Adjusted R2 of 0.9712, but it is aliased.
Table 4. Fit summary by RSM

Model Sequential p-value Lack of fit p-value Predicted R2 Adjusted R2
Linear <0.0001 <0.0001 0.7575 0.7112
2FI 0.8905 <0.0001 0.7151 0.4622
Quadratic <0.0001 0.0034 0.9700 0.8828
Cubic 0.0009 0.4300 0.9969 0.9712
Glucose concentration ¼ 23:42465 þ 0:137211A

þ 0:864899B þ 0:964734C þ 0:000956AB þ 0:000187AC þ 0:009375BC ð1Þ
0:000286A2 0:021703B2 0:030400C2
Equation (1) was suggested by Quadratic model, and the prediction was presented
in Fig. 3.
3.3.2 ANN Modelling

ANN regression analysis with backpropagation algorithm developed by the authors for
training the neural network was performed with data set provided in Table 1. The
trained results are then summarized in Table 5 where 4 models (3 shallow and 1 deep
models) are considered for comparison purpose. The performance metrics obtained
from the calculated results suggest that model 4 has the best performance, however the
overall results between the models indicate not much different for curve fitting of
experimental data. Thus, model 1 is selected for the simplicity to formulate an equation
which derived from neural network matrices for predicting glucose concentration {y}
with respective input components {x1 x2 x3} as shown in Eq. (2).
8 0 9
>
> 2= 1 þ e2x 1 1 > >
< 0 =
2= 1 þ e2x 2 1
fyg ¼ ½ 0:3769 0:2189 0:5648 0:2067 0 ð2Þ
> 2x3
1 >
: 2= 1 þ e
> >
;
1
62 C. M. Kee et al.
Table 5. Models and results by ANN analysis

ANN type No. of hidden No. of neuron Correlation Average
layer (activation function) R MSE
Model Shallow 1 4 (tanh) 0.9612 0.0021
Model~2 Shallow 1 8 (tanh) 0.9611 0.0020
Model 3 Shallow 1 12 (tanh) 0.9593 0.0022
Model 4 Deep 3 8-6-4 (sigmoid) 0.9759 0.0014
Where
8 9
8 09 2 3> x1 >
< x1 = 0:3036 0:8727 0:2574 0:0093 >< > =
x2
x02 ¼ 4 0:0030 0:5973 0:0107 0:1404 5
: 0; > x3 >
x3 0:1946 0:8420 0:5047 0:3624 >: > ;
1
Here, the pre-processing procedure of linear conversion from {min ! max} to

{0 ! 1} for each parameter is preferred for ANN to enhance performance of regres-
sion analysis when tanh and sigmoid activation functions are applied.
3.3.3 Comparison of Fitting Results

The predicted output glucose concentration y values from RSM and trained ANN
methods are plotted against the experimental data for comparison as shown in Fig. 3.
RSM quadratic model fits well the experimental results in satisfying the pre-designed
conditions of input factors and output responses on the 6 central points, whereas ANN
model has obtained good accuracy on predicting outputs by regression analysis with 4-
neutron shallow model. A deviation from experimental data with lower y value found
for data point 12 in ANN model is due to the data point appears as noise to exist in the
lower range cluster instead of higher range cluster after examining the data coordinates.
(a) Curve fitting errors (b) Predicted vs. Actual results
Fig. 3. Fitting results from RSM and ANN methods. (y: glucose concentration)
4 Conclusion
The ANOVA analysis showed that stirring speed, mass of OPT and hydrolysis time
contributed significant impact in glucose production. The highest glucose concentration
of 30.1 mmol/L was recorded, when 30 g of OPT was used as substate at stirring speed
of 225 rpm for 16 h hydrolysis time, under 60 °C. Equations developed from both
RSM, and ANN approached predicted the glucose production well R2 of 0.96 to 0.97.
Acknowledgement. The authors are grateful for the support by SEGi University and Malaysian
Palm Oil Board (MPOB) for providing the OPT sample.
References
1. Shahbandeh, M.: Palm Oil Export Volume Worldwide 2020/21, by Country: Statista Dossier
on the Palm Oil Industry in Malaysia. Statistical Report, Statista (2021)
2. Leslie, C.K.O., et al.: SureSawit™ true-to-type-a high throughput universal single nucleotide
polymorphism panel for DNA fingerprinting, purity testing and original verification in oil
palm. J. Oil Palm Res. 31, 561–571 (2019)
3. Parveez, G.K.A., et al.: Oil palm economic performance in Malaysia and R&D progress in
2020. J. Oil Palm Res. 33, 181–214 (2021)
4. Eom, I.Y., Yu, J.H., Jung, C.D., Hong, K.S.: Efficient ethanol production from dried oil
palm trunk treated by hydrothermolysis and subsequent enzymatic hydrolysis. Biotechnol.
Biofuels 83, 1–11 (2015)
5. USDA: Sugar Production Up Globally in 2021/22, Stable in the United States and Mexico:
Sugar: World Market and Trade. Technical Report, USDA FDS (2021)
6. Azmi, A.S., Malek, M.I.A., Puad, N.I.M.: A review on acid and enzymatic hydrolyses of
sago starch. Int. Food Res. J. 24, 265–273 (2017)
7. Wang, T., Lü, X.: Overcome saccharification barrier: advances in hydrolysis technology. In:
Lü, X. (ed.) Advances in 2nd Generation of Bioethanol Production, pp. 137–159. Woodhead
Publishing, England (2021)
8. Pervez, S., Aman, A., Iqbal, S., Siddiqui, N.N., Qader, S.A.U.: Saccharification and
liquefaction of cassava starch: an alternative source for the production of bioethanol using
amylolytic enzymes by double fermentation process. BMC Biotechnol. 14, 49 (2014)
9. Marulanda, V.A., Gutierrez, C.D.B., Alzate, C.A.C.: Thermochemical, biological, biochem-
ical, and hybrid conversion methods of bio-derived molecules into renewable fuels. In:
Hosseini, M. (ed.) Advanced Bioprocessing for Alternative Fuels, Biobased Chemicals, and
Bioproducts: Technologies and Approaches for Scale-Up and Commercialization, pp. 59–
81. Woodhead Publishing, England (2019)
10. Sumardiono, S., Budiarti, G., Kusmiyati: Conversion of cassava starch to produce glucose
and fructose by enzymatic process using microwave heating. In: The 24th Regional
Symposium on Chemical Engineering, 01024. MATEC Web Conf., France (2017)
11. Strąk-Graczyk, E., Balcerek, M.: Effect of pre-hydrolysis on simultaneous saccharification
and fermentation of native rye starch. Food Bioprocess Technol. 13, 923–936 (2020). https://
doi.org/10.1007/s11947-020-02434-9
12. Husin, H., Ibrahim, M.F., Bahrin, E.K., Abd-Aziz, S.: Simultaneous saccharification and
fermentation of sago hampas into biobutanol by Clostridium acetobutylicum ATCC 824.
Energy Sci. Eng. 7, 66–75 (2019)
64 C. M. Kee et al.
13. Magnússon, A.F., Al, R., Sin, G.: Development and application of simulation-based methods
for engineering optimization under uncertainty. Comput. Aided Chem. Eng. 48, 451–456
(2020)
14. Raghu, S., Pennathur, G.: Enhancing the stability of a carboxylesterase by entrapment in
chitosan coated alginate beads. Turk. J. Biol. 42, 307–318 (2018)
15. Elzagheid, M.I.: Laboratory activities to introduce carbohydrates qualitative analysis to
college students. World J. Chem. Educ. 6, 82–86 (2018)
16. Philis-Tsimikas, A., Chang, A., Miller, L.: Precision, accuracy, and user acceptance of the
one touch select simple blood glucose monitoring system. J. Diabetes Sci. Technol. 5, 1602–
1609 (2011)
17. Betiku, E., Okunsolawo, S.S., Ajala, S.O., Odedele, O.S.: Performance evaluation of
artificial neural network coupled with generic algorithm and response surface methodology
in modeling and optimization of biodiesel production process parameters from Shea tree
(Vitellaria paradoxa) nut butter. Renew. Energy 76, 408–417 (2015)
18. Kumar, S., Haq, I., Yadav, A., Prakash, J., Raj, A.: Immobilization and biochemical
properties of purified xylanase from Bacillus amyloliquefaciens SK-3 and its application in
kraft pulp biobleaching. J. Clin. Microbiol. 2, 26–34 (2016)
19. Aslanzadeh, S., Ishola, M.M., Richards, T., Taherzadeh, M.J.: An overview of existing
individual unit operations. In: Qureshi, N., Hodge, D., Vertes, A. (eds.) Biorefineries:
Integrated Biochemical Processes for Liquid Biofuels, pp. 3–36. Woodhead Publishing,
England (2019)
20. Betiku, E., Akindolani, O.O., Ismaila, A.R.: Enzymatic hydrolysis optimization of sweet
potato (Ipomoea batatas) peel using a statistical approach. Braz. J. Chem. Eng. 30, 467–476
(2013)
21. Li, L.J., Xia, W.J., Ma, G.P., Chen, Y.L., Ma, Y.Y.: A Study on the enzymatic properties
and reuse of cellulase immobilized with carbon nanotubes and sodium alginate. AMB
Express 9, 112 (2019)
Synthetic Data Augmentation of Cycling
Sport Training Datasets
Iztok Fister , Grega Vrbančič , Vili Podgorelec , and Iztok Fister Jr.(B)
University of Maribor, Koroška Ul. 43, 2000 Maribor, Slovenia

iztok.fister1@um.si
Abstract. Planning sport sessions automatically is becoming a very

important aspect of improving an athlete’s fitness. So far, many Artificial
Intelligence methods have been proposed for planning sport training ses-
sions. These methods depend largely on test data, where Machine Learn-
ing models are built, yet evaluated later. However, one of the biggest
concerns of Machine Learning is dealing with data that are not present
in the training dataset, but are unavoidable for predicting the further
improvements. This also represents a bottleneck in the domain of Sport
Training, where algorithms can hardly predict the future training ses-
sions that are compiled with the attributes of features presented in a
training dataset. Usually, this results in an under-trained trainee. In this
paper we look on this problem, and propose a novel method for synthetic
data augmentation applied on the original dataset.
Keywords: Synthetic data augmentation · Cycling sport training ·

TRIMP
1 Introduction
The performance of Machine Learning (ML) approaches, methods and tech-

niques is largely dependent on the characteristics of the target dataset, which
consists of numerous training samples. Such datasets, that contain a higher num-
ber of samples, which are also more diverse, in general, offer more expressive
power to the learning algorithms applied in various ML models, e.g. classifica-
tion, regression. When the datasets are not too diverse or the samples in datasets
are not representing all possible features in the real world, algorithms may suffer
in their ability to deal with their tasks, and, consequently, cannot deliver the
optimal outcome for a particular classification/regression problem. That behav-
ior can also be pointed out as a drawback of ML [14].
The lack of diverse datasets and small number of samples in datasets in
the field of Data Science is, nowadays, commonly addressed using various data
augmentation methods. These methods are able to increase the amount of data
by either adding slightly modified copies of already existing data, or creating
new synthetic data from existing ones. Beside the primary goal, the data aug-
mentation techniques also attempt to increase the generalization abilities of ML
models by reducing the overfitting and expanding the decision boundary of the
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
https://doi.org/10.1007/978-3-030-93247-3_7
66 I. Fister et al.
model [12,21]. There are many techniques for increasing the generalization of
ML models, such as dropout [23], batch normalization [11] or transfer learn-
ing [19,25]. However, the data augmentation techniques address the mentioned
problem from the root, which is the training dataset. While for certain ML
problems, such as image classification, the established practices of data augmen-
tation exist, this is not the case when dealing with time-series datasets or sports
datasets, which are commonly in structured form, with features extracted from
time-series datasets. With this specific of sports datasets in mind, the common
existing approaches to data augmentation would not result in an expected per-
formance improvement of the predictive ML model. Therefore, the need arises
for a domain-specific method for synthetic data augmentation.
Sport is becoming a very interesting venue for ML researchers, where they
are confronted with building ML models based on past training data in order
to plan future sport training sessions for a particular person. Several solutions
exist in the literature [6,20,22]. Artificial Sport Trainer (AST) [5] is an example
of a more complex solution for assisting athletes with automatic planning of
sport training sessions. Besides the planning, AST is also able to cover the other
phases of sport training, i.e. realization, control, and evaluation.
The main weakness of the AST is that it is unable to identify training sessions
with intensities beyond those presented in an archive. This paper introduces a
synthetic data augmentation method for generating uncommon sport training
sessions. This means that the method is appropriate to prescribe training sessions
which ensure that athletes in training will improve their fitness further. The
method consists of more steps, in which features need to be identified first.
Then, the interdependence among these are captured using the new Training
Stress Measure (TSM). This metric serves for recognizing the most intensive
training sessions, serving as a basis for generation of the uncommon training
session using synthetic data augmentation. Finally, the set of uncommon training
sessions enrich the existing archive.
The proposed method was applied on an archive of sport training sessions
generated for an amateur cyclist by the SportyDataGen generator [7]. The results
showed that the method can also be used for enriching the archive of the sport
training sessions in practice.
The contributions of this paper are:
– to propose a method to augment the training dataset of existing sport activ-
ities with synthetic data augmentation,
– to evaluate the method on data suitable for an amateur cyclist.
2 Problem Statement
Two subjects are important for understanding the subjects that follow:
– data augmentation,
– basics of cycling training.
The former highlights the principles of data augmentation, with emphasis on the
synthetic data augmentation, while the latter reveals the basics of cycling sport
training.
Synthetic Data Augmentation of Cycling Sport Training Datasets 67
2.1 Data Augmentation
In general, the term data augmentation covers various strategies, approaches and
techniques which are trying to increase the size of the training dataset artificially,
or to make the utilized dataset more diverse in order to represent the real life
distribution of the training data better.
For the purpose of tackling image classification tasks, the data augmenta-
tion is an already established practice, especially when utilizing deep Convo-
lutional Neural Networks (CNN) [17]. Dating back to 2009, when one of the
first well-known CNN architectures, AlexNet [16], was presented, which was
utilizing techniques for cropping, mirroring, and color augmentation of train-
ing datasets in order to improve the generalization capabilities of the trained
ML model, as well as reducing the problem of overfitting [12]. The approaches
and techniques for augmentation of images could be divided in two categories:
classical image data augmentation also known as basic data augmentation and
deep learning data augmentation [15]. The first group of approaches consists
primarily of techniques which are manipulating the existing training dataset
with geometric transformations and photometric shifting. The geometric trans-
formation techniques include flipping images horizontally or vertically, rotating,
shearing, cropping, and translating the images, while the photometric techniques
cover color space shifting, adding image filters and introducing the noise to the
images [15]. The more advanced approach to image data augmentation is with
utilization of deep learning. Such approaches could be split into three groups:
generative adversarial networks [10], neural style transfer [13], and meta met-
ric learning [8]. The last group includes the techniques which are utilizing the
concept of utilization of a neural network to optimize neural networks. Some
of the representatives of this kind of data augmentation techniques are neural
augmentation [24], auto augmentation [4], and smart augmentation [18].
While the approaches for the image data augmentation are well explored,
and are, in general, a part of a standard procedure, this is not yet the case for
the augmentation for time-series datasets. However, the researches in this field
have been gaining new momentum in recent years. Similar to the image data
augmentation techniques, some of the common approaches for augmentation of
time-series datasets are based on random transformations of the training data,
such as adding noise, scaling or cropping. However, the problem with utilization
of such approaches when dealing with time-series datasets is that there is a
diverse amount of time-series which each have different features, and not every
transformation is therefore suitable for application on every dataset. Based on
those specifics of time-series datasets, alternative approaches were developed,
such as synthesis of time-series. The group of synthesis based approaches includes
a number of different methods and techniques, from pattern mixing, generative
models, and decomposition methods [12].
While the presented approaches are proven to work well on specific target
tasks, the sports data are somewhat specific in terms of data augmentation.
Since, in general, such datasets are commonly derived from the recorded training
sessions of different athletes, which are in fact time-series datasets, one could try
68 I. Fister et al.
to utilize the known techniques for time-series dataset augmentation and extract
the features afterwards. However, such an approach would not be able to provide
us with the data inferred from the training sessions beyond those presented in the
original dataset. In other words, with such an approach, we could not augment
the original dataset in such a manner that we would obtain features for more
uncommon sport training sessions, but could only obtain a slightly more diverse
variation of already present training sessions.
2.2 Basics of Cycling Training
The goal of sport training is to achieve a competitive performance for an athlete

that is typically tested in competitions. The process is directed by some princi-
ples, where the most important are the following two: The principle of progressive
load and the principle of reversibility [9]. According to the first, only a steadily
increasing the amount of training can lead to an increase in an athlete’s fitness,
while, according to the second, the fitness is lost by the athlete’s inactivity, rest,
or recovery. The athlete’s body senses the increased amount of training as a
physical stress. When this stress is not too high, fatigues are experienced by the
athlete after the hard training. These can be overcome by introducing the rest
phase into the process or performing the less intensive workouts. On the other
hand, if the stress is too high, we can talk about overtraining. Consequently,
this demands the recovery phase, in which the athlete usually needs the help of
medical services to recover.
The amount of training typically prescribed by a sport trainer, is determined
by a training load. The training load specifies the volume of training and the
intensity at which it must be realized. Thus, the volume is simply a product of
duration and frequency. With these concepts in mind, the training plan can be
expressed mathematically as a triple:
Training plan = Duration, Intensity, Frequency. (1)
Duration denotes the amount of the training. Typically, the cycling training is
specified by a duration in minutes. Intensity can be determined by various mea-
surements, but mostly the following two are accepted in cycling: Heart Rate
(HR) and Power (PWR). Although the PWR presents accurate intensity mea-
surements [1], this study focuses on the HR due to a lack of experimental data
including it. Based on the so-called Functional Threshold Heart Rate (FTHR),
specific for each individual athlete, the HR intensity zones are defined that help
trainers to simplify the planning process. Let us mention that the FTHR is an
approximation of the maximum HR (max HR) denoting an Anaerobic Threshold
(AnT), where a lactate drop in the blood started to accumulate in the body due
to insufficient oxygen intake [3]. The frequency is a response to the question of
how many times the training is performed during the same period.
An example of the HR intensity zones suitable for an athlete with FTHR =
180 (40 year old athlete) is presented in Table 1, from which it can be seen
that there are seven HR zones with their corresponding names. The column
Table 1. HR zones suitable for a 40-year-old athlete with FTHR = 180.
HR zone Name % FTHR FTHR

Zone-1 Active recovery <60 <108
Zone-2 Aerobic endurance 60–70 108–126
Zone-3 Tempo 70–80 126–144
Zone-4 Lactate threshold 80–90 144–162
Zone-5 VO2 max 90–100 162–180
Zone-6 Anaerobic capacity ≥100 ≥180
“% FTHR” denotes the bounds of the particular HR zone interval expressed as

a percentage of FTHR, and the column “FTHR” their absolute values.
3 Proposed Method
When the AST generates the training plan based on the existing training ses-
sions, it is confronted with the same problem as the other ML methods, i.e., how
to handle objects of qualities beyond those presented in the testing database.
Moreover, in the theory of Sport Training, this limitation even violates the train-
ing principle of progressive overload, stating that the progressively harder train-
ing plan needs to be performed by athletes in training to increase their fitness.
As a result, a data augmentation method is proposed in the paper to overcome
the problem. The method consists of the following steps:
– identification of features in an archive of sport activities,
– identification of interdependence among features,
– identification of the most intensive training sessions,
– generation of uncommon training sessions based on synthetic data,
– data enrichment of the existing archive with the uncommon training sessions.
An archive of sport activities needs to be collected in the first step. Usually, these
activities are obtained from athletes worn mobile devices capable of monitoring
athletes’ performances during the realization of training sessions. Unfortunately,
collecting the sport activities in this way is slightly complicated, because obtain-
ing the data needs the permission of the athlete. On the other hand, the number
of sport activities is limited by the number of realized sport sessions by the
athlete. In line with this, an online generator of endurance sports activity collec-
tions has been proposed by Fister et al. [7] that was also used in our study. The
generator is capable of generating the collection of a specific number of fields
containing features for various sport disciplines and profiles of athletes. Here, we
are focused on endurance cycling, and, thus, manipulate with the features pre-
sented in Table 2. As can be seen from the Table, each sport activity is identified
by its identifier and six features representing training load indicators achieved
by an athlete during the sport training session. Interestingly, all attributes are
either real or integer numbers.
70 I. Fister et al.
Table 2. Features and the corresponding domains of attributes.
Nr. Feature name Abbreviation Attribute’s domain

1 Sport activity identifier ID ID ∈ N
2 Duration of the sport activity Duration Duration ∈ R
3 Distance of the sport activity Distance Distance ∈ R
4 Average HR HR HR ∈ N
5 Burned calories Calories Calories ∈ N
6 Average altitude Alt Alt ∈ R
7 Maximum altitude Alt max Alt max ∈ R
The observed features highlight performances of athletes achieved during

the realization of the training session. The simplest measure for estimation of
physical stress was TRaining IMPulse (TRIMP), proposed by Banister [2]. The
TRIMP is expressed as:
TRIMP = Duration · HR, (2)
and highlights the relationship between the duration and average HR. If we
want to take all relations among the six observed features into account, the new
measure is necessary. Indeed, we propose the so-called Training Stress Measure
(TSM) defined as follows:
Distance · HR · Calories
TSM = K · , (3)
Duration
where K is expressed using the following equation:
Alt
K= . (4)
Alt max
As can be seen from Eq. (3), the TSM is an extension of the already mentioned
measure TRIMP, and reflects the relationships between all features except Calo-
ries. This feature is connected strongly with power meters. In the case of using
the HR trackers (as in our study), this value is only estimated, and therefore
can be omitted.
The motivation behind the generation of the uncommon training sessions is to
take the potential candidate from the set of uncommon training sessions and ride
the same course more intensively (i.e., with a higher average HR). In line with
this, only three load indicators remain important for generation of uncommon
training sessions, i.e., Duration, Distance and HR. Moreover, the first two indi-
cators refer to the speed of riding, in other words: Speed = Duration/Distance,
while the third determines the intensity of the training session. Increasing this
indicator means raising the intensity, and, indirectly, also the speed. How the
speed will be increased we cannot express explicitly, due to the psycho-physical
characteristics of an athlete, but it must be predicted.
In this study, the HR is increased by ΔHR in order to change the potential

candidate to a really uncommon training session. The speed is predicted as
follows: The linear regression between Speed and modified heart rate HR +
ΔHR is calculated. As a result, a regression line Y = a · X + b is calculated
on the potential candidate, and moving the regression line upward for offset
proportional to increasing the average HR serves us as a tool for predicting the
new Speed value. After obtaining this value, the new value of TSM is calculated
according to Eq. (3).
The M uncommon training sessions must be included into a collection of
training session, thus, enriching it with more intensive training sessions.
4 Experimental Results
The goal of our experimental work was to show that uncommon sport activities
can improve the performance of an athlete using synthetic data augmentation, and
thus obey the first principle of the sport training. The experiments followed the
steps recommended by the proposed method for synthetic data augmentation.
At first, a collection of N = 500 sport activities was generated by Sporty-
DataGen [7]. Then, a set of potential uncommon sport activities was identified,
where the M = 10 of the most intensive training sessions were extracted. This
set served as potential candidates from which the uncommon training sessions
were generated by increasing the intensity of the candidate training sessions by
ΔHR = 1. Thus, it is assumed that the small stepwise increasing of the intensity
is enough to improve the fitness of an athlete, measured by TSM, significantly.
Finally, the set of uncommon training sessions was enriched with the existing col-
lection and the new training plan was proposed by the CI algorithm for planning
training sessions from sport activities.
Table 3. Characteristics of a collection of training sessions (FTHR = 180).
HR zone FTHR Effective HR Sessions

Zone-1 <108 72–107 39
Zone-2 108–126 108–125 229
Zone-3 126–144 126–144 193
Zone-4 144–162 144–151 39
Zone-5 162–180 n/a 0
Zone-6 >180 n/a 0
Avg./Total 111.50 500
The characteristics of the collection are illustrated in Table 3, from which it

can be seen that it’s about a 40 year old amateur athlete, with the majority of
sport activities realized in intensity Zone-2 and Zone-3 (i.e., aerobic endurance
and tempo). The 39 sport activities belong to intensity Zone-1, and the same
number of activities to intensity Zone-4. These characteristics show that the
cyclist was prepared primarily for endurance events.
72 I. Fister et al.
A set of the M = 10 of the most intensive training sessions from a collection

of sport activities are presented in Table 5, from which it can be seen that only
two training sessions from the set belong to the most intensity Zone-4. These
sessions are presented in bold in the Table 4.
Table 4. The most intensive training sessions according to the TSM (M = 10).
ID HR zone Duration Distance HR Calories Alt Alt max TSM

427 3 62.43 39.97 136 610 542.36 543.4 5214.03
347 3 62.87 37.94 143 549 842.58 844.2 5168.71
124 3 70.02 42.49 138 423 770.14 772.6 5008.55
183 4 57.52 34.41 144 700 616.91 642.6 4963.02
338 3 70.12 41.01 136 550 308.24 308.8 4763.54
301 3 65.22 38.64 134 614 178.64 182.8 4655.43
462 4 59.2 34.43 148 822 202.65 225.4 4643.59
401 3 66.45 39.72 129 622 354.31 356.6 4596.28
500 3 59.92 36.73 130 561 434.58 456.6 4550.31
459 3 62.8 37.2 128 501 1535.69 1538.4 4541.82
Total 48105.28
The set of the most intensive sport training sessions constituted a set of
potential candidates, from which a set of uncommon training sessions was gen-
erated by increasing the intensity of the candidate training sessions HR for
ΔHR = 1 (Table 5). As can be seen from this Table, the motivation behind
Table 5. The generated set of uncommon training sessions (ΔHR = 1).

ID Duration Distance HR Calories Alt Alt max TSM’ ΔTSM
501 66.51 39.97 137 610 542.36 543.40 4930.18 −283.85
502 63.48 37.94 144 549 842.58 844.20 5154.50 −14.21
503 70.81 42.49 139 423 770.16 772.60 4988.29 −20.26
504 57.62 34.41 145 700 616.91 642.60 4988.54 25.52
505 68.24 41.01 137 550 308.24 308.80 4930.71 167.17
506 64.21 38.64 135 614 178.64 182.80 4764.02 108.59
507 57.83 34.43 149 822 202.65 225.40 4786.10 142.51
508 65.74 39.72 130 622 354.31 356.60 4681.91 85.63
509 60.84 36.73 131 561 434.58 456.60 4515.98 −34.33
510 61.54 37.20 129 501 1535.69 1538.40 4671.21 129.39
Total 48411.43 176.77
increasing the intensity of potential sport activities was to realize the particular
activity in a higher intensity zone. In line with this, the candidate with ID = 347
in Zone-3 moved to a set of the uncommon training session belonging to Zone-4.
In summary, increasing the intensity of candidate sport activities caused
increasing of the total TSM by ΔTSM = 176.77 units.
5 Conclusion
One of the major bottlenecks of the ML methods is the fact that the data
necessary for predicting the further improvements are not presented in the test
datasets. In line with this, data augmentation methods have been developed that
increase the amount of data by creating new synthetic data from existing ones.
In our study, the synthetic data augmentation was applied to ARM, that is
one of the famous ML methods. The ARM method is a part of the AST, capable
of analyzing existing sport training activities. These sport activities serve as a
basis for planning sport training sessions in various sport disciplines. We focused
on synthetic data augmentation in cycling sport, while the results showed its
huge potential for use in practice.
There are several directions for future development of the proposed method,
such as for instance: (1) Using the synthetic data augmentation in other sports
disciplines, like running, triathlon, etc., (2) Widening the number of training
load indicators, (3) Incorporating the power meter data into analysis, and (4)
Inventing the new training stress measure, also able to express calories and power
meter data.
References
1. Allen, H., Coggan, A.R., McGregor, S.: Training and Racing with a Power Meter,
3rd edn. VeloPress, Boulder (2019)
2. Banister, E.W.: Modeling elite athletic performance. Physiol. Test. Elite Athletes
347, 403–422 (1991)
3. Clark, M.A., Lucett, S.C., Sutton, B.G.: NASM Essentials of Personal Fitness
Training, 4th edn. Jones & Bartlett Learning, Burlington (2014)
4. Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning
augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pp. 113–123 (2019)
5. Fister, I., Fister Jr., I., Fister, D.: Computational Intelligence in Sports. ALO, vol.
22. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03490-0
6. Fister, I., Rauter, S., Yang, X.-S., Ljubič, K., Fister Jr., I.: Planning the sports
training sessions with the bat algorithm. Neurocomputing 149, 993–1002 (2015)
7. Fister Jr., I., Vrbancic, G., Brezočnik, L., Podgorelec, V., Fister, I.: SportyData-
Gen: an online generator of endurance sports activity collections. In: CECIIS:
Central European Conference on Information and Intelligent Systems, pp. 171–
178. IEEE (2018)
8. Frans, K., Ho, J., Chen, X., Abbeel, P., Schulman, J.: Meta learning shared hier-
archies. arXiv preprint arXiv:1710.09767 (2017)
74 I. Fister et al.
9. Friel, J.: The Cyclist’s Training Bible: The World’s Most Comprehensive Training
Guide, 5th edn. VeloPress, Boulder (2018)
10. Goodfellow, I.: Generative adversarial nets. In: Advances in Neural Information
Processing Systems, vol. 27 (2014)
11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by
reducing internal covariate shift. In: International Conference on Machine Learning,
pp. 448–456. PMLR (2015)
12. Iwana, B.K., Uchida, S.: An empirical survey of data augmentation for time series
classification with neural networks. PLoS ONE 16(7), e0254841 (2021)
13. Jing, Y., Yang, Y., Feng, Z., Ye, J., Yizhou, Yu., Song, M.: Neural style transfer:
a review. IEEE Trans. Visual. Comput. Graph. 26(11), 3365–3385 (2019)
14. Kauwe, S.K., Graser, J., Murdock, R., Sparks, T.D.: Can machine learning find
extraordinary materials? Comput. Mater. Sci. 174, 109498 (2020)
15. Khalifa, N.E., Loey, M., Mirjalili, S.: A comprehensive survey of recent trends in
deep learning for digital images augmentation. Artif. Intell. Rev., 1–27 (2021)
16. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny
images (2009)
17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
18. Lemley, J., Bazrafkan, S., Corcoran, P.: Smart augmentation learning an optimal
data augmentation strategy. IEEE Access 5, 5858–5869 (2017)
19. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng.
22(10), 1345–1359 (2009)
20. Rauter, S.: New approach for planning the mountain bike training with virtual
coach (2018)
21. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep
learning. J. Big Data 6(1), 1–48 (2019)
22. Silacci, A., Taiar, R., Caon, M.: Towards an AI-based tailored training planning
for road cyclists: a case study. Appl. Sci. 11(1), 313 (2021)
23. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.:
Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn.
Res. 15(1), 1929–1958 (2014)
24. Wang, J., Perez, L., et al.: The effectiveness of data augmentation in image clas-
sification using deep learning. Convolutional Neural Netw. Vis. Recogn. 11, 1–8
(2017)
25. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big
Data 3(1), 1–40 (2016)
Hybrid Pooling Based Convolutional Neural
Network for Multi-class Classification of MR
Brain Tumor Images
Gazi Jannatul Ferdous, Khaleda Akhter Sathi,

and Md. Azad Hossain(&)
Department of Electronics and Telecommunication Engineering, Chittagong

University of Engineering and Technology, Chittagong 4349, Bangladesh
u1708023@student.cuet.ac.bd,
{sathi.ete,azad}@cuet.ac.bd
Abstract. This paper aims to improve the conventional convolutional neural

network (CNN) for classification problem of the MRI brain tumor images by
generalizing the pooling operations that play a central role in spatial dimen-
sionality reduction. A hybrid pooling operation based on max and average
pooling is developed to learn and adopt the complex and variable patterns of the
tumor regions without discarding or distorting the information. The proposed
hybrid pooling operation boost up the invariance properties when used in place
of average or max pooling. In addition, the performance of the proposed hybrid
pooling based CNN is evaluated by performing a ten-fold cross-validation
method. Results indicated that the proposed network structure achieves a mean
accuracy, precision, recall, and F1-score of 99.48%, 99.49%, 99.68%, and
99.42% respectively. The computation time during the training stage is
increased slightly by 4.5% and 5.2% compared to the conventional max and
average pooling based CNN model. However, the classification accuracy is
improved by approximately 0.56% and 0.49% from the single max and average
pooling based model respectively. Thereby, this incremental classification
accuracy could add an effective decision-support to the radiologists or physi-
cians for brain cancer treatment.
Keywords: Brain tumor Convolutional neural network Image

classification Max-pooling Average-pooling Hybrid pooling
1 Introduction
According to the global cancer report 2020 [1], there have been more than 0.25 million
new deaths from brain cancer disease. The prognosis and survival rate of brain cancer can
be improved by early detection of brain tumors with the help of their specific class.
Magnetic resonance imaging (MRI) is the most common tool employed in early detection
of brain tumor that reduce the brain cancer mortality. In a standard MRI process, a
powerful magnet is placed around the patient head to force the protons to align to the
induced magnetic field. At the same time, a radio-frequency wave is passed through the
patient to stimulate the protons and spin out of its original position. When the

https://doi.org/10.1007/978-3-030-93247-3_9
76 G. J. Ferdous et al.
radiofrequency wave is stopped to passing through, the MRI sensors detect the energy
released by the protons as it realigned to the magnetic field. Physicians or radiologists are
able to characterize these brain images produced by the MRI technique. However, the
diagnostic accuracy of such kind of imaging technique is fully dependent on the expe-
rience and technical level of the radiologist that sometimes result in a false diagnosis as
well as a treatment process. For this case, a computer-aided diagnosis (CAD) is con-
sidered as an alternative to conventional MRI techniques to help radiologists for
improving the performance of brain tumor diagnosis. The CAD system mainly uses the
visible attributes of raw images such as texture, color, and shape to diagnose abnor-
malities in the brain images. Despite having accurate diagnostic accuracy, it takes a lot of
time to process the attributes of the brain tumor images. The most recent advancement of
deep learning (DL) approaches has the potential to perform an automatic feature
extraction task of brain tumor images with a very tiny time that makes the CAD system
fully automated. From various DL approaches, convolutional neural network (CNN) is
extensively used for classification tasks on MRI brain tumor image data sets [2–6].
For instance, Sultan et al. [2] developed a custom CNN model comprised of three
convolutional, max-pool, and dropout layers for the purpose of classification of two
publicly available brain tumor datasets. The CNN model utilized a softmax layer for
categorizing the tumors into three classes of meningioma, glioma, and pituitary with an
accuracy of 96.13% and 98.7% for two datasets. In another study, Anaraki et al. [3]
proposed a hybrid model based on CNN and Genetic Algorithms (GA) for brain tumor
classification into three different categories with an accuracy of 94.2%. Also, Das et al.
[4] designed a CNN model with the composition of convolution, and pooling layers to
extract the tumor features adaptively. The extracted features are utilized to fully con-
nected layer to classify T1-weighted contrast-enhanced MRI images into different
classes through the softmax activation layer. The CNN classification system achieved
the highest classification accuracy of 94.39% on the testing dataset. Moreover,
Mzoughi et al. [5] introduced a three-dimensional (3D) CNN for classifying the glioma
brain tumors into two grades as low and high. The 3D-CNN model classification
accuracy of glioma tumor grade is found approximately 96.49%. Furthermore, Paul
et al. [6] developed two kinds of classification networks based on CNN and a fully
connected neural network to classify axial brain tumor images. CNN network out-
performed the fully connected neural network because of its effective feature extraction
capability through two convolutional layers and two max-pooling layers. Then through
the fully connected layer, the CNN model achieves an accuracy of 91.43% for brain
tumor classification.
All the contemporary CNN classifiers are based on either max pooling or average
pooling to reduce the overall network complexity by decreasing the dimension of the
image. In the case of max-pooling, the maximum valued pixels are selected with
discarding the lower valued pixels, whereas in the case of average pooling the filtered
images are produced by averaging the mixed pixels values. Consequently, the infor-
mation is distort or lost in the produced filtered images. To overcome these limitations,
despite choosing a single pooling function, the combination of max and average
pooling can address these problems. Moreover, in place of standard single pooling
operation, the hybrid pooling method can boost the classification performance of the
CNN classifier.
Hybrid Pooling Based Convolutional Neural Network 77
In this paper, a novel hybrid pooling operation is proposed to tackle the pixels
discarding as well as distortion in the feature values. This is achieved by performing
pixel-wise summation of the max pooled and average pooled filtered images. The
significance of the proposed hybrid pooling is examined by implementing it in a simple
CNN model. Whereas the simple CNN model is composed of two components: (a) a
feature extraction network with a hybrid pooling layer, and (b) a classification network.
The feature extraction network is designed to adopt complex features of brain tumors
from MRI images. Then, the generalized hybrid pooling structure is explored to resolve
the difficulties that arise in traditional pooling systems during dimensionality reduction.
Finally, the extracted features are fed into the classification network to distinguish the
tumors into three different classes.
The key contributions of this work include: (1) a simple CNN architecture is
proposed for multi-classification purposes, which is trained on raw MRI brain tumor
images. (2) A hybrid pooling structure is proposed, which performs pixel-to-pixel
summation to reduce feature distortion during dimensionality reduction of feature
maps. (3) A ten-fold cross-validation method is performed to examine the proposed
hybrid pooling based CNN model performance.
The remaining parts of the paper are organized into three sections. Section 2
provides a detail explanation of the proposed methodology. The results and perfor-
mance analysis of experimentation on publicly available brain tumor datasets and
comparisons with the single pooling based models are presented in Sect. 3. In addition,
Sect. 4 provides the concluding remarks.
2 Methodology
The complete process of classification is presented in Fig. 1. It consists of four stages:

1) Image acquisition is performed, in which all MRI images are labeled from raw
datasets as glioma, meningioma, and pituitary. Then, the dimensionality reduction from
512 512 pixels to 32 32 pixels is performed and fed into the CNN model without
any kind of segmentation process. 2) Feature extraction network is modeled with the
composition of hybrid pooling layer by taking advantage of single pooling layers to
obtain undistorted deep features. 3) Classification is carried out using the softmax
function that calculates the probability of occurrence of three classes of tumors in the
MRI images. 4) Finally, the ten-fold cross-validation is performed on whole datasets to
estimate the network performance on new brain tumor images.
Fig. 1. Diagram of the overall process of proposed hybrid pooling based CNN classifier.
2.1 Benchmark Data Set

The brain tumor dataset employed in this work is created by Cheng [7]. The owner
arranged the images into axial, sagittal, and coronal group of 994, 1045, and, 1025
images respectively that are collected from 233 brain tumor patients. Therefore, a total
of 3064 T1-weighted contrast enhanced MRI images are produced with 1426 glioma
images, 708 meningioma images, and 930 pituitary images.
Fig. 2. Illustration of proposed hybrid pooling based CNN architecture comprising of the input,
two Blocks, classification block and output. Block-1 and Block-2 differ only in the pooling layer.
Block-1 contains hybrid pooling layer consist of max and average pooling, whereas Block-2
contains max pooling layer.
2.2 Hybrid Pooling Based CNN Architecture

The architecture of proposed network is comprised of input, two feature extraction
blocks, and a classification block as illustrated in Fig. 2. The first block of feature
extraction ‘Block-1’ mainly perform convolution and hybrid pooling operation,
whereas ‘Block-2’ perform convolution and max-pooling operation. At Block-1 firstly
the convolutional layer produces a set of 64 feature maps of size 30 30 from the
input image. After that, the proposed hybrid pooling operation described in detail in the
next subsection is applied on the convolved images which give hybrid pooled images
with a size of 15 15. In between convolutional and hybrid pooling layers, the batch
normalization (BN) layer is used to regularize and accelerate the model. In addition, a
non-saturated rectified linear unit (ReLU) activation function is employed to diminish
the training time. The function of Block-2 is different from Block-1 only in the pooling
section, where max-pooling layer is used instead of the hybrid pooling layer. It gives a
set of 128 feature maps of size 6 6 that are utilized as the input of the final
classification block. The classification block is comprised of two fully connected
(FC) layers: (a) the first layer represents the flattened output of the last max-pooling
layer and, (b) the second layer represents the final output based on the number of tumor
classes. The softmax activation function is employed to classify the tumor classes by
calculating the probability of any class (p) over three classes (q) as the following
exponential function y (z) [8]:
ez p
yðzÞp ¼ Pq z
ð1Þ
q¼1 e q
The whole network altogether contains a total of 9 layers with the properties and
trainable parameters are summarized in Table 1.
2.2.1 Hybrid Pooling

The major enrollment of the pooling layer is to decrease the dimensionality of the
convolved images that results in reducing the neuronal connections. This layer contains
a set of pooled images obtained by traversing the pooling window over the convolved
images. The general formula for pooling operation [9] can be expressed as

Yjz ¼ f qzj Yjz1 þ bzj ð2Þ
where bzj represents the addition of perturbation at each down-sampling ðÞ operation.
The most commonly used pooling operation i.e., average, f avg ð xÞ and maximum,
f max ð xÞ pooling are performed using Eqs. (3) and (4) where xi;j represents an individual
element of a pooled region, hw.
1 X
h;w
f avg ð xÞ ¼ xi;j ð3Þ
jhwj i;j¼1
f max ð xÞ ¼ maxh;w
i;j¼1 xi;j ð4Þ
Fig. 3. Schematic representation of the hybrid pooling operation applied on a set of z feature
maps.
The major limitations related to the maximum and average pooling operation are
excessive distortion and blurs the effective tumor region of the raw image respectively.
Therefore, an introduction of the hybrid pooling method based on the maximum and
average pooling can overcome the above-mentioned limitations. This provides a
tradeoff between these two pooling methods so that the important parts of the tumor
region are preserved. The operation of the hybrid pooling method as shown in Fig. 3 is
obtained by using the following equation-
1 Xh;w
fhyb ð xÞ ¼ b ½maxh;w
i;j¼1 xi;j þ x
i;j¼1 i;j
ð5Þ
hw
where the b represents the probability of mixing two pooling methods with values
between 0.1 to 1.0. Based on the values of b the output pooled feature maps retain the
tumor characteristics more accurately by providing a tradeoff in between the maximum
and average pooled images. The selection of 0.7 from b 2 [0.1,1] provides an optimum
classification result as shown in Table 2 by retaining the optimal pixels value. How-
ever, this table also presents the non-equal probabilities of mixing two pooling methods
that are mentioned in some recent works [9–11] results in poor classification results
than the proposed hybrid pooling method.
Table 1. Layers properties of hybrid pooling based CNN architecture.

Layers Feature size Specification Parameter
1. Input 32 32 1 0
2. Convolution 30 30 64
Filter size: 3 3 640
Filter Number: 64
3. BN & ReLU 30 30 64 – 256,0
4. Hybrid pooling Max pooling 15 15 64 Filter size: 2 2 0
Stride: [2 2]
Avg pooling 15 15 64 Filter size: 2 2 0
Stride: [2 2]
5. Convolution 13 13 128 Filter size: 3 3 73856
Filter Number: 128
6. BN & ReLU 13 13 128 – 512,0
7. Max Pooling 6 6 128 Filter size: 2 2 0
Stride: [2 2]
8. Flatten 4608 – 0
9. Dense 3 – 13827
Total = 89,091
2.3 Model Validation

In this experiment, the ten-fold cross-validation method is utilized to verify the pro-
posed hybrid pooling based CNN model by randomly divide the data into ten
approximately equal sections. Then selected one portion in sequence each time as the
test set and the rest of the portions as the training set. In every iteration, the network is
trained using data shuffling. Finally, the estimation of the model evaluation matrices is
performed by taking an average of ten results. Moreover, the summary of the ten-fold
cross-validation steps are depicted as follows:
• Dividing the dataset into ten parts containing an equal number of images in each
part.
• For each iteration, one part is selected in sequence as the test set, and the remaining
dataset is used as a training set to train the model.
• After training, averaging the results of ten iterations to obtain the final test results.
Table 2. Effect of probability of mixing, b on the classification performance of hybrid pooling

based CNN model.
b Testing Accuracy (%)
ð1 bÞ:max þ b:avg b:max þ ð1 bÞ:avg b:max þ b:avg
1.0 99.02 99.15 99.15
0.9 99.12 99.25 99.32
0.7 99.19 99.38 99.48
0.5 99.16 98.93 99.15
0.3 98.66 99.38 99.02
0.2 99.35 99.28 99.15
To train the model for the optimal results, the hyperparameters selection are an
important fact. For this reason, a variation in the three hyperparameters i.e., number of
epochs, optimizer, and initial learning rate are performed for finding the optimum
performance. A default mini-batch size of 16 with a categorical cross-entropy loss
function is chosen and the model accuracy is monitored by the factor of 0.3. Then, the
variation of classification performance of the proposed model in terms of training and
testing accuracy with the variation of three hyperparameters is depicted in Fig. 4. In
case of number of epochs, the training and testing accuracy are found superior at epoch
15 as no further improvements are observed after this epoch. On the other hand, the
best classification accuracy is acquired in both training and testing stages by choosing
RMSprop optimizer that mainly optimizes the loss function. The effect of all the initial
learning rates between 0.0001 and 0.1 in the training accuracy is found almost equal. In
contrast, the testing accuracy is achieved highest for the initial learning rate of 0.0001.
Therefore, based on the optimum results of classification accuracy, the RMSprop
optimizer and the minimum learning rate of 0.0001 are selected for training and testing
the model. The final elected hyperparameters for the proposed model are summarized
in Table 3. The network is train and tested on a Google Colaboratory (Colab) envi-
ronment with Graphics Processing Unit (GPU). During the runtime of the code, the
model occupied 2.12 GB GPU RAM and 39.02 GB disk storage in the Colab
environment.
Table 3. Hyperparameter settings of proposed model.

Hyperparameter Value
Loss function Categorical Cross Entropy
Number of epochs 15
Optimizer RMSprop
Batch size 16
Initial learning rate 0.0001
Learning rate reduction factor 0.3
3 Results and Performance Analysis
For the standard evaluation of the proposed hybrid pooling based classifier, the most
extensively used quality index i.e., classification accuracy is calculated for the training
and testing stage by measuring the ratio of the number of correctly classified samples to
the total amount of data.
TP þ TN
Accuracy ¼ 100% ð6Þ
TP þ TN þ FP þ FN
where
• TP (true positive) = Positive predicted classes are correctly recognized as positive.
• TN (true Negative) = Negative predicted classes are correctly recognized as
negative.
• FP (False positive) = Negative predicted classes are incorrectly recognized as
positive.
• FN (False Negative) = Positive predicted classes are incorrectly recognized as
negative.
The designed model average classification accuracy in the training and testing stage
is found 99.99% and 99.48% respectively by averaging the total ten-fold accuracy
results. Moreover, the overall confusion matrix shown in Table 4 is used to determine
more performance indices such as precision, recall, and F1-score those are calculated
using the relations given below equations:
TP
Precision ¼ 100% ð7Þ
TP þ FP
Precision measures the rate of correctly classified positive class samples among all
the positive classes samples.
TP
Recall ¼ 100% ð8Þ
TP þ FN
Recall (or sensitivity) actually determines the rate of correctly captured actual
positive samples from all labeled positive classes.
PrecisionRecall
F-score ¼ 1 þ a2 ð9Þ
a2 Precision þ Recall
F1-score mainly calculates the harmonic average of the precision and recall with
the weighted parameter of a =1. According to the value of TP of each class, the
proposed model is capable of identifying the pituitary tumor class most precisely with
the lower number of wrong identifications. On the other hand, the meningioma tumor
class is identified with a bit greater number of incorrect samples than the other two
classes.
Fig. 4. Diagram of training and testing accuracy of the modeled classifier with the variation of
number of epochs, optimizer and initial learning rate.
Table 5 represents the class-specific performance of the proposed model in terms of

precision, recall, and f1-score. From this table, it is shown that the pituitary tumor class
provides the highest precision value of 99.57%. On the other hand, the glioma tumor
class shows slightly poor precision with good recall and F1-score results of 99.79% and
99.61% respectively. Among these three tumor classes, the meningioma tumor class
provides the lowest recall and f1-score compared to other tumor classes.
The changing of model accuracy and loss at different folds (k = 1 to 10) during the
training and testing stages are depicted in Fig. 5. It is observed that the training
accuracy is almost equal to > 99% for all folds. Whereas, the testing accuracy is
increased drastically from fold, k = 1 to k = 2 with a value of > 92% to > 99%. After
that, the testing accuracy is retained this constant value up to fold, k = 10. In case of
training loss, the loss is decreased gradually from fold, k = 1 to k = 4 with value <
0.00025 to almost zero. After fold, k = 4, the model shows a superior performance in
multi-classification task.
Table 4. Confusion matrices of three tumor classes.

Tumor Type Samples TP TN FP FN
Meningioma 708 698 2353 10 3
Pituitary 930 927 2129 3 5
Glioma 1426 1423 1630 3 8
Table 5. Performance evaluation matrices of proposed model.

Tumor classes Classification report
Precision (%) Recall (%) F1-Score (%)
Meningioma 99.46 97.68 98.08
Pituitary 99.57 99.59 99.57
Glioma 98.44 99.79 99.61
Fig. 5. Accuracy and loss plot in training and testing stage with the variation of number of folds
for the proposed hybrid pooling classifier.
Fig. 6. Training time and testing accuracy plot of single and hybrid pooling based classifier.
The time requirement during the training stage is also investigated for the proposed
classifier using only single or hybrid pooling methods. From Fig. 6 it is shown that the
hybrid pooling based classifier requires an additional 0.19 min training time from max
pooling and 0.22 min from average pooling based classifiers. Despite the requirement
of additional training time, the classification accuracy of the proposed hybrid pooling
model is increased by 0.56% and 0.49% from the max and average pooling based
models respectively.
4 Conclusion
In this paper, a simple brain tumor classification network is developed that utilizes a
hybrid pooling layer to reduce the dimensionality of the feature maps without dis-
carding the pixels in the tumor region. To retain the tumor features without any
distortion, the pixel-wise summation of the max-pooled and average pooled images is
performed. The experimental results show that the proposed hybrid pooling based CNN
classifier outperformed single pooling based models. However, the equal probability, b
= 0.7 of mixing maximum and average pooled features to construct hybrid pooled
features provides superior classification accuracy on the brain tumor dataset.
References
1. Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F.:
Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide
for 36 cancers in 185 countries. CA: A Cancer J. Clin. 71(3), 209–249 (2021). https://doi.
org/10.3322/caac.21660
2. Sultan, H.H., Salem, N.M., Al-Atabany, W.: Multi-classification of brain tumor images
using deep neural network. IEEE Access 7, 69215–69225 (2019)
3. Anaraki, A.K., Ayati, M., Kazemi, F.: Magnetic resonance imaging-based brain tumor
grades classification and grading via convolutional neural networks and genetic algorithms.
Biocybern. Biomed. Eng. 39(1), 63–74 (2019)
4. Das, S., Aranya, O.R.R., Labiba, N N.: Brain tumor classification using convolutional neural
network. In: 2019 1st International Conference on Advances in Science, Engineering and
Robotics Technology (ICASERT), pp. 1–5, IEEE (2019)
5. Mzoughi, H., et al.: Deep multi-scale 3D convolutional neural network (CNN) for MRI
gliomas brain tumor classification. J. Digit. Imaging 33, 903–915 (2020)
6. Paul, J.S., Plassard, A.J., Landman, B.A., Fabbri, D.: Deep learning for brain tumor
classification. In: Medical Imaging 2017: Biomedical Applications in Molecular, Structural,
and Functional Imaging, vol. 10137, p. 1013710, nternational Society for Optics and
Photonics (2017)
7. Cheng, J.: Brain tumor dataset (version 5) (2017). https://doi.org/10.6084/m9.figshare.
1512427.v5
8. Nasrabadi, N.M: Pattern recognition and machine learning. J. Electron. Imag. 16(4), 049901
(2007)
9. Chen, J., Hua, Z., Wang, J., Cheng, S.: A convolutional neural network with dynamic
correlation pooling. In: 2017 13th International Conference on Computational Intelligence
and Security (CIS), pp. 496–499, IEEE (2017)
10. Tong, Z., Tanaka, G.: Hybrid pooling for enhancement of generalization ability in deep
convolutional neural networks. Neurocomputing 333, 76–85 (2019)
11. Tong, Z., Aihara, K., Tanaka, G.: A hybrid pooling method for convolutional neural
networks. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP
2016. LNCS, vol. 9948, pp. 454–461. Springer, Cham (2016). https://doi.org/10.1007/978-
3-319-46672-9_51
Importance of Fuzzy Logic in Traffic
and Transportation Engineering
Aditya Singh(&)
School of Civil Engineering, Lovely Professional University, Phagwara, India

aditya.11602217@lpu.in
Abstract. In this paper, the way traffic problems can be minimized with the
help of fuzzy logic, which is a part of artificial intelligence, is discussed. It also
highlighted the issues with the conventional traffic system practiced in an Indian
state. It talked about the various benefits of using advanced traffic systems
including fuzzy logic along with its numerous advantages as well as disad-
vantages. It also highlighted the present scenario of accidents and fatalities in an
Indian state following the old conventional system of traffic.
Keywords: Flexible Inaccurate Fuzzification Gaussian Defuzzification
1 Introduction
When things are ambiguous or unclear, the word fuzzy is used. In physical world, most
of the times people face situations which are indeterminate to be classified as false or
true, in those cases fuzzy logic is beneficial for people because its flexible reasoning to
those problems. Through this people are able to take into consideration of uncertain as
well as inaccurate problems faced in any circumstances. When Boolean systems are
discussed, then the completely true and completely false conditions are considered as
well as they are represented as 1 and 0 values respectively [1]. However, in the case of
fuzzy systems, there is absence of logic for completely true and completely false
conditions. Nonetheless they provide logic for an intermediate state between com-
pletely true and completely false conditions, which can be partly true as well as partly
false. The algorithm based on fuzzy logic assist in solving different problems after
taking consideration of all the data accessible. Later on, it takes the most appropriate
decision based on the obtained input. The fuzzy logic approach basically impersonates
the ways a person makes a decision after considering all the options available between
complete true and complete false [2].
1.1 What is the Need to Use Fuzzy Logic?

Fuzzy logic is a part of artificial intelligence, which is nowadays a popular technology
in today’s world. In the conventional traffic system, the technology is already outdated
in the present era. The conventional traffic system is very basic form of traffic system
and with increase in road traffic, number of automobiles, road accidents and road
fatalities, the conventional traffic system is unable to cope up with them, and in the
future it will be more difficult for it to handle them. Hence, the better and improved

https://doi.org/10.1007/978-3-030-93247-3_10
88 A. Singh
approach is using fuzzy logic and automation in traffic system of a place to cope up
with the increasing traffic problems.
There are some major features of fuzzy logic discussed below [2]:
• Fuzzy logic provides flexibility as well as smooth application of method corre-
sponding to machine learning.
• They are the most appropriate technique to apply reasons in doubtful as well as
inaccurate situations.
• They assist in impersonating the human logic present in a person’s thought process.
• They permit a person to create nonlinear functions with random difficulties.
• They may have 2 values representing 2 likely sols to exist.
• They see inferences as a procedure of transmitting flexible limitations.
• They should be made with the assistance of specialists that can absolutely guide a
person throughout the process.
With the incorporation of fuzzy logic and modernizing the conventional traffic system,
it is expected to reduce the number of traffic accidents and road fatalities in a place
considerably.
1.2 Objectives
The major objectives of the study is given in the following:
• To understand the traffic conditions in a place.
• To observe the present traffic accidents and road fatalities.
• To modernize the conventional system of traffic in a place.
• To reduce the traffic accidents and road fatalities in a given place.
• To improve the traffic system and make people secure on roads.
2 Motivation and Reason for Choosing Goa as the Study

Area
The author chose Goa as the state of Goa as the study area because high fatality rates in
the state, which was also reported by Gururaj in 2005, putting the state in the top 3 for
maximum rate of fatality on roads per one hundred thousand people [3]. The state has
224 km of NHs, 232 km of SHs and 815 km of DHs in length and an area of over
3700 km2, making it well connected through road network. It is surrounded by Western
Ghats, making the terrain rolling in nature and Arabian Sea. Only parts of the state is
having plain terrain. Being a tourist paradise it is having mostly visited by domestic and
international tourists. Its current population is around 1.5 million people with almost
the same amount of vehicles [4]. The state is having 69% of 2 wheelers and over 20%
cars in the total vehicle population [5]. Having good quality roads and highways,
people often drive rashly. Being a tropical state and having around 300 cm average
annual rainfall, many accidents occur due to skidding of vehicles during rainy seasons
[4]. In many intersections, there is either absence of traffic lights or the conventional
Importance of Fuzzy Logic in Traffic and Transportation Engineering 89
system of traffic is being practiced in Goa. According to author, this can be solved up to
a certain extent with the incorporation of fuzzy systems and automation in the traffic
system of the state.
3 History of Fuzzy Logic
From the third decade of 20th century people started working on the concept of fuzzy
logic and in the year 1965, first time a word fuzzy logic was used by a professor named
Lotfi Zadeh in UC Berkeley in California. Zadeh found that the existing computer logic
was incapable of modifying data related to individual or indistinct thoughts of a human
being. This algorithm is still implemented in numerous areas, including artificial
intelligence and control theory. It was created to permit the computer to find the
peculiarities between the data which is partly true and partly false. Somewhat com-
parable to the reasoning process of a person. This includes some brightness and little
dark and so on [2].
4 Literature Review
Atakan et al. (2021) worked on fuzzy logic to create strategies in order to control the
timing of traffic signals. They found that there proposed model is better and effective
than the conventional traffic signals [6].
Tohan and Wong (2021) worked on fuzzy logic in order to quantify the problem of
congestion in traffic in a given place. They found that there method of solving the
problem of traffic congestion is effective than the existing methods [7].
Kheder and Rukaibi (2021) worked on fuzzy logic in order to improve the safety of
pedestrians along with improving the walkability as well as traffic flow. Their proposed
model was found to be effective and practical [8].
Wu et al., (2020) worked on fuzzy logic to create vibrant system to make decisions
to assist in a smart scheme for the purpose of navigation to be implemented on road
traffic. Their proposed method helped in improving navigation systems [9].
Tomar et al. (2018) worked on fuzzy logic along with the regression of logistics in
order to manage traffic in a given place. Their proposed model was found to be more
efficient than the existing ones [10].
Rout et al. (2020) worked on fuzzy logic and Internet of things to provide route for
emergency automobiles which can assist in smart cities. Their proposed model was
able save time and to provide the most efficient route for such automobiles without
wasting unnecessary time [11].
Jovanovic et al. (2021) worked on fuzzy logic in order to control diverged inter-
change particularly the diamond one which is already crossed its saturation point and
the system of ramp meter. They found that their model was more practical to solve the
existing problem [12].
Alemneh et al. (2020) worked on fuzzy logic to improve the safety of pedestrians
on roads. They claimed that their proposed method is effective in ensuring the
pedestrian’s safety issues and better than the existing system [13].
90 A. Singh
Kheder and Almutairi (2021) studied fuzzy system in order to use Nero-fuzzy
inference systems to reduce the problem of noise pollution and create a model in
Arabian conditions. They stated that their proposed traffic noise model is better than the
other such models and most suitable one for Arabian conditions [14].
Komsiyah and Desvania (2021) used fuzzy inference system to analyse traffic
signals and create a simulation for a three way intersection. They found that their
proposed model performs way better than the existing conventional one [15].
Abbasi et al. (2021) created a vehicle weighting model FWDP, with the assistance of
fuzzy logic. This will further prioritize the data in ad hoc net of an automobile. They stated
that their proposed model is more useful and efficient in solving the existing problem [16].
5 Main Focus of the Paper Along with Issues and Problems
The paper mainly focused on fuzzy logic which is a part of artificial intelligence to be
incorporated in the traffic systems of a place and to modernize it. It highlights the need
of implementing fuzzy systems and automation of conventional traffic systems in the
state of Goa. The major issue observed by the author in the state of Goa was the rash
driving of people on roads and highways mostly because of taking undue advantage of
good quality roads in the state. Since, the state also attracts a big number of tourists of
various types, its necessary to monitor and control their actions. The papers aim is to
improve the system of traffic in the state of Goa and to reduce the number of road
accidents as well as fatalities on roads and highways to a negligible value.
6 Architecture of the System of Fuzzy Logic
The architecture of fuzzy logic consist of four major parts, which are following [2]:
• Rule Base
• Fuzzification
• Inference Engine
• Defuzzification
Rule Base
Rule Base consists of all the instructions and rules available along with if condition as
well as then condition delivered by the specialists in order to administer the system of
decision making, based on language info. The latest advancements in the theory of
fuzzy logic proposes numerous operative techniques to assist in the designing of fuzzy
controllers as well to change them. With these improvements and advances, the fuzzy
rules are reduced considerably.
Fuzzification
They are utilized to transform inputs into fuzzy sets, including crisp numbers. These
inputs like crisp numbers are mainly the same inputs taken by sensors and then further
it was handed over to control system to process them, including pressure, temperature,
and rpm and so on.
Inference Engine
They determine how much the present fuzzy input matches based on every single rule
and then it takes decision on to dismiss the number of rules after taking consideration
of the field of input. Further the dismissed rules are collected together in order to create
control action.
Defuzzification
They are utilized to transform the fuzzy sets attained with the help of inference engine
into crisp values. They are the opposite of fuzzification process. Many techniques of
defuzzification exists and the most appropriate one is utilized with a particular spe-
cialist systems to decrease the rate of errors.
7 Methodology
The input value can be taken and send for fuzzification, the distance, crowd and time of
driving can be sent fuzzy set and then for fuzzification. From fuzzification, it can be
sent to rule base and inference engine, traffic personnel’s discussion and experience of
driver can be sent to rule base and inference engine. Then from rule base and inference
engine, it can be sent for defuzzification, output MF, speed along with brake alerts can
be sent to defuzzification to convert them into crisp values. Finally sent for output crisp
values [17].
Input Fuzzificaon Rule Base and Defuzzificaon

Inference Engine
Output
Fig. 1. Flow Chart of Fuzzy Logic
8 Membership Function in Fuzzy Logic
The graphs that can describes the ways every point in the spaces of input are charted to
the values of membership in the range of 0–1. These input spaces are generally called
as universal set (u) which consist of all the likely concerned components in every
specific applications [1].
There are some major kinds of fuzzifiers mentioned below:
• Singleton
• Gaussian
• Triangular or Trapezoidal
92 A. Singh
9 Fuzzy Control
The term fuzzy control can be described by the following points [1] (Tables 1 and 2):
• This is a method to symbolize the thinking of a person ways incorporating in a
control system.
• They can imitate logical thought process of human beings, meaning the way a
person is able to draw conclusion from the known things.
• They are not created in order to deliver reasons with accuracy but reasoning process
which are satisfactory.
• All the situations where it is uncertain about numerous things can be solved with the
assistance of fuzzy logic.
Table 1. Showing the comparison between efficiency of two major traffic data collection
methods [5].
Efficiency of traffic data collection methods Minimum value (%) Maximum value (%)
Traditional method 70 95
AI method 95 100
Table 2. Showing increase of the population of vehicles over the years [5].
Year Vehicle population in Goa (lakhs)
2014 10.09
2018 14.10
10 Applications of the System of Fuzzy Logic
There are major applications of fuzzy logic that are discussed below [1]:
• They are utilized in the field of aerospace engineering where satellites as well as
spacecraft are required in controlling their altitude.
• In large multinational companies, they use fuzzy logic to individually evaluate their
businesses as well as in their support system responsible for making decisions.
• They are used to control and handle traffic and speed control of vehicles in
motorized system of transportation.
• They are utilized with neural network as it is able to impersonate the way a human
makes decision at a faster rate. This is accomplished by combining the data and by
transforming the data into significant ones by creating partly true and partly false
conditions as part of fuzzy sets.
• They are widely utilized in the current control systems like expert system.
• They are utilized in normal linguistic processing as well as in numerous rigorous
applications involving AI.
• In the chemical industries, they are utilized in the process of chemical distillation,
drying and pH control.
In this section there are several graphs explaining the difference in the traditional
system of collection of traffic data and artificial intelligence system of collection of
traffic data, proving the latter to be superior in the field of traffic and transportation
engineering, implying to use several techniques of artificial intelligence to be used in
order to improve traffic system. It also highlights the increase in the number of vehicles
drastically over the years, making conventional traffic system to be outdated more and
more with increase in time. Hence, there is a need of fuzzy system and other techniques
of AI to be used.
Efficiency of data collecon methods

100 100
95 95
80 70
60
40
20
0
IN (%) TRADITIONAL AI METHOD
METHOD
Minimum Maximum
Fig. 2. The above graph is plotted on the data provided by Goodvision.
In the Fig. 2, it is clear that the traditional system of traffic data collection is not
much efficient and it can be said that the traditional system of traffic will be outdated
more and more in the future. It is also evident the efficiency of data collection with the
assistance of artificial intelligence data collection is significantly high, which implies
that automation and numerous methods of AI including fuzzy logic can be used in the
state of Goa.
94 A. Singh
Vehicle Populaon in Goa
14.1
15 10.09
10
0
In Lakhs
Year 2014 Year 2018
Fig. 3. The above graph is plotted by the data provided by the Goa government traffic
department.
With the help of Fig. 3, it is evident that the number of vehicles in the state of Goa
is increasing over the years greatly, which also implies that this trend will continue in
the future. Hence, fuzzy system and automation of traditional traffic systems in the state
will help in controlling and managing the traffic.
12 Major Advantages of Fuzzy Logic
There are major advantages of system of fuzzy logic highlighted below:

• The above mentioned system can work with all kinds of input including inaccurate,
unclear or disturbed input info.
• Since the algorithm can be defined with small amount of data, so small amount of
memory is needed.
• It is brought with set theory, which is a concept of the subject maths and the
reasoning in not at all complex or difficult.
• Its design is very simple and easy to understand.
• It delivers effective sols to difficult problems in all the areas as it is similar to how a
person reason and takes decision, but at a faster rate.
• It can help in managing and controlling large volumes of traffic.
• It can assist in making new traffic policies in order to reduce accidents.
• It is broadly used for the purpose of practical as well as commercial in different
fields of study.
• Its reasoning is inaccurate but satisfactory.

• If used artificial intelligence, then it assist a humans in controlling products meant
for consumers as well as machines.
13 Major Disadvantages of Fuzzy Logic
There are some major disadvantages of the system of fuzzy logic which are mentioned
below:
• It is ambiguous as different researchers uses dissimilar methods to solve a given
problem. Hence, there is no systematic way to solve a given problem.
• Since it works on both inaccurate and accurate data, so it generally compromises
accurateness.
• Evidence of its features are hard to get in many situations as not all the time humans
are unable to get a scientific explanation.
14 Conclusion
This paper only focuses on the use of fuzzy system and why it is needed to control and
manage traffic system in the state of Goa. There are other techniques of artificial
intelligence which is also necessary to be applied in the conventional traffic system in
the state of Goa, but they are currently out of the scope of this paper. The fuzzy logic is
a part of artificial intelligence, which should be applied in the state of Goa. Figure 1
described the basic working of fuzzy logic and Fig. 2 shows the efficiency of AI
methods in traffic data collection method is much better than the traditional one. As it
was observed in the Fig. 3, that the number of automobiles is increasing rapidly over
the years. It is evident that the trend in the future will continue. This will be a bigger
problem in the future to control traffic in the state of Goa, hence fuzzy system and other
techniques of artificial intelligence are needed to be implemented in the system of
traffic. This will further help in decreasing the road accidents, fatalities up to a bare
minimum and in controlling as well as managing the road traffic. This will also help in
improving the traffic safety and traffic system, along with making people secure on
roads and highways in the state of Goa. There are some major disadvantages of fuzzy
logic which can’t be ignored, but with enough precautions they can be minimized. In
some cases where fuzzy logic is not required, there is a need to avoid using them. For
the time being fuzzy logic can be used in the traffic system of the state of Goa, with
more advances in the technology, it might be possible to implement better techniques in
the system of traffic in the state of Goa and to modify it. The research is still going on in
the field of artificial intelligence and its various techniques. It is a time consuming
process, but with future research it is evident that the system of traffic can be improved
significantly.
96 A. Singh
References
1. GeeksforGeeks. https://www.geeksforgeeks.org/fuzzy-logic-introduction/amp/
2. Guru99. https://www.guru99.com/what-is-fuzzy-logic.html
3. Traffic Collisions in India. https://en.m.wikipedia.org/wiki/Traffic_collisions_in_India
4. Goa. https://en.m.wikipedia.org/wiki/Goa
5. Analysis of M.V. Accidents in Goa. https://www.goatransport.gov.in/roadsafety (2018)
6. Tunc, I., Yesilyurt, A.Y., Soylmez, M.T.: Different fuzzy logic control strategies for traffic
signal timing control with state inputs. IFAC-PapersOnLine 54(2), 265–270 (2021)
7. Tohan, T.D., Wong, Y.D.: Fuzzy logic-based methodology for quantification of traffic
congestion. Physica A 570, 125784 (2021)
8. Kheder, S., Al Rukaibi, F.: Enhancing pedestrian safety, walkability and traffic flow with
fuzzy logic. Sci Total Environ. 701, 134454 (2020)
9. Wu, B., Cheng, T., Yip, T.L., Wang, Y.: Fuzzy logic based dynamic decision-making
system for intelligent navigation strategy within inland traffic separation schemes. Ocean
Eng. 197, 106909 (2020)
10. Tomar, S., Singh, M., Sharma, G., Arya, K.V.: Traffic management using logistic regression
with fuzzy logic. Procedia Comput. Sci. 132, 451–460 (2018)
11. Rout, R.R., Vemireddy, S., Raul, S.K., Somayajulu, D.V.L.N.: Fuzzy logic-based
emergency vehicle routing: an IoT system development for smart city applications. Comput.
Electr. Eng. 88, 106839 (2020)
12. Jovanovic, A., Kukik, K., Stevanovic, A.: A fuzzy logic simulation model for controlling an
oversaturated diverge diamond interchange and ramp metering system. Math. Comput.
Simul. 182, 165–181 (2021)
13. Alemneh, E., Senouchi, S.-M., Messous, M.-A.: An energy-efficient adaptive beaconing rate
management for pedestrian safety: a fuzzy logic-based approach. Pervasive Mobile Comput.
69, 101285 (2020)
14. AlKheder, S., Almutairi, R.: Roadway traffic noise modelling in the hot hyper-arid Arabian
Gulf region using adaptive neuro-fuzzy interference system. Transport. Res. Part D:
Transport Environ. 97, 102917 (2021)
15. Komsiyah, S., Desvania, E.: Traffic lights analysis and simulation using fuzzy inference
system of Mamdani on three-signaled intersections. Procedia Comput. Sci. 179, 268–280
(2021)
16. Abbasi, F., Zarei, M., Rahmani, A.M.: FWDP: a fuzzy logic-based vehicle weighting model
for data prioritization in vehicular ad hoc networks. Veh. Commun. 100413 (2021)
17. Gupta, R., Chaudhari, O.K.: Application of fuzzy logic in prevention of road accidents using
multi criteria decision alert. Curr. J. Appl. Sci. Technol. 39(36), 51–61 (2020)
18. GoodVision. https://www.walterpmoore.com/traffic-studies
A Fuzzy Based Clustering Approach
to Prolong the Network Lifetime in Wireless
Sensor Networks
Enaam A. Al-Hussain(&) and Ghaida A. Al-Suhail
Department of Computer Engineering, University of Basrah, Basrah, Iraq

enaam.mansor@uobasrah.edu.iq
Abstract. The selection of cluster heads (CHs) in wireless sensor networks

(WSNs) is still a crucial issue to reduce the consumed energy in each node and
increase the network lifetime. Therefore, in this paper an energy-efficient
modified LEACH protocol based on the fuzzy logic controller (FLC) is sug-
gested to find the optimal number of CHs. The fuzzy chance is combined with
the probability of CH selection in LEACH to produce a new selection criterion.
The FLC system depends on two inputs of the residual energy of each node and
the node distance from the base station (sink node). Accordingly, the modified
clustering protocol can improve the network lifetime, decrease the consumed
energy, and send more information than the original LEACH protocol. The
proposed scheme is implemented using the Castalia simulator integrated with
OMNET++, and the simulation results indicate that the suggested modified
LEACH protocol achieves better energy consumption and network lifetime than
utilizing the traditional LEACH.
Keywords: Cluster head FIS LEACH Network lifetime Castalia

OMNET++ Wireless sensor networks
1 Introduction
Nowadays, the tremendous advancement of sensor equipment technology contributes

to huge implementation capabilities in many fields, such as underwater monitoring,
health monitoring, smart infrastructure monitoring, multimedia surveillance, Internet of
Things (IoT), and other fields of use. Among these, in a targeted area environment,
sensor devices are often distributed randomly over settings that can dynamically
change. Information from such nodes may be sensed, processed, and sent to adjacent
nodes and base station (BS). However, these sensors have many restricted features,
such as limited memory, low computing, low processing, and most importantly, low
power. As sensor nodes have limited resources, the clustering process mechanism is
favored as an energy-efficient technique in WSNs. When networking is restricted to a
few nodes, the strategy can conserve network energy. It would effectively extend the
network’s lifespan by minimizing the consumed energy using multi-hop transmission
and data aggregation [1–5]. In particular, Low Energy Adaptive Clustering Hierarchy
(LEACH) protocol is one of the most well-known. protocol [6], which depend on

https://doi.org/10.1007/978-3-030-93247-3_11
98 E. A. Al-Hussain and G. A. Al-Suhail
adaptive clustering to utilize energy consumption. It can be considered as a benchmark

of clustering routing protocol in WSNs and MANETs where the SNs in the network
field are separated into clusters. Each cluster has one sensor node referred to as the
leader node (or CH) and it is selected randomly. On the contrary, though energy from
sensor nodes is retained by LEACH, its energy efficiency is still somewhat disad-
vantaged because of random, faster power drainage, particularly where smaller nodes
per cluster are induced by the unequal distribution of nodes in clusters and time limit
due to the use of the TDMA MAC Protocol [6–8]. To avoid the random selection of the
CHs, and to find the optimum number of selected CHs and solve the complexities in the
relation between the network lifetime and the other parameters of the sensor nodes,
many approaches have been developed such as (i) Fuzzy Inference System (FIS) [9]
(ii) Adaptive Neural-Network Fuzzy-System [10] (iii) Metaheuristic Intelligent Algo-
rithms like swarm algorithms ABC and ACO, and flower-pollination [11–13]. There-
fore, a new modified LEACH protocol via the fuzzy logic controller is obtainable in
this paper, which efficiently improve the network lifespan and decrease the number of
dead nodes during its rounds. The modified LEACH protocol aims to select the CHs
based upon the Type1-Fuzzy Inference Method (T1-FIS). The CHs are chosen by
considering two parameters (i) residual energy (REN) and (ii) the node’s distance from
the base station (DBS) based on the threshold significance. The rest of this paper will
be structured as follows. Firstly, the related works are addressed in Sect. 2. In Sect. 3
the Modified LEACH protocol is described in detail. Section 4 displays and discusses
the simulation results. Finally, in Sect. 5, the conclusion has been drawn.
2 Related Works
LEACH protocol is a routing protocol that utilizes a clustering technique to create

random, adaptive, and self-configured clusters. In LEACH, all sensor nodes are
combined into clusters, each of which has a Leader Node (or CH) who handles the
TDMA schedule and send out the aggregated information to the BS. Since only CH
sends the data to the BS, the network’s energy consumption is significantly reduced. In
each round of the LEACH protocol, the CH is elected at random, and the probability of
being CH is proportional to the number of nodes. After several rounds, the chance of a
low-and high-energy sensor node being as CH is the same, which contributes to an
energy imbalance in CHs that exists in the whole structure; and consequently, the
lifetime of the network is reduced [1, 2]. Though, to improve the LEACH protocol, and
due to the complexity problem in the description of the relation between the network
lifetime and the other parameters of the nodes, a FIS is one of the most well-known
intelligent schemes that can be nominated to solve such problem. The reasons beyond
are; it doesn’t need precise system information and it is classified as a powerful tool in
Artificial Intelligent (AI) methods that can build efficient solutions via the combination
of many input data parameters and then provide the desired cost criterion. Many
researches have been devoted to clustering parameters based on the fuzzy rules to
determine and choose network CHs. For instance, Abidi, et al. [1] introduced a
Fuzzy CH selection algorithm based on LEACH protocol using three input parameters:
A Fuzzy Based Clustering Approach to Prolong the Network Lifetime 99
Remaining Energy, Neighbours Alive, Distance from the BS to select CH. in [2], Ayati,
M., et al. designed a three-level clustering method with numerous inputs at each level
of clustering: remaining energy and centrality, transmission quality and distance from
the BS, overall delay, and denial-of-service (DOS) attacks. Al Kashoash et al. [3] also
proposed a FIS for CH selection based on two inputs: Residual Energy and Received
Signal Strength (RSSI) value to increase network lifetime and packet transmission rate.
The proposed algorithm demonstrated a significant increase in network lifetime up to
the LEACH Protocol. The authors in [4–6] propose completely efficient solutions for
balancing the energy depletion of the SNs and extending the life of the WSN. These
methods make use of three fuzzy variables: node residual energy, distance to BS, and
distance to CH. The simulation results demonstrate that, when compared to the
LEACH protocol, the proposed algorithm significantly progresses energy efficiency
and network lifetime of WSNs. Additionally, Lee et al. [7] suggested a clustering
scheme for mobile sensor nodes that is utilize three inputs: residual energy, movement
speed, and pause time. The issue of energy consumption in WSNs has remained a focus
of research in recent years, and as a result, many current studies continue to work in
this direction by developing efficient methods for increasing the network’s efficiency.
Such as the authors in [8–10], who employ the Fuzzy Clustering Algorithm to improve
network reliability and lifespan. Additionally, that use Fuzzy Clustering Algorithm to
enhance the network reliability and increase the Lifetime. Also, many Intelligent
methods are suggested in terms of Adaptive Neural-Network Fuzzy-System, Meta-
heuristic Intelligent Algorithms like swarm algorithms ABC and ACO, and flower-
pollination [11–14]. Additionally, Balaji et al. [15] recommended a multi-hop data
packet exchange, with the data packets eventually being sent to the BS. When packets
are transmitted from the source sensor to the BS via the CH, they are transmitted using
fuzzy logic type1 with three parameters. Which correctly predicts the nodes with a high
degree of confidence and is close to the BS. On the other hand, in [16], the selection of
the fuzzy logic cluster head is based on three inputs: remaining energy, node density,
and distance to the BS (sink). Due to the fact that WSNs suffer from a number of issues
related to energy consumption and network scalability as a result of their complexity
and nonlinear behavior, some recent research has focused on automating the con-
struction and optimization of the rule base table in a FIS. For example, Tran et al. [17]
improve energy efficiency in large-scale sensor networks by using energy-based
multihop clustering in conjunction with the Dijkstra algorithm to determine the shortest
path. Meanwhile, Fanian et al. [18] propose a Fuzzy Multi-hop Cluster based routing
Protocol and an intelligent scheme called the Shuffled Frog Leaping Algorithm for
improving the rule base table in a FIS. Additionally, the authors in [19, 20] suggest that
an energy-efficiency and reliability-based cluster head selection scheme would be an
ideal way to improve the overall accomplishment of WSNs.
3 Modified LEACH Protocol Design
In this section, a selection approach for cluster heads (CHs) using the Fuzzy Inference
Scheme is suggested to improve network lifetime and reduce the energy consumption
of the LEACH protocol in WSNs. The organization of this section is introduced as
follows: (i) Network model assumptions are stated first, (ii) the details about the design
of the fuzzy logic controller are presented, and (iii) finally the operation of the sug-
gested protocol is given.
3.1 Network Model

The criteria required to describe the network model based on the proposed
Fuzzy LEACH protocol are considered as follows:
1. N Sensor Nodes are considered uniformly disseminated on M X M interesting area,
and all the nodes and BS are stationary (non-mobile).
2. All SNs have the capability to sense, aggregate, and forward the data to the BS (i.e.,
acts as a sink node).
3. In the network, the nodes are non-chargeable and are homogeneous in initial energy
terms.
4. The Sink Node (BS) is situated in the central of the network field. It is often
assumed that the communication links to the other nodes are symmetrical. So that
the data rate and energy consumption of any two nodes are symmetrical in terms of
packet transmission.
5. The nodes are operated in power control mode, based on the receiving distance
from the SN.
6. At the Sink node (BS), the selected CH nodes would not be selected again in any
new round of selection.
7. In each round of the Set-Up phase, the cluster heads are still selected randomly but
with extra fuzzy logic criteria to enhance the CHs selection process of the LEACH
protocol.
3.2 Design of Fuzzy Logic Controller (FLC)

There are commonly four key steps in the FLC [14]:
1. Fuzzification: Convert the crisp values of each input variable to fuzzy values in the
form of membership functions. Triangular, Trapezoidal, and Gaussian membership
functions are the most well-known types of MFs, but to avoid discontinuities in the
input domain, the Gaussian membership function is proposed, which is defined as in
Eq. 1:
ðx cÞ2
f ðx; r; cÞ ¼ expð Þ: ð1Þ
2r2
where, c is the mean, and r is the standard deviation.
2. Rule evaluation: Apply step 1 output to the fuzzy rule to evaluate the fuzzy output.
A typical rule of the Mamdani fuzzy model is used due to their widespread
acceptance
Rn: if x1 is X1 and x2 is X2 then Y is y.
3. Aggregation: It is integrating each rule’s outputs into a single fuzzy set. Many
aggregation methods can be used such as (i) max(maximum), (ii) sum (sum of the
rules o/p sets), and (iii) proper (probabilistic or).
4. Defuzzification: transform fuzzy set values to a single crisp number. Many types of
defuzzification methods are found, the weighted average method and centroid
method, or what is sometimes called (center of the area) are the most popular
methods for defuzzification.
Figure 1 demonstrates a fuzzy inference system for selecting CHs. in this model,
the Fuzzy Logic Controller (FLC) is assessed using two parameters: Residual Energy
and node distance to Base Station; and it can produce the output parameter namely the
fuzzy chance. Later, to improve the cluster head selection, a fuzzy chance and LEACH
probability criterion can be combined to produce a new chance in finding these CHs.
Fig. 1. The fuzzy inference system for the CHs selection
In our proposal, we used the Mamdani approach as a FIS because of its simplicity.
The rule of the Fuzzy Logic Controller (FLC) is to measure the probability of CH
selection based on two input descriptors as shown in Table 2: (i) Residual Energy
(REN) and (ii) the distance between each node and the Base Station (DBS). Thus, this
controller is designed with two designated inputs (REN and DBS) and one output
(chance). The first one is the residual energy (REN). The second input is the distance to
the base station (DBS). Consequently, the linguistic variables and the fuzzy logic rule
base are shown in Tables 1 and 2.
Table 1. Inputs/output linguistic Table 2. Fuzzy rules.

variables. Residual energy Distance to BS Chance
Parameter Linguistic variable (REN) (DBS)
Residual energy Low, Medium, Low Close M
(REN) High Low Average L
Distance to BS (DBS) Close, Average, Far Low Far VL
Chance VL, L, M, H, VH Medium Close H
Medium Average M
Medium Far L
High Close VH
High Average H
High Far M
In the FLC system, the Gaussian membership functions are employed rather than
triangular or trapezoidal membership functions to represent linguistic variables and in
order to avoid the discontinuities if the input MFs do not cover each input domain
completely. Figure 2 depicts the membership functions of REN, and the membership
functions of DBS respectively.
Fig. 2. Fuzzy inference system of proposed algorithm (a) MFs of input variable REN, (b) MFs
of input variable DBS, and (c) MFs of output variable chance.
On contrary, the fuzzy inference system (FIS) is simulated using the Xfuzzy tool
which is a development environment that combines many powerful tools to design and
tuning the parameters of a fuzzy system. Also, there is an excellent ability to generate a
C++ code for this system that can be integrated with the Castalia simulator as in Fig. 3.
Which demonstrates the simple architecture for Modified Leach protocol using Xfuzzy
tools and Castalia simulator that integrating with OMNET++. The graphical user
interface of Xfuzzy shows the specifications by means of drop-down structures so that
the complete system or any rule bases can be select as the active specification by few
stages: (i) Description stage: select a preliminary description of the system by using
two tools (xfedit and xfpkg) that assist in the description of fuzzy systems. (ii) Verifi-
cation stage: study the behavior of the fuzzy system under development value of the
various internal variables for the input values of the given range. (iii)Tuning stage:
adjusting the different MFs. (iv) Synthesis stage: generate a system representation that
could be used externally such as xfcpp, that used to develop a C++ description.
Fig. 3. Simple architecture for modified leach protocol using xfuzzy tools and OMNET+
+/castalia simulator.
3.3 Operation of the Modified LEACH Protocol

In this subsection, we use the proposed CHs selection algorithm using T1-FIS
depending on Residual Energy (REN) and the Distance from the BS (DBS) of each
node (DBS). This algorithm performs its operation at rounds based on LEACH pro-
tocol according to specific and defined criteria. Thereby, each round starts with the
following steps of the proposed fuzzy-based algorithm:
4 Simulation Results
Now, this section discusses the performance assessment of the Modified LEACH in
Sect. 3, using Castalia simulator and OMNET++. The proposed algorithm is examined
when the network of our simulations consists of 100 SNs spread uniformly over an area
of 100 100 m2. We consider the location of the base station to be in the location (50,
50). The initial energy of all nodes is expected to be 3 J. The obtained results are
compared with the original LEACH protocol. The environmental network parameters
utilized in the simulation are expressed in Table 3.
Table 3. Simulation parameters.

Parameters Value Parameters Value
Network size 100 100 m2 Initial energy 3J
No. of nodes 100 Simulation time 300 s
No. of clusters 5 Round time 20 s
Location of BS 50 50 m Packet header size 25 Bytes
Node distribution Random Data packet size 2000 Bytes
Energy model Battery Bandwidth 1 Mbps
Figure 4 illustrates First Node Dead (FND), Half Node Dead (HND), and Last
Node Dead (LND). The results demonstrate that Modified LEACH outperforms the
original leach protocol by about 50.94%, 14.1667%, and 13.259% in standings of
FND, HND, and LND respectively.
Meanwhile, Fig. 5, and Fig. 6 present energy consumption of the nodes, and total
number of alive nodes is measured in relation to the number of network communication
rounds to evaluate the proposed protocols. The results show that the Modified LEACH
consumes less energy than and Traditional LEACH in terms of consumed energy, and
the number of alive nodes per round by about 13.69%, and 15.29% at the first 100 s.
Fig. 4. FND, HND, and LND.
Fig. 5. Total energy consumption. Fig. 6. Total no. of alive nodes.

5 Conclusions
Due to the increase and expansion of WSN applications, particularly in the last few
years, it has become important to find an effective solution to WSN challenges. Energy
savings was one of the primary challenges confronting these networks. In this paper, a
CH selection scheme is proposed using Fuzzy-Logic system (T1-FIS). It depends on
Residual Energy and Node Distance from BS in order to maximize the network lifetime
and decrease the energy consumption per sensor node. The modified LEACH Protocol
is developed and simulated in Castalia (v3.2) and OMNET++ (v4.6). The results
showed that the modified protocol can successfully decrease the energy consumption
and increase the network lifespan compared to the original LEACH and other existing
Fuzzy-based LEACH proposals. For future work, the proposed scheme can involve
different parameters in the design of fuzzy inference system like the centrality, SNR,
RSSI, and packet size. Moreover, one of Artificial Intelligent algorithms (AI) such as
GA, PSO, ABC, and FPA is also eligible to optimize the QoS accomplishment of the
LEACH protocol.
References
1. Abidi, W., Ezzedine, T.: Fuzzy cluster head election algorithm based on LEACH protocol
for wireless sensor networks. In: 13th International Wireless Communications and Mobile
Computing Conference (IWCMC), pp. 993–997. IEEE (2017)
2. Ayati, M., Ghayyoumi, M.H., Keshavarz-Mohammadiyan, A.: A fuzzy three-level clustering
method for lifetime improvement of wireless sensor networks. Ann. Telecommun. 73(7–8),
535–546 (2018). https://doi.org/10.1007/s12243-018-0631-x
3. Al-Kashoash, H.A., Rahman, Z.A.S., Alhamdawee, E.: Energy and RSSI based fuzzy
inference system for cluster head selection in wireless sensor networks. In: Proceedings of
the International Conference on Information and Communication Technology, pp. 102–105
(2019)
4. Abbas, S.H., Khanjar, I.M.: Fuzzy logic approach for cluster-head election in wireless sensor
network. Int. J. Eng. Res. Adv. Technol. 5, 14–25 (2019)
5. Mahboub, A., Arioua, M., Barkouk, H., El Assari, Y., El Oualkadi, A.: An energy-efficient
clustering protocol using fuzzy logic and network segmentation for heterogeneous
WSN. Int. J. Electr. Comput. Eng. 9, 4192 (2019)
6. Kwon, O.S., Jung, K.D., Lee, J.Y.: WSN protocol based on LEACH protocol using fuzzy.
Int. J. Appl. Eng. Res. 12, 10013–10018 (2017)
7. Lee, J.S., Teng, C.L.: An enhanced hierarchical clustering approach for mobile sensor
networks using fuzzy inference systems. IEEE Internet Things J. 4, 1095–1103 (2017)
8. Phoemphon, S., So-In, C., Aimtongkham, P., Nguyen, T.G.: An energy-efficient fuzzy-based
scheme for unequal multihop clustering in wireless sensor networks. J. Ambient. Intell.
Humaniz. Comput. 12(1), 873–895 (2020). https://doi.org/10.1007/s12652-020-02090-z
9. Al-Husain, E.A., Al-Suhail, G.A.: E-FLEACH: an improved fuzzy based clustering protocol
for wireless sensor network. Iraqi J. Electri. Electron. Eng. 17, 190–197 (2021)
10. Lata, S., Mehfuz, S., Urooj, S., Alrowais, F.: Fuzzy clustering algorithm for enhancing
reliability and network lifetime of wireless sensor networks. IEEE Access 8, 66013–66024
(2020)
11. Thangaramya, K., Kulothungan, K., Logambigai, R., Selvi, M., Ganapathy, S., Kannan, A.:
Energy aware cluster and neuro-fuzzy based routing algorithm for wireless sensor networks
in IoT. Comput. Netw. 151, 211–223 (2019)
12. Sharma, N., Gupta, V.: Meta-heuristic based optimization of WSNs energy and lifetime-a
survey. In: 2020 10th International Conference on Cloud Computing, Data Science &
Engineering (Confluence), pp. 369–374. IEEE (2020)
13. Yuvaraj, D., Sivaram, M., Mohamed Uvaze Ahamed, A., Nageswari, S.: An efficient lion
optimization based cluster formation and energy management in WSN based IoT. In: Vasant,
P., Zelinka, I., Weber, G.W. (eds.) ICO 2019. AISC, vol. 1072, pp. 591–607. Springer,
Cham (2020). https://doi.org/10.1007/978-3-030-33585-4_58
14. Devika, G., Ramesh, D., Karegowda, A.G.: Swarm intelligence–based energy‐efficient
clustering algorithms for WSN: overview of algorithms, analysis, and applications. Swarm
Int. Optim. Algorithms Appl., 207–261 (2020)
15. Balaji, S., Julie, E.G., Robinson, Y.H.: Development of fuzzy based energy efficient cluster
routing protocol to increase the lifetime of wireless sensor networks. Mob. Netw. Appl. 24,
394–406 (2019)
16. Rajput, A., Kumaravelu, V.B.: FCM clustering and FLS based CH selection to enhance
sustainability of wireless sensor networks for environmental monitoring applications.
J. Ambient. Intell. Humaniz. Comput. 12(1), 1139–1159 (2020). https://doi.org/10.1007/
s12652-020-02159-9
17. Tran, T.N., Van Nguyen, T., Bao, V.N.Q., An, B.: An energy efficiency cluster-based
multihop routing protocol in wireless sensor networks. In: International Conference on
Advanced Technologies for Communications (ATC), pp. 349–353. IEEE (2018)
18. Fanian, F., Rafsanjani, M.K.: A new fuzzy multi-hop clustering protocol with automatic rule
tuning for wireless sensor networks. Appl. Soft Comput. 89, 106115 (2020)
19. Murugaanandam, S., Ganapathy, V.: Reliability-based cluster head selection methodology
using fuzzy logic for performance improvement in WSNs. IEEE Access 7, 87357–87368
(2019)
20. Van, N.T., Huynh, T.T., An, B.: An energy efficient protocol based on fuzzy logic to extend
network lifetime and increase transmission efficiency in wireless sensor networks. J. Intell.
Fuzzy Syst. 35, 5845–5852 (2018)
Visual Expression Analysis from Face Images
Using Morphological Processing
Md. Habibur Rahman1(&) , Israt Jahan1 ,

and Yeasmin Ara Akter2
1
Department of Computer Science and Engineering, East Delta University,
Chattogram, Bangladesh
2
School of Science, Engineering and Technology, East Delta University,
Chattogram, Bangladesh
yeasmin.a@eastdelta.edu.bd
Abstract. Visual Expression Analysis is an active field of computer vision

where any given scenario’s emotion is analyzed as anger, disgust, fear, surprise,
sadness, contempt, happiness and many more. Human facial expression can be
used as an effective tool in this research arena. Detecting emotion requires
identifying whether there is a face or not in the image. Most of the systems were
prepared with grayscale images, but this manuscript proposes using MTCNN, a
face detector that recognizes faces from RGB images . The methodology includes
RGB color images of customized dataset of Flickr. The face is considered as the
region of interest (ROI) from any given image. ROI is further converted into
binary images after evaluating the combinations of morphological operations
(erosion, dilation, opening and closing) that selects the best morphological
technique i.e. subtracting the eroded images form the gray images for retrieving
facial features. After extracting the features, Random Forest, Logistic Regres-
sion, SVM, xGBoost, GBM and CNN classifiers have been implemented to get
the best classifier. Consequently, based on the performance analysis, CNN is the
best model with 99.71% train accuracy and 98.01% test accuracy to classify four
facial expressions: ‘anger’, ‘happiness’, ‘sadness’ and ‘surprise’.
Keywords: Computer vision Morphological operations MTCNN Image

processing CNN
1 Introduction
The human brain is a complicated system that learns things by remembering patterns of
a given data. It is possible to develop a technical approach that can learn patterns and
participate in decision making like humans. The input data can be of various type, for
example, image data, textual data etc. When a system is developed to understand and
participate in decision making based on video or image data is called computer vision
[1]. Visual expression analysis is an application of computer vision that extracts
information from the image to determine the emotion since emotion is an essential tool
for analyzing a person’s feelings. In this regard, human facial expression can be used as
a pivotal point to detect emotion. The changes in facial patterns during each expression

https://doi.org/10.1007/978-3-030-93247-3_12
Visual Expression Analysis from Face Images 109
give us message about a person’s mood. Eyes, lips, and nose hold most of the data,
which helps to recognize the human sentimental condition. Hence, this paper focuses
on some morphological operations to extract information from the organs mentioned
above to analyze facial expression.
2 Related Works
Expressions can be easily determined from speech and face movement. However,
researchers are attracted to work on differentiating expression from images because of the
availability of image and video. Byungsung LEE et al. introduced a real-time method to
incorporate facial expression from video images to generate a well-being life care system
to give emotional care service [2]. They used PCA and template matching algorithms for
face detection. To detect face candidate, HT skin colour model, mean filtering, mor-
phological operations were applied. S. L. Happy et al. offered a supervised real-time
classification algorithm that operates on grayscale frontal face image [3]. They employed
Haar classifier for face detection, LBPH for feature extraction and PCA for classification.
The proposed system fails to detect emotions from rotated and occluded image.
A hybrid feature extraction and facial expression method is proposed by Maliha
Asad et al. [4] where features are extracted by applying PCA, fisher LDA and HOG
individually, and the recognition was done by SVC classifier. Extended ck+ dataset
was used to train 7 emotions where only 5 emotions were tested due to poor detection
rate. Jayalekshmi J and Tessy Mathew trained their proposed system on the JAFFE
dataset [5]. Their system performed Zernike moment, Local Binary Pattern (LBP) and
Discrete Cosine Transform (DCT) for feature family.
Advait Apte et al. applied morphological operations, mean and standard deviation
on Cohn-Kanade AU-coded dataset for pre-processing to extract features. They got
92% accuracy by applying scaled conjugate gradient backpropagation of neural net-
work [6]. Allen Joseph and P. Geetha proposed a facial expression system based on
facial geometry and trained it with KDEF dataset [7]. They used discrete wavelet
transform and fuzzy combination to enhance images, modified mouthmap and eyemap
algorithm to extract mouth region and neural networks to classify facial expression.
Viola-Jones algorithm is used as a face detector in [3–6] and [7].
Guan-Chun Luh et al. recently utilized Yolo, YoloV2 and YoloV3 deep neural
network to study facial expression-based emotion recognition on JAFFE, RaFD and
CK+ dataset [8]. G. G. Lakshmi Priya and L. B. Krithika introduced a GFE-HCA
approach for facial expression recognition where the MMI dataset is used with five
emotions [9]. They also detected face using the Viola-Jones algorithm but extracted
only edge-based and shape-based features and, after that, sent to a self-organized map
based NN classifier for emotion classification.
Based on the state-of-the-works, it is clear that most of the Researches are
dependent on grayscale images only. Viola-Jones algorithm is used as a face detector
and lacks to detect rotated images that are more than 120°. However, to detect face
from RGB images, we discussed a different approach where we applied MTCNN,
Haar-Cascade, morphological operations, and several machine learning algorithms and
a CNN classifier for emotion classification.
110 Md. H. Rahman et al.
3 Dataset
In today’s era, images are not laborious to obtain. In consequence, image datasets are
also available. This research work intends to interpret facial expression by detecting
human face from input images. Most of the existing image dataset contains cropped
facial images. As a result, a dataset is adopted by selecting necessary images from the
album “The face we make” available on Flickr [10]. Our dataset consists of a total of
804 RGB images where 160 images are for “Anger”, 216 images for “Happiness”, 204
images for “Sadness”, and 224 images for “Surprise”. Fig. 1 shows a sample of our
collected data.
Fig. 1. Samples of the dataset
4 Methodology
Our proposed system consists of two phases. The first phase includes Haarcascade and
MTCNN algorithms to detect face and the second phase performs multiple classifi-
cations to recognize expressions. By combining these two phases, the overall system
design is illustrated in Fig. 2:
Fig. 2. Overview of working procedure of proposed system

4.1 Face Detection

In this research, Haarcascade and Multi-task Cascaded Convolutional Neural Network
(MTCNN) algorithms are used for detecting human face from the images. Haarcascade
is a face detection algorithm proposed by Viola and Jones where they trained 4960
images of human face [11]. This algorithm requires frontal human faces and works best
on grayscale images. To make the dataset compatible with detecting human face, the
images are needed to be converted into grayscale images. The detection rate of haar-
cascade algorithm is 95.27% based on this dataset. Another approach for face detection
is MTCNN, a deep learning approach described by Kaipeng Zhang, et al. in the 2016
[12]. Unlike haarcascade algorithm, MTCNN can distinguish a face from RGB images.
According to this dataset, the detection rate of MTCNN is almost 100%. In this paper,
MTCNN is proposed as the face detector for our system since MTCNN has a higher
detection rate than the Viola-Jones algorithm. The facial expression analysis depends
on the face detection result based on the localized faces which are used in second phase
of the whole work for further classification. The output image of the Viola-Jones
algorithm and MTCNN are shown in Fig. 3.
Fig. 3. a) Original image b) Detected face using Viola-Jones algorithm c) Detected face using
MTCNN.
4.2 Pre-processing
Pre-processing makes the raw data adjusted for further consideration. Among numer-
ous ways of pre-processing methods, we employed following techniques:
4.2.1 Cropping and Augmentation

As preprocessing, we at first cropped the image according to the bounding box (generated
as Sect. 4.1) as a region of interest (ROI). Thereafter, two augmentation techniques
“Horizontal flip” and “Rotation” have been used for increasing data samples of the dataset
by modifying the data [13]. Additionally, images were resized to 120 * 120 * 1 because
the proposed system requires fixed width and length for each images. Fig. 4 shows a
sample of ROI and the output of augmentation techniques.
Fig. 4. a) Region of Interest or ROI b) Flipped image of ROI c) Rotated image of ROI.
4.2.2 Morphological Processing

Morphological operations are mathematical operations performed on an image to
extract features to get the most distinctive facial emotion information. The combination
of four morphological operations such as erosion, dilation, opening and closing are
experimented in this paper to select the best technique for facial feature extraction.
Erosion enlarges dark regions and shrinks bright regions whereas, dilation enlarges
bright regions and shrinks dark regions. Opening is defined as an erosion followed by
dilation and closing is defined as a dilation followed by an erosion. These operations
only works on the grayscale image or binary image. Hence, the system performs RGB
to grayscale image conversion on the ROI. Eyes, lips and noses are the key elements to
dissect each expression. Erosion followed by a subtraction extracts features from these
areas in such a manner where erosion removes the pixel from these areas, and sub-
traction of eroded image from grayscale image brings back the detached pixel.
Output Image ¼ Subtracted Image ¼ Gray Image Eroded Image ð1Þ
There are few more experimented combinations that can extract facial features-
Output Image ¼ E ¼ Erosion þ Opening ð2Þ
D ¼ Dialation þ Closing ð3Þ
Output Image ¼ Gray Image þ ðE DÞ ð4Þ
Output Image ¼ Gray Image þ E ð5Þ
Output Image ¼ Gray Image ðE þ DÞ ð6Þ
Output Image ¼ Gray Image ðE DÞ ð7Þ
The output images are the final images for each technique that are going to participate
in classification. The images are converted into image vectors using image to feature
vector method. Thus, the system extracts feature from eyes, lips and noses to train
classifiers.
The output images of the experimented combinations of morphological operations
are shown below (Fig. 5):
Fig. 5. a) Eroded image of ROI. Output image of- b) Subtracted image. c) Erosion + Opening.
d) Gray image + E – D. e) Gray image + E. f) Gray image – E + D.
4.3 Classical Machine Learning Approaches

Machine learning classification algorithms generate predictions using computational
statistics, methods delivered by mathematical optimizations based on the training data.
In this paper, the Random Forest classifier, Logistic Regression classifier, Support
Vector Machine classifier and boosting classifiers such as XGBoost and Gradient
Boosting Machine are used for classification purpose. Hyper-parameter tuning is a
process of finding the best parameters for the model architecture to get the ideal model
for a system [14]. These methods are trained with the collected dataset. The result’s
analysis focuses on the accuracy and overfitting, which we got throughout the hyper-
parameter tuning process.
4.4 Deep Learning (CNN) Approach

Convolutional Neural Network (CNN) is a deep neural network usually applied to
interpret visual imagery. Deep learning models are constructed using layers. The layers
between the input and output layers are called hidden layers, connected sequentially.
They contain neuron, weights and bias that help generate feedback and update the
features. Our constructed CNN operates using max-pooling in the hidden layers, stride
value and padding on an input image and extract features. The shape of an output layer
is computed as-
ðm f þ 1Þ ðn f þ 1Þ
Outputshape ¼ þ1 þ1 ð8Þ
S S
Here, m n is the input size; f f is the kernel size and S is the stride value. All layers
are summed up, and total features are the trainable parameters for the model. The
model learns from the features using an optimizer. In the final layer, the activation
function classifies the images by retrieving the highest probability based on each class’s
probability distribution value [15]. In the proposed CNN model, “adam” is used as an
optimizer with a learning rate and softmax function as an activation function for
classification. Based on the classification result, the model improves the result using
backpropagation to reduce the error and repeats the whole process using epoch. The
combination of layers of Convolutional Neural Network (CNN) that works best for our
system is illustrated in Fig. 6.
5 Result Analysis
One of the major focuses of this research is implementing and comparing the outcomes
of classical machine and deep learning strategies. Hence, we have demonstrated the
result analysis in two way.
Fig. 6. CNN layers of sequential model

Firstly, the machine learning classifier’s performance analysis is specified through

experiments that shows the impact of augmentation techniques and the combination of
morphological operations (discussed in Sect. 4.2.2) on the classifiers. For augmentation
techniques, based on the first morphological operation (Eq. 1) the investigation is
performed in three ways: without applying augmentation, applying both flip and
rotation, and applying only the flip method. Every experiment is analyzed through the
hyper-parameter tuning phase to see the changes in the classifiers’ result. Table 1
depicts the result (accuracy) of “Without applying augmentation” and “Applying both
augmentation techniques”.
Table 1. Experimental result of ‘Without applying augmentation’ and ‘Applying both

augmentation technique’.
Machine Pre-tuning (Hyper-parameter) Post-tuning (Hyper-parameter)
learning Without Applying both flip Without Applying both flip
classifiers applying and rotation applying and rotation
augmentation techniques (%) augmentation techniques (%)
technique technique (%)
(%)
Logistic 90.54 89.39 93.03 89.72
regression
SVM 89.55 87.89 89.55 87.89
Random 92.54 90.38 91.54 92.21
Forest
xgBoost 88.56 81.43 91.54 91.04
Gradient 90.05 91.54 91.54 90.88
Boosting
Table 1 clearly indicates that the accuracy of all classifiers (except the Gradient
boosting machine) is higher for the experiment done without applying any augmen-
tation techniques. In both cases, the training accuracy is almost 100%, which means
that the classifiers result in overfitting. A third experiment took place that is done by
applying flip only to overcome this issue.
Table 2 shows the experimented result generated before and after doing hyper-
parameter tuning process. All the hyper-parameters selected for each classifier are
discussed in Sect. 4.3. The result that we got through this test helps to reduce over-
fitting. Logistic Regression classifier gives the highest accuracy with lowest overfitting
rate.
Table 2. Experimented result of “applying flip augmentation technique” for pre and post-hyper-
parameter tuning.
Classifiers Accuracy Precision Recall (%) F1-score
(%) (%) (%)
Pre Post Pre Post Pre Post Pre Post
Logistic regression 95.52 97.51 95.5 97.81 95.5 97.47 95.5 97.62
SVM 96.52 96.51 96.5 96.51 96.5 96.38 96.5 96.54
Random Forest 95.27 96.01 95.3 95.87 95.3 95.78 95.3 96.02
Gradient Boost 94.28 94.78 94.3 94.66 94.3 94.51 94.3 94.88
xgBoost 86.57 95.77 87.0 95.75 86.6 95.56 86.4 95.81
Table 3. Experimented results of morphological techniques for pre and post hyperperameter
tuning.
Classifiers Erosion + Opening Gray image + E - D Gray image – E + D Gray image + E (%)
(%) (%) (%)
Pre Post Pre Post Pre Post Pre Post
Logistic Regression 95.27 93.53 90.80 90.30 90.05 89.55 91.29 91.04
SVM 93.53 93.53 88.81 88.81 90.05 90.05 90.55 90.55
Random Forest 95.02 94.03 91.04 92.79 90.55 90.05 90.30 91.79
Gradient Boost 94.78 95.52 91.54 92.29 90.05 91.54 90.55 90.05
xgBoost 87.31 95.77 84.83 97.01 82.34 92.54 86.57 90.80
Table 3 shows the experimented results of the morphological techniques where flip
augmentation technique was applied. The results didn’t improve though the motive of
this experiment was to improve the accuracy.
The second method of evaluating our system is the analysis of CNN model. It is
initially constructed with five convolution layers where batch size = 128, epoch = 100
and dense = 512. The accuracy of this combination is 95.21% and results in under-
fitting which cannot be accepted. Due to the higher dense value, the model becomes
more complex. To overcome under-fitting and increase performance, the batch size is
reduced to 32, and dense value to 128 and epoch is increased to 200. The accuracy for
this is 95% and results in overfitting. The layers are reduced from 5 to 4, and a drop out
layer is added with a value of 0.4 which results in 98.01% accuracy. But the problem is,
this model is not stable. To fix this problem, a global average pooling layer is added
along with a max-pooling layer to simplify the input for the fully connected layer,
which also has 98.01% accuracy, and both precision and recall are 98%. The output
image of first morphological operation (Eq. 1) is used to extract features for the above
experiments of the CNN part. Furthermore, the accuracy obtained after performing few
more morphological techniques (Sect. 4.2.2) such as Eqs. 2, 4, 6 and 7 are respectively
93%, 98%, 90% and 98%.
The model loss is calculated using the “categorical cross entropy function”. The
training loss and validation loss of the CNN model without global average pooling are
0.0577 and 0.1455. On the other hand, the training loss and validation loss with global
average pooling are 0.0109 and 0.0716. In both cases, model accuracy is the same, but
the global average pooling model is more stable as depicted in Fig. 7.
Fig. 7. Model loss graph – a) without global average pooling b) with global average pooling.
Model accuracy graph – c) without global average pooling d) with global average pooling.
Confusion matrix gives the overall idea of the performances of classifiers. It also
shows how much data is misclassified while testing. In confusion matrices, emotion
classes identify as “angry” = 0, “happy” = 1, “sad” = 2 and “surprise” = 3. The
confusion matrix of all classifiers, including CNN is illustrated in Fig. 8.
Fig. 8. Confusion matrix of a) Logistic Regression b) SVM c) Random Forest d) xGBoost e)

Gradient boosting machine f) CNN
Based on the result analysis, CNN model has the train accuracy of 99.71% and test
accuracy of 98.01%, it also has the highest precision and recall value. In this paper,
CNN is proposed as the best classifier since, it has the lowest model loss and over-
fitting rate so far compared with the other classifiers.
6 Visualization
We visualized the output of all steps, such as ROI, erosion, subtraction etc., of this
system by developing a graphical user interface using the PyQt5 framework. It interprets
the emotion from an input image, and it can also compare the predicted result among the
classifiers. The below figure shows the interface of the proposed system (Fig. 9).
Fig. 9. Graphical user interface of the proposed system
7 Conclusion and Future Work
In this paper, our attempt was to predict the emotion from the images of facial
expression. As a consequence, after collecting and annotating the data, we employed
MTCNN since it performs better than the Viola-Jones algorithm and has almost 100%
detection rate for face detection. Before classification, a combination of several mor-
phological methods has been applied to extract the useful features from the face.
Among the machine learning algorithms, Logistic regression gave the best performance
with 97.51% accuracy. However, CNN is the best classifier among the fundamental
research with 98.01% accuracy and lowest overfitting. That’s why CNN is proposed as
the ablest classifier for our design. The proposed model is limited to detect four
expressions only. Moreover, one particular expression has several types of signs and all
signs should be detected. Thus, our further study on this topic is to overcome these
issues and analyze with other deep learning models and improve our model to make it
capable of detecting multiple human faces and their emotions.
References
1. Jason, B.: A gentle introduction to computer vision. Mach. Learn. Mastery (2019). https://
machinelearningmastery.com/what-is-computer-vision/
2. Lee, B., Chun, J., Park, P.: Classification of facial expression using SVM for emotion care
service system. In: Proceedings of the 9th ACIS International Conference on Software
Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing SNPD
2008 2nd International Workshop on Advanced Internet Technology and Applications,
pp. 8–12 (2008). https://doi.org/10.1109/SNPD.2008.60
3. Happy, S.L., George, A., Routray, A.: A real time facial expression classification system
using local binary patterns. In: 4th International Conference on Intelligent Human Com-
puter Interaction: Advancing Technology for Humanity IHCI 2012, pp. 1–5 (2012). https://
doi.org/10.1109/IHCI.2012.6481802
4. Asad, M., Gilani, S.O., Jamil, M.: Emotion detection through facial feature recognition. Int.
J. Multimed. Ubiquitous Eng. 12(11), 21–30 (2017). https://doi.org/10.14257/ijmue.2017.
12.11.03
5. Jayalekshmi, J., Tessy, M.: Facial expression recognition system ‘sentiment analysis’. In:
2017 International Conference on Networks & Advances in Computational Technologies
(NetACT), Trivandrum, pp. 1–8 (2017)
6. Apte, A., Basavaraj, A., Nithin, R.K.: Efficient facial expression ecognition and classification
system based on morphological processing of frontal face images. In: 2015 IEEE 10th
International Conference on Industrial and Information Systems, ICIIS 2015 - Conference
Proceedings, pp. 366–371 (2015). https://doi.org/10.1109/ICIINFS.2015.7399039
7. Joseph, A., Geetha, P.: Facial emotion detection using modified eyemap–mouthmap
algorithm on an enhanced image and classification with tensorflow. Vis. Comput. 36(3),
529–539 (2019). https://doi.org/10.1007/s00371-019-01628-3
8. Luh, G., Wu, H., Yong, Y., Lai, Y., Chen, Y.: Yolov3 deep neural networks. In: 2019
International Conference on Machine Learning and Cybernetics, pp. 1–7 (2019). https://
ieeexplore.ieee.org.ezproxy.ugm.ac.id/document/8949236
9. Krithika, L.B., Priya, G.G.L.: Graph based feature extraction and hybrid classification
approach for facial expression recognition. J. Ambient. Intell. Humaniz. Comput. 12(2),
2131–2147 (2020). https://doi.org/10.1007/s12652-020-02311-5
10. Dexter, M.: The face we make. Flickr (2012). https://www.flickr.com/photos/thefacewe
make/albums
11. Viola, P., Jones, M.J.: Cohomology of one-relator products of locally indicable groups.
J. London Math. Soc. s2–30(3), 419–430 (1984). https://doi.org/10.1112/jlms/s2-30.3.419
12. Zhang, K., Zhang, Z., Li, Z., Member, S., Qiao, Y., Member, S.: (MTCNN) multi-task
cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
13. Jason, B.: Configure image data augmentation - MATLAB - MathWorks India. Mach.
Learn. Mastery (2019). https://machinelearningmastery.com/how-to-configure-image-data-
augmentation-when-training-deep-learning-neural-networks/?fbclid=
IwAR1G5LP2OGsZAkcS9g2tbQ68wrT_29tWL6P2s3GzOzvaEEm0pneyHFqff7A#:*:
text=Image data augmentation is a, of images in the dataset. &te
14. Jordan, J.: Hyperparameter tuning for machine learning models. Jermy Jordan (2017).
https://www.jeremyjordan.me/hyperparameter-tuning/
15. Canedo, D., Neves, A.J.R.: Facial expression recognition using computer vision: a
systematic review (2019). https://doi.org/10.3390/app9214678
Detection of Invertebrate Virus Carriers Using
Deep Learning Networks to Prevent Emerging
Pandemic-Prone Disease in Tropical Regions
Daeniel Song Tze Hai, J. Joshua Thomas(&), Justtina Anantha Jothi,

and Rasslenda-Rass Rasalingam
Department of Computing, School of Engineering, Computing, and Built

Environment, UOW Malaysia KDU Penang University College,
10400 Penang, Malaysia
{jjoshua,justtina,rasslenda}@kdupg.edu.my
Abstract. Insects are a class of invertebrate organisms. They are a massively

effective group, comprising animals such as bees, butterflies, cockroaches, flies,
mosquitoes, and ants. Mosquito-borne diseases are those spread by the bite of an
infected mosquito. Zika, West Nile virus, Chikungunya virus, dengue, and
malaria are diseases that are spread to people by mosquitoes. The purpose of this
work is to show that computer vision can classify the mosquito species that
spread the dengue (emerging-pandemic prone disease) by using Convolutional
Neural Network (CNN). The work is to assist the non-specialist to identify these
three types of mosquito species with a simple web interface integrated with deep
learning models works at the backend. Convolutional Neural Network
(CNN) has been implemented to extract features from the mosquito images and
identify the mosquito species, such as Aedes, Anopheles, and Culex. There are
2,111 mosquito images collected and used these mosquito images to train the
CNN to perform mosquito species classification. From the experiment result,
deep learning shows that it has the ability to identify the mosquito species. There
are series of experiment has been conducted with data augmentation, regular-
ization technique, stride, filters methods in convolutional layer to improve the
performance and prediction of the algorithms with higher prediction results.
Keywords: Dengue Deep learning Convolutional neural network Data

augmentation Mosquito
1 Introduction
According to The Star Online (2020), the world is changing rapidly, and a vast array of
innovations have been seen just in the last decade that carries just about everything to
the next stage. For example, houses with virtual assistants such as Google Assistant,
Alexa, and Siri are transforming into smart homes. In 2019, Gartner (2019) has sur-
veyed the companies that are operating with AI or ML. The result shows that 59% of
respondents have an average of four AI/ML projects, and this shows that AI/ML is a
technology trend in 2020 and 2021.

https://doi.org/10.1007/978-3-030-93247-3_13
Detection of Invertebrate Virus Carriers Using Deep Learning Networks 121
In the past decade, mosquitoes have been seen as the main vectors of disease-
causing infectious agents that have an enormous influence on global health [2]. There is
a chance of exposure to mosquito-borne diseases in more than half of the world’s
human population, and more than 1 billion cases of these infections reported annually.
The mosquito-borne diseases are the main threat in South-East Asia where dengue
outbreaks have a significant effect on human health [5]. Besides dengue, Malaria and
West Nile virus are the common mosquitoes transmitted diseases. The mosquito’s
species, such as Aedes, Anopheles, and Culex, spread these diseases. This project
focuses on deep learning where Tensorflow and Keras Deep Learning Library [6] will
be adopted for developing and evaluating the deep learning model. This project uses a
Convolutional Neural Network (CNN) [10] as the deep learning algorithm to classify
the mosquito’s species such as Aedes, Anopheles, and Culex. In this article, we present
convolutional neural network with computer vision able to detect and analyse the
image inputs and perform classification of dengue mosquitos. We explain in Sect. 2 the
Wing geometric morphometric, CNN, CNN layers, and its parameters. In Sect. 3, we
introduce the dataset used in the work and the overall structure of the convolutional
neural network used for this project. In Sect. 4, we have explained the various stages
on the algorithm, and discuss its implementation in stages such as pooling layer,
dropout layer, fully connected layer, callbacks, save model. In Sect. 5, we discuss the
experimental results with different data size, accuracy, early stopping, training loss,
prediction result with confusion matrix. The conclusion of the work has been covered
in Sect. 6.
2 Related Work
2.1 Wing GeometricMorphometric

Wing geometric morphometrics [7] to identify the mosquito genera such as Aedes,
Anopheles, and Culex. In the data collection process, every adult female mosquito
right-wing will be removed. After that, the right-wings are photographed by using 40x
magnification with a digital camera. The images were then digitized eighteen land-
marks by using the software [7]. In the wing geometric morphometric process, a
method named canonical variate analysis (CVA) is used to explore the dissimilarity of
the wing shape degree in a morphospace between species and calculate the Maha-
lanobis distance. Next, a cross-validated for each species reclassification tests were
carried out, and a Neighbor-Joining tree was built to show the patterns of species
segregation [7]. Mosquito wing geometric morphometry is a proven technique of
mosquito identification and is cheap and easy to use. [7] mentioned this method could
classify epidemiologically relevant vector mosquito’s species whose identification has
proven troublesome using other techniques. As a result, the three genus mosquito has a
classification accuracy of 90% correctly, but this method is costly and requires expert
knowledge and skills.
122 D. S. T. Hai et al.
2.2 Convolutional Neural Networks

A simple Convolutional Neural Network (CNN) architecture is made up of three layers,
which are known as convolution layers, pooling layers, also known as subsampling layers
and fully connected layers A simple CNN architecture is built when those layers are
stacked, which is shown in Fig. 1. In this section, the layers of CNN will be discussed.
Fig. 1. Basic convolutional neural network layers (unknown)
A matrix of pixel values is what the computer will see when an image enters. There
are various matrices that the network will see depending on the resolution and
dimension of the image, such as 28 28 3 matrix, which 28 28 is the image width
and height while 3 represent RGB channel [9].
2.3 Convolutional Layers

The convolutional layer plays an essential role in how the convolutional neural network
works. The convolution layer is made up of a lot of feature maps that are obtained with
the input signal by the convolution of the convolution kernel. For a single-channel two-
dimensional (2-D) image, every kernel with convolution is a matrix of weight that can
be either a 3 3 or 5 5 matrix. The convolution operation uses convolution kernels
to process inputs of variable size and extract various input features in the convolution
layer [10].
The convolutional layer primarily has weight sharing and sparse interaction. The
convolutional layer’s weight sharing mechanism further decreases the network training
parameters [10]. The weight sharing mechanism can effectively avoid the network
overfitting caused by a wide range of parameters and enhance the network operating
performance [3, 9, 11]. The convolutional layer’s sparse interaction not just reduces the
model’s storage requirements but also needs less computation to get the output, in
consequence, enhancing the model’s performance [3]. The convolutional layer can
reduce the complexity of the model considerably by optimizing its output. Three
hyperparameters can be optimized, such as depth, stride, and padding.
Depth: The output volume depth generated by the convolutional layer can be man-
ually adjusted to the same input region by the number of neurons within the segment.
Reducing the depth can considerably decrease the network’s total number of neurons,
but also decrease the model’s pattern recognition capabilities substantially.
Stride: Stride is one of the hyperparameters that can reduce the convolutional layer
parameters. Stride specifies the number of pixel shifts over the matrix of the input, and
the default value is 1. The number of strides can increase, but it will result in having a
smaller feature map as the potential locations have been skipped over [1, 3]. The filter
will move one pixel every time and will have a feature map size of 5 5 only. If the
amount of the stride is 2, the filters will move two-pixels every time and will have a
feature map size of 3 3, this can result in reducing the overlap and the output size.
Padding: There are two classes of padding, which are “valid” and “same.“ “Valid”
means the convolutional layer will not pad at all and will not maintain the input size.
“Same” means there is a pad around the convolutional layer so that it can keep the
output size and have the same size as the input size. “Same,” also known as zero-
padding, can prevent loss of information that might occur at the image boundary as the
information will only be captured when the filter passes through [1].
Max Pooling: Max pooling takes the maximum value from the feature map and pro-
vides better efficiency and performance with simple linear classifiers and sparse coding.
The max-pooling statistical properties make it very suited to sparse representation.
However, the primary weak point of max-pooling is that the pooling operator will only
extract the maximum value from the feature map and neglect other values. This con-
dition may result in unacceptable outcomes because of the loss of information [3].
2.4 Fully Connected Layers

The image’s feature map is extracted after a sequence of convolutional layers and
pooling layers, and the feature map neurons are then converted into a fully connected
layer [9]. The meaning of fully connected is each upper-level neuron is interconnected
with each next-level neuron, which is shown in Fig. 1. Thus, the convolutional layer
and pooling layer can be considered as a feature extractor, while the fully connected
layer can be regarded as a classifier [8].
A fully-connected layer typically examines which types of categories are the closest
matches with the advanced features and the weight they have. The right likelihood for
the various categories can be obtained by measuring the dot product and the weight
between preceding layers. The fully-connected layer processes the input and then
shows the output in an N-dimensional vector, while N indicates the categories number
and the probability of the categories [8].
Dropout is a technique that can help the CNN model overcome the overfitting
problem. Dropout will drop the unit from the input, and hidden layers depend on the
probability. Max-pooling Dropout can maintain the max-pooling layer behaviour at the
same time allows other function values to influence the performance of the pooling
layer. Data Augmentation can increase the training data size by 1 to 10 N and
prevent the model from overfitting. The examples of data augmentation are random
cropping, random rotation, and flipping. Batch Normalization is a technique similar to
Dropout, which used to prevent the model from overfitting. Besides that, the higher
learning rate is allowed to be used in batch normalization. Transfer Learning is
essential when the training data is limited as transfer learning enables the model to have
excellent performance and prevent the model from overfitting. CallBacks Function is a
set of functions to investigate and find the best parameter for the model in the training
phase. The examples of callbacks function are EarlyStopping, LearningRateScheduler,
ModelCheckpoint, and TensorBoard.
3 Methodology
The dataset will be distributed to three phases which are training, validation, and
testing. First of all, the dataset will be used to train the CNN model and validate the
CNN model after training. If the performance of the trained model and validated model
does not meet the requirement, it is required to modify the CNN. If the performance of
the model reaches the requirement, the model will be taken to perform image classi-
fication of mosquito’s species. If the predicted result does not meet the requirement, it
is required to modify the CNN again until the model meets all the requirements.
Figure 2 shows the overall convolutional neural network structure of this project.
The CNN structure for this project is made up of 3 convolutional layers, two max-
pooling layers, three dropout layers, one batch normalization layer, and one fully
connected layer. The optimizer for the CNN structure is Adam.
Dataset: A total of 2,111 mosquitoes images are collected through VectorBase [4].
A. Building the CNN Architecture

It is essential to design the Convolutional Neural Network (CNN) and to fine-tune the
hyper-parameters to get the best model. It is because the hyper-parameter value decides
the behaviour of the training model and how the model learns the training data. For
example, the input dimensions for each layer will decrease as the convolution opera-
tion, and pooling operation is going on. Therefore, the hyper-parameters values will
affect the input dimensions and turn out the model has a negative dimension, such as
filter-size, stride, padding, etc. A model with a negative dimension will cause the
execution stops when executing the code. Therefore, it is essential to design the CNN
architecture properly before proceeding to the implementation phase. Below is the
calculation for the CNN model activation shape and parameters contained after each
convolution operation and pooling operation.
Activation Size is calculated by multiplying all values of Activation Shape. In
Fig. 2, First of all, the activation shape for convolution operation and pooling operation
is calculated by using the formula shown in Eq. (1) and Eq. (2). Next, the formula for
the number of parameters for the layer shown in Eq. (3). For the first stage, the
dimensions of the input image are 28 28 (width and height), and the color channel
value is 1, which is grayscale. Thus, the activation shape for the input image is (28, 28,
1) and the activation size is (28 * 28 * 1 = 784). For the third stage, the convolutional
layer has the same filter size, kernel size, stride value, and padding value as the layer
above. As this padding for this layer is zero-padding, so the input dimensions remain
the same. Thus, Activation Value is 28 * 28 * 32 = 25,088 and parameters size is 32
(32 3 3) + 32 = 9,248. For the fourth stage, the max pooling layer has a 2 2
kernel size, two stride values, and “Valid” padding. Formula 3.1 is adopted as the
padding value is “Same”. Dimensions Value is ceil((28 – 2 + 1)/2) = ceil(13.5) = 14.
Therefore, the activation shape is (14, 14, 32) and the activation value is 14 * 14 *
32 = 6,272. This process will continue until the flatten layer. Flatten layer connect the
input dimensions to the dense layer, so the activation shape is (3,136, 1) and the
activation size is 3,136 * 1 = 3,136. The next layer is the dense layer, which has a (256,
1) activation shape, and the activation size is 256 * 1 = 256. As for the parameter size,
the calculation is (3136 x 256) + 256 = 803,072. The activation shape for the batch
normalization layer is (256, 1), the activation size is 256 * 1 = 256, and the number of
parameters are 4 * 256 = 1,024. The last layer will be using a softmax activation
function to perform multi-classification.
Fig. 2. Overall structure of convolutional neural network structure

ðceilðN þ f 1Þ=s, ceil(N þ f 1)/s, Number of filtersÞ ð1Þ
ðN; N; Number of filtersÞ ð2Þ
ðPrevious Input Layer (Filter Kernel Size)Þ þ Number of Bias ð3Þ
4 Implementation
A. Build the Convolutional Neural Network: In the pre-processing data process, the
mosquito’s images have been resized to 28 28 dimensions and converted to
grayscale. Thus, the input dimensions for the convolutional neural network is required
to be the same with the processed images data dimensions. First of all, the first con-
volutional layer will receive the input training data to perform the convolutional
operation. As for the hyper-parameters for this convolutional layer, the convolutional
layer has 32 filters, 3 3 kernel size, and the padding is set as zero-padding. With the
zero-padding, it can maintain the dimensions of the output size. After the convolutional
operation, a ReLU activation function is used to replace all negative activations to 0.
Activation functions can improve the non-linear properties of the model. Thus, the
output size for this layer has 32 tensors of the size 28 by 28 (28 28 32). The first
convolutional layer is used to detect low-level features, such as curves and edges.
However, the objective of the model is to classify mosquitoes species by recognizing
the mosquitoes pattern. Thus, it is not enough for the model to classify mosquitoes
species with one convolutional layer. Therefore, more convolutional layer is added to
make the model have a better network to recognize high-level features. As for the
pooling layer, the max-pooling is adopted to perform the pooling operation. The filter
size is 2 2 and the stride value is 2.
Dropout Layer
Dropout is a regularization technique for preventing overfitting and gives significant
improvements to the neural network. The dropout layer is temporarily dropping the
activation unit from the network by changing them to 0. In this project, the dropout
layer is used in the input unit and the hidden unit. In the input unit, the dropout value is
set as 0.25, while the dropout value is set as 0.5 in the hidden unit. Flatten Layer helps
to reshape the multi-dimensional tensors into a 1D tensor. For example, the activation
shape of the previous layer is (7, 7, 64), flatten layer reshape into 1D tensor (7 * 7 *
64 = 3136). Hence, the 1D tensor shape (3136) can be used as the input of the dense
layer (Fully connected layer). The batch normalization layer is a regularization tech-
nique for preventing overfitting. It normalizes the activation value and replaces it with
the mean value.
Output Layer: Last Dense Layer is used to detect the high-level features and outputs
which types of categories are the closest matches with the high-level features and the
weight they have. The fully-connected layer processes the input and then shows the
output in an N-dimensional vector, while N indicates the categories number and the
probability of the categories. Besides that, the softmax activation functions will be used
to perform multi-classification.
Compilation: During the compilation of the model shows the loss function “cate-
gorical_crossentropy” is adopted as the softmax activation function is used in the dense
layer [5]. Besides that, Adam will be used as the optimizer of the model [11]. Adam can
perform computationally efficiently for different parameters with individual learning
rates and only requires little memory. Metrics accuracy specifies a list of metrics for
different outputs when evaluating the model, such as training accuracy, training loss,
validation accuracy, and validation loss.
Callbacks Function: During the implementation, TensorBoard is implemented as it is
an essential tool that allows the researcher to visualize the dynamic graph of the
training and test metrics. TensorBoard can enable the researcher to investigate the
model and find the best tuning parameter for their model. Besides that, Model
Checkpoint is implemented as it helps the researcher to save the best model during the
training process automatically. The model checkpoint will automatically save the
model with minimum validation loss. Additionally, Early Stopping is implemented to
help the researcher notice when the model overfitting or under fitting. Overfitting of the
training dataset may cause by too many epochs while under fitting of the training
dataset may cause by too few epochs. Hence, the researcher can define a large number
of training epochs during the implementation. The researcher does not need to worry
about the performance of the model because the model checkpoint will stop the training
when the performance of the model stops to improve. After finishing building the CNN
architecture and implementing the callbacks function, the model is prepared to be
trained. “X_train and to_categorical(y_train)” represents the training data and the label
of the training data. Batch Size represents the number of data fit into the network and
epochs represent the number of times the training dataset is fit into the neural network.
Shuffle means the training data will be shuffled before every epoch and validation_data
means the validation dataset and the label of the validation dataset. Callbacks represent
the callbacks function, as mentioned above. Verbose with two means shows the output
one line per epoch.
In this phase, the researcher will conduct some experiments. The experiments include
the data size, the batch size, the regularization technique, the data augmentation
technique and the model prediction results. These experiments are intent on finding out
how these experiments affect the performance of the model. In this experiment, the
validation data size is set as 10%, 20%, and 30% from the dataset. For example, if the
validation size is 10%, then the training size is 90%. Hence, the number of training data
and validation data for each test case is shown in Table 1.
Table 1. Test case for the datasize.
The Fig. 3 shows the training accuracy of different data sizes. As the graph shows,
all of the training accuracies are increasing slowly as the training is going on. However,
in the epochs 65, test case 1 has the highest training accuracy (94.53%) compared to the
other two, where test case 2 has 92.31%, and test case 3 has 93.83%. The reason test
case 3 does not continue training is because test case 3 has started to overfit in the
validation, and the training process has been stopped by the Keras callbacks function
“EarlyStooping.” All of the training loss is decreasing slowly, and test case 1 has the
lowest training loss (0.1415) in epochs 65.
Fig. 3. Training accuracy of different data size
Test case 2 has 0.2097 and test case 3 has 0.1710 of the training loss. Thus, test
case 1 is still the best CNN model within these test cases. As the training started,
validation accuracy for both test cases began to increase rapidly. However, the per-
formance of both test cases began to rise slowly after epochs 35. Eventually, test case 3
has the worst validation accuracy (76.61%), followed by test case 2 (80.16%), and test
case 1 (84.62%) will be the best validation accuracy.
Fig. 4. Validation loss of different data size.

Figure 4 shows the validation loss of different data sizes. The graphs show that test
case 3 has the highest validation loss (0.7884) performance and started to overfit in
epoch 33. Nevertheless, test case 2 has 0.5781 validation loss and also began to overfit in
epoch 39, while test case 1 has 0.4879 validation loss and started to overfit in epoch 65.
Table 2. Summary of the different data size.
In test case 1 has the highest training accuracy and validation accuracy. Besides
that, test case 1 has the lowest training loss and validation loss. The performance of test
case 2 and test case 3 is terrible because both test cases have lesser training data
compared to test case 1. Hence, splitting the dataset to 90% of training data and 10%
validation data is the best method for the CNN model. The size of the dataset is one of
the biggest problems in deep learning. According to research, each class requires 1000
examples. Unfortunately, the researcher is only able to find approximately 600 mos-
quitoes images for each category. The reason is due to the images of mosquitoes
species are limited on the internet. Table 2 shows a summary of different data sizes.
Prediction Result
A total of 9 images have been used as the test images. These images include 3 Aedes
images, 3 Anopheles images, and 3 Culex images. These test images have not been
used in the train data. So, these test images can be used for the model to perform image
classification. As for the prediction result, the result is shown in Fig. 5, The confusion
matrix of model predicted results. From the figure, the predicted label is at x-axis while
the true label is at y-axis. The dark blue cell represents all of the test images are
accurately predicted, the medium blue cell represents some of the test images are
accurately predicted while the light blue cell represents the test images are incorrectly
predicted.
Fig. 5. Confusion matrix of the predicted model
Therefore, the 3 Aedes and Culex test images are accurately predicted while 2
Anopheles test images are correctly predicted. However, there is one Anopheles test
image incorrectly predicted. The model predicts the test image as Culex but it actually
is Anopheles. Hence, there are 8 out 9 of the test images are correctly predicted and 1
out 9 of the test images are incorrectly predicted.
6 Conclusion
Computer vision can classify the mosquito species, such as Aedes, Anopheles, and
Culex. Hence, one of the computer vision algorithms, Convolutional Neural Network
(CNN), is adopted to perform image classification of mosquito species. The CNN
achieves pretty high accuracy on the validation accuracy and has excellent results in the
accuracy of mosquito species classification. The dataset of the mosquito species is one
of the keys to success. There are more than 2,000 mosquito species images collected
from the internet and fit these images to the model to increase the reliability of the
model to recognize the mosquito species. Besides the dataset, the architecture of CNN
is playing an essential role in this project. Plenty of experiments conducted to find the
best hyperparameters for the model.
Acknowledgement. The research work has conducted from the UOW Malaysia, KDU Penang
University College IPA lab. The authors are thankful to the ICO2021 reviewers has given
comments to improve the article.
References
1. Albawi, S., Mohammed, T.A., Zawi, S.: Understanding of a convolutional neural network.
In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6 (2017).
https://doi.org/10.1109/ICEngTechnol.2017.8308186
2. Famakinde, D.O.: Mosquitoes and the lymphatic filarial parasites: research trends and
budding roadmaps to future disease eradication. Trop. Med. Infect. Dis. 3(4), 1 (2018).
https://doi.org/10.3390/tropicalmed3010004
3. Thomas, J.J., Karagoz, P., Ahamed, B.B., Vasant, P. (eds.): Deep Learning Techniques and
Optimization Strategies in Big Data Analytics. IGI Global (2019)
4. Murugappan Giraldo-Calderón, G.I., et al.: VectorBase: an updated bioinformatics resource
for invertebrate vectors and other organisms related with human diseases. Nucleic Acids
Res. 43(Database Issue), D707–D713 (2015). https://doi.org/10.1093/nar/gku1117
5. Ismail, T.N.S.T., Kassim, N.F.A., Rahman, A.A., Yahya, K., Webb, C.E.: Day biting habits
of mosquitoes associated with mangrove forests in Kedah, Malaysia. Trop. Med. Infect. Dis.
3(77), 1–8 (2018). https://doi.org/10.3390/tropicalmed3030077
6. Park, J., Kim, D., Choi, B., Kang, W., Kwon, H.: Classification and morphological analysis
of vector mosquitoes using deep convolutional neural networks. Sci. Rep. 10(1), 1012
(2020). https://doi.org/10.1038/s41598-020-57875-1
7. Wilke, A., et al.: Morphometric wing characters as a tool for mosquito identification.
PLoS ONE 11, 1–12 (2016). https://doi.org/10.1371/journal.pone.0161643
8. Zhang, Q.: Convolutional neural networks. In: 3rd International Conference on Electrome-
chanical Control Technology and Transportation, pp. 434–439 (2018). https://doi.org/10.
5220/0006972204340439
9. Murugappan, M., Thomas, J.V.J., Fiore, U., Jinila, Y.B., Radhakrishnan, S.: COVIDNet:
implementing parallel architecture on sound and image for high efficacy. Future Internet 13
(11), 269 (2021)
10. Chui, K.T., Gupta, B.B., Liu, R.W., Zhang, X., Vasant, P., Thomas, J.J.: Extended-range
prediction model using NSGA-III optimized RNN-GRU-LSTM for driver stress and
drowsiness. Sensors 21(19), 6412 (2021)
11. Thomas, J.J., Fiore, U., Lechuga, G.P., Kharchenko, V., Vasant, P. (eds.): Handbook of
Research on Smart Technology Models for Business and Industry. IGI Global (2020).
https://doi.org/10.4018/978-1-7998-3645-2
Classification and Detection of Plant Leaf
Diseases Using Various Deep Learning
Techniques and Convolutional Neural Network
Partha P. Mazumder(&), Monuar Hossain(&), and Md Hasnat Riaz(&)
Department of Computer Science and Telecommunication Engineering,

Noakhali Science and Technology University, Noakhali 3814, Bangladesh
monuar0112@student.nstu.edu.bd
Abstract. In this paper, we developed a Convolutional Neural Network model

for detecting and classifying simple leaves images of (mostly) diseased plants
and healthy plants with the help of different types of deep learning method-
ologies. We used an open database from PlantVillage dataset of 54,306 images
containing 14 different plants in a set of 38 distinct classes of (diseased plants
and healthy plants) to train our model. Among different model architectures
were trained, the best performance reaching a 99.22% success rate using 0.4%
data as testing among the whole dataset in identifying the corresponding (dis-
eased plant or healthy plant) combination. This significantly good amount of
success rate ensures the model a very useful advisory or early warning tool, and
also an approach that could be further extended to uphold an integrated plant
disease identification system to operate in real cultivation conditions or a clear
path toward smartphone-assisted crop disease diagnosis on a large amount of
areas.
Keywords: Plant disease classification Neural network InceptionV3

Xception
1 Introduction
Plant diseases have a longer lasting effect on agricultural products. There are estimated
more than $30–50 billion annually monetary loss caused by plant diseases [1]. Modern
technologies have blessed human society the potential to produce ample rations to meet
the request of more than 7 billion people. However the lack of side effects of the food
always remains intimidated by a number of factors such as significant amount of
change in climate (Tai et al. 2014) [2], the reduce in pollinators (from the reports of
Plenary of the Intergovernmental Science-Policy Platform on Biodiversity Ecosystem
and Services of its 4th session, 2016) [3], plant diseases (Strange and Scott 2005) [4],
and also many others. Also there are disease causing agents called pathogens. However
we can lessen crop losses and also can take different types of measures to overpower
specific micro-organisms if plant diseases are efficiently diagnosed and distinguished
early. Thus, plant pathologists have shared their knowledge with farmers through
farming communities. That’s why machine learning comes into the picture. To improve
the diagnostic results, several studies on machine learning-based automated plant
https://doi.org/10.1007/978-3-030-93247-3_14
Classification and Detection of Plant Leaf Diseases 133
diagnosis have been conducted. Convolutional neural networks (CNNs) are widely
perceived as one of the most promising classification techniques among machine
learning fields. The most attractive advantage of CNN is their ability to acquire req-
uisite features for the classification from the images automatically during their learning
processes. Recently, CNN have demonstrated excellent performance in large scale
general image classification tasks [5], traffic sign recognition [6], leaf classification [7],
and so on. Computer vision, and object recognition techniques in particular, has made
immense advancements in the past few years. The PASCAL VOC Challenge (Ever-
ingham et al. 2010) [8], and more recently the Large Scale Visual Recognition Chal-
lenge (ILSVRC) (Russakovsky et al. 2015) [8] based on the ImageNet dataset (Deng
et al. 2009) [9] have been widely used as yardstick for a quantity of visualization-
related problems in computer vision, including object classification. In 2012, a large,
deep convolutional neural network achieved a top-5 error of 16.4% for the classifi-
cation of images into 1000 possible categories (Krizhevsky et al. 2012) [10]. In the next
3 years, different types of advancements in deep convolutional neural networks less-
ened the error rate up to 3.57% (Krizhevsky et al. 2012 [10]; Simonyan and Zisserman
2014 [11]; Zeiler and Fergus 2014 [12]; He et al. 2015 [13]; Szegedy et al. 2015 [14]).
2 Related Works
Research in agriculture section is focused towards improvement of the standards and

the proportion of the product at less wasting with more take. The standard of the
agricultural product may be debased due to different types of plant diseases. These
diseases are caused by pathogens such as fungi, bacteria and viruses. With the help of
different types of applications, many systems have been suggested to solve or at least to
lower the problems faced by the farmers, by harnessing the use of image processing
and different types of automatic classification tools.
Suhaili Kutty et al. [15] discussed the process to classify Anthracnose and Downey
Mildew, watermelon leaf diseases using neural network analysis. They used a digital
camera with specific calibration procedure under controlled environment. Their clas-
sification is based on color feature extraction from RGB color model where the RGB
pixel color indices have been extracted from identified ROI (region of interest).To
reduce noise from images and for segmentation median filter is used. And for classi-
fication of the image, neural network pattern recognition toolbox is utilized. Proposed
method achieved 75.9% of accuracy based on its RGB mean color component.
Sanjeev Sannaki et al. [16] identify the disease with the help of image processing
and AI techniques on images of grape plant leaf. In their proposed system, complex
background with grape leaf image is taken as input. Noise is removed using anisotropic
diffusion also the segmentation is done by k-means clustering. After segmentation,
feature extraction is happened by computing Gray Level Co-occurrence Matrix. And
finally classification takes place using Feed Forward Back Propagation Network
classifier. Also Hue feature is used for more accurate result.
Akhtar et al. [17] have implemented the support vector machine (SVM) approach
procedures for the classification and detection of rose leaf diseases as black spot and
anthracnose. Authors have implied the threshold method for segmentation and Ostu’s
134 P. P. Mazumder et al.
algorithm was mainly used to establish the threshold values. In this approach, different
features of DWT, DCT and texture based eleven haralick features are extricated which
are afterwards merged with SVM approach and predicts quite efficient accuracy value.
The study of Usama Mokhtar et al. [18] incorporated with method that involves
gabor wavelet transform technique to extract fitting features relevant to image of
tomato leaf in coincidence with using Support Vector Machines (SVMs). They
described technique of Tomato leaves diseases detection and diseases are: Powdery
mildew and Early blight. Here gabor wavelet transformation is applied in feature
extraction for feature vectors also in classification. Cauchy Kernel, Laplacian Kernel
and Invmult Kernel methods are involved in SVM for output decision where tomato
leaf infected with Powdery mildew or early blight. The proposed approach ensures
excellent footnote with accuracy 99.5%.
Supriya et al. [19] worked with the cotton leaves. They first captured the affected
leaf and then pre-process converting into other color space. They also used Otsu’s
global thresholding method during segmentation. Also color-co-occurrence method is
used for extracting different features such as color and texture. Multi SVM (Multi
Support Vector Machine) classifier is used for detecting the diseases.
Ms. Kiran R. Gavhale et al. [20] presented number of image processing techniques
to extract diseased part of leaf. For Pre-processing, Image enhancement is completed
using DCT domain and thus color space conversion is done. After that segmentation is
done with the help of k-means clustering. Feature extraction is done using GLCM
Matrix. For classifying canker and anthracnose disease of citrus leaf, the use of SVM
with radial basis kernel and polynomial kernel is done.
N.J. Janwe and Vinita Tajane [21] suggested for their medical plants disease
identification using Canny Edge detection algorithm, Histogram Analysis and CBIR.
The identification of medical plants according to its edge features. The leaf image
converts to gray scale and calculate the edge histogram. The algorithm that purposed is
canny edge detection.
3 Research Methodology
3.1 Dataset
We use PlantVillage Dataset for completing this classifier. We inspect total 54,306
images of plant leaves, which have a spread of 38 class labels allotted to them. Each
class label is a crop-disease pair, and we ensure an attempt to estimate the crop-disease
pair given just the picture of the plant leaf. In all the methods used in this paper, we
reduce the sizes of the images upto 256 256 pixels, and we carry out both the model
optimization and predictions on these downscaled images. Across all our experiments,
we work with the colored version of the whole PlantVillage dataset (Fig. 1).
Fig. 1. Example of leaf images from the Plant Village dataset, representing every crop-disease
pair used [22]. (1) Apple Scab, Venturiain inaequalis (2) Apple Black Rot, Botryosphaeria
obtuse (3) Cedar Apple Rust, Gymnosporangium juniperi-virginianae (4) Apple healthy,
Malus (5) Blueberry healthy, Vaccinium sect.Cyanococcus (6) Cherry healthy, Prunus avium
(7) Cherry Powdery Mildew, Podoshaera clandestine (8) Corn Gray Leaf Spot, Cercospora
zeae-maydis (9) Corn Common Rust, Puccinia sorghi (10) Corn healthy, Zea mays
subsp.mays (11) Corn Northern Leaf Blight, Exserohilum turcicum (12) Grape Black Rot,
Guignardia bidwellii, (13) Grape Black Measles(Esca), Phaeomoniella aleophilum, Phaeo-
moniella chlamydospore (14) Grape Healthy, Vitis (15) Grape Leaf Blight, Pseudocercospora
vitis (16) Orange Huanglongbing(Citrus Greening), Candidatus Liberibacter spp. (17) Peach
Bacterial Spot, Xanthomonas campestris (18) Peach healthy, Prunus persica (19) Bell Pepper
Bacterial Spot, Xanthomonas campestris (20) Bell Pepper healthy, Capsicum annuum Group
(21) Potato Early Blight, Alternaria solani (22) Potato healthy, Solanum tuberosum (23)
Potato Late Blight, Phytophthora infestans (24) Raspberry healthy, Rubus idaeus (25) Soy
bean healthy, Glycine max (26) Squash Powdery Mildew, Erysiphe cichoracearum (27)
Strawberry Healthy, Fragaria x ananassa (28) Strawberry Leaf Scorch, Diplocarpon
earlianum (29) Tomato Bacterial Spot, Xanthomonas campestris pv.vesicatoria (30) Tomato
Early Blight, Alternaria solani (31) Tomato Late Blight, Phytophthora infestans (32) Tomato
Leaf Mold, Passalora fulva (33) Tomato Septoria Leaf Spot, Septoria lycopersici (34) Tomato
Two Spotted Spider Mite, Tetranychus urticae (35) Tomato Target Spot, Corynespora
cassiicola (36) Tomato Mosaic (37) Tomato Yellow Leaf Curl (38) Tomato healthy, Solanum
lycopersicum.
3.2 Measurement of Performance

To have a proper sense of how our working will perform on newly unseen data, and
also to remain a track of if any of our approaches are overfitting with the new data, we
go through all our experiments across a whole range of train-test set splits, namely 80–
20 (80% of the whole dataset used for training, and 20% for testing), 60–40 (60% of
the whole dataset used for training, and 40% for testing), 40–60 (40% of the whole
dataset used for training, and 60% for testing), 20–80 (20% of the whole dataset used
for training, and 80% for testing) (Fig. 2).
3.3 Approach
We evaluate the applicability of deep convolutional neural networks for the classifi-
cation problem described above. We focus on two popular architectures, namely
InceptionV3 [23], and Xception [24].
To summarize-
1. Choice of deep learning architecture:
I. InceptionV3
II. Xception
2. Choice of training mechanism:
i. Transfer Learning.
ii. Training from scratch
3. Choice of dataset:
i. Color
4. Choice of training-testing set distribution:
i. Train: 80%, Test: 20%
ii. Train: 60%, Test: 40%
iii. Train: 40%, Test: 60%
iv. Train: 20%, Test: 80%
To enable a fair comparison between the results of all the experimental configu-
rations, we also tried to standardize the hyper-parameters across all the experiments,
and we used the following hyper-parameters in all of the experiments:
• Base learning rate: 0.001
• Batch size: 32
• Default Image Size: tuple (256,256)
• Epoch: 100
• Depth: 3
• Optimizer: Adam
All the above experiments were conducted using Keras, which is a fast, open source
framework for deep learning. The basic results, such as the overall accuracy can also be
replicated using a standard instance of Keras.
Star
Input Image
Image pre-processing and labelling
Augmentation process
NN Training
Testing
Classified disease Healthy Image
Output result
End
Fig. 2. Flowchart of the entire work.

3.4 Results
The overall accuracy we obtained on the PlantVillage dataset varied from (Train from
scratch 80.28% (epoch = 25, Optimizer = Adam) to 92.04% (epoch = 150, Opti-
mizer = Adam)) and (Train using InceptionV3 85.53% (epoch = 25, Opti-
mizer = Adam)). Also using Xception model the accuracy varies from 98.625%-
99.22% (Table 1 and Figs. 3, 4).
Table 1. Different splits of test and train of dataset using Xception and accuracy at the end of
100 epochs
Model Accuracy Size
Xception (epoch-100) 99.13% 88 mb
0.2% Test & 0.8% Train
Xception (epoch-100) 99.22% ,,
Fig. 3. Training and Validation accuracy using training from Xception (0.6% Test & 0.4%
Train)
Fig. 4. Training and Validation loss using training from Xception (0.6% Test & 0.4% Train)
4 Conclusion
Pre-trained models have been greatly used in machine learning and computer vision
applications also including plant disease identification. The achievements of convo-
lutional neural networks in object recognition and image classification has made
immense advancement in the past few years. The main purpose of this system is to
improve the efficiency of the automatic plant disease detection. Our above mentioned
results show that a large, deep convolutional neural network can achieve significant
results on a highly challenging dataset with the help of purely supervised learning.
Experimental results show that the proposed system can successfully detect and clas-
sify the plant disease with accuracy of 99.22%. In future work, we will extend our
database for more plant disease identification and use large number of data as training
data as training purpose in classification. As we increase the training data, the accuracy
of the system will be high and then we can compare the accuracy rate and speed of
system.
References
1. Sastry, K.S.: Plant Virus and Viroid Diseases in the Tropics, vol. II. Springer, Heidelberg
(2013). https://doi.org/10.1007/978-94-007-7820-7
2. Tai, A.P., Martin, M.V., Heald, C.L.: Threat to future global food security from climate
change and ozone air pollution. Nat. Clim. Chang 4, 817–821 (2014). https://doi.org/10.
1038/nclimate2317
3. Report of the Plenary of the Intergovernmental Science-PolicyPlatform on Biodiversity

Ecosystem Services on the work of its fourth session (2016). Plenary of the Intergovern-
mental Science-Policy Platform on Biodiversity and Ecosystem Services Fourth session.
Kuala Lumpur. http://www.ipbes.net/sites/default/files/downloads/pdf/IPBES-4-4-19-
Amended-Advance.pdf. Accessed 04 Jan 2021
4. Strange, R.N., Scott, P.R.: Plant disease: a threat to global food security. Phytopathology 43,
83–116 (2005). https://doi.org/10.1146/annurev.phyto.43.113004.133839
5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1–9 (2012)
6. Hall, D., McCool, C., Dayoub, F., Sunderhauf, N., Upcroft, B.: Evaluation of features for
leaf classification in challenging conditions. In: 2015 IEEE Winter Conference on
Applications of Computer Vision, pp. 797–804 (2015)
7. Jin, J., Fu, K., Zhang, C.: Traffic sign recognition with hinge loss trained convolutional
neural networks. IEEE Trans. Intell. Transp. Syst. 15, 1991–2000 (2014)
8. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual
object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010). https://doi.org/10.
1007/s11263-009-0275-4
9. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale
hierarchical image database. In: IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2009. (IEEE) (2009)
10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.)
Advances in Neural Information Processing Systems, pp. 1097–1105. Curran Associates,
Inc. (2012)
recognition. arXiv:1409.1556 (2014)
12. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet,
D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833.
Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:
1512.03385 (2015)
14. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.: Going deeper with
convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (2015)
15. Kutty, S.B., et al.: Classification of watermelon leaf diseases using neural network analysis.
In: 2013 IEEE Business Engineering and Industrial Applications Colloquium (BEIAC),
pp. 459–464, April 2013. IEEE
16. Sannakki, S.S., Rajpurohit, V.S., Nargund, V.B., Kulkarni, P.: Diagnosis and classification
of grape leaf diseases using neural networks. In: 2013 Fourth International Conference on
Computing, Communications and Networking Technologies (ICCCNT), pp. 1–5, July 2013.
IEEE
17. Akhtar, A., Khanum, A., Khan, S.A., Shaukat, A.: Automated plant disease analysis
(APDA): performance comparison of machine learning techniques. In: 2013 11th
International Conference on Frontiers of Information Technology, pp. 60–65, December
2013. IEEE
18. Mokhtar, U., Ali, M.A., Hassenian, A.E., Hefny, H.: Tomato leaves diseases detection
approach based on support vector machines. In: 2015 11th International Computer
Engineering Conference (ICENCO), pp. 246–250, December 2015. IEEE
19. Patki, S.S., Sable, G.S.: Cotton leaf disease detection & classification using multi SVM. Int.
J. Adv. Res. Comput. Commun. Eng. 5(10), 165–168 (2016)
20. Gavhale, K.R., Gawande, U.: An overview of the research on plant leaves disease detection
using image processing techniques. IOSR J. Comput. Eng. (IOSR-JCE) 16(1), 10–16 (2014)
21. Tajane, V., Janwe, N.J.: Medicinal plants disease identification using canny edge detection
algorithm, analysis and CBIR. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6), 530–536
(2014)
22. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant
disease detection. Front. Plant Sci. 7, 1419 (2016). p. 3
23. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the
impact of residual connections on learning. In: Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 31, no. 1, February 2017
24. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258
(2017)
Deep Learning and Machine Learning
Applications
Distributed Self-triggered Optimization
for Multi-agent Systems
Komal Mehmood1(B) and Maryam Mehmood2

1
University of Engineering and Technolgy, UET Lahore, Lahore, Pakistan
komal jarral@yahoo.com
2
Mirpur University of Science and Technology, MUST Mirpur AK, Mirpur, Pakistan
aquerius90@yahoo.com
Abstract. In this paper, distributed constrained convex optimization

problem for multi-agent consensus has been investigated. The exchange
of information among agents is required for convergence in Distributed
optimization algorithms. Scaling of such multi-agent networks is largely
hampered by the bandwidth limitation. To address this issue, we propose
self-triggered based information exchange among the agents. Specifically,
each agent computes its next information exchange time based on the
current state information. The proposed algorithm reduces communica-
tion burden as compared to its periodic counterpart. When the consensus
is reached, the interval between two consecutive information exchanges
becomes large and as a result overall data rate requirement is reduced.
Numerical results prove the effectiveness of the proposed self-triggered
based mechanism.
Keywords: Distributed networks · Event-triggered optimization ·

Self-triggered optimization · Multi-agent systems · Distributed convex
optimization
1 Introduction
In the past few years, many efforts have been made to bring improvement to the
multi-agent systems, including their cooperation, consensus, formation, optimiza-
tion and so on [1–4]. Out of these, multiagent consensus is an important problem
in which all the agents should achieve a common state. Many interesting results
have been obtained for multi-agent consensus problems, particularly for non-
linear systems in the past decade. For example, optimal consensus have been
achieved while rejecting external disturbances for a class of non-linear multi-
agent systems in [5]. Also, a novel algorithm have been proposed for a system
with time-varying asymmetric state constraints of the agents in [6]. Second-Order
Consensus for Multiagent Systems have been proposed in [7]
Apart from achieving consensus in multi-agent systems, there also exist other
concerns regarding most practical problems. Another important area of study is
the optimization of multi-agent systems that makes all the agents converge to
https://doi.org/10.1007/978-3-030-93247-3_15
146 K. Mehmood and M. Mehmood
the optimal value. Specifically, distributed optimization has aroused interest in

researchers. The main aim of distributed optimization is to optimize the global
cost function which is described as the sum of each agents' individual cost func-
tion. Each agent is accessible only to its own local cost and to the cost of its
neighbors. Therefore, optimal consensus and distributed optimization problems
have been popular research topics recently. The distributed optimization prob-
lems with consensus are more complex than the consensus problems without
optimization. In the past few years, a few novel discrete-time algorithms have
been proposed to solve these problems, including subgradient algorithms [8,9]
and alternating direction method of multipliers [10]. Also some researchers have
come up with continuous-time algorithms to work with distributed optimization
problems [11–16].
As we are moving towards multiple agents, we need to find ways to reduce the
bandwidth and computational requirements per agent. One way to overcome this
problem is to allow every agent to collect information from neighboring nodes
and update its values according to some rules instead of using the traditional
scheme of sampling at equal time intervals.
Event-triggered algorithms for multi-agent distributed optimization problems
have been proposed in [17–21]. [18,20] solve consensus problems while in [17] time
varying directed balanced graph has been studied. A distributed optimization
algorithm where step sizes are uncoordinated converges geometrically to the
optimal solution in [19]. [21] proposed a finite-time consensus control for second-
order multi-agent systems.
In event-driven schemes, a regular measurement of state error is required
to be compared with a defined threshold in order to find out when to update
the system parameters, we in this paper consider self-driven scheme for the
multi-agent systems, where there is no need to keep track of the state error
measurement but next update time is pre-computed at the previous update.
In the next Sect. 2 some useful notations and graph theory are explained,
Sect. 3 describes a multi-agent system with distributed optimization and analyzes
the system optimality. Section 5 proposes a self-triggered algorithm. Section 6
contains a numerical example with simulation results. Section 7 gives a conclu-
sion.
2 Preliminaries and Graph Theory
A set of N scaler agents interconnected with each other according to a commu-

nication topology explained as a communication graph G = (V,E) in which V
= 1,2,...,N shows the set of nodes while E ⊆ V×V represents the set of edges. If
agent i and agent j are directly connected they form an edge (i, j) ∈ E, and they
are termed as neighboring agents. In this paper we are studying an undirected
weighted graph G, i.e. (i, j) ∈ E and (j, i) ∈ E are equivalent. A = aij is called
the weighted adjacency matrix whose diagonal entries are all zero i.e. aij = 0
for all i = j and aij > 0 for all i and j connected with each other making an
edge i.e. (i, j) ∈ E. G is considered to be a connected graph that means there is
Distributed Self-triggered Optimization for Multi-agent Systems 147
a way from each i to every j through distinct nodes, determining that they are
connected through adjacent nodes. A matrix D defined as the degree matrix of
G is a diagonal matrix D = diag{di} with diagonal entries di = j∈Ni aij and
Ni is the set of neighboring agents defined as {j ∈ V : (j, i) ∈ E}. The difference
of degree and adjacency matrices D − A is the Laplacian matrix of G, denoted
by L, i.e. L = D − A.
For undirected graphs, L is a symmetric and positive semi-definite matrix.
All the rows of L sum up to zero, and thus the vector of 1's is an eigenvector
corresponding to eigenvalue λ1 (L) = 0, i.e., L1 = 0. If the graph is connected,
L has one eigenvalue equal to zero and other eigenvalues are all positive, i.e.,
0 = λ1 (L) < λ2 (L) ≤ ... ≤ λn (L)
3 System Model
We consider a general model of a distributed convex optimization problem for a
class of multi-agent systems. The optimization problem is formulated as:

n
min ai x2i + bi xi + ci (1)
i=1

n
s.t xi = XD (2)
i=1
Where i = 1, 2, 3, ..., n are the agents, xi is the state of agent i, XD is a constant

such as the states of all agents always sum up to XD . The objective function is
a quadratic function, and the global cost function is the sum of cost functions
of all agents.
The Lagrangian of the above problem is formulated as:

n
L(x1 , x2 , ..., xn , λ) = min ai x2i + bi xi + ci
i=1
(3)

n
+ λ(XD − xi )
i=1
Where λ is the Lagrange multiplier. Solving it for λ, we get

∂L(xi , λ)
= 2ai xi + bi − λ = 0 (4)
∂xi
λ = 2ai xi + bi (5)
From 4 and 5
λ = 2a1 x1 + b1 = 2a2 x2 + b2 = .... = 2an xn + bn
or
λ1 = λ2 = ... = λn = λ∗ (6)
from 2 and 6 n bi
∗
XD + i=1 2ai
λ = n 1 (7)
i=1 2ai
This gives the optimal value of λ, all agents have to reach this optimal value
after certain number of updates, and
λ ∗ − bi
x∗i = (8)
2ai
is the optimal value of xi . Thus 7 and 8 give us the values of λ and x of all
agents when our system reaches consensus.
4 Distributed Event-Triggered Algorithm
Our main concern is to come up with a self-triggered sampler to solve the average
consensus problem. To propose a self-triggered sampler, we will first address
an event-triggered sampling rule proposed in [18]. Then we will design a self-
triggered sampler by exploiting the error condition from that event-triggered
sampling rule. The distributed event-triggered sampler as proposed in [18] is
described as:
λ̇i = −ki aij (λi (tik ) − λj (tjk )) (9)
jεNi
λi (k + 1) = λi (k) + dλi (10)

ki = 2ai , ki > 0. Here λi (tik ) and λj (tjk )
are the recently updated values of agent
i and its neighboring agents jεNi . And tik denote the k th sampling time of agent
i. The event-triggering condition for agent i as designed in [18] is:

λi (tik ) − λi (t) ≤ ci aij (λi (tik ) − λj (tjk )) (11)
jNi
The proof of 11 is present in [18].
5 Distributed Self-triggered Algorithm
In this section, we propose a self-triggering algorithm to solve the distributed

optimization problem.
Self-triggered optimization is an aperiodic sampling method where we don’t
need to calculate the error and compare it with the threshold on each iteration
as in the event-triggered case. Instead, at one update time we will calculate the
time for the next update, i.e. at tk we will calculate the time for the next update
tk+1 .
Lemma: 1 Let,

g(t) := λi (tik ) − λi (t) ≤ ci aij (λi (tik ) − λj (tjk )) := δki (12)
jNi
Then g(t) is bounded with
dλi (tk ) Ldλ ,λ (t−tk )

g(t) ≤ (e i i − 1) := g(λik , t − tk ) (13)
Ldλi ,λi
where Lidλ,λ is the Lipschitz constant of dλi with respect to λi . Since g(λik , t−tk )
is increasing with time t, there will come a time t = tk+1 , when g(λik , tk+1 −tk ) =
δki
Then,
dλi (tk )Ldλi ,λi (eLdλi ,λi (tk+1 −tk ) − 1) = δki (14)
δki Ldλi ,λi
eLdλi ,λi (tk+1 −tk ) = 1 +
dλi (tk )
δki Ldλi ,λi
Ldλi ,λi (tk+1 − tk ) = ln(1 + )
dλi (tk )
1 δki Ldλi ,λi
tk+1 − tk = ln(1 + )
Ldλi ,λi dλi (tk )
Thus the algorithm for the next update time of agent i is as follows:
1 δki Ldλi ,λi

tik+1 = tik + ln ( 1 + ) (15)
Ldλi ,λi dλi (tk )
This self-triggered sampler 15 will make sure tik+1 − tik > 0 for all k.
Proof of Lemma 1: Let

g(s) = λi (tik ) − λi (s)
d d
g(s) = − λi (s)
ds ds
d
g(s) = −dλi (s)
ds
s
g(s) = dλi (σ)dσ
sk
s s
g(s) = dλi (σ) − dλi (tk )dσ + dλi (tk )dσ
sk sk
s
s
dλi (σ) − dλi (tk )
g(s) = (λi (σ) − λi (tk ))dσ + dλi (tk )dσ
sk λi (σ) − λi (tk ) sk
s s
g(s) = Ldλi ,λi g(σ)dσ + dλi (tk )dσ
sk sk
Taking norm on both sides,

s s
g(s) ≤ Ldλi ,λi g(σ)dσ + dλi (tk )dσ
sk sk
By Leibniz Theorem,
d
g(s) ≤ Ldλi ,λi g(s) + dλi (tk )
ds
dλi (tk ) Ldλ ,λ (s−sk )
g(s) ≤ (e i i − 1)
Ldλi ,λi
Remark 1: After every event, we find out which agent has the nearest next
update in time. When an update time of agent i, tik is reached, it will calculate
the value of λ̇i using the current state information from its neighboring agents,
and update it’s state according to (9). Then time for next update tik+1 will
be calculated using (15). In the meantime, agent i will transmit it’s updated
information to it’s neighbors, and the neighbors on receiving this information
will update their state values too and will also recalculate the time for their
coming updates respectively.
5.1 Communication Delays
Till this point we have ignored the communication delays between the agents.
In this section, we will consider the possible delay of the updated information
from the neighboring agents to reach agent i.
Let τij be the time it takes for the data to transfer from agent i to agent j
or vice versa, i.e. τij = τji
Then the new update equation is given by,

λ̇i = −ki aij (λi (tik ) − λj (tik − τij )) (16)
jεNi
Here tik is the time when agent i will update its state using the latest state
information from it’s neighbors. And the latest update from the neighbors which
can reach i before time tik is the value at time tik − τij for any agent j.
After calculating λ̇i , agent i will calculate the time for its next update tik+1
using (15). For this agent i needs to compute the value of δki first, from (11) and
(12) the new value of δki with added communication delay is as follows,

δki := ci aij (λi (tik ) − λj (tik − τij ))
jNi
Hence, by following Lemma 1, Eq. 14 with maximum possible delays in arrival

of states at node i is given by
∗
dλi (tk )Ldλi ,λi (eLdλi ,λi (tk+1 −tk −τij ) − 1) = δki (17)
Where
∗
τij = max τij
jNi
and
∗ 1 δki Ldλi λi
tik+1 = tik + τij + ln(1 + ) (18)
Ldλi λi dλi (tk )
6 Numerical Example
This section contains a numerical example to prove the effectiveness of the pro-
posed algorithm.
Consider a multi-agent system with 6 agents, connected via undirected graph
with fixed communication topology, as shown in Fig. 1. In Table 1, there are the
values of coefficients ai and bi of the cost functions of each agent.
Fig. 1. Communication topology
Let XD = 600. The initial values of agents are set as x(0) =

(100, 90, 90, 120, 110, 90)T . From Eq. 5 initial values of λi of each agent are λ1 (0)
= 20.42, λ2 (0) = 16.37, λ3 (0) = 21.43, λ4 (0) = 23.70, λ5 (0) = 19.45, λ6 (0) =
18.08.
The optimal value of λ as calculated by Eq. 7 is λ∗ = 19.76, and the
corresponding optimal values of xi are x∗1 = 96.5966, x∗2 = 113.5872, x∗3 =
82.0788, x∗4 = 96.0156, x∗5 = 112.1391, x∗6 = 99.5827. The constant ki = (2ai ).
The proposed algorithm with the above mentioned parameter values are illus-
trated by a computer simulation. Figure 2 shows the changing values of λi on
every update. According to the update equation, the change in the state of each
agent also depends on the states of its neighboring agents. It can be seen that
Table 1. Coefficients of all agents
i 1 2 3 4 5 6
ai 0.096 0.072 0.105 0.082 0.074 0.088
bi 1.22 3.41 2.53 4.02 3.17 2.24
ci 51 31 78 42 62 45
after a few iterations all agents achieve consensus and all λs reach their optimal
value. Similarly, Fig. 3 shows the corresponding evolution of xi where all agents
achieve their respective optimal values.
In Fig. 4, we have plotted the inter-sampling times for all six agents against
the number of iterations using self-triggered sampling algorithm and it is also
compared with a respective periodic update rule. The sampling period for peri-
odic updates is set to 0.2 ms and it remains constant for all iterations. On the
other hand, the inter-sampling times of all agents for self-triggered sampling are
increasing with every iteration, because with every iteration the states of agents
are updated and they are coming closer to consensus. When reached consensus
the inter-sampling times become very large thus reducing the communication
between agents during a steady state.
For the next result, we added a transient at node 1 that changed the state
of agent 1, and as a result a disturbance occurred in the system reducing the
inter-sampling times as shown in Fig. 5. A transient at node 1 also disturbed its
neighbors, thus we can see a drastic fall in the inter-sampling times of agent 1
and it’s neighbors due to the transient. In Fig. 6, we plotted the inter-sampling
times of agent 1 with and without a transient to have a clear idea of how the
proposed sampler behaves at a transient. Finally Fig. 7 shows another way of
plotting the self-triggered samples against time, for the case without a transient
occurring at node 1, showing the gap between updates for all agents individually.
It can be seen that the inter-sampling times of all agents were very small in the
beginning and increased with time as the system moves towards steady state.
Here the x-axis is showing time in seconds and y-axis is showing the agents’
numbers.
Fig. 2. The evolution of λi with self- Fig. 3. State evolution with self-
triggered sampling. triggered sampling.
Fig. 4. Inter-sampling times obtained Fig. 5. A sudden decrease in the inter-

with the self-triggered implementation sampling times when a transient occurs
and a periodic sampler with inter- at node 1.
sampling time 0.0002 s.
Fig. 6. Inter-sampling time of agent 1 Fig. 7. Samples of each agent.

with and without transient.
7 Conclusion
In this paper, a distributed convex optimization problem for a class of multi-
agent systems have been studied. The aim of this paper is to reduce computa-
tional complexity and unnecessary communication between agents. Thus, a self-
triggered algorithm has been proposed which, unlike event-triggered algorithm
needs no measurement error of the agents but only the knowledge of the state
of the neighboring agents. The proposed algorithm is simple and helps to reach
consensus while reducing the number of triggering events, controller updates and
communication transmission. The algorithm has been extended to the case with
communication delays. In future, we are looking forward to extend the proposed
approach by introducing multi-agent systems having directed graphs, also the
communication topology can be changed to switching topology.
References
1. Wen, X., Qin, S.: A projection-based continuous-time algorithm for distributed
optimization over multi-agent systems. Complex Intell. Syst., 1–11 (2021)
2. Margellos, K., Falsone, A., Garatti, S., Prandini, M.: Distributed constrained opti-
mization and consensus in uncertain networks via proximal minimization. IEEE
Trans. Automatic Control (2017)
3. Qiu, Z., Liu, S., Xie, L.: Distributed constrained optimal consensus under fixed
time delays. In: Control, Automation, Robotics and Vision (ICARCV), 2016 14th
International Conference on IEEE, pp. 1–6 (2016)
4. Feng, K., Wang, Y., Zhou, H., Wang, Z.H., Liu, Z.W.: Second-order consensus of
multi-agent systems with nonlinear dynamics and time-varying delays via impulsive
control. In: Control and Decision Conference (CCDC), 2016 Chinese. IEEE, pp.
1304–1309 (2016)
5. Wang, X., Hong, Y., Ji, H.: Distributed optimization for a class of nonlinear multi-
agent systems with disturbance rejection. IEEE Trans. Cybern. 46(7), 1655–1666
(2016)
6. Meng, W., Yang, Q., Si, J., Sun, Y.: Consensus control of nonlinear multiagent
systems with time-varying state constraints. IEEE Trans. Cybern. (2017)
7. Su, H., Liu, Y., Zeng, Z.: Second-order consensus for multiagent systems via inter-
mittent sampled position data control. IEEE Trans. Cybern. 50(5), 2063–2072
(2020). https://doi.org/10.1109/TCYB.2018.2879327
8. Lou, Y., Shi, G., Johansson, K.H., Hong, Y.: Approximate projected consensus
for convex intersection computation: convergence analysis and critical error angle.
IEEE Trans. Automatic Control 59(7), 1722–1736 (2014)
9. Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent opti-
mization. IEEE Trans. Automatic Control 54(1), 48–61 (2009)
10. Wei, E., Ozdaglar, A.: Distributed alternating direction method of multipliers. In:
Decision and Control (CDC), 2012 IEEE 51st Annual Conference on IEEE, pp.
5445–5450 (2012)
11. Huang, B., Zou, Y., Meng, Z.: Distributed continuous-time constrained convex
optimization with general time-varying cost functions. Int. J. Robust Nonlinear
Control 31(6), 2222–2236 (2021)
12. Yang, S., Liu, Q., Wang, J.: A multi-agent system with a proportional-integral
protocol for distributed constrained optimization. IEEE Trans. Automatic Control
62(7), 3461–3467 (2017)
13. Gharesifard, B., Cortés, J.: Distributed continuous-time convex optimization on
weight-balanced digraphs. IEEE Trans. Automatic Control 59(3), 781–786 (2014)
14. Kia, S.S., Cortés, J., Martı́nez, S.: Distributed convex optimization via continuous-
time coordination algorithms with discrete-time communication. Automatica 55,
254–264 (2015)
15. Liu, Q., Wang, J.: A second-order multi-agent network for bound-constrained dis-
tributed optimization. IEEE Trans. Automatic Control 60(12), 3310–3315 (2015)
16. Shi, G., Johansson, K.H., Hong, Y.: Reaching an optimal consensus: dynamical
systems that compute intersections of convex sets. IEEE Trans. Automatic Control
58(3), 610–622 (2013)
17. Li, H. Liu, S., Soh, Y.C., Xie, L.: Event-triggered communication and data rate
constraint for distributed optimization of multiagent systems. IEEE Trans. Syst.
Man Cybern. Syst. (2017)
18. Chen, G., Dai, M., Zhao, Z.: A distributed event-triggered scheme for a convex
optimization problem in multi-agent systems. In: Control Conference (CCC), 2017
36th Chinese. IEEE, pp. 8731–8736 (2017)
19. Lü, Q., Li, H., Liao, X., Li, H.: Geometrical convergence rate for distributed opti-
mization with zero-like-free event-triggered communication scheme and uncoordi-
nated step-sizes. In: Information Science and Technology (ICIST), 2017 Seventh
International Conference on IEEE, pp. 351–358 (2017)
20. Li, X., Tang, Y., Karimi, H.R.: Consensus of multi-agent systems via fully dis-
tributed event-triggered control. Automatica 116, 108898 (2020)
21. Li, Q., Wei, J., Yuan, J., Gou, Q., Niu, Z.: Distributed event-triggered adaptive
finite-time consensus control for second-order multi-agent systems with connectiv-
ity preservation. J. Franklin Institute (2021)
Automatic Categorization of News
Articles and Headlines Using Multi-layer
Perceptron
Fatima Jahara, Omar Sharif , and Mohammed Moshiul Hoque(B)
Department of Computer Science and Engineering, Chittagong University

of Engineering and Technology, Chittagong 4349, Bangladesh
fatimajahara@ieee.org, {omar.sharif,moshiul 240}@cuet.ac.bd
Abstract. News categorization is the task of automatically assigning

the news articles or headlines to a particular class. The proliferation of
social media and various web 2.0 platforms usage has resulted in substan-
tial textual online content. The majority of this textual data is unstruc-
tured, which is extremely hard and time-consuming to organize, manip-
ulate, and manage. Due to the fast and cost-effective nature, automatic
news classification has attained increased attention from news agencies
in recent years. This paper introduces a deep learning-based framework
using multilayer perceptron (MLP) to classify Bengali news articles and
headlines into multiple categories:accident, crime, entertainment, and
sports. Due to the unavailability of the Bengali news corpus, this work
also developed a dataset containing 76343 news articles and 76343 head-
lines. Additionally, this work investigates the performance of the pro-
posed classifier using five-word embedding techniques. The comparative
analysis reveals that MLP with Keras embedding layer outperformed the
other embedding models achieving the highest accuracy of 98.18% (news
articles) and 94.53% (news headlines).
Keywords: Natural language processing · Text classification · News

categorization · Deep learning · News corpus
1 Introduction
With the rapid increase of online news sources and the availability of the Inter-
net, people are preferring to read daily news from news portals. Thousands of
news portals are constantly providing updated news articles and headlines every
hour n the Bengali text. Most of these are textual contents are unorganized or
unstructured. Thus, it has become almost impracticable for a group of editorials
to categorize these massive amounts of news articles by reading each of them.
Moreover, the variability of different arrangements and categorization methods
makes it troublesome for the users to decide their favored news articles/headlines
for a particular class without browsing through exclusive news portals. Manual
classification of massive online news content is time-consuming, complicated, and
costly due to its messy nature. Thus, an automatic news classification can be a
https://doi.org/10.1007/978-3-030-93247-3_16
156 F. Jahara et al.
potential solution that uses deep learning (DL) and NLP in a more agile, inex-
pensive, and reliable way to analyze a massive amount of news content. Bengali
news classification can help make the user experience better and web searching
using news categories. Moreover, Bengali online news portals can be used to sort
news articles/headlines and search an article easily.
Textual news classification is the process of classifying or tagging texts col-
lected from news articles into their predefined categories. Although several tech-
niques are available to develop news classification in high-resource language,
a minimal number of researches have been conducted on news classification in
low-resource languages, including Bengali. Bengali is the 7th most widely spoken
language globally, and with the rapid growth of Bengali news articles portals, the
necessity of an automated classification system becomes an ought. Classification
of Bengali news concerning both articles and headlines is challenging due to the
lack of standard news corpus and NLP tools. Additionally, variations of textual
contents in different news classes produce an imbalanced dataset making the
classification task more complicated. Most of the previous studies on Bengali
news classification systems focused on classifying news based on headlines or
articles using ML techniques with TF-IDF features which provided lower accu-
racy. To address these issues, this work introduces a deep learning technique
(i.e., multi-layer perceptron) with Word2Vec embedding to categorize Bengali
news articles and headlines. The specific contributions of this work are:
– Develop a corpus containing 76343 news articles (523403 unique words) and
76343 news headlines (61470 unique words) into four news classes.
– Investigate various word embedding techniques including Keras embedding,
Word2Vec, and FastText with parameters tuning.
– Propose a multi-layer perceptron (MLP)-based model with optimizing hyper-
parameters to classify Bengali news articles and headlines.
– Investigate and compare the performance of the proposed model with existing
techniques.
2 Related Work
Several studies have been carried out on textual news classification in high-
resource languages such as English and Chinese. However, Bengali news classi-
fication is in the primitive stage to date. Cecchini et al. [1] used three models:
SVM, MAXENT, and CNN with TF-IDF to classify Chinese news articles. Stein
et al. [2] proposed NB, DT, RF, SVM, and MLP for English news article classi-
fication based on both authors and topics.
Mandal et al. [3] proposed an ML-based technique to classify Bengali web
texts using DT, KNN, NB, and SVM on 1000 documents. Alam et al. [4] used
LR, NN, NB, RF, and Adaboost models with Word2Vec and TF-IDF feature
extractions for Bengali news articles classification where NN with Word2Vec
outperformed other models. Recent work used m-BERT and XLM-RoBERTa to
categorize Bengali textual news [5]. Rabib et al. [6] used several ML techniques
Automatic Categorization of News Articles and Headlines 157
(e.g., SVM, NV, RF, and LR) for news classification. This work also used BiL-
STM and CNN for fine-tuned predictions of Bengali news into 12 categories. A
recent research used the DL-based methods, including MLP, CNN, RNN, LSTM,
C-LSTM, Bi-LSTM, HAN, CHAN to classify news articles and titles into 10 pre-
defined categories where Bi-LSTM with Word2Vec (Skip-gram) performed the
best [7]. Shopon et al. [8] proposed a BiLSTM model to classify Bengali news
articles into 12 different categories based on news captions. Shahin et al. [9]
investigated ANN, SVM, LSTM, and Bi-LSTM model for Bengali news head-
lines classification into 8 classes where Bi-LSTM has outperformed the other
models. Most of past studies focused on classifying Bengali news either based
on articles or headlines. These studies concentrated on using default models
without mentioning hyperparameters optimization. Moreover, none of the work
investigated Keras embedding layer on the Bengali news classification task. This
work proposes an MLP-based news article and headlines classification system
with Keras embedding layers and hyperparameters optimization to address the
weaknesses of past news classification systems in Bengali.
3 News Corpus Development

Due to the unavailability of the standard corpus in Bengali, this research devel-
oped a textual news corpus. The corpus development process has followed the
directions suggested by Das et al. [10]. Steps for dataset preparation are:
– Data Accumulation: The data collection task was automated by web-
scraping the news portals. A total of 76359 news articles are accumulated
from 5 renowned sources: Prothom Alo, Daily Nayadiganta, Daily Samakal,
Kaler Kantho, and Bhorer Kagoj (Table 1b). The news articles collection span
from the year 2015 to 2020. Five participants have been accumulated data
from these sources. We have scraped almost 76343 news articles to create
the corpus for the news classification system. The crawling is done based on
four categories: accident, crime, sports, and entertainment. The title, author,
date, and description are the four main meta information that also encoded.
The news articles and headlines are considered to crawl data.
– Data Cleaning: Data cleaning is the process of preparing data by remov-
ing unnecessary data, which prevents coarse data from providing inaccurate
results. General Bengali news articles may contain Bengali as well as English
digits and also some foreign words. These digits, foreign words, and various
punctuations have been removed from the dataset due to their insignificance
in model classification. We have also removed 398 common Bangla stop words
to focus on the critical words that bear meaning in the data context. The raw
data contained 15,655,516 total words and 662,996 unique words in the news
articles, but after the preprocessing and cleaning, a total of 11,843,411 words
with 523,403 unique words are remaining included in the corpus. The news
headlines raw data contained 516,781 total words and 60,167 unique words.
A total of 373,860 words (61,470 unique words) have been included in the
corpus for the headlines.
3.1 Data Statistics
The developed corpus contains 76,343 news documents where news articles con-
tributed 15,655,516 words and headlines contribution is 516,781 words. Table 1
highlights the summary of the developed corpus in each category.
Table 1. Class-wise data statistics.
Class Data Total words Unique words

Articles Headlines Articles Headlines
Accident 11841 2,077,659 74,387 100,018 6,834
Crime 11,222 2,998,583 78,160 127,494 10,175
Entertainment 17,568 2,828,029 120,263 166,258 20174
Sports 35,712 7,751,245 243,971 269,226 22,984
Total 76,343 15,655,516 516,781 662,996 60,167
Figure 1a shows the distribution of the dataset into four categories where
46.8% data belongs to the Sports category while the other classes contributed a
similar proportion of data. Figure 1b shows the wise source distribution of the
news corpus where the maximum amount of data is accumulated from the news
portal of ‘Kaler Kantho’.
News portal Data

Kaler Kantho 41950
Prothom Alo 20613
Bhorer Kagoj 10795
Daily Nayadiganta 2066
Daily Samakal 919
(b) Source-wise data distribution.
(a) Class-wise data distribution.
Fig. 1. Summary of data distribution in the corpus.
4 News Classification Framework
The proposed research aims to develop a news article classification framework

that can classify the Bengali news articles and headlines into four predefined
categories. Figure 2 demonstrates the abstract framework of the proposed news
classification, which consists of four main modules: preprocessing, embedding
model generation, classification model generation, and prediction.
Fig. 2. Abstract view of the proposed news classification framework.
4.1 Data Preprocessing
Several prepossessing on raw data is a necessity before feeding it to the embed-

ding or classification models.
– Label Encoding: This is the process of converting labels into numeral values
which specifically converts text categories into machine-readable form. In this
work, all labels are encoded with the unique integer (from 1 to 4).
– Tokenization: This process divide a sentence/text into a sequence words
called tokens [11]. A token is a string of contiguous characters grouped as
a semantic unit and delimited by space, punctuation marks, and newlines.
Each news is tokenized, producing a total of 11,843,411 tokens for articles
and 373,860 for headlines.
– Word Encoding: This process transforms the words of a text into numbers
to maps each unique word to a particular value (for example, 224 for the word
). A pre-defined set of words in the tokenized train set is chosen by
limiting the number of most frequent words for encoding with a word index
(ranging from 1 to 331530).
– Text Sequencing: This process converts a text document into a list of inte-
gers. Each word in the news (i.e., articles and headlines) is assigned with an
integer value in the document.
– Padding: All the articles do not appear as the same length in the corpus.
Thus, to train the model, padding is used for scaling the lists into the same
length [12]. This work used post padding with ‘0’ at the end of the sequence
to make them of the same length 2642. The length greater than 2642 is
truncated.
4.2 Word Embedding
Word embedding is a type of feature extraction technique for selecting a set of

relevant features from the texts, reducing the amount of input data for classifier
training. Although several embedding techniques are extensively used in text
classification, this work used the three most common techniques: Keras embed-
ding layer, Word2Vec, and FastText. FastText exploit sub-word information to
construct an embedding model where word representations are learned from the
character n-grams and the sum of n-gram vectors [13]. Two variants: a contin-
uous bag of words (CBOW) and Skip-gram are considered for Word2Vec and
FastText with ‘Gensim’. Table 2 summarizes the most common parameters with
their corresponding values used to generate embedding models.
Table 2. Parameters of embedding models.
Embedding Attribute Description Optimal

value
Keras embedding input dim Size of the vocabulary 100,000
layer
output dim Embedding dimension 800
input length Input sequence length 2642
Word2Vec & sg Training algorithm: CBOW(0) or 0,1
FastText Skip-gram(1)
Size (output dim) Embedding dimension 800
Window Maximum distance between a target 5
word and words around the target word
min count Minimum count of words 5
input length Input sequence length 2642
For Word2Vec and FastText (for both CBOW and Skip-gram), the same
set of values have been used. The embedding model maps the vocabulary and
represents each word through a feature vector of dimension (input length x out-
put dim) (2642 × 800) which is then fed to the classifier model. The resultant
vector is a dense vector consisting of real values. The embedding layer works
like a lookup table where the words are the keys, and the feature vectors are the
values.
4.3 Classifier Model Generation
This work proposes an MLP-based deep learning model to classify Bengali news
articles and headlines. Moreover, several ML classifiers such as LR, RF, NB,
DT, KNN, and SVM are also trained with TF-IDF features to investigate the
performance of the textual news classification task.
MLP consists of an input layer, an output layer, and one hidden layer between
these two layers. The neurons in the hidden layers perform the classification
of the features with the help of non-linear activation functions (ReLU) and
predefined weights and biases. Figure 3 illustrates the architecture of the MLP
classifier model.
Fig. 3. Architecture of MLP-based model for news classification.
The input layer takes the processed train set and passes them to the adjacent
layer of the model. Since no processing is done in this layer, the input and out-
put shapes are the same: (None, input length) (None, 2642) where input length
denotes the length of the input prepossessed data. The embedding layer extracts
the features, which generates a feature matrix of size (None, input length, out-
put dim) (None, 2642, 800) input length denotes the length of the input data
and output dim represents the dimension of the embedding. The matrix is then
flattened into a one-dimensional vector of shape (None,output dim) (None,
800) using the GlobalAveragePooling 1D layer. The flattened vector then passes
to the dense layer, a hidden layer with 450 units that use the ‘relu’ activation
function to learn the specific learning parameters. To avoid overfitting dropout
rate of 0.5 is used through the dropout layer. The output shape of this layer is
(None, units) (None, 450). The final dense layer produces the output predic-
tion results of shape (None, units) (None, 29). It has several units equal to the
number of categories and uses the ‘softmax’ activation function to predict class
labels’ probabilities.
The generated model is first compiled using the ‘Adam’ optimizer with a
learning rate of ‘0.001’ and the ‘sparse categorical crossentropy’ loss function.
For tuning the model, we have used ‘accuracy’ as the metric. The preprocessed
train set and validation set and their encoded labels are fed to the compiled
model for training and tuning. We have adopted a set of necessary hyperparam-
eters for tuning the model to get the optimal value of the parameters. After the
model is tuned through hyperparameter optimization, the optimal values are
used to train the model. The classifier model is trained by applying the optimal
set of parameters and using the train set to create the trained classifier model.
4.4 Prediction
The trained classifier model is used to predict the labels of the unseen test
samples. Class labels of the unlabeled test set are used for the evaluation of the
trained classifier model. A total of 7592 news articles of the pre-processed test
set is fed to the trained model, which then predicts the label of the news articles
using the ‘softmax’ probability distribution (Eq. 1).
exp(θi )
Sof tmax(θi ) = n (1)
i=1 exp(θi )
here, θi denotes the output feature vector from the trained model, and n rep-
resents the number of categories. The output values range from 0 to 1, and the
class one with the highest probability is taken as the predicted label.
5 Experiments
Classifier models implemented on python 3.6.9 framework with scikit-learn 0.22.2

packages. Pandas and NumPy 1.18.5 are used to prepare the data. The ‘Scikit-
learn’ is used to implement the machine learning classifiers. Parameters of the
classifiers are selected by trial and error approach during experimentation. The
experiments have been performed on a general-purpose computer with an Intel®
CoreTM i3-5005U CPU running at 2.00 GHz, 4.0 GB RAM, and 64-bit Windows
10 Pro. Google Colab with Keras and Tensorflow backend is used as the deep
learning framework. Overall, 80% (61122 documents) of data is used for training,
10% (7629 documents) for validation and 10% (7529 documents) for testing
purposes. The MLP model with the embedding models is tuned with different
hyperparameters. Table 3 illustrates a summary of the hyperparameters utilized
by the MLP classifier.
Table 3. Hyper-parameters summary.
Hyper-parameter Search space Optimal

value
Learning rate [ 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001] 0.001
Optimizer [Adam, Adamax, Adagrad, Adadelta, Nadam, Adam
RMSprop, SGD, Ftrl]
Activation [ReLU, tanh, sigmoid, softmax, softplus, softsign, ReLU
function selu, elu]
Dropout [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] 0.5
No. of hidden [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 1
layers
No. of units (per [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 450
hidden layer) 250, 300, 350, 400, 450, 500, 550, 600, 650, 700,
750, 800, 900, 1000]
Batch size [1, 32, 64, 128, 256, 1024, 61122] 32
Embedding [10, 50, 100, 150, 200, 250, 300, 350, 400, 500, 800
dimension 600, 700, 800]
Vocab size [5k, 10k, 30k, 50k, 80k, 100k ,330k ] 100k
6 Results Analysis
Several measures such as accuracy (Ac), precision (Pr), recall (Re), and F 1-
score are considered to evaluate the proposed textual news classification model
on the developed corpus. Table 4 shows the accuracy of the proposed MLP-based
news classification system with different embedding techniques.
Table 4. Performance of MLP-based Bengali news classification.
Category embedding techniques News articles News headlines

Ac (%) Pr Re F1 -score Ac (%) Pr Re F1 -score
Keras embedding layer 98.18 0.98 0.98 0.98 94.53 0.95 0.95 0.95
Word2Vec (CBOW) 97.47 0.97 0.97 0.97 72.34 0.71 0.72 0.71
Word2Vec (Skip-gram) 97.77 0.98 0.98 0.98 82.97 0.84 0.83 0.83
FastText (CBOW) 96.98 0.97 0.97 0.97 97.35 0.97 0.97 0.97
FastText (Skip-gram) 97.35 0.97 0.97 0.97 79.40 0.80 0.79 0.80
Results indicate the proposed MLP model with Keras embedding achieved
the highest scores for categorizing the news articles (98.18%) and headlines
(94.53%) compared to other embedding techniques. Since the Keras embedding
layer works as a layer of the neural network, it gets trained with the MLP model
through backpropagation. This helps Keras embedding layer to learn relevant
features effectively and thus performs better than pre-trained Word2Vec and
FastText models.
Table 5 shows the class-wise performance of the proposed Bengali news clas-
sification model with Keras embedding layer concerning news articles and head-
lines.
Table 5. Category-wise performance measures.
Category name News articles News headlines No. of

test data
Ac Pr Re F1 -score Ac Pr Re F1 -score
Sports 97.63 1.00 0.97 0.99 96.80 0.97 0.96 0.97 3564
Entertainment 99.02 0.95 0.99 0.97 91.78 0.90 0.94 0.92 1747
Accident 98.80 0.98 0.99 0.99 94.53 0.96 0.92 0.94 1171
Crime 98.01 0.98 0.98 0.98 91.56 0.91 0.92 0.92 1110
Weighted average 98.18 0.98 0.98 0.98 95.53 0.95 0.95 0.95 7592
Category-wise performance analysis revealed that Entertainment obtained

the maximum accuracy of (99.02%) but Sports and Accident obtained the high-
est F1 -score (0.99) for news articles categorization. On the other hand, as regards
news headlines, the Sports class achieved both the highest accuracy (96.80%) and
the highest F1 -score (0.97) among all classes. Although Entertainment category
obtained the maximum accuracy, F1 -score is decreased because of several Sports
class data is misclassified as Entertainment. However, concerning headline cat-
egorization, the Sports category is performed better due to the several sports
headlines that contained the relevant contents.
7 Comparison with Existing Approaches

We investigated the performance of existing techniques [3,4] on the developed
dataset. Table 6 shows a comparative accuracy measure of previous techniques
of textual news classification in Bengali.
Table 6. Performace comparison.
Method Techniques Ac (%)

News articles News headlines
Alam et al. [4] LR + TF-IDF 97.92 92.77
RF + TF-IDF 97.22 90.61
NB + TF-IDF 97.07 92.88
Mandal et al. [3] DT + TF-IDF 93.77 88.31
KNN + TF-IDF 54.08 59.68
SVM + TF-IDF 98.01 93.82
Proposed MLP + Keras embedding 98.18 94.53
The comparative analysis showed that the proposed method outperformed

previous techniques achieving the highest classification accuracy of 98.18% (news
articles) and 94.53% (news headlines). The possible reason is that the MLP
method learns by estimating errors and update weights through backpropaga-
tion, whereas ML models can not. Again TF-IDF uses a pre-trained feature
matrix, whereas the Keras embedding layer has the privilege of getting trained
and updating the embedding values through model training.
8 Error Analysis
The results confirmed that the MLP model with Keras embedding performed
better than other models for Bengali news classification. To better understand
the model’s performance, a detailed error analysis is performed using the confu-
sion matrix Fig. 4.
It is observed that the Sports category gained the most accurate pre-
dictions (3474 true positives out of 3564) (Fig. 4a). However, 86 Sports
data was misclassified as Entertainment class. Few Sports news such as
(a) (b)
Fig. 4. Confusion matrix.
(Bangladesh national cricket team left-

handed opener Soumya Sarkar tied the knot with Khulna’s daughter Priyanti
Debnath Pooja.), includes Entertainment contents as a celebration in the
Sports class that the classifier often confused. Moreover, some Crime arti-
cles can also be classified as Accident class which leads to 19 misclassi-
fied Crime news. As regards to the headlines classification, Accident and
Crime data are mostly misclassified with each other. In particular, 42 Acci-
dent data are misclassified as Crime and 54 Crime data are misclassi-
fied as Accident (Fig. 4b). The model is confused for an Accident head-
line (Three sen-
tenced to life imprisonment, two acquitted in Dia-Rajiv death case) with the
Crime data. In most cases, news headlines are not well-structured and are mostly
deprived of the inner context of the news data, which causes the reduced perfor-
mance. In contrast, news articles hold more words than headlines, which helps
the classifier model learn more relevant features to distinguish classes.
9 Conclusion
This work introduced a multilayer perceptron (MLP)-based textual news articles

and headlines classification model in Bengali. Due to the unavailability of the
standard corpus, a Bengali news corpus is developed to perform the news classifi-
cation task. This work investigated five embedding techniques with a tuned MLP
classifier model. Moreover, the performance of the proposed MLP-based model
is compared to six ML-baselines (with TF-IDF features). Results showed that
MLP with Keras embedding layer achieved the highest news classification accu-
racy of 98.18% (for news articles) and 94.53% (for headlines) on the developed
corpus. Moreover, the proposed model outperformed the existing ML baselines
for classifying the Bengali news categorization. The performance of the current
implementation can be enhanced with more data in the corpus from other news
categories (such as politics, technology, business, and so on). More investigations

with larger or shorter headlines and extensive article contents can be performed.
Other embeddings (i.e., GloVe, bag-of-words) and classifier models like CNN,
LSTM may also investigate to improve performance. Multi-label news classifica-
tion can be addressed for the generalization of the model.
Acknowledgement. This work was supported by the CUET NLP Lab, Chittagong
University of Engineering & Technology, Chittagong, Bangladesh.
References
1. Cecchini, D., Na, L.: Chinese news classification. In: IEEE International Conference
on Big Data and Smart Computing (2018)
2. Stein, A.J., Weerasinghe, J., Mancoridis, S., Greenstadt, R.: News article text
classification and summary for authors and topics. Comput. Sci. Inf. Technol. (CS
& IT) 10, 1–12 (2020)
3. Mandal, A.K., Sen, R.: Supervised learning methods for bangla web document
categorization (2014)
4. Alam, M.T., Islam, M.M.: Bard: bangla article classification using a new compre-
hensive dataset. In: 2018 International Conference on Bangla Speech and Language
Processing (ICBSLP) (2018)
5. Alam, T., Khan, A., Alam, F.: Bangla text classification using transformers. CoRR
abs/2011.04446 (2020)
6. Rabib, M., Sarkar, S., Rahman, M.: Different machine learning based approaches of
baseline and deep learning models for Bengali news categorization. Int. J. Comput.
Appl. 176(18), 10–16 (2020)
7. Rahman, R.: A benchmark study on machine learning methods using several fea-
ture extraction techniques for news genre detection from bangla news articles &
titles. In: 7th International Conference on Networking, Systems and Security (2020)
8. Shopon, M.: Bidirectional LSTM with attention mechanism for automatic Bangla
news categorization in terms of news captions. In: Mallick, P.K., Meher, P.,
Majumder, A., Das, S.K. (eds.) Electronic Systems and Intelligent Computing.
LNEE, vol. 686, pp. 763–773. Springer, Singapore (2020). https://doi.org/10.1007/
978-981-15-7031-5 72
9. Shahin, M.M.H., Ahmmed, T., Piyal, S.H., Shopon, M.: Classification of bangla
news articles using bidirectional long short term memory. In: 2020 IEEE Region
10 Symposium (TENSYMP), pp. 1547–1551 (2020)
10. Das, A., Iqbal, M.A., Sharif, O., Hoque, M.M.: BEmoD: development of Bengali
emotion dataset for classifying expressions of emotion in texts. In: Vasant, P.,
Zelinka, I., Weber, G.W. (eds.) ICO 2020. AISC, vol. 1324, pp. 1124–1136. Springer,
Cham (2021). https://doi.org/10.1007/978-3-030-68154-8 94
11. Rai, A., Borah, S.: Study of various methods for tokenization. In: Mandal, J.K.,
Mukhopadhyay, S., Roy, A. (eds.) Applications of Internet of Things. LNNS, vol.
137, pp. 193–200. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-
6198-6 18
12. Trappey, A.J., Trappey, C.V., Wu, J.L., Wang, J.W.: Intelligent compilation of
patent summaries using machine learning and natural language processing tech-
niques. Adv. Eng. Inf. 43, 101027 (2020)
13. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Using Machine Learning Techniques
for Estimating the Electrical Power of a New-
Style of Savonius Rotor: A Comparative Study
Youssef Kassem1,2(&) , Hüseyin Çamur2 , Gokhan Burge2,

Adivhaho Frene Netshimbupfe2, Elhamam A. M. Sharfi2,
Binnur Demir2, and Ahmed Muayad Rashid Al-Ani2
1
Faculty of Engineering, Mechanical Engineering Department,
Near East University, 99138 Nicosia, North Cyprus
2
Faculty of Civil and Environmental Engineering, Near East University,
99138 Nicosia, North Cyprus
{yousseuf.kassem,huseyin.camur,
binnur.demirerdem}@neu.edu.tr,
Ahmed.alani123@icloud.com
Abstract. The ability and accuracy of machine learning techniques have been
investigated for static modeling of the new-style wind turbine. The main aim of
this study is to predict the electrical power (MP) of the new-style Savonius rotor
as a function of aspect ratio, overlap ratio, number of the blade, wind speed, and
rotational speed. In this paper, the EP of the proposed rotors was evaluated
through Multilayer Feed-Forward Neural Network (MFFNN), and Cascade
Feed-forward Neural Network (CFFNN) and Elman neural network
(ENN) based on experimental data. Additionally, the proposed models were
compared with previous models used in Ref. [6] to show the ability and accu-
racy of the proposed models. The results indicated that the ENN model has
higher predictive accuracy compared to other models.
Keywords: Machine learning models Mechanical power Multiple linear

regressions Savonius turbine New-style
1 Introduction
The energy sector is the most prominent of the economic crisis and the environmental
disaster in most developing countries. This sector is the biggest waste and the primary
cause of budget deficits and debt ballooning, in addition to being the primary cause of
air pollution and related deaths. Moreover, the electricity crisis has been increased due
to the growth of population, rising living standards, and industry sectors, which have
led to an increase the energy demand, and the increased electricity cost associated with
fossil fuel-based electrical energy production. Generally, most Arabic countries do not
suffer from poverty in electrical energy sources, such as oil, gas, sunlight, and wind.
Nowadays, all the world countries are looking to utilize renewable energy resources
instead of fossil fuels to mitigate climate change. Also, the use of renewable energies

https://doi.org/10.1007/978-3-030-93247-3_17
168 Y. Kassem et al.
can be an alternative solution for solving the electricity crisis in most countries and
reducing the consumption of fossil fuels.
Wind energy is one of the most alternative energy resources for electricity pro-
duction globally. Wind turbines are utilized to convert wind kinetic energy into elec-
trical energy. In the literature, utilizing the wind turbine helps to meet the needs of the
basic domestic in all the world countries [1]. In addition, low-cut wind turbine will help
to reduce the greenhouse gas emissions and fossil fuel consumption [2].
Savonius wind turbine has a simple structure and is suitable to operate at low wind
speed. Thus, it can be utilized for generating electricity for domestic applications.
Several scientific researchers have investigated the influence of the rotor geometries
and blade shape on Savonius’s performance [3, 4]. For example, Mahmoud et al. [3]
investigated the effect of rotor geometries and end-plate on Savonius’s performance.
The results showed that the CP increased when the upper and lower end-plate are used.
Based on the above, the investigation of the effect of turbine geometries and the
shape of the blade is important for evaluating the performance of the Savonius rotor. In
the literature, the behaviour of the performance of the Savonius turbine shows high
nonlinearity. Several empirical models including machine learning models and math-
ematical have been used to predict the performance of the Savonius turbine including
power coefficient, mechanical power, and torque [5]. For example, Sargolzaei and
Kianifar [5] used three machine learning models to estimate the torque of the proposed
rotor. The results showed ANFIS model gave the best accuracy compared to other
models.
As an ongoing study of authors investigation the performance of Savonius turbine
[6], this study’s goal is to predict the electrical power of new-style Savonius wind
turbine using three machine learning tools, namely, Multilayer Feed-Forward Neural
Network (MFFNN), Cascade Feed-forward Neural Network (CFFNN), and Elman
neural network (ENN). Also, the accuracy of models is compared with previous models
(multilayer perceptron neural network (MLPNN) and radial basis function neural
network (RBFNN)) used in Ref. [6].
2 Experimental Data
Figure 1 shows the 2D and 3D views of the proposed rotors. In this study, the effect of
blade number (NB), blade height (H), blade diameter (D), external gap (L0 ), and wind
speed (WS) on the mechanical power (MP) of the proposed rotors are investigated as
shown in Fig. 1. The blade and the shaft of the rotor are made from PV and stainless
steel, respectively. Also, the desks are made from fiberglass. Details of the experiment
setup and measurements were given in Ref. [6]. In this research, the experimental data
of the new style Savonius wind turbine were collected to develop and validate the
proposed models (MFFNN, CFFNN, and ENN) in comparison with the MLPNN and
RBFNN models. In this study, aspect ratio, overlap ratio, wind speed, rotational speed,
and the number of the blade are used as input variables. The mechanical power is the
output variable.
Using Machine Learning Techniques for Estimating the Electrical Power 169
3 Prediction Methods
Many models and techniques are such as machine learning models and mathematical
models are used as alternative tools to descript a complex system. They are utilized in a
wide variety of applications. In this study, four empirical models (Multilayer Feed-
Forward Neural Network, Cascade Feed-forward Neural Network, and Elman neural
network and multiple linear regression) are developed to estimate the mechanical
power of the new-style Savonius rotor. In this study, TRAINLM is utilized as a training
function. Also, Mean squared error (MSE) is estimated to find the best performance of
the training algorithm. The descriptions of developed models in detail were given in
Ref. [7–9]. MATLAB software was used to develop the proposed models.
Fig. 1. The 2D and 3D view and dimensions of the new-style Savonius rotors
4.1 Artificial Models

The descriptive statistics of the experimental data are presented in Table 1. In this
study, the data are divided into training and testing groups and the results by the models
are compared with each other. The optimum network architecture for all models was
determined through the trial and error method. It should be noted that the optimum
number of HLs and NNs in the MFFNN, CFFNN and ENN models were estimated
based on the minimum value of MSE.
It is found that the best transfer function for the hidden neurons is the tangent-
sigmoid function. It is found that one hidden layer and 6 neurons are selected as the
best for the MFFNN model (5:1:1) with an minimum value of MSE (3.64 10–7).
While it found that 1 hidden layer and 8 neurons are chosen as an optimum number for
the CFFNN model (5:1:1) with an MSE value of 9.20 10–7. Additionally, it is
observed that the ENN model (5:1:1) with 5 neurons has the minimum MSE with a
value of 3.14 10–7. For the training phase, the R-squared value was found to be
about 1 for all proposed models as shown in Fig. 2.
Table 1. Selected parameters used in this study

Parameter Variable Explanation Standard Variation Minimum Maximum
deviation coefficient
Input 1 NB Number of blades 0.82 2.00 4.00 0.82
Input 2 H/D Aspect ratio 1.31 1.88 6.25 1.31
Input 3 L'/D Overlap ratio 0.59 0.00 1.88 0.59
Input 4 WS Wind speed 3.19 3.00 12.00 3.19
Input 5 RPM Rotational speed 38.02 11.80 173.60 38.02
Input 6 MP Mechanical 2.81 0.01 12.24 2.81
power
Output EP Electrical power 2.39 0.01 10.41 2.39
4.2 Performance Evaluation of Empirical Models for Testing Data

In this study, R-squared and root mean squared error (RMSE) are determined to find
the best model for estimating the value of EP. The comparison of the predicted and
actual values of the EP for all models is shown in Fig. 3. It is found that the ENN
model has the highest value of R-squared (0.999996) and the least value of RMSE
(0.000437) compared to other models.
Furthermore, the performance of the developed models is compared with previous
models used in Ref. [6]. It is found that the highest R-squared value of 0.999996 and
lowest RMSE value of 0.000437 are obtained from ENN model. It is concluded that the
ENN model was found to be the best model for estimating the EP of the new-
configuration Savonius rotor and more precise compared to CFFNN, MFFNN,
MLPNN, and RBFNN models (Table 2).
12 MFFNN
EsƟmated power [W] 10
8
6
4 y = 0.9997x + 0.001
R² = 0.999993
2
0
0 2 4 6 8 10 12
Actual power [W]
12 CFFNN
EsƟmated power [W]
10
8
6
4 y = 0.9994x + 0.002
2 R² = 0.999978
0
0 2 4 6 8 10 12
Actual power [W]
12 ENN
EsƟmated power [W]
10
8
6
4
y = 0.9999x + 0.0007
2 R² = 0.999994
0
0 2 4 6 8 10 12
Actual power [W]
Fig. 2. Comparison of experimental data and the estimated values found by machine learning
models
Electrical power [W] 14 R-square = 0.999994

RMSE = 0.000543
9
-1
0 2 4 6 8 10 12
Wind speed [m/s]
Actual MFFNN
14 R-square = 0.999984
Electrical power [W]
RMSE = 0.000931
9
-1
0 2 4 6 8 10 12
Wind speed [m/s]
Actual CFFNN
14 R-square = 0.999996
Electrical power [W]
RMSE = 0.000437
9
-1
0 2 4 6 8 10 12
Wind speed [m/s]
Actual ENN
Fig. 3. Comparison of experimental data and the estimated values found by empirical models
Table 2. Performance evaluation of the models

Statistical indicator Current study Ref. [6]
MFFNN CFFNN ENN MLPNN RBFNN
R-squared 0.999994 0.999984 0.9999966 0.999130 0.950136
RMSE 0.000543 0.000931 0.000437 0.07046 0.53245
5 Conclusions
The main objective was to examine the application of artificial neural network models
(Multilayer Feed-Forward Neural Network, and Cascade Feed-forward Neural Net-
work, and Elman neural network) for predicting the electrical power (EP) of new-
configuration Savonius rotors. These models were also compared with multilayer
perceptron neural network (MLPNN) and radial basis function neural network
(RBFNN) to show the predictive accuracy of the proposed models. In this work, the
impact of the blade number, aspect ratio, overlap ratio, wind speed, rotational speed on
the electrical power was investigated and the experimental data were used to develop
the proposed models. Moreover, the coefficient of determination (R2) and root mean
squared error (RMSE) were used to assess the best empirical model. It is found that the
ENN model was found to be the best model for estimating the EP of the new-
configuration Savonius rotor and more precise compared to CFFNN, MFFNN,
MLPNN, and RBFNN models.
References
1. Abdulmula, A.M., Sopian, K., Haw, L.C., Fazlizan, A.: Performance evaluation of standalone
double axis solar tracking system with maximum light detection MLD for telecommunication
towers in Malaysia. Int. J. Power Electron. Drive Syst. 10(1), 444 (2019)
2. Arreyndip, N.A., Joseph, E., David, A.: Wind energy potential assessment of Cameroon’s
coastal regions for the installation of an onshore wind farm. Heliyon 2(11), e00187 (2016)
3. Mahmoud, N., El-Haroun, A., Wahba, E., Nasef, M.: An experimental study on improvement
of Savonius rotor performance. Alex. Eng. J. 51(1), 19–25 (2012)
4. Driss, Z., Mlayeh, O., Driss, S., Maaloul, M., Abid, M.S.: Study of the incidence angle effect
on the aerodynamic structure characteristics of an incurved Savonius wind rotor placed in a
wind tunnel. Energy 113, 894–908 (2016)
5. Sargolzaei, J., Kianifar, A.: Neuro–fuzzy modeling tools for estimation of torque in Savonius
rotor wind turbine. Adv. Eng. Softw. 41(4), 619–626 (2010)
6. Kassem, Y., Gökçekuş, H., Çamur, H.: Artificial neural networks for predicting the electrical
power of a new configuration of Savonius rotor. In: Aliev, R., Kacprzyk, J., Pedrycz, W.,
Jamshidi, M., Babanli, M., Sadikoglu, F. (eds.) ICSCCW 2019. AISC, vol. 1095, pp. 872–
879. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-35249-3_116
7. Kassem, Y., Gokcekus, H.: Do quadratic and Poisson regression models help to predict
monthly rainfall? Desalin. Water Treat. 215, 288–318 (2021)
8. Kassem, Y., Gokcekus, H., Camur, H., Esenel, E.: Application of artificial neural network,
multiple linear regression, and response surface regression models in the estimation of
monthly rainfall in Northern Cyprus. Desalin. Water Treat. 215, 328–346 (2021)
9. Li, X., Han, Z., Zhao, T., Zhang, J., Xue, D.: Modeling for indoor temperature prediction
based on time-delay and Elman neural network in air conditioning system. J. Build. Eng. 33,
101854 (2021)
Tree-Like Branching Network
for Multi-class Classification
Mengqi Xue, Jie Song, Li Sun, and Mingli Song(B)
Zhejiang University, Hangzhou 310007, Zhejiang, China

{mqxue,sjie,lsun,brooksong}@zju.edu.cn
Abstract. In multi-task learning, network branching, i.e. specializing

branches for different tasks on top of a shared truck, has been a golden
rule. In multi-class classification task, however, previous work usually
arranges all categories at the last layer in deep neural networks, which
implies that all the layers are shared by these categories regardless of
their varying relationships. In this paper, we study how to convert a
trained typical neural network into a branching network where layers are
properly shared or specialized for the involved categories. We propose a
three-step branching strategy, dubbed as Tree-Like Branching (TLB), to
exploit network sharing and branching for multi-class classification. TLB
first mines inherent category relationships from a trained neural network
in a layer-wise manner. Then it determines the appropriate layer in the
network on which specialized branches grow to reconcile the conflict-
ing decision patterns of different categories. Finally TLB adopts knowl-
edge distillation to train the derived branching network. Experiments on
widely used benchmarks show that the derived tree-like network from
TLB achieves higher accuracy and lower cost compared to prior models,
meanwhile exhibiting better interpretability.
Keywords: Multi-task learning · Multi-class classification · Deep

neural network · Knowledge distillation
1 Introduction
In computer vision, deep neural networks, especially convolutional neural
networks (CNNs), has continuously celebrated their success in many vision
tasks [9,11,19,22]. The development of CNNs enormously increases the network
capacity which leads to more discrimination ability and better generalizability.
Hence, jointly solving multiple tasks in one model is drawing more attention
than tackling single tasks in isolation because of the multi-objective nature hid-
den in many real-world problems like self-driving, bioinformatics and so on. In
multi-task learning, a sharing-and-branching mechanism [1,3,6,13] is established
where heterogeneous tasks share the foremost portion of the network and have
individual task-specific portion of the network in latter layers. On the one hand,
network sharing introduces a regularization effect on multi-task learning and
reduces the amount of the parameters. On the other hand, network branch-
ing specializes branches for different tasks, which reconciles the contradictions
https://doi.org/10.1007/978-3-030-93247-3_18
176 M. Xue et al.
between different tasks and thus improve the overall performance of multi-task
learning. When it comes to classification, popular networks mostly place all cat-
egories at the single last layer [19,22,24,25], which makes the entire network
shared by all these categories. However, different categories, like different tasks,
also shares different amount of decision patterns due to their varying category
similarities. In other words, a multi-class classification problem is in fact a multi-
task problem if we view each category as a binary classification task. Motivated
by this observation, in this paper we propose a network sharing-and-branching
strategy named Tree-Like Branching (TLB) to convert a trained neural network
into a branching network for multi-class classification. TLB consists of three main
steps. First, it employs agglomerative hierarchical clustering [14] on class-specific
gradient features from a trained neural network to build the category hierarchy.
With the category hierarchy, specialized branches grow on the original network,
which turns itself into neural decision tree. Finally TLB adopts distillation [5]
to train the derived tree-like network. Note that we use only unlabelled data
in every step as labeled data is usually out of reach due to privacy or security
issues. The proposed TLB gradually divides the whole multi-class feature space
into several class-related feature space, which resembles the divide-and-conquer
principle in decision trees (DTs) [17]. Decision making underlying the derived
tree experience a coarse-to-fine process, and the class-adaptive branches show
various topology with diverse category set. Such derived tree-like networks enjoy
better interpretability thanks to their structure organizations. Experiments on
popular benchmarks including CIFAR10 [7], CIFAR100 [7] and a mixed dataset
using CIFAR100 and Oxford Flowers [15] show that TLB achieves higher accu-
racy, lower computation cost and better interpretability than prior networks. In
a nutshell, we make following three main contributions:
– We argue category relationship is vital for multi-class classification and pro-
pose to adopt agglomerative hierarchical clustering on gradient features to
build it.
– We propose a novel three-step branching strategy named TLB to build the
tree-like branching network for multi-class classification.
– Experiments show that the network derived by TLB exhibits higher accuracy,
lower computation cost and better interpretability than prior methods.
2 Relation to Prior Work

2.1 Multi-task Learning
Multi-task learning (MTL), which aims to train a single model that could simul-
taneously solve more than one task, has been widely used in the field of machine
learning. Some previous works [3,12,13] take a study on network sharing and
strive for the best sharing architectures for multi-task learning depending on the
tasks at hand. In this paper, inspired by the sharing-and-branching mechanism
in MTL, we solve multi-class classification problems by a tree-like branching
network.
Tree-Like Branching 177
2.2 Knowledge Distillation
Knowledge distillation (KD) [5] is a process that efficiently transfers knowledge

from a large teacher model to a small student model, which is a typical model
compression and acceleration technique. The teacher model teaches the student
model to acquire the knowledge, ending up with a small model which has compa-
rable or even superior performance. Following [5], various types of KD algorithms
have been proposed to exploit the hidden knowledge and produce compact mod-
els, such as attention-based [26], NAS-based [10], and graph-based KD [8]. Unlike
these methods, we utilize KD to design and train the branching network.
2.3 Neural Trees
Neural tree is a new type of model which integrates the characteristics of the
decision trees into convolutional neural networks. For example, soft decision trees
(SDTs) [2,21] train a decision tree to mimic a deep neural network’s predictions.
Adaptive Neural Trees (ANTs) [23] unite neural networks and decision trees to
grow a tree-like neural network using some primitive modules. These neural trees
inherently gain higher interpretability but lower accuracy than modern deep
networks. Our method exploits the category relationship to design the branching
network for multi-class classification, without any performance degradation.
3 Method
3.1 Problem Setup
Assume there is a a N -way classification problem, where the input and the
output spaces are denoted by X and Y. The category set is defined as C =
{1, 2, ..., N }. Furthermore, assume there is trained typical neural network, which
can be denoted by fΘ : X → Y, parameterized by Θ. For the sake of clarity,
typical neural networks refer to existing prevailing networks such as VGG [19],
K
ResNet [4] and GoogLeNet [22]. The training data is denoted by X = {xk }k=1 .
Note that in this paper we adopt unlabeled data to convert the trained typical
network into branching network, since labeled data is usually out of reach due
to privacy issue or expensive cost.
Block-Pooling Module. Modern popular backbone network architectures are
usually comprised of block-pooling modules. A block consists of a group of layers
such as convolutional layers with same number of filters and feature map size or
special architectures like bottleneck [4]. The following pooling operation is usu-
ally implemented by max pooling or average pooling layers. Such block-pooling
modules are sequentially stacked one after another in the network. We use mi to
denote the i-th block-pooling module, which is parameterized by Θi . The whole
L
network architecture M can thus be defined as M = {mi }i=1 , where L denotes
the number of block-pooling modules. Please note that here we omit some other
components, for instance, average pooling layers or fully connected layers [4].
178 M. Xue et al.
Fig. 1. Pipeline of the proposed TLB strategy using a baseline consisted of four mod-
ules. The symbol m and R denote the block-pooling module and the router, respectively.
In following subsections, we delineate the proposed TLB as a three-step strat-

egy (see Fig. 1 for an illustration). At the first step, TLB mines inherent cate-
gory relationships from a trained neural network in a layer-wise manner. Then it
determines the appropriate layer in the network on which specialized branches
grow to reconcile the conflicting decision patterns of different categories at the
second step. At the final step, TLB adopts knowledge distillation to train the
derived branching network.
3.2 Step 1: Class Relationship from Trained Networks
Given an unlabeled sample x, the trained network produces the output ŷ ∈ RN ,

T
ŷ = f (x) = ŷ1 , · · · , ŷN . In the vector ŷ, element yî represents a score related
to a specific class ci in the category set C. We compute the derivative of yî w.r.t.
the parameters Θj of the j-th block-pooling module through the trained network
as follows:
∂ yî
g ji = . (1)
∂Θj
In the j-th module, the distance between the p-th category and the q-th category
is approximated by

j j 2
j j
dp,q = g p − g q = g − gj . (2)
p,k q,k
k=1
The distances are calculated using pair-wise comparison on C and thus a simi-
larity matrix D j ∈ RN ×N is constructed. The similarity matrix D j contains the
category relationship of C using the knowledge learned at module mj .
3.3 Step 2: Building Branching Networks
As categories rely on different patterns to make decision, e.g., animals versus

vehicles, here we adopt different branches to solve classification problems which
differ vastly. To this end, we enable the original trained network to grow branches
on itself. In order to determine the best location for branching in the trained net-
work, we use similarity matrices obtained in previous step to separate C at appro-
priate modules meanwhile keeping the network in low complexity. Specifically, for
each similarity matrix D j , we adopt the row vector D ji to represent the i-th cate-
gory and employ hierarchical clustering [18] on the category vectors to construct
category hierarchies for C. In hierarchical clustering, the Euclidean distance is
selected as a measure of dissimilarity between vectors and the UPGMA [20] is
chosen as the linkage criterion which computes pairwise dissimilarities. At first,
each D ji is treated as a separate cluster, hierarchical clustering is performed in
a bottom-up manner and stop when there are only two clusters that correspond
to two disjoint category sets: Clj and Crj , Clj ∪ Crj = C. The number of clusters
is set 2, which indicates that if module mj is the branching point, it will grow
two branches, one for Clj and the other for Crj . The ratio between inter-cluster
distance and intra-cluster distance is employed to determine branching or not.
Take Clj as an example, the ratio is computed as follows:
Dinter Clj i∈Clj j∈Crj d ( Di , Dj )

rlj = = , (3)
Dintra Cl · Crj
j
i∈Clj d ( D i , u)
where u is the arithmetic mean of class vectors in Clj and d is the Euclidean dis-
tance. With the ratios of two subsets, an indicator ρj is introduced to determine
whether mj is suitable for branching:
ρj = w(j) · (rlj + rrj ), (4)
where w(j) is a scaling function which scales ρj according to the location of the
module. As the deeper layers are preferred for branching, we set w(j) = 1/j.
We first calculate ρj of every module mj . Then the branching point is chosen
by j ∗ = arg maxj ρj when maxj ρj ≥ τ . The threshold τ is used to prevent
unnecessary branching. When the first spilt node comes up, all the modules
before it, namely m1 to mj ∗ −1 , automatically become the root node. And all the
following modules, namely mj ∗ +1 to mL , are duplicated as two branch nodes with
the same architectures. A special router Rφj∗ parameterized by φ is introduced,
defined as Rj ∗ : Xj ∗ → [0, 1]. Rφj∗ deals with the features from split node mj ∗ and
passes them to branches with corresponding categories. In this way, a branched
model has made its first appearance. We repeat the process of branching on the
180 M. Xue et al.
resulting branches until the maximum depth is reached. Eventually, the final
tree-like network is derived. Figure 1 provides an example for a whole branching
process.
3.4 Step 3: Training with Knowledge Distillation
Due to the lack of labeled training data, we employ knowledge distillation [5]
to exploit the knowledge learned in the trained model. To this end, we follow
the general teacher-student framework, in which the pre-trained typical network
is the teacher and the derived network is the student. The teacher provides a
pseudo label for an unlabelled sample by softening the predcited probabilities of
the input data. Similarly, the pseudo label used for training routers, which are
used for students to determine which branch the data should be delivered to, is
also given in this manner. Taking a sample xk as an example, we adopt P̂t (xk ) to
denote the soft targets produced from the teacher. The derived network is trained
with by fitting the soft targets when given the same data point. Specially, the
pseudo labels for routers should be converted to binary values as 0 or 1, due
to their binary decision making for routing; accordingly, the P̂tn (xk ) in vector
P̂t (xk ) is set to 1 when n-th category belongs to following left branch and 0 for
the right. The loss functions are defined as:
LKD (P̂t , P̂s ) = DKL (P̂t (xk ), P̂s (xk )), (5)
Ltotal = (1 − λ)LCE + λLKD , (6)
where DKL denotes the Kullback-Leibler divergence between the predicted cate-
gorial distributions from the teacher and the student networks, and LCE denotes
the cross-entropy loss. The student network is optimized by minimizing Eq. 6,
where λ is a trade-off hyper-parameter to balance the effects of the two terms. At
inference, the classification decisions are made by the specific branches routed
by the routers.
4 Experiments
4.1 Experimental Setup
Training Details. Our approach is implemented in PyTorch [16] on a Quadro

P6000 GPU. For data augmentation, all samples are resized to 32 × 32 with a
random horizontal flip. A thin ResNet-18 [4] is used as the network architecture
of the teacher with 11 millon parameters. Our optimizer is SGD with a momen-
tum to 0.9, a weight decay coefficient of 0.0005 and a batch size of 128. The
initial learning rate is 0.1, decayed by a factor of 10 after 150 epochs with total
200 epochs. The hyper-parameter λ and τ are set to 0.6 and 2.2. The sample
used for calculating category relationship is randomly chosen from testing data.
Teachers and derived students have same training settings.
Table 1. Performance comparison. We compare TLB with the teacher and a random
strategy, which randomly arranges categories in each branch while keeping the same
network architecture as that from TLB.
Parameters (M) Accuracy (%)

At training At inference
Teacher TLB Teacher TLB Teacher Random TLB
CIFAR10 11.17 14.60 11.17 8.37 93.3 94.7 95.2
CIFAR100 11.22 17.43 11.22 8.54 73.5 73.3 75.9
CIFAR-S1 11.17 17.36 11.17 8.51 90.1 87.5 90.4
CIFAR-S2 11.17 14.60 11.17 8.37 89.5 89.5 91.2
Mixed 1 11.18 18.37 11.18 8.67 53.7 55.1 59.9
Mixed 2 11.18 18.49 11.18 8.66 54.4 53.9 58.2
Mixed 3 11.18 18.37 11.18 8.67 52.4 56.1 58.8
4.2 Datasets and Experimental Results
Table 1 summaries our experimental results on different datasets and compares

our method with the teacher and a random strategy, which shuffles and rearranges
category subsets of students while keeping the same network architectures.
Results on CIFAR. We firstly adopt CIFAR10 and 100 [7] to verify the effec-
tiveness of the proposed TLB. CIFAR10 contains 60, 000 images from 10 classes,
50, 000 for training and 10, 000 for testing. CIFAR100 [7] consists of 100 classes
and 600 images per class. To validate that TLB is not specific to the categories
involved, we also construct two datasets, CIFAR-S1 and CIFAR-S2, by ran-
domly sampling the CIFAR100 to evaluate the proposed method. From Table 1
we can see that TLB invariably outperforms teachers and the random strategy.
The consistent accuracy improvement indicates that TLB is an effective strategy
not only on well-structured datasets but also randomly generated datasets. It is
noticed that the random strategy causes a drop of 2.6% on CIFAR-S1 dataset,
revealing that incorrect network sharing is detrimental for classification.
Results on Mixed Dataset. We construct a mixed dataset to show that TLB
can reconcile the conflicting the decision patterns for different categories. The
dataset is constructed by one half sampled from the OxFord Flowers [15] and the
other half from CIFAR100. The Oxford Flowers dataset exhibits a very different
distribution from CIFAR100, therefore categories in these two datasets rely on
different patterns to distinguish themselves from others. Experimental results in
Table 1 show that TLB significantly outperforms the teacher and the random
strategy on all the three different mixed datasets, by absolutely 2.7%–6.4%.
Moreover, detailed results in Table 2 reveal that TLB achieves higher accuracy
on data from CIFAR100 and Oxford Flowers separately, which indicates that
the proposed method can fully exploit the sharing-and-branching mechanism to
boost the classification performance.
182 M. Xue et al.
Table 2. Detailed test accuracy (%) on the mixed dataset.
Dataset Mixed 1 Mixed 2 Mixed 3

CIFAR Oxford CIFAR Oxford CIFAR Oxford
100(10) Flowers (10) 100(10) Flowers (10) 100(10) Flowers (10)
Teacher 73.8 47.8 75.5 49.2 66.3 52.4
TLB 76.2 53.2 75.9 56.0 77.4 55.3
Root
90
Student Router
Accuracy(%) of routers and students

85
80
Split 75
70
65
Leaf 60
55
50 Random Random Random TLB TLB TLB

Vehicles Animals (Mixed1) (Mixed2) (Mixed3) (Mixed1) (Mixed2) (Mixed3)
(a) (b)
Fig. 2. (a) The network architecture derived from CIFAR10 by TLB. (b) Accuracy of
students and routers on three mixed datasets using TLB or the random strategy.
We also give discussion about computation cost and interpretability of TLB.

In Table 1, we can see that during the training phase derived students have more
parameters than teachers due to branching. However, at inference each category
will be routed to specific branch, which leads to lower inference cost. Figure 2(a)
illustrates the tree-like network architecture derived from CIFAR10 by TLB
with two class-related branches for vehicles and animals, which matches with
human visual perception characteristics. In Fig. 2(b), the accuracy of routers
increases with accuracy of students, and TLB has always exceeded the random
strategy, which implies TLB divides the multi-class feature space properly to
make routers have less uncertainty when routing. In general, our proposed TLB
can derive tree-like network architectures which have better interpretability from
perspective of models and lower cost at inference time.
5 Conclusion
In this paper, we propose a novel three-step branching strategy, named Tree-

Like Branching (TLB), to explore and exploit network sharing-and-branching
mechanism for multi-class classification. This approach makes use of category
relationship to convert a trained typical neural network into a tree-like network
with properly designed shared and class-adaptive layers using knowledge distilla-
tion. Extensive experiments on widely used benchmarks demonstrate that TLB
achieves superior performance, lower cost, meanwhile exhibiting better inter-
pretability with accurately network sharing and branching.
Acknowledgement. This work is funded by National Key Research and Develop-

ment Project (Grant No: 2018AAA0101503) and State Grid Corporation of China
Scientific and Technology Project: Fundamental Theory of Human-in-the-loop Hybrid-
Augmented Intelligence for Power Grid Dispatch and Control.
References
1. Bakker, B., Heskes, T.: Task clustering and gating for bayesian multitask learning.
J. Mach. Learn. Res. 4(May), 83–99 (2003)
2. Frosst, N., Hinton, G.: Distilling a neural network into a soft decision tree. arXiv
preprint arXiv:1711.09784 (2017)
3. Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: Nddr-cnn: Layerwise feature
fusing in multi-task cnns by neural discriminative dimensionality reduction. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 3205–3214 (2019)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 770–778 (2016)
5. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network.
arXiv preprint arXiv:1503.02531 (2015)
6. Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh
losses for scene geometry and semantics. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)
7. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny
images (2009)
8. Lee, S., Song, B.C.: Graph-based knowledge distillation by multi-head attention
network. In: BMVC, p. 141 (2019)
9. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic
segmentation. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 3431–3440 (2015)
10. Macko, V., Weill, C., Mazzawi, H., Gonzalvo, J.: Improving neural architecture
search image classifiers via ensemble learning. arXiv preprint arXiv:1903.06236
(2019)
11. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-
motion from monocular video using 3d geometric constraints. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675
(2018)
12. Meyerson, E., Miikkulainen, R.: Beyond shared hierarchies: Deep multitask learn-
ing through soft layer ordering. arXiv preprint arXiv:1711.00108 (2017)
13. Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-
task learning. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 3994–4003 (2016)
14. Müllner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv
184 M. Xue et al.
15. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number
of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image
Processing, pp. 722–729. IEEE (2008)
16. Paszke, A., et al.: Automatic differentiation in pytorch (2017)
17. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
18. Rokach, L., Maimon, O.: Clustering methods. In: Data Mining and Knowledge
Discovery Handbook, pp. 321–352. Springer, Boston (2005). https://doi.org/10.
1007/0-387-25465-X 15
19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
20. Sokal, R.R.: A statistical method for evaluating systematic relationships. Univ.
Kansas Sci. Bull. 38, 1409–1438 (1958)
21. Suárez, A., Lutsko, J.F.: Globally optimal fuzzy decision trees for classification and
regression. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1297–1311 (1999)
22. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
23. Tanno, R., Arulkumaran, K., Alexander, D., Criminisi, A., Nori, A.: Adaptive
neural trees. In: International Conference on Machine Learning, pp. 6166–6175.
PMLR (2019)
24. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing & Optimization (2018).
https://doi.org/10.1007/978-3-030-00978-6
25. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization
(2019). https://doi.org/10.1007/978-3-030-33585-4
26. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the
performance of convolutional neural networks via attention transfer. arXiv preprint
arXiv:1612.03928 (2016)
Multi-resolution Dense Residual Networks
with High-Modularization for Monocular
Depth Estimation
Din Yuen Chan, Chien-I Chang(&), Pei Hung Wu,

and Chung Ching Chiang
Department of Computer Science and Information Engineering,

National Chiayi University, Chiayi, Taiwan
dychan@mail.ncyu.edu.tw
Abstract. Deep-learning neural networks (DNN) have been acknowledged to

capably solve the ill-posed monocular depth estimation problem in self-drive
applications. In this paper, we proposed a dense residual multi-resolution
supervised DNN toward accurate monocular depth estimations for traffic land-
scape scenes. The proposed DNN is constructed by regularly integrating the
dense residual short-cut connections into the multi-resolution backbone. How-
ever, since some implicitly influential features cannot be viable at the end of
learning, the DNN structure for generating the details of estimated monocular
depths shall not be too deepened. Basically, the structural depth of DNN can be
suppressed by effectively exploiting the functional residual connections. In the
proposed DNN structure, the amount of short-cut connections can be moderate
through rational employments. Particularly, for achieving high modularization,
we address three-layered modules to generate the adequate levels and layers in
which the results can easily be controlled to meet a requested prediction/
inference accuracy. Therefore, the visualization and quantitative results can
demonstrate the superiority of the proposed DNN to other compared DNNs for
street landscape experiments.
Keywords: Monocular depth estimation Deep learning neural networks

Multi-resolution backbone Dense residual module
1 Introduction
The accurate monocular depth estimation is getting widespread for perspective scene
understanding in computer vision. Even though in the computer vision field, the
monocular depth estimation is essentially an ill-posed problem because of its need to
correctly map a variety of many 3D scenes to few 2D scenes, with consecutive
improvements, the DNN solutions continuously provide plausible progressive results.
For example, in the early period, Eigen et al. [1] have verified that DNN can suc-
cessfully attain accurate monocular depth estimation. Given the workable verification
of monocular depth estimation in DNN, the DNN-based monocular depth estimation
becomes more intriguing and also becomes a convincing application to the self-drive
and smart navigation in an autonomous vehicle, robotics, mobile devices, and advanced

https://doi.org/10.1007/978-3-030-93247-3_19
186 D. Y. Chan et al.
driver assistance system (ADAS). Specifically, the monocular depth estimation DNN,
perhaps abbreviated as mono-Net, can be cataloged into three types by the training
resources. Firstly, the type-I mono-Nets [1–6] are trained by purely using available
high-resolution ground truth (GT) depth maps such that the high-resolution GT depth
maps are available and completely supplied for loss computation. Still, even though the
GT depth maps can be captured by high-quality depth devices, they still need laborious
handcrafted annotations. Namely, a sufficient acquisition of ground truth (GT) depth
maps usually needs a large amount of high-quality labeled training data by the primary
collection of the depth sensor and the subsequent handcrafted correction and refine-
ment. In fact, although the process seems too verbose nowadays, following the hybrid-
installation maturity, the resolution increment, and the implementation convenience of
various TOF depth sensors with color cameras, and the drawing-assistant/image-
processing software tools, the laboring difficulties will be greatly mitigated sooner or
later in the future. The structures of networks [1] are of two stages to lead the coarse-to-
fine inference that the latter stage takes charge of the estimation refinement. Analog-
ically, Song et. al. proposed a depth-to-depth auto-encoder DNN [2] that the GT depth
map and roughly estimated depth map are combined as inputs sent into the second
network for further depth estimation promotion. In [3], the DNN contrasted the global
perspective profile predictor and the local depth detail predictor which are an auto-
encoder CNN of symmetric skip-connections and a resolution-invariable CNN to learn
the piecewise smooth depths and the discontinuous depths in gradient fields, respec-
tively. And then, following these two modules, an integration module merges the
relative fine details into global information to generate the depth map from a single-
color image. In [4], a spatial pyramid pooling (ASPP) module, a cross-channel leaner,
and a full-image encoder are integrated to compose the scene understanding module
using extracted dense features. Behind this module, the ordinal regression layer is
applied for transferring the resolving of depth estimation to that of ordinal regression.
The autoencoder network in [5] can exploit the transfer learning to acquire high
accurate depth estimation in a pre-trained truncated DenseNet-169. Although there are
many short-cut connections and layers, the structure of truncated DenseNet-169 is not
regarded complicated. This can confirm the significance of effective kernel initializa-
tion in spite of using the legacy artificial neural networks or prevalent DNNs. In [6], the
multi-scale feature fusion module can extract the multi-scale features along the learning
of encoder module. And then, the multi-scale feature maps and primarily estimated
depth map are concatenated as the input of refinement module which is merely com-
posed of few successive blank convolutional layers to achieve the depth map with clear
object boundaries. The type-II mono-Nets are trained by using the dual information of
rectified stereo color-image pairs or nearby images [7–10]. They are acknowledged to
be unsupervised approaches, which focus to explore the unsupervised cues when the
real-world GT depths, the 3D geometric appearances, and the semantic contours are
unavailable for their training. In general, those approaches require the functional
synthetic module to create the virtual depth map and synthesize the virtual stereo-image
pairs for the loss function settlement. The Type-III mono-Nets can have the semi-
supervised and self-supervised characteristic [11–14]. The common Type-III mono-
Nets incorporate the geometrically correlative GT resources to achieve the specific
multiple tasks which may include monocular depth estimation network and semantic
Multi-resolution Dense Residual Networks with High-Modularization 187
segmentation as well as, perhaps, planar 3D reconstruction. Because their respective

subnetworks have the close task-specific goals in terms of the characteristic of pre-
dicted signal, those subnetworks can be mutually enhanced by simultaneously learning
and intimately transferring the common advantageous features. In general, based on the
essence of more available resources, the type-III mono-Nets can intrinsically obtain
better predictions than the two formers especially for diverse photographing scenarios.
Rather, the practical self-drive will encounter the inevitable car vibrations and non-
uniform sunshine. Thus, the rectification and the calibration of trained stereo image
pairs need complicated preprocessing which will be the obstacle of real time self-
driving. To the best of our knowledge about structures and available resources based on
the surveys stated below, creative chances and improvable points of Type-I mono-Net
are relatively limited to the Type-II and Type-III mono-Nets. However, the equipment
capturing the training resource of Type-I mono-Nets can be easily settled on the
practical autonomous vehicles. The high-resolution representation network (HR-Net)
[15] is primarily addressed for human posture estimation. In effect, it could provide the
suited multi-resolution architecture to explore and delve the appropriated multi-
resolution features. Hence, motived by HR-Net, we propose a high-regular multi-
resolution monocular depth estimation network with dense-residual blocks for land-
scape monocular depth estimation. And its architecture can be high modularized as
inherently preferred solution in firmware and hardware. The contributions of this study
are three-fold.
• The developed multi-resolution architecture can handle the smoothness and
sharpness of semantic profiles in the generated depth map.
• The addressed modularization focuses on the regularity of structural combination
and the fixation of number of convolutional channels. It can be conceivable that the
simplification can be easily attained to directly benefit firmware/hardware solutions
of monocular DNN.
• The proposed layered modules can make the mono-DNNs be easily implemented as
teacher-student DNN and facilitate the rational control of adequate levels and layers
under the constraint of the leverage between the prediction accuracy and the net-
work size.
2 Three Layered Regular Modules for Multi-resolution DNN
For pursing the smooth, sharp and non-fragmental semantic profiles in the generated
depth map, the different resolution features need to be simultaneously extracted and
reserved. Motivated by HR-Net in [15], the proposed mono-DNN can be developed with
the architectural simplification of HR-Net intrinsically, which can easily maintain the
multi-scale perspective features. In our proposed DNN, the dense short-cut connections
are embedded into the modules to extract the appropriated landscape features from
diverse traffic scenarios with deepened residual learning. Moreover, the proposed DNN
have no long-range connections. e.g., long-distance skip paths. This cannot only attain
the easy modularization, but also lead the addressed approach to be prone to real-time
firmware/hardware implementations. Particularly, the proposed system is constructed by
means of the progressive design in the order of increasing block size. Although the
number of layers is three in this study, the layers and the assembling of assembling can
be arbitrary. In general, the proposed layered modularization can fairly simplify the
design of teacher-student DNN. The three modules are designed from small layer to big
layer for the convenience and the flexibility of building the multi-resolution mono-DNN
in terms of interaction and expansion of distinct resolutions. They are depicted as
follows.
The proposed DNN has a gradually extended architecture with increasing parallel
specific resolution paths along the deepening the network depth. The first-layer module
is a four-tier regular dense residual block, denoted as 4t-RRDB, which can be regarded
as the basic-layer module for mono-DNN building, as Fig. 1 shown. Its regularization
is the fixation of convolutional channel number. This can facilitate the wide deploy-
ment of the depth-wise separable convolution. Because the number of resulted feature
maps can be directly consistent to the number of subsequent filters, the 1 1 con-
volutions can be saved. Hence, the depth-wise separable convolution can be simplified
by removing the 1 1 convolutions following the 2D convolutions when the con-
junction of those 1 1 convolutions is likely in vain or redundant. In Fig. 1, interlaced
concatenation first uniformly blends multiple feature cubes slice by slice delivered
through different connections. Instead of the use of generic 1 1 convolution,
Conv1d-1 1 L being the short-term 1 1 convolution performs the learnable
short-range weighted-sum of L slices. Thus, the shrinkage of slices cannot be excessive
such that the network can inhibit the activity of embedded useful clues to be promptly
diminished in the subsequent assembled features. The multiple short-skip connections
are comprehensively settling in this basic module for regular dense residual learning.
For regularization of the next modular expansion, we progressively extend one
more path to form the second layer module with lowered resolution. We treat the 4t-
RRDB as the basic component to build the higher layer module called 2-resolution
two-level RRDB module (2rTRM), which is the expansion of 4t-RRDB module. As
shown in Fig. 2, the two parallel paths are intimately integrated to acquire a highly
symmetric two-level 4t-RRDB-based module. Within 2rTRM, transferring the feature
maps cross the different paths is performed only once from the high to the low reso-
lutions. This can let the efficacies of different resolutions fusion and inter-path inter-
ference within numerous-parameter DNN be more easily traceable. Analogically, we
treat 2rTRM as the modular element to further cover three resolutions for achieving the
third-layer module named 3-resolution two-level RRDB module, abbreviated as
3rTRM. As shown by Fig. 3. it can be considered as the expansion-advanced module,
where the 2rTRM modules are concisely consolidated, As shown Fig. 3, 3rTRM has
two 2rTRMs of different resolutions in the status of overlapped embedding. According
to the systematic propriety of informational interchange and structural expansion in
HR-Net topology, the 2nd-layer module can be constructed based on 4t-RRDB. And
then, we concisely aggregated the modules to be the 3rd-layer module.
Fig. 1. Detailed structure of 4t-RRDB where K and BN express the channel number and the
batch normalization, respectively. The D_Conv.3 3 represents the depth-wise separable 3 3
2D convolution, and the Conv1d-1 1 L expresses the 1 1 1D convolution for shrinking L
times the number of feature cube slices. The black skip connection is to link the head and the tail
of 4t-RRDB, and the dense colored lines provides the inner skipping connections.
Fig. 2. Structural detail of 2-resolution two-level RRDB module (2rTRM).
Fig. 3. Structural detail of 3-resolution two-level RRDB module (3rTRM) where sub-networks
masked by two pink dash-line blocks are two overlapped 2rTRMs with different resolution
combinations.
3 High-Modularized Multi-resolution Architecture
Observe the regular ingredient association and the channel number maintenance of
three modules in Fig. 1, 2 and 3, the lightweight monocular DNN simplification can be
easily attained. For example, sustaining and retaining the identical number of convo-
lutional channels to the number of inputted cube slices can facilitate the large-scale
convolution decrease by straightforward replacing the generic 3 3 one-to-many
convolution by either the depth-wise separable 3 3 convolution or even the sim-
plified depth-wise separable 3 3 convolution, which only performs one-on-one
dedicated filtering, for each array of bundled filters. With those proposed modules on
hands, the implementation of proposed monocular estimation multi-resolution DNN
can have the variants of full regular modularization. Moreover, as Fig. 4 shown, the
proposed DNN can be easily made as the type of student-teacher DNN. By adding
more 3r-TRMs, the student-and-teacher framework can be attained that the student
network can play a role of fundamental network shown in Fig. 4.
Fig. 4. Diagram of proposed monocular estimation multi-resolution DNN.
The constructed DNN in Fig. 4 can already outperform the compared DNNs in the
tests with dataset KITTI under the training loss function designed for the depth esti-
mation field. Hence, we only perform it as the proposed DNN. The first loss term of
total training loss given later is the mean square error (MSE) of pixel-wise depth
difference denoted by
1X N 2
LMSE Dp ; D ¼ di di ; ð1Þ
N i¼1
where d i and d i are the ith pixel value in the predicted depth map denoted by Dp and the
N GT depth map D , respectively, which have N points. The second term is the profile
loss given by

LSSIM Dp ; D ¼ 1 SSIM Dp ; D ; ð2Þ
where
2lX lY þ c1 2rX rY þ c2 rX;Y þ 0:5c2

SSIM ðX; YÞ ¼ ;
l2X þ l2Y þ c1 r2X þ r2Y þ c2 rX rX þ 0:5c2
while computing the structural similarity index measure (SSIM) for images X and Y
that l and r are mean and standard deviation, respectively, of compared images. The
third term is the gradient loss defined by
1X N
LG Dp ; D ¼ rh ðdi Þ rh d þ rv ðdi Þ rv d ð3Þ
i i
N i¼1
where rh ð:Þ and rv ð:Þ are to calculate the horizontal and vertical components of
gradient, respectively, for the pixel in parentheses that the gradient operator
rð:Þ ¼ ðrh ð:Þ; rv ð:ÞÞ. The final term is to the edge-aware smoothness/regularization
term defined by
1X N
LS Dp ; D ¼ jrh ðdi Þjejrh ðIi Þj þ jrv ðdi Þjejrv ðIi Þj ; ð4Þ
N i¼1
where Ii is the ith -pixel strength of the luminance image which is sent to the proposed
DNN as the depth estimation target in Fig. 4. With a forementioned four terms on
hands, their weighted sum is the total loss given by

LT Dp ; D ¼ a1 LMSE Dp ; D þ LSSIM Dp ; D þ a2 LG Dp ; D þ a3
LS Dp ; D ; ð5Þ
where we can empirically set a1 = 0.1, a2 = 0.3 and a3 = 0.5 for considering the nor-
malization and significance of dynamic range, both.
4 Experiments
In this section, we evaluate the performance of our mono-Net by comparing it with

some state-of-art mono-Nets on KITTI dataset for depth predictions of single-view
landscape color image. Each original low-resolution depth map is interpolated as the
GT depth map of 1244 376 pixels same as the pixel number of color image in KITTI
dataset by using the tool of NYU Depth V2. There are totally 11,348 landscape depth-
color pairs in, KITTI dataset that they were divided into 9,078 training frames and
2,270 testing frames for our simulation work. We implement our depth estimation
network by our own programming using Python 3.6 in Windows 10 and Intel i7-9700k
with GeForce RTX 2080 Ti, 16GB. The recursive training routine is set to have 20
epochs with 10000 steps per epoch and 0.0001 learning rate that the batch size is only
one frame for matching the capability of inexpensive GPU hardware. In the simulation,
our proposed DNN shown in Fig. 4 are compared with the other DNNs in literatures
being Eigen’s DNN [1], Song’s DNN [2], Fu’s DNN [4] and Alhashim’s DNN [5]. The
qualitative comparisons between our mono-Net and the other four mono-Nets in
visualization shown in Fig. 5, where the displayed results of estimated depth maps are
majorly selected from KITTI dataset for the aim of testing traffic landscape scenes. For
fair comparisons, we only measure quantitative qualities of the depths, which are
located at the locations of original available GT depth pixels in KITTI dataset without
the depths of interpolated pixels. The quantitative qualities include four reconstruction-
error metrics given by absolute relative difference (Abs_rel), squared Relative differ-
ence (Sq_rel), RMSE, and the logarithmic RMSE (RMSE in log), as well as the
percentage of correct depth-estimated pixels. The smaller the reconstruction errors, the
higher accuracy the mono-DNN can obtain. For counting the number of correct pre-
dicted pixels, the difference of estimated
and GT depth at the jth checked pixel
depth
can be formularized by dj ¼ min dj =d j ; d j =dj . By neglecting the subscript index
ofdj , such a difference measured for all checked depths is identically expressed by
symbol d such that the error-toleration criterions can be set as d < 1.25, d < 1.252 and
d < 1.253 for correct-prediction approval. The quantitative comparisons are tabulated at
Table 1 and Table 2 which demonstrate that the proposed mono-DNN can acquire the
higher percentages of correct estimated pixels and the lower reconstruction errors on
average than the compared mono-Nets, respectively.
Table 1. Comparisons of percentages of correct pixels estimated from our mono-Net and the
other four mono-Nets under the error-toleration thresholds given by 1.25, 1.252, and 1.253.
d < 1.25 d < 1.252 d < 1.253
Eigen’s DNN [1] 0.692 0.899 0.967
Song’s DNN [2] 0.893 0.976 0.992
Fu‘s DNN [4] 0.932 0.984 0.994
Alhashim’s DNN [5] 0.886 0.965 0.986
Proposed 0.999 1.000 1.000
Table 2. Comparisons of average reconstruction errors for our mono-Net and the other mono-
Nets in terms of absolute relative difference, squared Relative difference, RMSE (linear), and
RMSE (in log).
Abs_rel Sq_rel RMSE
Eigen’s DNN [1] 0.190 1.515 7.156
Song’s DNN [2] 0.108 0.278 2.454
Fu‘s DNN [4] 0.072 0.307 2.727
Alhashim’s DNN [5] 0.093 0.589 4.170
Proposed 0.028 0.353 3.903
Since the proposed DNN is fully modularized through layer-by-layer development,

its simplification can be very easy. In our simulation, we have replaced 4t-RRDB by
the plain 4-tier residual module thorough the entire network for acquiring several times
the regular computation reduction that the monotonous 4-tier residual module has only
a skip connection linking its head and tail without inner short-distance short cuts.
However, when the basic module has the dense inner short-distance short cuts, the
proposed DNN can maintain the gradual degradation of depth inference accuracy along
saving the convolutional layers once a module bundle. The Fig. 5 can demonstrate that
the the proposed mono-Net can achieve the estimated depth maps with pertinent shape-
preservation and dividable-area smoothnees for street landscape scenes.
(a) KITTI image-depth pairs. (b) KITTI image-depth pairs.
(c) KITTI image-depth pairs. (d) KITTI and NYU-v2 image-depth pairs.
(e) KITTI image-depth pairs.
Fig. 5. The qualitative comparisons in visualization of estimated depth maps selected from
KITTI dataset between (a) Eigen’s DNN, (b) Song’s DNN, (c) Fu’s DNN, (d) Alhashim’s DNN,
and (e) the proposed DNN, where the first, second and third rows displays the color images, the
GT depth maps and the estimated depth maps, respectively.
5 Conclusion
In this paper, we present a dense residual multi-resolution supervised DNN in high

modularization toward accurate monocular depth estimations for traffic landscape
scenes. The architecture of proposed DNN is built by regularly integrating the dense
residual connections into the multi-resolution backbone such that it needn’t be deep-
ened for generating the details of estimated monocular depths. For highly modularizing
the proposed architecture, we address three layered modules to facilitate the amount of
rationally controlling the adequate levels and layers under the requested prediction
accuracy. The visualization and quantitative results of landscape experiments can
demonstrate the superiority of our DNN to compared DNNs. Particularly, the proposed
mono-Net can generate well shape-preservation depth map such that the depth-
generated multi-views could have the considerable perceptual comfort for bare-eye 3D
viewing.
Acknowledgments. This paper is supported by the funding granted by Ministry of Science and
Technology of Taiwan, MOST 109–2221-E-415 -016 -
References
1. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-
scale deep network. In: Advances in Neural Information Processing Systems, vol. 27 (NIPS),
December 2014
2. Song, M., Kim, W.: Depth estimation from a single image using guided deep network. IEEE
Access 7, 142595–142606 (2019)
3. Kim, Y., Jung, H., Min, D., Sohn, K.: Deep monocular depth estimation via integration of
global and local predictions. IEEE Trans. Image Process. 27(8), 4131–4144 (2018)
4. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network
for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2002–2011 (2018)
5. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning.
6. Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward
higher resolution maps with accurate object boundaries. In: IEEE Winter Conference on
Applications of Computer Vision, Waikoloa Village, HI, USA, pp. 1043–1051, March 2019
7. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with
left-right consistency. In: Proceedings IEEE Conference on Computer Vision and Pattern
Recognition, pp. 270–279, July 2017
8. Pilzer, A., Lathuilière, S., Sebe, N., Ricci, E.: Refine and distill: exploiting cycle-
inconsistency and knowledge distillation for unsupervised monocular depth estimation.
CVPR, pp. 9768–9777, June 2019
9. Wong, A., Soatto, S.: Bilateral Cyclic Constraint and Adaptive Regularization for
Unsupervised Monocular Depth Prediction, pp. 5644–5653. CVPR, Open Access paper
(June 2019)
10. Ye, X., Fan, X., Zhang, M., Xu, R., Zhong, W.: Unsupervised monocular depth estimation
via recursive stereo distillation. IEEE Trans. Image Process. 30, 4492–4504 (2021)
11. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation
with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu,
C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018).
https://doi.org/10.1007/978-3-030-01267-0_4
12. Godard, C., Aodha, O.M., Firman, M., Brostow, G.: Digging into self-supervised monocular
depth estimation, pp. 3828–3838. ICCV, Open Access paper (Oct. 2019)
13. Lee, J.H., Han, M.-K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar
guidance for monocular depth estimation, CVPR, arXiv preprint arXiv:1907.10326, June
2020
14. Song, X., et al.: MLDA-Net: multi-level dual attention-based network for self-supervised
monocular depth estimation. IEEE Trans. Image Process. 30, 4691–4705 (2021)
15. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human
pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 5693–5703 (2019)
A Decentralized Federated Learning Paradigm
for Semantic Segmentation of Geospatial Data
Yash Khasgiwala, Dion Trevor Castellino(&), and Sujata Deshmukh
Department of Computer Engineering, Fr. Conceicao Rodrigues College

of Engineering, Mumbai, India
{yashkhasgiwala,diontrevorc}@gmail.com,
sujata.deshmukh@fragnel.edu.in
Abstract. Data-driven deep learning is recognized as a promising approach to

building precise and robust models to classify and segment complex images.
Road extraction from such complex aerial images is a hot research topic in
geospatial data mining as it is essential for sustainable development, urban
planning, and climate change research. It is unheard of for individual satellites to
possess many data samples to build their personalized models. Centralizing
satellite data to train a model on varied data is also infeasible due to privacy
concerns and legal hurdles. This makes it a challenge to train Deep Learning
algorithms since their success is directly proportional to the amount of diverse
data available for training, preventing Deep Learning from reaching its full
potential. Federated Learning (FL) sidesteps this problem by collaboratively
learning a shared prediction model without sharing data across satellites. This
paper constructs a semantic segmentation-based FL system that leverages the
strengths of shared learning and Residual architecture combined with Unet for
road extraction. We train and evaluate our system on a public road dataset
reorganized into a heterogeneous distribution of data scattered among multiple
clients and compare it with models trained locally on individual satellites. We
further try to enhance the performance of our FL-based model by implementing
various versions of Unet.
Keywords: Federated learning Data privacy Dense Unet Residual Unet

Semantic segmentation
1 Introduction
Every day, tonnes of devices collect millions of data points that can be used to improve
services. However, data is not shared freely because of privacy concerns, computa-
tional constraints of collecting data at a centralized location, and legal restraints. For
example, satellites revolving around the earth cover different geological landmasses. To
train models to differentiate between certain geospatial features, it is necessary to
leverage data that these satellites collect. However, since these aerial images could
contain sensitive data specific to a region, data sharing is restricted. Such situations
warrant the use of Federated Learning (FL), where multiple clients (or devices) col-
lectively learn a model by sharing the weights of the local models trained on their
respective data. This kind of learning is possible in today’s world because of the
https://doi.org/10.1007/978-3-030-93247-3_20
A Decentralized Federated Learning Paradigm for Semantic Segmentation 197
massive amount of data split among various devices. FL has many advantages, such as
generating a more intelligent model while ensuring lower latency, less energy con-
sumption, and most importantly, keeping data privacy intact as the clients only share
the updated parameters, not the data itself.
Consequently, this eliminates the requirement of powerful hardware, thereby
making it possible to execute these computations on various interconnected IoT
devices. This can be extended to include the satellites on which we depend for navi-
gational purposes that cover the different specific regions of our planet. These satellites
can capture and map high-resolution images of the landmass they observe. We can
mine geospatial data such as buildings, roads, vegetation, etc., with the help of
semantic segmentation from these images. Semantic segmentation is the procedure of
labeling specific areas of an image. Combined with semantic segmentation, FL can be
used to significant effect for road extraction from geospatial data.
2 Related Work
Google conceptualized FL to improve model training on devices with both data and
computational restraints such as cellular devices, tablets, etc. (Konecny et al. [1]). This
development then spurred the creation of an FL system independent of a central server
which usually orchestrated the training process. Apart from reducing dependence on
the central server, this also speeds up the model training process as the latency reduces
due to direct peer-to-peer communication (Roy et al. [2]). Nvidia then explored the
feasibility of this paradigm in the field of medical segmentation of brain tumor samples,
where data privacy is of utmost importance (Rieke et al. [3]). FL was used to improve
traffic congestion as the existing methods weren’t successful without continuous
monitoring (Xu et al. [4]).
Consequently, it was employed by a Network Intrusion Detection System to secure
Satellite-Terrestrial Integrated Networks (Li et al. [5]). This method is also used in its
asynchronous form as a part of geospatial applications (Sprague et al. [6]). Outside the
FL paradigm, Aerial Detection and classification of vehicles is achieved through
semantic segmentation (Audebert et al. [7]).
While most previous research attempts rely on homogeneously distributed data,
most of the FL-aided segmentation on heterogeneous data is done in the Healthcare
sector. We simulate a heterogeneous distribution of data spread among three satellites
to mimic real-world satellite locations, which are used to train models locally on each
satellite. The performance of these local models is validated against the performance of
the FL model, which takes advantage of the same data. Both of these approaches strive
to implement semantic segmentation on their respective data. We have used a modified
version of the Residual Unet [6] to compare the performance of these two approaches.
Modifications in the Residual Unet are made concerning the number of residual blocks
and the upsampling algorithm. We then use a Dense [7] and a standard Unet [8] with
FL to gauge the relative performance of the Deep Residual Unet with the FL approach.
198 Y. Khasgiwala et al.
3 Methodology
3.1 Algorithm
In this section, we discuss the algorithm for FL and semantic segmentation. In the real-
world implementation of FL, each federated member will have its data coupled with it
in isolation. FL aims to ship a copy of global model weights to all the local clients,
where each client will use these weights to gradually learn and train on the local data
while simultaneously updating its model weights. The client will then ship its model
weights back to a centralized server after each epoch, where these intermediate local
model weights are aggregated to form an updated global model. The parameters of this
global model are then shared with the clients to continue training based on the new
parameters.
To simulate FL in a real-world scenario, we randomly distribute data in a hetero-
geneous manner among three satellites. We then build the global model with an input
shape of (256, 256, 3). We obtain the weights of the global model and initialize an
empty list to store the weights of the local models. The client-side receives the global
model weights and the total number of data points across all clients from the centralized
server. The local model at the client end is built and initialized with global weights.
This model is trained on the data possessed by that particular client. The weights of this
model are then multiplied by a scaling factor which is the ratio of the total number of
local samples and total samples across all clients. This is to ensure that the clients’
parameters are given weightage concerning the size of data they contain i.e., a client
with a relatively large amount of data points will get more weightage than a client with
a relatively small amount of data points. After each local model is trained once, i.e. one
communication round (1 epoch), the local parameters of each client are added and set
to the global model. These global model parameters are then sent back to the clients for
further training until a fixed number of communication rounds are complete.
Semantic segmentation is the process of assigning each pixel in an image to a class
label. Its architecture consists of an encoder-decoder module. The encoder learns a
rudimentary representation of the input image while the decoder semantically projects
this learned feature map onto the pixel space to get a dense classification. During
training, each pixel is assigned a class label predicted by the model. Loss is computed
between each pixel of the predicted mask and the ground truth mask. The gradients are
then computed using backpropagation. Training is done until the loss function con-
verges to a global minimum. The Intersection over Union is calculated to evaluate how
accurately the predicted mask resembles the ground truth mask (Fig. 1).
Fig. 1. Client-side and server-side FL algorithm
3.2 Model
Residual Unet is an improvement over the traditional Unet architecture. It is designed
to solve the problem of vanishing and exploding gradients in deep networks. It benefits
from the identity mapping feature of deep residual networks and semantic represen-
tation learning of Unet architecture. We implement a deeper and modified version of
the Residual Unet. The model consists of an encoder, a bridge, and a decoder. The
encoder facilitates the learning of a summarised representation of the input image. The
bridge connects the encoder and the decoder. The decoder utilizes the feature map from
the encoder to restore the segmentation map.
As seen in Fig. 2, the encoder has four residual blocks (64, 128, 256, and 512
kernels in the convolution operations in each residual block, respectively). Instead of
using a pooling operation to reduce the spatial dimension of the feature map, a stride of
two is applied in the first convolution operation of the second, third, and fourth residual
block; this reduces the dimensions of the feature map by a factor of 2. The input of the
encoder block is concatenated with its output. This eases network training and helps in
information exchange between lower and higher levels of the network without
degradation. This concatenated output of the encoder block is fed to the corresponding
decoder block and the succeeding encoder block as a skip connection to facilitate
upsampling and feature extraction, respectively. The bridge also consists of a convo-
lution operation (1024 kernels) incorporating a stride of two. There are four residual
blocks 512, 256, 128, and 64 kernels in the convolution operation in each residual
block, respectively, in the decoder. At the start of each decoder block, the lower-level
feature map is up-sampled using a transposed convolution instead of the upsampling 2d
operation. The transposed convolution operation has weights; hence it learns to
reconstruct the image from the lower feature maps rather than using the nearest/bilinear
interpolation technique used by the upsampling 2d operation. The transposed convo-
lution and the corresponding encoder block outputs are concatenated and passed
through a similar residual convolution block as the encoder. The output of the
transposed convolution operation and the decoder block is concatenated to simulate the
identity mapping of the residual network. The output of the last decoder block passes
through a 1 1 convolution operation (1 kernel) and a sigmoid activation layer to
project it into a single dimension segmentation mask.
3.3 Loss Function and Evaluation Metrics

We have wielded a union of binary cross-entropy (BCE) loss and soft dice loss. BCE
loss computes the difference between the class probability of each pixel in predicted
and ground truth masks, thereby asserting equal learning to each pixel in an image.
This proves to be a disadvantage for datasets with an imbalance between the mask and
non-mask pixels. BCE loss is a per-pixel loss that is determined discreetly without
considering whether the adjacent pixels are part of the mask or not. On the other hand,
soft dice loss is a measure of overlap between predicted and target masks. The ground
truth boundary and predicted boundary pixels can be viewed as two sets in semantic
segmentation. Dice Loss directs the two sets of masks to overlap. In Dice loss, the
numerator considers the overlap between the two sets at the local scale. At the same
time, the denominator takes the total number of boundary pixels at a global scale into
account. As a result, dice loss effectively accounts for both local and global informa-
tion, thereby making it possible to achieve high accuracy. As seen in (1) (where pi
stands for the prediction, yi stands for ground truth and i ranges from 1 to N), we
combine the two-loss functions to benefit from the gradient stability of BCE loss and
the local and global context of soft dice loss.
1X 2j X \ Y j
BCE Dice Loss ¼ yi log pi þ ð1 yi Þlogð1 pi Þ þ 1 ð1Þ
N jX [ Y j
TP
TI ¼ ; a ¼ 0:6; b ¼ 0:4 ð2Þ
TP þ aFN þ bFP
jA \ Bj
J ðA; BÞ ¼ ð3Þ
jA [ Bj
Tversky index [9], as shown in (2), is helpful for highly imbalanced datasets since it
consists of constants ‘alpha’ and ‘beta’ that serve to penalize false negative and false
positive predictions respectively to a higher degree. This increased control on the
evaluation metric gives a more accurate measure of prediction than the standard dice
coefficient while informing us about the model’s performance in edge cases. The
Jaccard Index (3) also known as Intersection over Union (IoU), is the ratio between the
positive instance overlap of two sets and their mutual combined values. These indices
range between 0 and 1, where 0 signifies the worst performance and 1 signifies the best
performance.
Fig. 2. Modified Residual Unet architecture. The ‘ +’ sign indicates concatenation of block
input and output(identity mapping) i.e. x + F(x). The ‘x’ sign indicates a skip connection from
the encoder to the decoder.
4 Experiments
4.1 Dataset
We utilize the Massachusetts Road Dataset [10] to implement semantic segmentation.
The dataset consists of 1171 aerial images of the state of Massachusetts. Each of these
images having a height and width of 1500 pixels covers a wide range of urban, suburban,
and rural regions encompassing 2600 square kilometers. These images are cropped into
image tiles of size 256 256. On account of hardware constraints, we limit our dataset to
3200 images and their corresponding target masks, which include 480 samples to be used
as a test set. The remaining 2720 samples are distributed among three satellites to simulate
FL wherein satellites A, B and C comprise 960, 1280, and 480 training samples,
respectively. All the pixels of the target masks are converted into 0 or 1 depending on the
classes they represent(road or background)to achieve the task of binary segmentation.
4.2 Implementation and Results

We implement the two training methods by employing the Keras framework and
minimizing the loss function (1) through the Adam [11] optimization algorithm. In this
implementation, we utilize fixed-sized images (256 256) to train the model. These
images are flipped horizontally in a random fashion to implement data augmentation.
We train the model on an NVIDIA TESLA P100 GPU while keeping the mini-batch
size 16. The learning rate was initially set to 0.01 and reduced by a factor of 0.1 in
every 15 epochs. The network will converge in 40 epochs.
Model evaluation results on the test set are listed in Tables 1 and 2. The FL model
is compared with the local models trained on each satellite. As seen [Table 1], the
proposed FL procedure achieves a much better segmentation performance than the
local models, as evidenced with the lowest BCE-Dice Loss of 0.257, the highest mean
IoU of 0.581, and the highest Tversky Index of 0.740 without sharing clients’ data. As
seen in Fig. 3, the FL-trained model reaches a global minimum more smoothly than
locally trained models. Out of the three individual satellites, Satellite C has the least
amount of data points. Hence, it will perform relatively worse as compared to the other
two satellites, as evidenced by our results. Satellite A comes next, with a very average
performance that is still non-satisfactory. Even with more data points, Satellite B’s
performance, although satisfactory for the data points it possesses, still falls short of the
precision achieved by the FL method.
We further try to achieve better performance by training various versions of Unets
with the same FL method. Dense Unet performs better than the standard and Residual
Unet with the highest mean IoU and Tversky Index of 0.592 and 0.743, respectively.
This is because, in each dense block, all the previous layers are concatenated in the
channel dimension and fed as input to the next layer. In the residual model, only the
identity function of the prior input and respective block output is concatenated and
passed ahead. This short circuit connection in the Dense Unet ensures the reuse of prior
feature maps, which helps Dense Unet to achieve better results than Residual Unet with
fewer parameters and lower computational cost. As a result of the dense connection, it
is easier to compute gradients by backpropagation in Dense Unet because each layer
can reach directly to the final error signal.
Table 1. Comparison between local models and FL model (Residual Unet)

Satellite BCE-Dice loss mIoU Tversky index
Sat A 0.293 0.549 0.708
Sat B 0.286 0.557 0.712
Sat C 0.300 0.542 0.675
FL 0.257 0.581 0.740
Table 2. Comparison between Unet style architectures used with FL

Model BCE-Dice loss mIoU Tversky index
Unet 0.271 0.562 0.728
Dense Unet 0.266 0.592 0.743
Residual Unet 0.257 0.581 0.740
Fig. 3. (clockwise) Mean IoU plot for Residual, Dense, and Standard Unet; BCE-Dice loss,
mean Tversky Index and IoU plots respectively for the 4 trained models.
As seen in Fig. 4, the individual models plot fragmented bits of roads while the
Dense Unet based FL model plots the roads correctly. Unlike the FL model, the models
wrongly classify many houses as roads, which barely makes such errors. As seen in
columns 2 and 6 in Fig. 4, our FL-based model further detected roads that were not
labeled in the ground truth mask.
Fig. 4. Segmentation results on the test set. (From top to bottom) Original image, ground truth
mask, Sat A model prediction, Sat B model prediction, Sat C model prediction, Residual Unet FL
model prediction, and Dense Unet FL model prediction.
5 Conclusion
In this paper, we propose an FL-based semantic segmentation system for road

extraction from geospatial data. We compare it to training on a single device with the
corresponding data constraints. We discover that it performs significantly better than
the local models while maintaining the privacy of the data possessed by the individual
clients. To further improve the inference accuracy of the aerial federated segmentation
system, we train the data on various encoder-decoder models and ascertain that Dense
Unet performs the finest. It also detects roads that aren’t present in the ground truth
masks while keeping misclassification errors to a minimum. FL is undoubtedly a

promising approach to deliver precise and secure models. By permitting multiple
devices to train collaboratively without the necessity to exchange data, FL addresses
privacy concerns without impacting performance. In the future, this paradigm can be
explored in great detail by experimenting with different loss functions, the number of
satellites, and other variations of FL (Asynchronous FL) and applied in various fields
thereby extracting maximum value from the data at hand without making it available to
entities other than the source.
References
1. Konecny, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D.: Federated
learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.
05492 (2016)
2. Roy, A.G., Siddiqui, S., Pölsterl, S., Navab, N., Wachinger, C.: BrainTorrent: a peer-to-peer
environment for decentralized federated learning. arXiv abs/1905.06731 (2019)
3. Rieke, N., et al.: The future of digital health with federated learning. NPJ Digital Medicine 3,
1–17 (2020)
4. Xu, C., Mao, Y.: An improved traffic congestion monitoring system based on federat-
edlearning. Information 11(7), 365 (2020). https://doi.org/10.3390/info11070365
5. Li, K., Zhou, H., Tu, Z., Wang, W., Zhang, H.: Distributed network intrusion detection
system in satellite-terrestrial integrated networks using federated learning. IEEE Access 8,
214852–214865 (2020). https://doi.org/10.1109/ACCESS.2020.3041641
6. Sprague, M.R., Jalalirad, A., Scavuzzo, M., Capota, C., Neun, M., Do, L., Kopp, M.:
Asynchronous federated learning for geospatial applications. DMLE/IOTSTREAMING
@PKDD/ECML (2018)
7. Audebert, N., Le Saux, B., Lefèvre, S.: Segment-before-detect: vehicle detection and
classification through semantic segmentation of aerial images. Remote Sensing. 9(4), 368
(2017). https://doi.org/10.3390/rs9040368
8. Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual U-Net. IEEE Geosci.
Remote Sens. Lett. 15(5), 749–753 (2018). https://doi.org/10.1109/LGRS.2018.2802944
9. Guan, S., Khan, A.A., Sikdar, S., Chitnis, P.V.: Fully dense UNet for 2-D sparse
photoacoustic tomography artifact removal. IEEE J. Biomed. Health Inf. 24 (2020)
10. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image
segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015.
LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-
24574-4_28
11. Abraham, N., Khan, N.M.: A novel focal tversky loss function with improved attention U-
Net for lesion segmentation. In: 2019 IEEE 16th International Symposium on Biomedical
Imaging (ISBI 2019), pp. 683–687 (2019)
12. Geoffrey, E.H., Mnih, V.: Machine learning for aerial image labeling (2013)
13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980
(2015)
14. Intelligent Computing & Optimization, Conference proceedings ICO 2018, Springer, Cham,
ISBN 978-3-030-00978-6
Intelligent Computing and Optimization 2019 (ICO 2019), Springer International Publish-
ing, ISBN 978-3-030-33585-4
16. Intelligent Computing and Optimization
17. Proceedings of the 3rd International Conference on Intelligent Computing and Optimization
2020 (ICO 2020)
Development of Contact Angle Prediction
for Cellulosic Membrane
Ahmad Azharuddin Azhari bin Mohd Amiruddin1,2,

Mieow Kee Chan1(&), and Sokchoo Ng3
1
Centre for Water Research, Faculty of Engineering, Built Environment and
Information Technology, SEGi University, Jalan Teknologi, Kota Damansara,
47810 Petaling Jaya, Selangor, Malaysia
ahmad.azhari@privacyrequired.org, mkchan@segi.edu.my
2
Department of Chemical Engineering, Universiti Teknologi PETRONAS,
Bandar Seri Iskandar, 32610 Perak, Malaysia
3
Faculty of Arts and Science, International University of Malaya-Wales,
Kuala Lumpur, Malaysia
ashleyng@iumw.edu.my
Abstract. Contact angle (CA) of a membrane determines its application.

Accurate CA prediction models are available. Nevertheless, detailed membrane
roughness properties and thermodynamic data of the interaction between the
membrane and water droplet are required. These data are not easily accessible,
and it is not available for newly developed material. This study aims to apply
Artificial Neural Network to estimate the CA by using pure water flux, mem-
brane porosity and its pore size as inputs. This model was tested on two type of
filtration processes: dead end (DE) and cross flow (CF). The results showed that
the prediction for DE achieve an overall accuracy of 99% with a sample size of
53 data sets. The prediction for CF could be done by using DE + CF model with
a maximum R2 at training stage of 0.9456. In conclusion, a novel statistical
solution to predict CA for cellulosic membrane was developed with high
accuracy.
Keywords: Cellulose membrane Artificial neural network Contact angle

Pure water flux Cross flow
1 Introduction
Membrane is a selective barrier, which allows the selected component in the feed
stream to pass through it, while the rest of the components are retained. This separation
process is governed by pressure difference, temperature difference or concentration
gradient between the feed stream and product stream. Currently, membranes are widely
used in wastewater treatment to remove dyes [1] and oil [2]. In medical and phar-
maceutical field, membranes are used to remove uremic toxic from the blood of patients
with kidney failure [3] and protein separation [4]. The efficiency of the separation
processes is strongly depending on the surface properties of the membrane such as
hydrophilicity.

https://doi.org/10.1007/978-3-030-93247-3_21
208 A. A. A. bin Mohd Amiruddin et al.
Hydrophilicity of a membrane is revealed by the value of the contact angle (CA).

The membrane is classified as hydrophilic if the contact angle is less than 90° and it is
hydrophobic if the CA is more than 90°. Zhang et al. [5] fabricated superhydrophobic
polyvinylidene fluoride (PVDF) membranes with CA more than 150° and the mem-
brane showed superior performance in water-oil mixture purification. As a result, oil
with more that 99.95% purity was obtained. Wang et al. [6] fabricated hydrophilic
electrospun nanofiber membrane-supported thin film composite membrane for water
treatment. The low-pressure plasma treatment reduced the CA of the composite
membrane from 137° to 0°. The osmotic water flux of the membrane was at least 40%
higher compared to the commercial membrane.
Young’s equation and Cassie-Baxter model are widely used to determine the
contact angle of a surface. However, the accuracy of the equations is strongly
depending on the nature of sample. Young’s equation is applicable for ideal
homogenous surface, which is rigid, perfectly flat and non-reactive [7]. Meanwhile, the
contact angle of flat heterogeneous surface can be estimated by Cassie-Baxter model
[8]. Practically, it is hard to obtain sample with perfectly smooth surface. Thus, Luo
et al. [9] concluded that Cassie-Baxter need to be modified with appropriate geomet-
rical models to obtain accurate result, which is closed to experimental data. Bahramian
and Danesh [10] attempted to evaluate the contact angle of sample by estimating the
solid-vapour and solid liquid interfacial tension. However, detailed thermodynamic
data such as liquid critical temperature and the melting point of the solid sample were
required. These data are not easily accessible, and it may not be available, especially for
newly developed materials.
Contact angle goniometer has been used to measure the contact angle of a mem-
brane. However, the instrument is costly, and the accuracy of the result strongly
depends on the experience of the user. Previous study showed that the contact angle of
a membrane was affected by pure water flux (PWF), membrane porosity and its pore
size [11]. However, detail modelling work to predict the contact angle has not been
done. Thus, the objective of this study is to estimate the contact angle of cellulosic
membrane via mathematical approach. A multi-layered Artificial Neural Network
(ANN) that models the hydrophilicity of membrane was developed. PWF, membrane
porosity and mean pore size were used as the inputs and the network then estimated the
contact angle of the membrane by implementing the feed-forward propagation.
2 Methodology
2.1 Data Collection

Data of membrane properties including PWF (L/hr.m2), porosity(dimensionless), pore
size (in diameter, nm) and CA were obtained through random sampling of various
cellulose-based membranes in the literature [12–27].
Development of Contact Angle Prediction for Cellulosic Membrane 209
2.2 Model Development

The model was developed by using MATLAB Function Fitting Neural Network fea-
ture, which is an ANN optimization tool for data fitting analysis. The input to be used
in the network was entered as an array of ‘3 by n’, where n was the number of data set
to be used for the modelling with the columns corresponding to any consistent order of
the membrane properties data. The membrane properties data were grouped into one
variable, and the contact angle was the target. Two sets of input-target pair were
prepared for two different filtration processes, namely dead end (DE) and cross flow
(CF) filtration.
Figure 1 shows the algorithm of the contact angle prediction. Firstly, the network
was fed with the membrane data set; the performance labelled as the inputs, the CA as
the targets, in the nftool GUI of MATLAB, an ANN algorithm optimized for curve
fitting. The training method used was the Levenberg–Marquardt (LM) algorithm, with
two hidden layers of a tangent-sigmoid transfer function, and linear transfer function.
The algorithm also included a normalization of the data set to ranges between [−1, 1].
Initially, a neuron size of 10 was adopted. The cross-validation split of the membrane
was pre-set as 70-15-15 at the initial stage of study. This indicated that 70% of the
collected data was used as training data to create the contact angle prediction model and
15% of the data was used for verification/validation purpose, which showed the adap-
tation of the model to additional data. The remaining data was used for testing purpose to
evaluate the performance of developed model using the foreign data. The performance
of the network was analysed using two indicators: the regression factor, R2 and average
mean square errors (MSE). R2 indicated the linear fit of the predicted contact angles
while and MSE showed the mean square error between predicted contact angle and
actual contact angle. In this study, the desired R2 was 0.99 and MSE was ±5 [28].
Fig. 1. Flowchart of the methodology

Next, the model was optimized by studying the effect of neuron size and TVT split
on the developed model. The studied range of neuron size was within 6 to 18 while the
TVT split was between 60-20-20 to 80-10-10. Lastly, normality test was conducted on
the collected data by using SPSS v22 to further improve the model performance. This
was carried out by observing the kurtosis and skewness values of the data. The data are
normal distributed if kurtosis and skewness values are within ±3. After excluding the
non-normal distributed data, the model was developed using the optimum neuron size
and TVT split, which were identified in the earlier study.
3.1 Effect of Filtration Process

The total data set consist of 77 types of cellulose membranes in which 53 set of the data
was collected from DE filtration, while the remaining 24 set of data was collected from
CF filtration. Table 1 showed the range of the membrane properties data. It was found
that all the contact angle data were less than 90°, which was due to the nature of the
hydrophilic cellulose membrane.
Table 1. The range of the data collected from 77 types of cellulose membranes.
Parameter Range Average CA value
Mean pore size (nm) 1–10 38.67o
10–172 57.34o
PWF (L/(m2.h)) 0–100 55.52o
100–347 39.78o
Porosity 0.23–0.50 43.91o
0.50 0.853 26.28o
Results shown in Table 2 were the four best performing network models from
multiple trials of the different filtration processes. Each model is built from the same
data set, but with different seeding for the neuron’s initial weight generation. DE + CF
indicated that all the collected membrane properties data was used to develop the
prediction model by assuming the effect of filtration mode was negligible. For DE and
CF, the collected data was separated according to the filtration process. Table 2 showed
that the maximum R2 for DE + CF was 0.9456 for the training stage. The R2 value was
improved significantly when the mode of filtration process was considered. Maximum
R2 value for DE was 0.9992 for the training stage.
This could be explained by the fundamental difference between the filtration pro-
cesses. The water is flows tangentially across the membrane surface in CF while the
water flows perpendicularly to the membrane surface in DE. According to Amot et al.
[19], flux collected from CF filtration was generally higher compared to DE under the
same pressure difference and velocity of water for the same membrane. This implies
that mode of filtration processes affects the PWF and thus influence the R2 and MSE of
the contact angle prediction model.
On the other hand, compared to DE, lower R2 value was obtained for CF due to the
small sample size. For a sample size of only 24, the CF model consistently showed
large error margins during the testing and validation stages of the model. Furthermore,
the R2 value from the training stage of the CF model generation failed to reach the
desired value, which was 0.99. Comparatively, 53 set of data was collected from the
literature for DE and the maximum R2 value of 0.9992 was obtained for the training
stage.
Table 2. Performance of four models developed from CF + DE, DE, and CF membrane data
sets, using TVT split of 70-15-15, with neuron size 10
Model Filtration processes DE + CF DE CF
Analysis MSE R2 MSE R2 MSE R2
1 Train 24.3500 0.9456 3.1576 0.9992 23.3227 0.4213
Verify 11.1018 0.9382 7.8304 0.9978 5.2629 0.9747
Test 80.1626 0.8311 13.5341 0.9713 62.7982 0.7002
2 Train 54.8051 0.8512 6.8173 0.9821 23.1988 0.4277
Verify 70.8633 0.8152 14.2553 0.9839 8.1841 0.9707
Test 23.1905 0.9443 7.1031 0.9857 11.3628 0.6166
3 Train 32.3205 0.9253 6.1354 0.9865 58.3246 0.1195
Verify 71.3056 0.8175 14.0482 0.9344 29.8945 0.5546
Test 154.0834 0.5669 14.5321 0.8673 120.9473 0.5833
4 Train 53.6511 0.8665 0.4987 0.9989 29.6789 0.3441
Verify 72.2611 0.7161 1.3064 0.9961 9.0283 0.3795
Test 23.6551 0.9373 3.3995 0.9892 70.9823 -0.4355
3.2 Effect of Neuron Size and TVT Data Split

Attempt to identify the optimal neuron size and TVT setting for DE filtration was
carried out and the result were shown in Tables 3 and 4. The R2 values were within the
range of 0.98 to 1.0 when the neuron size was increased from 6 to 18. 91% of data
showed that the R2 value was at least 0.99 and above. This showed that the neuron size
was found to be a non-significant contributor to the prediction performance. It is
notable that with a low and high neuron size, MSE values were high. Even though the
R2 values were high in relative to higher neuron size, the high MSE values were
undesirable. Even though errors were unavoidable, but it should be kept as minimum to
ensure the accuracy of the prediction model. The desired MSE values were obtained
when the neuron size used was 10 with a TVT data split of 70-15-15, as shown in
Table 4. 0.4987, 1.3064 and 3.3995 were obtained for training, validation and testing
stages. Ideally, while errors are unavoidable, it should be kept as minimum as possible
to ensure that the priority of an accurate prediction is not compromised. Thus, the
neuron size 10 was considered an optimum value for this case, which shows a low
MSE value across the entire range of models. This value can easily change with a
different set of data and under different TVT split conditions. Figure 2 and 3 below
shows the error histogram and regression factor of the contact angle values of the
optimum (neuron size 10) model developed.
Fig. 2. Histogram of contact angle errors from the TVT process of the most optimal DE network
Fig. 3. Regression plot of the contact angle errors from the TVT process of the most optimal DE
network, (a) Training, (b) Test, (c) Validation
Table 3. R2 for the optimal TVT setting and neuron size analysis for DE membrane data
TVT (%) Analysis/Neuron size 6 8 10 12 14 16 18
60-20-20 Train 0.9879 0.9982 0.9991 0.9992 1.000 0.9923 0.9813
Verify 0.9724 0.9053 0.9950 0.9695 0.9687 0.9855 0.9281
Test 0.9639 0.9795 0.9023 0.9822 0.9813 0.9489 0.9771
70-15-15 Train 0.9910 0.9919 0.9989 0.9996 0.9977 1.0000 0.9946
Verify 0.9676 0.9782 0.9961 0.9979 0.9833 0.9796 0.9743
Test 0.9806 0.9679 0.9892 0.9929 0.9865 0.9949 0.9602
80-10-10 Train 0.9936 0.9842 0.9998 0.9975 0.9989 0.9995 0.9950
Verify 0.9832 0.9693 0.9819 0.9994 0.9915 0.9985 0.9980
Test 0.9987 0.9385 0.9987 0.9868 0.9890 0.9795 0.9954
Table 4. MSE for the optimal TVT setting and neuron size analysis for DE membrane data
TVT (%) Analysis/Neuron size 6 8 10 12 14 16 18
60-20-20 Train 5.1909 0.6171 0.2621 0.3222 0.0038 4.1245 8.0640
Verify 10.2962 20.2044 9.7183 11.3304 15.9511 6.2275 32.9429
Test 27.0720 14.6689 20.4325 6.0475 8.1152 15.5020 6.8805
70-15-15 Train 3.4760 3.5643 0.4987 0.3961 0.9558 0.0029 2.1038
Verify 14.8919 7.5487 1.3064 1.1581 4.8992 5.3336 14.3581
Test 17.6584 8.6214 3.3995 7.3113 6.2851 4.2906 12.1641
80-10-10 Train 2.1808 6.7577 0.0850 1.0030 0.4937 0.1827 1.8193
Verify 16.0764 5.8134 5.2753 0.9080 3.5165 13.0305 1.4328
Test 2.1569 5.4514 0.3552 3.5709 1.6201 10.5707 4.6378
3.3 Optimization via Normality Approach

Normality test was conducted on all the collected membrane properties data for DE
filtration and the results were shown in Table 5. The kurtosis and skewness values for
all the data were within the range of ±3 except for mean pore size data as the kurtosis
value was 4.376. The normality test was conducted again after excluding six set of the
extremely large mean pore size data, which were within 80–100 nm. The normality
result was shown in Table 6 and all the remaining data was normally distributed.
Table 5. Normality results forall the DE membrane data

Membrane properties Skewness Kurtosis
Mean pore diameter 2.449 4.376
Pure water flux 0.567 −0.364
Porosity 0.339 −0.278
Contact angle 0.858 0.365
Table 6. Normality results after excluding the extreme DE membrane data

Membrane properties Skewness Kurtosis
Mean pore diameter 1.543 2.647
Pure water flux 0.616 −0.385
Porosity 0.285 0.289
Contact angle 0.719 −0.134
Table 7. Performance of model by using only normal distributed DE membrane data

Model Analysis MSE R2
1 Train 3.16 0.9992
Verify 7.83 0.9978
Test 13.53 0.9713
Train 6.82 0.9821
2 Verify 14.26 0.9839
Test 7.10 0.9857
Train 6.14 0.9865
3 Verify 14.05 0.9344
Test 14.53 0.8673
Train 0.50 0.9989
4 Verify 1.31 0.9961
Test 3.40 0.9892
The prediction model was developed again using only the normal distributed data
with a TVT split of 70-15-15 and 10 neuron sizes. Table 7 shows the performance of
the prediction model. The result showed that the normality of the data did not affect the
accuracy of the predicted contact angle. This might be due to the magnitude of the
membrane parameters, i.e., significance of each data point, contributing more to
determining the non-linear relationship of the membrane-contact angle behaviour.
Furthermore, the range of the data for each membrane properties was different. For
instance, the mean pore diameter and pure water flux are values that are not restricted
within a specific range of values, while the porosity and contact angles are essentially
restricted within a scale of 1 to 0 and 0 to 180, respectively.
4 Conclusion
A black-box model for predicting contact angle for cellulose-based membrane using
the properties data such as PWF, porosity, and mean pore size as inputs was suc-
cessfully developed using ANN. The development of a model combining both CF and
DE filtration data sets was not conducive, due to different filtration mechanism. The
model for CF membrane parameters failed to achieve high accuracy which could be
due to lack of sample availability whereby only 24 sample sizes were used in the model
training. A performance value of >0.99 in regression factor, and mean square error of
up to *3, were obtained within the sample size of 53 data sets used in the training,
validation and testing of a model for predicting the contact angle in DE membranes.
A total of 10 neuron size and 70-15-15 TVT split were adopted to develop this model.
This indicated that the prediction was done successfully. It is recommended to evaluate
the DE model with more data points from cellulosic membranes to improve its flexi-
bility and applicability. Additionally, more data need to be collected to develop the
prediction system for CF model. The experimental results also suggested using the
model developed from CF + DE to predict CF.
Acknowledgments. The support from SEGi University is highly appreciated.
References
1. Lin, J., et al.: Tight ultrafiltration membranes for enhanced separation of dyes and Na2SO4
during textile wastewater treatment. J. Memb. Sci. 514, 217–228 (2016)
2. Karakulski, K., Gryta, M.: The application of ultrafiltration for treatment of ships generated
oily wastewater. Chem. Pap. 71(6), 1165–1173 (2016). https://doi.org/10.1007/s11696-016-
0108-1
3. Chan, M.K., Idris, A.: Permeability performance of different molecular weight cellulose
acetate hemodialysis membrane. Sep. Purif. Technol. 75, 102–113 (2010)
4. Nor, M.Z.M., Ramchandran, L., Duke, M., Vasiljevic, T.: Application of membrane-based
technology for purification of bromelain. Int. Food Res. J. 24, 1685–1696 (2017)
5. Zhang, W., Shi, Z., Zhang, F., Liu, X., Jin, J.: Superhydrophobic and superoleophilic PVDF
membranes for effective separation of water-in-oil emulsions with high flux. Adv. Mater. 25,
2017–2076 (2013)
6. Gong, L., et al.: In situ ultraviolet-light-induced TiO2 nanohybridsuperhydrophilic
membrane for pervaporation dehydration. Sep. Purif. Technol. 122, 32–40 (2014)
7. K. Seo, M. Kim, D. H. Kim, Re-deviation of Young’s equation, Wenzel equation, and
Cassie-Baxter equation based on energy minimization, Surface Energy, 2015
8. Cassie, A.B.D., Baxter, S.: Wettability of porous surfaces. Trans. Faraday Soc. 40, 546–551
(1944)
9. Luo, B.H., Shum, P.W., Zhou, Z.F., Li, K.Y.: Surface geometrical model modification and
contact angle prediction for the laser patterned steel surface. Surf. Coat. Int. 205, 2597–2604
(2010)
10. Bahramian, A., Danesh, A.: Prediction of solid – fluid interfacial tension and contact angle.
J. Colloid Interface Sci. 279, 206–212 (2004)
11. Chan, M.K., Ng, S.C., Noorkhaniza, S., Choo, C.M.: Statistical analysis on the relationship
between membrane properties and the contact angle. In: Innovation and Analytics
Conference and Exhibition, Malaysia, pp. 53–57 (2016)
12. Maheswari, P., Barghava, P., Mohan, D.: Preparation, morphology, hydrophilicity and
performance of poly (ether-ether-sulfone) incorporated cellulose acetate ultrafiltration
membranes. J. Polymer Res. 20(2) (2013)
13. Kanagaraj, P., Neelakandan, S., Nagendran, A.: Preparation, characterization and perfor-
mance of cellulose acetate ultrafiltration membranes modified by charged surface modifying
macromolecule. Korean J. Chem. Eng. 31(6), 1057–1064 (2014). https://doi.org/10.1007/
s11814-014-0018-2
14. Jayalakshmi, A., Rajesh, S., Senthilkumar, S., Mohan, D.: Epoxy functionalized poly(ether-
sulfone) incorporated cellulose acetate ultrafiltration membrane for the removal of chromium
ions. Purif. Technol. 90, 120–135 (2012)
15. Dasgupta, J., et al.: The effects of thermally stable titanium silicon oxide nanoparticles on
structure and performance of cellulose acetate ultrafiltration membranes. Sep. Purif. Technol.
133, 55–68 (2014)
16. Jayalakshmi, A., Rajesh, S., Senthilkumar, S., Hari Sankar, H., Mohan, D.: Preparation of
poly (isophthalamide-graft-methacrylamide) and its utilization in the modification of
cellulose acetate ultrafiltration membranes. J. Ind. Eng. Chem. 20(1), 133–144 (2014)
17. Kee, C., Idris, A.: Modification of cellulose acetate membrane using monosodium glutamate
additives prepared by microwave heating. J. Ind. Eng. Chem. 18(6), 2115–2123 (2012)
18. Rajesh, S., Shobana, K., Anitharaj, S., Mohan, D.: Preparation, morphology, performance,
and hydrophilicity studies of poly(amide-imide) incorporated cellulose acetate ultrafiltration
membranes. Ind. Eng. Chem. Res. 50(9), 5550–5564 (2011)
19. Zirehpour, A., Rahimpour, A., Seyedpour, F., Jahanshahi, M.: Developing new CTA/CA-
based membrane containing hydrophilic nanoparticles to enhance the forward osmosis
desalination. Desalination 371, 46–57 (2015)
20. Kanagaraj, P., Nagendran, A., Rana, D., Matsuura, T.: Separation of macromolecular
proteins and removal of humic acid by cellulose acetate modified UF membranes. Int.
J. Biol. Macromol. 89, 81–88 (2016)
21. Hossein Razzaghi, M., Safekordi, A., Tavakolmoghadam, M., Rekabdar, F., Hemmati, M.:
Morphological and separation performance study of PVDF/CA blend membranes. J. Mem-
brane Sci. 470, 547–557 (2014)
22. Kong, L., et al.: Superior effect of TEMPO-oxidized cellulose nanofibrils (TOCNs) on the
performance of cellulose triacetate (CTA) ultrafiltration membrane. Desalination 332(1),
117–125 (2014)
23. Roy, A., De, S.: Extraction of steviol glycosides using novel cellulose acetate pthalate
(CAP) – Polyacrylonitrile blend membranes. J. Food Eng. 126, 7–16 (2014)
24. Banerjee, S., De, S.: An analytical solution of Sherwood number in a stirred continuous cell
during steady state ultrafiltration. J. Membr. Sci. 389, 188–196 (2012)
25. Ichwan, M., Son, T.: Preparation and characterization of dense cellulose film for membrane
application. J. Appl. Polym. Sci. 124(2), 1409–1418 (2011)
26. Mohamed, M., Salleh, W., Jaafar, J., Ismail, A., Abd. Mutalib, M., Jamil, S.: Feasibility of
recycled newspaper as cellulose source for regenerated cellulose membrane fabrication.
J. Appl. Polymer Sci. 132(43) (2015)
27. Mahdavi, H., Shahalizade, T.: Preparation, characterization and performance study of
cellulose acetate membranes modified by aliphatic hyperbranched polyester. J. Membr. Sci.
473, 256–266 (2015)
28. Extrand, C.: Uncertainty in contact angle measurements from the tangent method. J. Adhes.
Sci. Technol. 30, 1597–1601 (2016)
29. Amot, T.C., Field, R.W., Koltuniewicz, A.B.: Cross flow and dead end microfiltration of
oily-water emulsions Part II. Mechanisms and modelling of flux decline. J. Memb. Sci. 169,
1–15 (2000)
Feature Engineering Based Credit Card
Fraud Detection for Risk Minimization
in E-Commerce
Md. Moinul Islam1 , Rony Chowdhury Ripan1 , Saralya Roy1 ,

and Fazle Rahat2(B)
1
Chittagong University of Engineering and Technology, Chittagong, Bangladesh
2
Bangladesh University of Business and Technology, Dhaka, Bangladesh
fazlerahat@bubt.edu.bd
Abstract. In today’s financial business, financial fraud is a rising con-

cern with far-reaching repercussions, and data mining has a crucial role in
identifying fraudulent transactions. However, fraud detection in a credit
card can be challenging because of significant reasons, such as normal
and fraudulent behaviours of the profiles change frequently, scarcity of
fraudulent data, dataset being highly imbalanced, and so on. Besides, the
efficiency of fraud identification in online transactions is greatly impacted
by the dataset sampling method and feature selection. Our study investi-
gates the performance of five popular machine learning approaches such
as Logistic Regression (LR), Random Forest (RF), Support Vector Clas-
sifier (SVC), Gradient Boosting (GBC), and K-Nearest Neighbors (KNN)
in terms of feature selection. Feature selection is done by Sequential
Forward Selection in addition to extending the models’ performance by
handing imbalanced data using Random Undersampling and feature scal-
ing using PCA transformation & RobustScalar for both numerical and
categorical data. Finally, the performance of different machine learn-
ing techniques is assessed based on accuracy, precision, recall, and F1-
measure on a benchmark credit card dataset.
Keywords: Credit card fraud detection · Cyber security · Sequential

feature selection · Machine learning · Comparative analysis
1 Introduction
In this modern era of technology, credit and debit card usage have increased sig-
nificantly in the last years, with the increasing amount of fraud. Financial fraud
in E-Commerce is a pervasive issue with extensive outcomes for the financial
industry, and data mining has been crucial in identifying fraudulent credit card
transactions. Because of the increase of fraud transactions, each year, many
banks lose billions of money. In conformity with the European Central Bank,
overall fraud in the Single Euro Payments Region touched 1.33 billion euros in
2012, increasing 14.8% over 2011. Additionally, payments made through non-
traditional channels such as mobile, internet, and others account for 60% of
https://doi.org/10.1007/978-3-030-93247-3_22
218 M. M. Islam et al.
fraud; in 2008, this figure was 46% [5]. As new fraud patterns arise, the detec-
tion system faces new obstacles daily. There has been a significant amount of
studies conducted in the field of card fraud detection. Data mining is a common
approach for detecting credit theft since it can address a lot of difficulties. Identi-
fication of credit card fraud involves classifying transactions into two categories:
legitimate (valid) transactions and fraudulent ones. Credit card fraud detection
relies on tracking a customer’s spending habits. A variety of approaches have
been utilized to tackle these challenges, including AI, genetic algorithms, SVMs,
decision trees, and naive bayes [1–3,6,8,16].
Many researchers were conducting credit card fraud detection studies utiliz-
ing various machine learning methods to derive information from available credit
card fraud data. For instance, Lucas et al. [12] suggested a framework that models
a series of credit card transactions using HMM from three distinct binary view-
points, resulting in eight alternative sets of series from the (training) set of trans-
actions. The Hidden Markov Model (HMM) then models each of these sequences
by assigning a probability to each transaction based on its preceding transaction
sequence. Finally, these probabilities are employed as extra features in a fraud
detection Random Forest classifier. Operating an optimized light gradient boost-
ing machine, Taha et al. [17] proposed an approach for fraud detection in online
transactions (OLightGBM). Furthermore, the light gradient boosting machine’s
(LightGBM) parameters are tuned using a Bayesian-based hyperparameter opti-
mization approach. Using the Artificial Neural Network(ANN) technique and
Backpropagation, Dubey et al. [7] developed a model. The customer’s credit card
data is collected initially in this model, which includes numerous attributes such
as name, time, last transaction, transaction history, etc. The data is then sepa-
rated into two categories: train data (80%) and test data (20%), which are used
to forecast if the transactions are normal or fraudulent. To incorporate transac-
tion sequences, Jurgovsky et al. [10] formulated the fraud detection problem as a
sequential classification problem, using Long Short-Term Memory (LSTM) net-
works. In a further collation, they discovered that the LSTM outperforms a base-
line random forest (RF) classifier in terms of accuracy rate for transactions where
the consumer is present physically at the merchant location. To train the behav-
ior aspects of normal and aberrant transactions, Xuan et al. [21] employed two
kinds of random forests. The performance of credit fraud detection is then com-
pared between the two random forests. In [15], Fraud-BNC based on Bayesian Net-
work Classifier (BNC) algorithm was introduced for solving problems in detect-
ing credit card fraud. Here, this algorithm automatically generates Fraud-BNC
and organizes the information of the algorithms into a classification scheme, then
searches for combinations of these components that are most potent for locating
credit card fraud in a credit card fraud detection dataset using the Hyper-Heuristic
Evolutionary Algorithm (HHEA). Yee et al. [22] demonstrated a comparison of
several supervised-based classification methods, such as Tree Augmented Naive
Bayes (TAN), Naive Bayes, K2, logistics, and J48 classifiers, in terms of credit card
fraud detection in a laboratory setting. In addition, they demonstrated how data
Feature Engineering Based Credit Card Fraud Detection 219
preparation techniques such as standardization and PCA might aid the classifiers
in achieving higher levels of accuracy.
2 Methodology
Our research study focuses on analyzing credit card fraud data based on
Sequential Forward Selection, a feature selection algorithm using several popu-
lar machine learning classification techniques to evaluate their performance for
detecting credit card fraud data. The overall methodology of our study is illus-
trated in two steps, (a) data preprocessing module and (b) feature selection
module as shown in Fig. 1.
Fig. 1. Architectural overview of our model
2.1 Dataset
There are numerous bank-related characteristics in credit card datasets that are
utilized to create a fraud detection framework from the credit card data. Our
dataset comprises 284807 instances, each having 31 features, where categorical
feature indicates the amount of money spent during the transaction, V 1 − V 28
features are obtained from P CA transformation, and the “class” feature indi-
cates the binary representation of the target. In this study, for evaluation pur-
poses, we use a benchmark credit card fraud detection dataset from Kaggle [19].
2.2 Data Preprocessing Module
Out of over 200K transactions in our dataset, 492 were flagged as fraudulent.
Fraud rates are 0.172% for all transactions, according to the dataset’s real trans-
action class. There are numerical results from a PCA transformation in the data
due to issues with realism. Users’ transaction confidentially would be violated
if the major components, as well as additional details on the data, were made
public. For our prediction models and analysis, this data frame might include
errors, and our classifiers might be overfitted since it expects a large majority of
the transactions are legitimate.
(1) Handling of Imbalanced Data: There are two main approaches to ran-
dom re-sampling for imbalanced classification, OverSampling, and UnderSam-
pling. In this study, Random UnderSampling is used for imbalanced class dis-
tribution, which essentially consists of deleting data to provide a more balanced
dataset and therefore preventing the overfitting of our models. Under-sampling
techniques delete instances from the training dataset that belong to the major-
ity class in order to better balance the class distribution, such as reducing the
imbalance from a 30:1000 to a 30:30, 30:31, etc., distribution.
(2) Feature Scaling: In addition, we have V1-V28 features, which are already
scaled due to PCA, and from them, two features (time and amount) are not
scaled. These two features need to be scaled. Fraud dataset values are given in
different ranges, varying from feature to feature. For instance, Fig. 2a and Fig. 2b
show the data distributions of two different features, “time” and “amount”,
respectively.
Fig. 2. Data distribution of “Time” (a) and “Amount” (b) features
The value is shallow for some data instances, while it is much higher for some
data instances, as seen in Figs. 1 and 2. Therefore the process of data scaling is
used to normalize the spectrum of feature values. In order to do this, we used a
Robust Scaler that normalizes “time” and “amount” features by removing the
median and scales the data according to the Interquartile Range (IQR = 75th
percentile of the data - 25th percentile of the data). Besides, Robust Scaler is
less prone to outliers.
2.3 Feature Selection

The sequential forward selection (SFS) method is used in this bottom-up app-
roach begins with a null set of output sequence and gradually packs it with
the characteristics selected by an evaluation metric, as the search proceeds [13].
Every iteration, one feature is removed from the feature list, and a new one is
added from the features that haven’t been included yet. As a consequence, com-
pared to other new features, the enlarged feature set should generate the least
amount of erroneous classification. It’s commonly used because of how quickly
and easily it works.
2.4 Applying Machine Learning Techniques

Using a variety of machine learning classification algorithms, for instance, Logis-
tic Regression (LR), Random Forest (RF), SVM, Gradient Boosting Classifier
(GBC), and K-Nearest Neighbor (KNN) on the credit card fraud detection
dataset, we can accurately predict whether or not a transaction is fraudulent.
Classification algorithms such as Logistic Regression [18] utilize observations to
classify them according to a probability distribution. A more complicated cost
function is used in Logistic Regression in place of a linear function, which is
defined as “Sigmoid Function” or the “logistic function.” We use the Sigmoid
Function (σ) that tends to limit the output of the Logistic Regression Model
between 0 and 1. We apply Random Forest classifier [9] that is a supervised
algorithm that’s based on the decisions using tree models. Using a bootstrap
sampling method, at first, it generates K from our dataset to specific training
data subsets. After that, by training these subsets, it creates decision trees of K.
The categorization of each item in the test dataset is predicted by all of the deci-
sion trees based on their votes. We then apply Support Vector Classifier [20] that
addresses issues that have not been addressed without any intermediate issues.
At first, each data point in n-dimensional space is plotted by the SVM classifier,
where n is the number of features of a dataset to figure out the hyper-plane,
which differentiates very well between the two classes and performs binary clas-
sification with “GridSearchCV” [14] to perform hyperparameter tuning to get
the optimum values for SVM and also ’RBF kernel was used as learning param-
eter. Gradient Boosting Classifier [11] depends on a loss function. A gradient
boosting model’s additive aspect comes from the fact that trees are added to
the model over time, and as this occurs, the current trees are not changed; their
values remain constant. The KNN algorithm is [4] based on the similar items
that occur near each other. This algorithm differentiates between the current
data point. Next, it stores each data point’s index and distance in sorted order,
pointing to a list. Then, the sorted array is parsed, and all instances are classified
as belonging to the mode of K.
3 Performance Evaluation
To demonstrate the experimental outcomes, we utilize assessment metrics for
example Precision, Accuracy, Recall and F1-Score. A false positive (F P ) in terms
of detecting credit card fraud means that a non-fraudulent (actually negative)

transaction has been categorized as fraud. It’s important to remember that recall
is specified as the proportion of genuine positive elements to the entire number of
positive elements, i.e., the sum of the true positive (T P ) and the false negatives
(F N ). To calculate accuracy, divide the total number of occurrences by the
proportion of true positives (T P ) and true negatives (T N ). F 1-score is calculated
by averaging accuracy and recall over a certain time period.
T rue P ositive
P recision = (1)
T rue P ositive + F alse P ositive
T rue P ositive
Recall = (2)
T rue P ositive + F alse N egative
TP + TN
Accuracy = (3)
TP + TN + FP + FN
P recision × Recall
F 1 − Score = 2 × (4)
P recision + Recall
Handling of Imbalance Data: Like we mentioned in Sect. 2, the dataset is

very asymmetrical, with the real transaction class showing that fraud occurs at
a rate of 0.172% of all transactions. Another way we can see it as barplot rep-
resentation of class values as shown in Fig. 3a, where “normal” transactions are
much greater than the number of “fraud.” Using the “Random Under-Sampling”
method, the imbalance problem is solved in this study which is illustrated in
Fig. 3b.
Fig. 3. Bar plot representation of “class” feature before (a) and after (b) using random
under-sampling
Feature Scaling: After handling imbalanced data in our dataset, features were
scaled using PCA transformation & RobustScaler method. After scaling, “Time”
in Fig. 4a and “Amount” in Fig. 4 (b) features have shown much better distri-
bution compared to before as shown in Fig. 2.
Fig. 4. (a) Data distribution of “Time” (a) and “Amount” (b) features after Scaling
Comparison Results: After that, we apply several popular machine learn-

ing algorithms, Logistic Regression (LR), Random Forest (RF), Support Vector
Machine (SVC), Gradient Boosting (GBC), and K-Nearest Neighbors (KNN).
Before applying each classification algorithm, feature selection is done using
Selection Forward Selection (SFS) algorithm. In this study, SFS returns the
top 15 features from all the features. To assess the action of various classifica-
tion techniques before and after SFS, we have shown the performance metrics
in Table 1 and Table 2 on the basis of precision, accuracy, recall, and F1-score
respectively.
Table 1. Without feature selection Table 2. With feature selection
Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score

(%) (%) (%) (%) (%) (%) (%) (%)
KNN 91.88 93 92 92 KNN 95.43 96 95 95
RF 92.89 94 93 93 RF 95.94 96 96 96
SVC 92.39 93 92 92 SVC 95.43 96 95 95
LR 90.86 91 91 91 LR 91.37 92 91 91
GBC 93.91 94 94 94 GBC 95.43 96 95 95
Table 1 shows that classifier Gradient Boosting Classifier(GBC) without

using SFS achieves better results than other classifiers to predict credit card
fraud data. Gradient Boosting Classifier (GBC) achieves the highest accuracy
(93.91%) and 94% in precision, recall, and F1-score. After using the SFS algo-
rithm, the Random Forest classifier achieves a better result (95.94% accuracy)
than other classifiers to predict credit card fraud on our dataset as shown in
Table 2. It is observed that all classification algorithms except Logistic Regres-
sion (LR) have shown much improvement in accuracy after using SFS.
Fig. 5. Comparison of Classification Accuracy before and after Feature Selection
Besides, a bar chart of comparison of Accuracy before and after using the
“SFS” method is shown in Fig. 5.
In addition, the Receiver Operating Characteristic (ROC) curves of the clas-
sifiers with and without Sequential Forward Selection (SFS) are represented
in Fig. 6. From Fig. 6a, it is noted that RF, SVM, and GBC have the highest
AUROC score of 0.98 before using the SFS method. From Fig. 6b, it is observed
that only RF has the highest AUROC score of 0.99 after using feature selection.
Fig. 6. ROC curve of classifiers before (a) and after (b) using SFS
On the fraud detection benchmark dataset, we investigated several machine

learning approaches in terms of feature selection utilizing Selection Forward
Selection (SFS). To successfully detect fraud or legitimate transactions, we used a
variety of prominent machine learning techniques on our dataset, including Logis-

tic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gra-
dient Boosting Classifier (GBC), and K-Nearest Neighbor (KNN). Our experi-
mental results indicate that while Gradient Boosting Classifier (GBC) obtains
higher accuracy without feature selection, Random Forest achieves higher accu-
racy after feature selection. Additionally, it is noted that all classification algo-
rithms except Logistic Regression (LR) improved significantly following feature
selection. This evaluation may assist financial companies in preventing fraudu-
lent transactions early on by applying this model and making more informed
judgments on how to manage fraudulent transactions, therefore saving people
money. We hope that our experimental investigation will aid in the development
of a control strategy for preventing future fraud transactions.
References
1. Vasant, P., Zelinka, I., Weber, G.-W. (eds.): ICO 2018. AISC, vol. 866. Springer,
Cham (2019). https://doi.org/10.1007/978-3-030-00979-3
4. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric
regression. Am. Stat. 46(3), 175–185 (1992)
5. Bahnsen, A.C., Aouada, D., Stojanovic, A., Ottersten, B.: Feature engineering
strategies for credit card fraud detection. Expert Syst. Appl. 51, 134–142 (2016)
6. Bahnsen, A.C., Stojanovic, A., Aouada, D., Ottersten, B.: Improving credit card
fraud detection with calibrated probabilities. In: Proceedings of the 2014 SIAM
International Conference on Data Mining, pp. 677–685. SIAM (2014)
7. Dubey, S.C., Mundhe, K.S., Kadam, A.A.: Credit card fraud detection using artifi-
cial neural network and backpropagation. In: 2020 4th International Conference on
Intelligent Computing and Control Systems (ICICCS), pp. 268–273. IEEE (2020)
8. Gaikwad, J.R., Deshmane, A.B., Somavanshi, H.V., Patil, S.V., Badgujar, R.A.:
Credit card fraud detection using decision tree induction algorithm. Int. J. Innov.
Tech. Explori. Eng. (IJITEE) 4(6), 66 (2014)
9. Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference
on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)
10. Jurgovsky, J., et al.: Sequence classification for credit-card fraud detection. Expert
Syst. Appl. 100, 234–245 (2018)
11. Li, C.: A gentle introduction to gradient boosting (2016). http://www.ccs.neu.edu/
home/vip/teach/MLcourse/4 boosting/slides/gradient boosting.pdf
12. Lucas, Y., et al.: Towards automated feature engineering for credit card fraud
detection using multi-perspective HMMS. Future Gener. Comput. Syst. 102, 393–
402 (2020)
13. Marcano-Cedeño, A., Quintanilla-Domı́nguez, J., Cortina-Januchs, M., Andina,
D.: Feature selection using sequential forward selection and classification applying
artificial metaplasticity neural network. In: IECON 2010–36th Annual Conference
on IEEE Industrial Electronics Society, pp. 2845–2850. IEEE (2010)
14. Paper, D.: Scikit-Learn Classifier Tuning from Complex Training Sets, pp. 165–188.
Apress, Berkeley (2020)
15. de Sá, A.G., Pereira, A.C., Pappa, G.L.: A customized classification algorithm for
credit card fraud detection. Eng. Appl. Artif. Intell. 72, 21–29 (2018)
16. Singh, G., Gupta, R., Rastogi, A., Chandel, M.D., Ahmad, R.: A machine learning
approach for detection of fraud based on SVM. Int. J. Sci. Eng. Technol. 1(3),
192–196 (2012)
17. Taha, A.A., Malebary, S.J.: An intelligent approach to credit card fraud detection
using an optimized light gradient boosting machine. IEEE Access 8, 25579–25587
(2020)
18. Tolles, J., Meurer, W.J.: Logistic regression: relating patient characteristics to out-
comes. JAMA 316(5), 533–534 (2016)
19. ULB, M.L.G.: Credit card fraud detection (2018). https://www.kaggle.com/mlg-
ulb/creditcardfraud
20. Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw.
10(5), 988–999 (1999)
21. Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., Jiang, C.: Random forest for credit
card fraud detection. In: 2018 IEEE 15th International Conference on Networking,
Sensing and Control (ICNSC), pp. 1–6. IEEE (2018)
22. Yee, O.S., Sagadevan, S., Malim, N.H.A.H.: Credit card fraud detection using
machine learning as data mining technique. J. Telecommun. Electron. Comput.
Eng. (JTEC) 10(1–4), 23–27 (2018)
DCNN-LSTM Based Audio Classification
Combining Multiple Feature Engineering
and Data Augmentation Techniques
Md. Moinul Islam1 , Monjurul Haque2 , Saiful Islam3 ,

Md. Zesun Ahmed Mia4,5(B) , and S. M. A. Mohaiminur Rahman1
1
2
Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
3
Ahsanullah University of Science and Technology, Dhaka, Bangladesh
4
Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh
5
University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh
Abstract. Everything we know is based on our brain’s ability to process

sensory data. Hearing is a crucial sense for our ability to learn. Sound is
essential for a wide range of activities such as exchanging information,
interacting with others, and so on. To convert the sound electrically, the
role of the audio signal comes into play. Because of the countless essen-
tial applications, audio signal & their classification poses an important
value. However, in this day and age, classifying audio signals remains a
difficult task. To classify audio signals more accurately and effectively,
we have proposed a new model. In this study, we’ve applied a brand-
new method for audio classification that combines the strengths of Deep
Convolutional Neural Network (DCNN) and Long-Short Term Memory
(LSTM) models with a unique combination of feature engineering to get
the best possible outcome. Here, we have integrated data augmentation
and feature extraction together before fitting it into the model to evalu-
ate the performance. There is a higher degree of accuracy observed after
the experiment. To validate the efficacy of our model, a comparative
analysis has been made with the latest conducted reference works.
Keywords: DCNN-LSTM · Spectrograms · Short Time Fourier

Transform · Data augmentation · Spectral feature extraction · MFCC ·
Melspectrogram · Chroma STFT · Tonnetz
1 Introduction
Digital and analog audio signals both use a varying amount of electrical voltage
to delineate sound. Our daily lives depend heavily on audio signals of various
origins. No one would be able to hear anything without it. Audio signals are
now required not just by humans, but also by man-made machines. Human-
like sound comprehension has several uses, involving intelligent machine control
and monitoring, acoustic information use, acoustic surveillance, and categoriza-
tion and information extraction applications such as exploring audio archives
https://doi.org/10.1007/978-3-030-93247-3_23
and audio-assisted multimedia assets [9]. For many years, categorizing audio or
sound has been an important area of research. Intending to achieve this classi-
fication, multiple models and features have been tried and experimented with
over the years, all of which have proved to be helpful and accurate in the pro-
cess of classifying and separating audio and sound. Many possible applications
exist in the area of sound detection and classification, including matrix factor-
ization, the categorization of music genres, wavelet filterbanks, automated music
tagging, dictionary learning, bird song classifications, IoT embedded automated
audio categorization, and emotion recognition [1–3,6,8,12]. Since deep learning
was introduced, it has boosted research in various fields and swiftly superseded
traditional machine learning algorithms by exhibiting superior performance on
numerous tasks. With or without Artificial Intelligence, there are countless pos-
sible approaches for developing audio recognition and classification models that
use various audio feature extraction procedures. The detection and categoriza-
tion of ambient sound is a fascinating subject with several applications, ranging
from crime detection to environmental context-aware analysis. For audio clas-
sification, prominent classifier models include those that use sensible artificial
intelligence or linear predictive coding, as well as those Deep Neural Networks,
Decision Tree Classifier, and Random Forest.
A few contributions have been made to the field of audio categorization. In
recent research studies, convolutional neural networks were shown to be very
efficient in classifying brief audio samples of ambient noises. The authors in
[11] used the publicly accessible ESC-10, ESC-50, and Urbansound8K data sets
and enhanced them by adding arbitrary temporal delays to the original tracks
and conducted class-dependent time stretching and pitch shifting on the ESC-10
training set, as well as extracted Log-scaled mel-spectrograms from all record-
ings, to develop a model composed of two convolutional ReLU layers with max-
pooling, two fully connected layers of each ReLU, and a softmax output layer
trained on a low-level audio data representation. The authors used 300 epochs
for the short segment version and 150 epochs for the long segment variant and
tested the model using fivefold cross-validation (ESC-10 and ESC-50) and ten-
fold cross-validation (UrbanSound8K) with a single training fold to show that
CNN outperformed solutions based on manually-engineered features. Palanisamy
et al. [10] showed that standard deep CNN models trained on ImageNet might
be used as strong foundation networks for audio categorization. They claimed
that just by fine-tuning basic pre-trained ImageNet models with such a sin-
gle set of input character traits for audio tasks, they could achieve cutting-edge
results on the UrbanSound8K and ESC-50 datasets, as well as good performance
on the GTZAN datasets, and to define spectrograms using qualitative visuals,
CNN models might learn the bounds of the energy distributions in the spec-
trograms. Abdoli et al. [4] presented a method for classifying ambient sound
that uses a 1D Convolutional Neural Network (CNN) to attempt to acquire a
representation straight from the audio input in order to capture the signal’s pre-
cise temporal characteristic. The performance of their proposed end-to-end app-
roach for detecting ambient noises was found to be 89% accurate. The suggested
DCNN-LSTM Based Audio Classification Combining 229
end-to-end 1D design for ambient sound categorization employs lesser parame-

ters than the bulk of previous CNN architectures while reaching a mean accuracy
of 11.24% to 27.14% greater than equivalent 2D designs.
In our research, we’ve introduced an entirely new technique to audio classifi-
cation strategy by integrating two separate models: deep CNN and LSTM. Before
we train the data in our newly proposed model, we used a unique combination of
feature engineering methods to discover the best results. There are three phases
to sound classification: audio signal preprocessing, spectral feature extraction,
and classification of the corresponding audio signal. The Urbansound8K dataset
has been utilized for audio categorization by our team. There are 8732 audio
slices in total that have been tagged in this dataset. There are ten groups in
the audio file which entails air conditioning, car horns, children’s laughter, dog
barks, engine idle, gunshots, jackhammers, sirens, and street music and all of
them are examples of ambient noise. Data augmentation is first used to improve
the model’s training results so that it can yield good results. Three data augmen-
tation methods were investigated: time-stretching, noise introduction, and pitch
shifting. To convert audio data to numerical values, we used the NumPy array
in python. The audio was then transformed using spectral features via Fourier
Transform from the time domain to the frequency domain. In addition to Zero
Crossing Rate, Chroma STFT (Short-Time Fourier Transform), MFCC (Mel-
frequency Cepstral Coefficient), Mel spectrogram, RMS, and Tonnetz, we have
also computed a number of feature extraction approaches like these. Spectral
feature extraction approaches are being combined to create a new model. After
that, the 34928 numerical data with a total field of 5867904 have been integrated
using data augmentation and spectral feature extraction before training the data
into the model. We trained with 80% of the data, tested with 10%, and validated
with 10%. Finally, we’ve trained the data with our recommended model, a hybrid
of deep CNN and LSTM. There are three layers in a deep CNN. We have used
Adam optimizer for improved optimization. Hyperparameter tuning uses batch
normalization, maximum pooling, and dropout all at once. ReLU and Softmax
were used to fit the model, and Softmax was also employed for the output layer.
The LSTM model’s input layer receives data from the output layer. The LSTM
model makes use of two levels. As with deep CNN, we used Adam optimizer and
activation functions like ReLU and Softmax to fit the model better and improve
tuning. However, in this case, dropout was used for hyperparameter tuning. After
that, the accuracy of audio classification was significantly enhanced. Finally, our
novel model has been compared to the models from other recent reference works
in order to highlight its worth.
2 Methodology
The overall methodology of our suggested audio classification model is described

in this section. We have used a benchmark dataset UrbanSound8K [13] for vali-
dating our model. This dataset contains 8732 brief audio samples (with a dura-
tion of 4 s or less) taken from a variety of urban recordings, including air condi-
tioners, vehicle horns, kids playing, barking dogs, drilling, engine revving, gun-
shots, jackhammers, sirens, and street music, among other things. This dataset
is divided into ten (ten) classes, as stated above. It was found that vehicle horns,
gunshots, and siren noises were not uniformly dispersed throughout the class-
rooms.
2.1 Data Augmentation
Data augmentation is a simple technique for generating synthetic data with

variations from the current or existing samples to offer the model with larger
data samples with more variety, allowing the model to prevent overfitting and
be more generalized. There are several augmentation methods in audio, such as
Noise Injection, Time Shifting, Pitch Shifting, Changing Speed, Time Stretching,
and others.
This research has adopted three data augmentation techniques: Background
Noise Injection, Pitch Shifting, and Time Stretching. In Noise Injection, the
sample data was merged with a separate recording that includes external noise
from a variety of acoustics. Each data was generated by,
m = xi (1 − w) + wyi (1)
where xi is the original audio sample of the dataset, yi is the background noise
that is injected, and w is the weighted parameter chosen randomly for each
merge within a range of 0.001 to 0.009. During Pitch Shifting, the pitch of the
audio samples is either increasingly or decreasingly shifted based on a particular
value. Each data was pitch-shifted by [−2, −1, 1, 2]. Time stretching is an audio
processing technique that lengthens or shortens the duration of a sample without
altering its pitch. The augmentation techniques were applied using the Librosa
library. Figure 1a, 1b and 1c illustrate the data augmentation techniques applied
in the dataset.
2.2 Spectral Feature Extraction
When using feature extraction, the acoustic signal is transformed into a series
of acoustic feature vectors that accurately describe the input audio sound. The
goal is to condense the several massive amounts of data in each file into a consid-
erably smaller collection of characteristics with a known number. We have used
spectral characteristics to solve our classification problem, which involves uti-
lizing the Short-Time Fourier Transformation to transform the enhanced audio
samples from time domain to frequency domain displayed in Fig. 2a There are
numerous spectral features. Among them, we have employed six: Zero Crossing
Rate, Chroma STFT, MFCC, Mel spectrogram, Tonnetz, and computing RMS
value for each frame. Figure 2b, 2c and 2d represent the plotting of spectrogram
for each feature extraction technique.
(a) After Background Noise Injection (b) After Time Stretching
(c) After Pitch Shifting
Fig. 1. Data augmentation illustration
(a) Audio Conversion to Frequency Do-

main (b) Mel-Scaled Power Spectrogram
(c) Mel Frequency Cepstral Coeffi- (d) Chroma Short Time Fourier Trans-
cients form
Fig. 2. Audio conversion & spectral feature extraction
Zero-Crossing Rate indicates that how many times the signal shifts from pos-
itive to negative and vice-versa, and that will be divided by the frame duration
[7], where sgn is the sign function.
wL
1
Zi = |sgn[xi (n)] − sgn[xi (n − 1)]| (2)
2wL n=1
The Chroma rate of an audio signal depicts the strength of each of the audio
signal’s twelve distinct pitch classes. They can be used to distinguish between
the pitch class profiles of audio streams. Chroma STFT contains information
regarding pitch and signal structure categorization and uses short-term Fourier
transform to generate Chroma properties. MFCC stands for Mel Frequency Cep-
stral Coefficients are concise representations of the spectrum. By transforming
the conventional frequency to Mel Scale, MFCC takes into consideration human
perception for sensitivity at correct frequencies. Mel spectrogram is a combi-
nation of Mel scale and spectrogram, whereas Mel scale denotes the frequency
scale’s nonlinear transformation. The y-axis indicates Mel scale, while the x-axis
depicts time. Tonnetz detects harmonic shifts in audio recordings to calculate
tonal centroid features. It is an infinite planar representation of pitch relation-
ships in an audio sample.
For feature scaling purposes in our proposed method, we have utilized two
standard techniques, ‘One Hot Encoding’ and ‘Standard Scaler’. One hot encod-
ing replaces the label encoded categorical data with numbers. It is a standardiza-
tion technique to scale the independent features to bring them in the same fixed
range. The Standard Normal Distribution (SND) is followed by StandardScaler.
That’s why the mean is set to 0, and the data is scaled to unit variance.
2.3 Deep CNN-LSTM Model Architecture
In the DCNN-LSTM design, CNN layers for feature extraction on input data
are integrated with LSTMs to provide sequence prediction, resulting in a highly
efficient feature extraction system. We combined CNN and LSTM models, both
of which use spectrograms as their input. In order to generate a DCNN-LSTM
model, Deep CNN layers on the front end were combined with LSTM layers and
a Dense layer on the output. In this architecture, two sub-models are used for
feature extraction and feature interpretation across a large number of iterations:
the Deep CNN Model for extracting features and the LSTM Model for feature
interpretation (Fig. 3).
We presented a model that consists of three layers of 2D convolutional net-
works, and two layers of MaxPooling2D arranged into a stack of the desired
depth. These layers assess the spectral properties of the spectrograms, while the
pooling layers help solidify the interpretation. The Conv2D layers interpret the
spectrum characteristics of the spectrograms, and the pooling layers consolidate
the interpretation. The first Conv2D layer that processes the input shape uses
64 filters, a kernel size of 5, and a stride of 1 before applying a MaxPooling layer
to decrease the size of the input shape. Our framework utilizes a 5x5 filtered
matrix as the argument defines the kernel’s window’s size. Due to stride being
set to 1 in the first layer, the filter moves one unit to converge around the input
volume. Using the ‘same padding’ technique, this convolutional layer yields the
Fig. 3. Overall proposed methodology
same height and weight as the original. We chose ReLU as the activation func-
tion for this layer instead of sigmoid units because of its many advantages over
more conventional units, including efficient gradient propagation and faster cal-
culation than sigmoid units. Also, in order to minimize overfitting, there is a
dropout of 0.3 in the layers that uses the same padding method and activation
function (ReLU) as the other two Conv2D layers.
To stack the LSTM layers, we first created two LSTM layers with a total
hidden unit count of 128 for each layer, and then we set the return sequence to
true in order to stack the layers. To avoid overfitting, the output of both LSTM
layers requires a 3D array followed by Time Distributed Dense layers as input
with a dropout of 0.2 to be used. As a result, it was determined that ReLU would
be used as the activation function in both layers with input sizes of 64 and 128 for
the first layer since its input shape is (21,8), which indicates 20 iterations and will
inform LSTM how many numbers of instances it should go through the process
once the input has been applied. Afterward, the outcome from the time dispersed
dense layer is utilized as the input in the flatten layer, and the process repeats
until the desired result is achieved. When we were finished with the flattening
process, we were left with a vector of input data, which we then passed through
the dense layer. We were able to transform the information provided to a discrete
probability distribution and use that distribution as an input in the dense layer
by utilizing the Softmax activation function in the dense layer of the network.
We have utilized the Adam, an optimization technique, which measures the
rate of development at which a parameter adapts to changes in its environment.
The Adam optimizer outperforms the previous ones in terms of performance
and provides a gradient descent that is tuned. For individual parameters, the
adaptive learning rates are used to estimate the appropriate level of learning. In
many circumstances, it has been shown that Adam favors error surfaces with flat
minima, which is a good optimizer. The parameters β1 and β2 only specify the
periods over which the learning rates degrade, and not necessarily the learning
rate itself. The acquisition rates will be all over the place if they degrade quickly.
It will take a long time to learn the learning rates if they degrade slowly. The
learning rates are automatically calculated based on a moving estimate of the
parameter gradient, and the parameter squared gradient in all circumstances.
3 Results
Our research strategy involved identifying attributes that are both effective and
accurate for the DCNN-LSTM model. In this section, we assess our model in
terms of experiments conducted. We also evaluate the effectiveness of our pro-
posed ensemble method, pre-trained weights, and finally compare to some of the
previous state-of-the-art models.
(a) Loss vs no. of epochs (b) Accuracy vs no. of epochs
Fig. 4. Validation loss & accuracy of our proposed model
In the data preprocessing module, we stacked three data augmentation tech-

niques, background noise injection, time-stretching, and pitch shifting, to reduce
overfitting & evaluate the performance of our model. To extract spectral features
from the spectrograms, MFCCs, Mel Spectrogram, Chroma STFT, Tonnetz were
stacked with one another in addition to computing zero-crossing rate (ZRC) and
Root Mean Square (RMS) value for each frame of the audio data and obtained 169
features in total to work with. Stacking the techniques was effective in enhancing
our model’s performance considerably. We then fed the data into our proposed
DCNN-LSTM model illustrated in Sect. 2.3, evaluated performance metrics, and
validated the model for the dataset. Stacking those techniques helped us to reach
a better validation accuracy of 93.19% with an epoch size of 26 and used Stratified
10-fold cross-validation to ensure the robustness of the result in terms of model-
ing CNN (layer 3 Conv2D, epochs of 50) with 86.1% and LSTM (layer 2, epochs
of 200) with 87.75% respectively for the training process. Figure 4 illustrates the
validation loss and accuracy of our proposed DCNN-LSTM model in the y-axis
and the number of epochs in the x-axis as we can see that with the increase in the
number of epochs, the validation error of our model decreases exponentially for
both the training and testing data. The epoch count was set at 50. Still, the error
stopped improving after 26 epochs and returned the accuracy and loss result due
to the early Callback function without further increasing the computational time
for the model. Table 1 shows the comparison accuracy of our proposed model with
the previous state-of-the-art models.
Table 1. Proposed model vs previous state-of-the-art models
Model Dataset Accuracy (%)

logmel-CNN [16] ESC-50 78.3
DCNN + Mix-up [17] UrbanSound8K 83.7
DenseNet (Pretrained Ensemble) [10] UrbanSound8K 87.42
Conv1D + Gammatone [4] UrbanSound8K 89
DCNN with Multiple Features + Mix-up [14] ESC-50 88.5
GoogleNet [5] UrbanSound8K 93
TSCNN-DS [15] UrbanSound8K 97.2
Proposed DCNN-LSTM + UrbanSound8K 93.19
Stacked Features & Augmentation
4 Conclusion
This paper proposes an approach to urban sound classification, which comprises
a deep neural network of two different neural network models, CNN and LSTM.
Also, in combination with two separate stacks of various multiple data augmen-
tation and feature extraction techniques. UrbanSound8K has been used to train
and test our models, one of the finest datasets of this domain. With the afore-
mentioned feature engineering, training, validating, and testing the model on
this dataset assists us to acquire a decent result of 93.19% accuracy, which is
pretty much close to state-of-the-art result and better than other previous works.
Though we have emphasized data augmentation on a single dataset, the
comparison would be more relevant if we could also work with other prominent
datasets. Our model’s such accuracy comes without any usage of pre-trained
models and transfer learning. So, there remains a scope of future work of using
these two, possibly improving our existing accuracy. Moreover, a simple stack
of DCNN-LSTM has been effectively used for urban sound classification and
has achieved a high score, and it is a matter of future research that whether
various combinations of more sophisticated models of recurrent neural networks
or convolutional neural networks can bring much better score.
References
1. Vasant, P., Zelinka, I., Weber, G.-W.: Intelligent Computing and Optimization.
Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00979-3. ISBN 978-3-
030-00978-6
Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33585-4. ISBN 978-3-
030-33585-4
Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68154-8
4. Abdoli, S., Cardinal, P., Koerich, A.L.: End-to-end environmental sound classifi-
cation using a 1D convolutional neural network. Expert Syst. Appl. 136, 252–263
(2019)
5. Boddapati, V., Petef, A., Rasmusson, J., Lundberg, L.: Classifying environmen-
tal sounds using image recognition networks. Procedia Comput. Sci. 112, 2048–
2056 (2017). Knowledge-Based and Intelligent Information & Engineering Systems:
Proceedings of the 21st International Conference, KES-20176-8 September 2017,
Marseille, France
6. Costa, Y.M., Oliveira, L.S., Silla, C.N.: An evaluation of convolutional neural net-
works for music classification using spectrograms. Appl. Soft Comput. 52, 28–38
(2017)
7. Giannakopoulos, T., Pikrakis, A.: Audio features. In: Giannakopoulos, T., Pikrakis,
A. (eds.) Introduction to Audio Analysis, pp. 59–103. Academic Press, Oxford
(2014)
8. Hershey, S., et al.: CNN architectures for large-scale audio classification (2017)
9. Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A comparison of deep learning methods
for environmental sound detection. In: 2017 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 126–130. IEEE (2017)
10. Palanisamy, K., Singhania, D., Yao, A.: Rethinking CNN models for audio classi-
fication (2020)
11. Piczak, K.J.: Environmental sound classification with convolutional neural net-
works. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal
Processing (MLSP), pp. 1–6. IEEE (2015)
12. Salamon, J., Bello, J.P.: Unsupervised feature learning for urban sound classifi-
cation. In: 2015 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 171–175. IEEE (2015)
13. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound
research. In: Proceedings of the 22nd ACM International Conference on Multime-
dia, MM 2014, pp. 1041–1044. Association for Computing Machinery, New York
(2014)
14. Sharma, J., Granmo, O.C., Goodwin, M.: Environment sound classification using
multiple feature channels and attention based deep convolutional neural network.
In: INTERSPEECH, pp. 1186–1190 (2020)
15. Su, Y., Zhang, K., Wang, J., Madani, K.: Environment sound classification using
a two-stream CNN based on decision-level fusion. Sensors 19(7), 1733 (2019)
16. Tokozume, Y., Harada, T.: Learning environmental sounds with end-to-end con-
volutional neural network. In: 2017 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 2721–2725. IEEE (2017)
17. Zhang, Z., Xu, S., Cao, S., Zhang, S.: Deep convolutional neural network with
mixup for environmental sound classification. In: Lai, J.-H., et al. (eds.) PRCV
2018. LNCS, vol. 11257, pp. 356–367. Springer, Cham (2018). https://doi.org/10.
1007/978-3-030-03335-4 31
Sentiment Analysis: Developing an Efficient
Model Based on Machine Learning and Deep
Learning Approaches
Said Gadri(&), Safia Chabira, Sara Ould Mehieddine,

and Khadidja Herizi
Laboratory of Informatics and Its Applications of M’sila LIAM, Department

of Computer Science, Faculty of Mathematics and Informatics, Univ.
Mohamed Boudiaf of M’sila, M’Sila, Algeria
{said.kadri,safia.chabira,sara.omah,
herizikha}@univ-msila.dz
Abstract. Sentiment analysis is a subfield of text mining. It is the process of

categorizing opinions expressed in a piece of text. a simple form of such
analysis would be to predict whether the opinion about something is positive or
negative (polarity). The present paper proposes an efficient sentiment analysis
model based on machine learning ML and deep learning DL approaches.
A DNN (Deep Neural Network) model is used to extract the relevant features
from customer reviews, perform a training task on almost of samples of the
dataset, validate the model on a small subset called the test set and consequently
compute the accuracy of sentiment classification. For the programming stage,
we benefited from the large opportunities offered by Python language, as well as
Tensorflow and Keras libraries.
Keywords: Machine learning Deep learning Artificial Neural Networks

Natural language processing Social media
1 Introduction
Today social media such as Twitter, Facebook, Instagram, etc., become an important
means that allow people to share their opinions and sentiments about a product they
want to buy or to express their views about a particular topic, company service, or
political event [1]. Many business companies need to process these sentiments/opinions
and exploit them in many interesting applications such as improving the quality of their
services, drawing efficient business strategies, and achieve a large number of customers
[2]. In our days, sentiment analysis SA is considered among the hottest research topic
in NLP and text mining fields. It can be defined as the process of extracting auto-
matically the relevant information that expresses the opinion of the user about a given
topic [1, 3]. A simple form of such analysis would be to predict whether the opinion
about something is positive, negative, or neutral (polarity). There exist other forms of
sentiment analysis or opinion like predicting rating scale on product’s review, pre-
dicting polarity on different aspects of the product, detecting subjectivity and

https://doi.org/10.1007/978-3-030-93247-3_24
238 S. Gadri et al.
objectivity in sentences, etc. [1, 2, 4]. SA is useful in a wide range of applications,

notably: business activities, government services, biomedicine, recommender systems.
For instance, in the domain of e-business, companies can study customers’ feedback
relative to a product in order to provide better: products, services, marketing strategies
and to attract more customers [2]. In the field of recommender systems, we use SA to
improve recommendations for books, movies, hotels, restaurants, and many other
services [5]. There exist four approaches to process the problem of sentiment analysis,
including lexicon-based approach, machine learning approach, deep learning approach,
and hybrid approach [1]. The lexicon-based approach was the first approach that has
been used by researchers for the task of sentiment analysis. It is based on two principal
techniques: the dictionary-based technique which is performed using a dictionary of
terms like those in wordnet, and the corpus-based technique which is based on a
statistical analysis of the content of documents combined with some statistical algo-
rithms such as hidden Markov models HMMs [6], the Conditional Random Field CRF
[7]. The machine learning approach [8] is proposed by many researchers for SA, and
based on classical ML algorithms, such as Naïve Bayes NB [9], Support Vector
Machine SVM [10], etc. Deep learning approach is recently proposed by researchers
and know a large success in many fields, such as computer vision [11–13], image
processing [14, 15], object detection [16, 17], network optimization [18], sensor net-
works [19, 20], system security [21]. It gives better results in terms of accuracy but
needs massive data. Many models are currently used, including DNN, CNN, RNN,
LSTM. Our main objective in this work is to classify opinions expressed by customers
by short reviews to determine whether the reviews’ sentiment towards the movie
service is positive or negative. For this purpose, we used the traditional machine
learning ML approach and the deep learning DL approach. For the first approach, we
applied many ML algorithms including LR, NB, SVM. For the DL approach, we built a
DNN model to perform the same task. We finished our work by establishing a com-
parison between the two approaches.
2 Related Work
In 2016, Gao et al. [22] developed a sentiment analysis system using the Adaboost
algorithm combined with a CNN model. They performed their experimental work on
the movie reviews IMDB dataset. Their main objective was to study the contribution of
different filters lengths and exploiting their potential in the final polarity of the sen-
tence. Singhal et al. [23] presented a survey of sentiment analysis and deep learning
areas. The survey comprises well-known models such as CNN, RNTN, RNN, LSTM.
They applied their experiments on many datasets, including sentiment treebank, movie
reviews, MPQA, customer reviews. Al-Sallab et al. [24] realized an opinion mining in
Arabic using an RNN model. For the dataset, they used many datasets, namely: online
comments from QALB, Twitter, newswire articles written in MSA providing complete
and comprehensive input features for autoencoder. Preethi et al. [5] realized a sentiment
analysis for recommender systems in the cloud. They used RNN end NB classifiers and
the Amazon dataset. The main performed task is to recommend the places that are new
for the user’s current location by analyzing the different reviews and computing the
Sentiment Analysis: Developing an Efficient Model Based on Machine Learning 239
score grounded on them. Jangid et al. [25] developed a financial sentiment analysis
system and used many models: CNN, LSTM, RNN, and for dataset they used financial
tweets. The main performed work was the realization of aspect-based sentiment
analysis. Zhang et al. [26] presented a detailed survey of deep learning for the senti-
ment analysis field. They used: CNN, DNN, RNN, LSTM models. For datasets, they
performed experiments on social network sites. As principal tasks realized in this work:
sentiment analysis with word embedding, sarcasm analysis, emotion analysis, multi-
modal data for sentiment analysis. Wu et al. [27] realized sentiment analysis with
variational autoencoder using LSTM and Bi-LSTM algorithms and StockTwits dataset.
Many interesting tasks have been performed through this work such as encoding and
decoding, sentiment prediction. Wang et al. [28] proposed a hybrid method that uses
sentiment analysis of reviews related to movies to improve a preliminary recommen-
dation list obtained from the combination of CF and content-based methods. Gupta
et al. [29] combine sentiment and semantic features in the LSTM model based on
emotion detection. Salas-Zarate et al. [30] developed an ontology-based model to
analyze sentiments in tweets related to diabetes datasets. Sharef et al. [31] discussed the
use of sentiment analysis in the field of big data. Several other studies applied deep
learning-based sentiment analysis in different domains, notably: finance [32], recom-
mender systems for cloud services [5], etc. Pham et al. [11] used multiple layers of
knowledge representation to analyze travel reviews and detect sentiment for five
parameters, rating value, room, location, deadlines, and services.
3 Sentiment Analysis Process
In the present work, we developed a sentiment analysis system based on ML and DL

approaches which are considered as the most performant in the last decade. As it was
explained in Sect. 1, the main objective is always how to classify opinions expressed
by customers using short reviews to determine whether the reviews’ sentiment towards
the movie is positive or negative. Our project can be divided into the following steps:
1. Downloading the used dataset, in our case we used the movie reviews dataset
IMDB
2. Selecting the most important columns.
3. Performing some preprocessing tasks, including: cleaning spaces, removing punc-
tuation, removing stopwords, removing links and non-characters letters, splitting
texts and representing them by individual word vectors, then transforming them into
their base form by stemming and lemmatization, converting all term vectors into
numerical vectors by using a collection of coding, including binary coding, TF
coding, TF-IDF coding, n-grams, embedding words, etc.
4. For the ML approach, we applied the following algorithms: LR, SVM, NB.
5. For the DL approach, we proposed a new model based on many hidden layers
composed of simple neurons for each (will be detailed next).
6. Running a training task on 80% of samples of our dataset (the train set) to learn the
selected ML algorithms, as well as the new DNN model.
240 S. Gadri et al.
Train set Preprocessing Stage Reviews dataset Training stage
relevant values
(80% of Train vectors
Selecting the
- Cleaning spaces
samples) -Removing punctuation ML algorithms
-Removing stop-words. KNN, NB,
Test set -Converting texts into LR, SVM,...
lowercase letters. Test vectors
(20% of
- Other tasks
samples) DNN Model
Reviews dataset Test stage

Train vectors
ML algorithms
KNN, NB,
LR, SVM,... Model Accuracy
Test vectors
DNN Model
k-fold cross-validation technique
Fig. 1. Sentiment analysis process
7. Validating the ML algorithms and the DNN model on 20% of samples of our
dataset (test set). For this purpose, we used also the k-fold cross-validation tech-
nique (usually k = 10) in order to determine the performance of the different ML
algorithms and the DNN model.
8. For the programming stage, we used python combined with TensorFlow and Keras
which offer soft APIs and many rich libraries for ML and DL.
Figure 2 presents a detailed diagram representing the SA process in our system.
4 The Proposed Sentiment Analysis Model
After applying some preprocessing tasks to prepare texts, we proceed to the devel-
opment of our sentiment analysis model as follows:
1. First, we Applied many ML algorithms, including; Logistic Regression LR,
Gaussian Naïve Bayes NB, Support Vector Machine SVM. For this purpose, we
used the scikit-learn library of python containing the most known ML algorithms.
2. Designing a DNN model (Deep Neural Network): We proposed a DNN model
composed of ten (06) full connected layers described as follows: Layer 1(750
neurons and expects 2 input variables: text, label), layer 2 (512 neurons), layer 3
(128 neurons), layer 4 (64 neurons), layer 5 (16 neurons), layer 6, or the output
layer (2 neurons) to predict the class (1: Positive polarity, 0: Negative polarity)
• The six (06) fully connected layers are defined using the Dense class of Keras which
permits to specify the number of neurons in the layer as the first argument, the
initialization method as the second argument, and the activation function using the
activation argument.
• We initialize the network weights to a small random number generated from a
uniform distribution (‘Uniform‘) Or ‘normal’ for small random numbers, we use the
rectifier (‘Relu’) on most layers and the sigmoid function in the output layer.
• We use a sigmoid function on the output layer to ensure our network output is
between 0 and 1 and easy to map to either a probability of class 1 or 0.
• We compile the model using the efficient numerical libraries of Keras under the
covers (the so-called backend) such as TensorFlow. The backend automatically
chooses the best way to represent the network for training and making predictions to
run on your hardware (we have used CPU in our application).
• When compiling, we must specify some additional properties required when
training the network. We note that training a network means finding the best set of
weights to make predictions for this problem.
• When training the model, we must specify the loss function to evaluate a set of
weights, the optimizer used to search through different weights of the network, and
any optional metrics we would like to collect and report during training. Since our
problem is a binary classification, we have used a logarithmic loss, which is defined
in Keras as “binary_crossentropy“.
• We will also use the efficient gradient descent algorithm “adam” because it is an
efficient default.
• Finally, since it is a classification problem, we report the classification accuracy as
the performance metric.
• Execute the model on some data.
• We train or fit our model on our loaded data by calling the fit() function on the
model, the training process will run for a fixed number of iterations through the
dataset called epochs, which we must specify using the n-epochs argument. We can
also set the number of instances that are evaluated before a weight update in the
network is performed, called the batch size, and set using the batch_size argument.
For our case, we fixed the following values: Nb-iter = 15, batch-size = 32. These
are chosen experimentally by trial and reducing the error.
• We trained our DNN on the entire dataset (training set) and evaluated its perfor-
mance on a part of the same dataset (test set) using the evaluate () function. This
will generate a prediction for each input and output pair and collect scores,
including the average loss and any metrics you have configured, such as accuracy.
Figure 2 shows the architecture and the characteristics of the proposed DNN
model.
Hidden
Layer 6
Input Hidden
Layer Hidden Layer 5
Hidden Hidden Layer 4
Layer 2 Layer 3
Fig. 2. Architecture of the proposed DNN model

242 S. Gadri et al.
5 Experimental Work
5.1 Used Dataset
In our experimentation, we used Movie Reviews Dataset (IMDB): which is one of the
most popular movie dataset used for sentiment analysis classification. It contains a set
of 50.000 highly polar reviews. It can be divided into two subsets: the train set,
containing 40.000 movie reviews, and the test set containing 10.000 movie reviews for
testing. The two subsets are presented in CSV format. IMDB data is available on many
websites such as Kaggle. Each CSV file contains, two principal columns which are:
Table 1. Description of the used dataset

Field Signification
Text The text of the posted review
Label The target variable or the class (positive/negative). Where: positive expresses a
positive review about the movie, negative expresses a negative review
5.2 Programming Tools
Python: Python is currently one of the most popular languages for scientific appli-
cations. It has a high-level interactive nature and a rich collection of scientific libraries
which lets it a good choice for algorithmic development and exploratory data analysis.
It is increasingly used in academic establishments and also in industry. It contains a
famous module called the scikit-learn tool integrating a large number of ML algorithms
for supervised and unsupervised problems.
Tensorflow: TensorFlow is a multipurpose open-source library for numerical com-
putation using data flow graphs. It offers APIs for beginners and experts to develop for
desktop, mobile, web, and cloud. TensorFlow can be used from many programming
languages such as Python, C++, Java, Scala, R, and Runs on a variety of platforms
including Unix, Windows, iOS, Android.
Keras: Keras is the official high-level API of TensorFlow which is characterized by:
Minimalist, highly modular neural networks librarys written in Python, Capable of
running on top of either TensorFlow or Theano, Large adoption in the industry and
research community, Easy production of models, Supports both convolutional net-
works and recurrent networks and combinations of the two, Runs seamlessly on CPU
and GPU.
5.3 Evaluation
To validate the different ML algorithms, and obtain the best model, we have used the
cross-validation method consisting in splitting our dataset into 10 parts, train on 9 and
test on 1, and repeat for all combinations of train/test splits. For the CNN model, we
have used two parameters which are: loss value and accuracy metric.
1. Accuracy Metric: This is a ratio of the number of correctly predicted instances
divided by the total number of instances in the dataset multiplied by 100 to give a
percentage (e.g., 90% accurate).
2. Loss Value: used to optimize an ML algorithm or DL model. It must be calculated
on training and validation datasets. Its simple interpretation is based on how well
the ML algorithm or the DL built model is doing in these two datasets. It gives the
sum of errors made for each example in the training or validation set.
5.4 The Obtained Results

As we explained in Sect. 3, after applying some preprocessing tasks to prepare texts,
we proceed to the development of our sentiment analysis model using the classical ML
and the DL approaches. In the present section, we illustrate the obtained results when
executing the training and the testing steps on our model for the two approaches
(Table 1).
Table 2. The accuracy average after applying different ML algorithms

Algorithm Accuracy
BIN coding TF-IDF coding
LR 54.53% 59.74%
NB 54.36% 57.34%
SVM 54.34% 60.40%
BIN Vs TF-IDF
100
0
LR LDA KNN CART NB SVM
TF coding TF-IDF coding
Fig. 3. Binary coding Vs TF-IDF coding
The following Table 3 summarizes the obtained results when applying the DNN
model:
244 S. Gadri et al.
Table 3. Loss and accuracy values obtained when applying the proposed DNN model
Training set Loss: 0.048; Acc 98.69% Loss: 0.043; Acc: 98,84%
Test set Loss: 0.3827; Acc: 100% Loss: 0.1082; Acc: 100%
a. Binary coding b. TF-IDF coding
Fig. 4. Train.loss Vs Val.loss of the DNN model. a. Binary coding. b. TF-IDF coding
a. Binary coding b. TF-IDF coding
Fig. 5. Train.Accuracy Vs Val.Accuracy of the DNN model.
6 Discussion
The main advantage of our designed SA model is that it combines the classical ML and
the DL approaches. In the first stage, after downloading the IMBD dataset and per-
forming some preprocessing tasks on it, we apply some ML algorithms including LR,
NB, SVM. Table 2 summarizes the obtained results when applying the ML algorithms
for the two coding models, the binary model (Col 2) and TF-IDF model (Col 3). We
observe the following: (a) The obtained accuracy is not high for all ML algorithms
(relatively low <70%) (b) The accuracy is low for both coding models, but there is a
small improvement when using TF-IDF coding model as it is shown on Table 2 and
Fig. 3. In the second stage, we apply the DL approach through the new DNN (Fig. 2)
on the training and the test datasets using the two coding models. Two performance
measures are considered in this case; the loss value which computes the sum of errors
after training the model, and the accuracy value which gives the rate of correctness. For
a performant model, the loss value must be very low, but the accuracy value must be
very high, which is the case for our proposed model (>98%). We observe also, that
there is not a clear improvement when changing the coding model from binary model
to TF-IDF model. Figure 4a and Fig. 4b show the evolution of the training loss and the
validation loss over time and in terms of the number of epochs (nb-epochs = 15; Batch-
size = 32) for the two coding models. We observe that the value of loss function for
both coding models is similar, and no clear improvement is marked here. Similarly,
Fig. 5a and Fig. 5b plot the evolution of training accuracy and validation accuracy in
terms of the number of epochs. For the two coding models, contrary to the loss
function, the accuracy starts relatively low and ends very high (>98%). The value of the
accuracy for the binary model and TF-IDF model is approximately the same and no
clear improvement is marked. i.e., the coding model does not influence the performance
of a DNN model.
7 Comparison Between ML and DL Approaches
We concluded our study by establishing a comparison between ML and DL approa-

ches. This comparison proves that the performance of the DNN model in terms of
accuracy is always high whatever the used ML algorithm, which is shown in Table 4.
Table 4. Comparison between ML and DL Approaches

Algorithm Accuracy
LR 54.53% 59.74%
NB 54.36% 57.34%
SVM 54.34% 60.40%
CNN Model 100% 100%
8 Conclusion and Future Suggestions
In the present paper, we presented the different approaches used in the sentiment
analysis field, especially the DL approach. We illustrated the well-known studies and
researches done in this interesting area. We note that we have used in the experimental
part two coding models: TF and TF-IDF to transform input data into numerical values
before providing it to DL models and the well-known IMDB dataset. We also presented
the detailed architecture of the proposed DNN by giving the different layers and the
number of neurons in each layer. We conducted many experiments to evaluate the
different ML algorithms as well as the proposed DNN model on datasets. The exper-
iments performed here show that our model gives high accuracy for sentiment analysis
detection which is not the case when applying ML algorithms. As perspectives for this
work, we will focus our future studies on the following things:
246 S. Gadri et al.
• Using other coding models, such as embedding words Word2Vec, and N-grams
• Exploiting other DL models, notably: CNN, RNN, LSTM, and other hybrid models
combined with: TF, TF-IDF, word embedding, and n-grams techniques to improve
the obtained accuracy.
• We also plan to use other datasets and establish a wide comparison between them.
References
1. Dang, N.C., Moreno-García, M.N., De la Prieta, F.: Sentiment analysis based on deep
learning: a comparative study. Electronics 9, 483 (2020). https://doi.org/10.3390/
electronics9030483. www.mdpi.com/journal/electronics
2. Yang, L., Li, Y., Wang, J., Sherratt, R.S.: Sentiment analysis for e-commerce product
reviews in chinese based on sentiment lexicon and deep learning, IEEE Acces, 8 (2020).
Digital Object Identifier https://doi.org/10.1109/ACCESS.2020.2969854
3. Intelligent Computing & Optimization, Conference Proceedings ICO 2018, Springer, Cham.
https://doi.org/10.1007/978-3-030-00979-3. https://www.springer.com/gp/book/978303000
9786. ISBN 978-3-030-00978-6
Intelligent Computing and Optimization 2019 (ICO 2019), Springer International Publish-
ing, ISBN 978-3-030-33585-4. https://www.springer.com/gp/book/9783030335847
5. Preethi, G., Krishna, P.V., Obaidat, M.S., Saritha, V., Yenduri, S.: Application of deep
learning to sentiment analysis for recommender system on cloud. In: Proceedings of the
2017 International Conference on Computer, Information and Telecommunication Systems
(CITS), Dalian, China, 21–23, 2017, pp. 93–97 (2017)
6. Soni, S., Shara, A.: Sentiment analysis of customer reviews based on hidden Markov model.
In: Proceedings of the 2015 International Conference on Advanced Research in Computer
Science Engineering & Technology (ICARCSET 2015), Unnao, India, 6 March 2015, pp. 1–
5 (2015)
7. Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random
fields. In: Proceedings of the 26th International ACM SIGIR Conference on Research and
Development in Informaion Retrieval, Toronto, Canada, 28 July–1 August 2003, pp. 235–
242 (2003)
8. Zhang, X., Zheng, X.: Comparison of text sentiment analysis based on machine learning. In:
Proceedings of the 15th International Symposium on Parallel and Distributed Computing
(ISPDC), Fuzhou, China, 8–10 July 2016, pp. 230–233 (2016)
9. Malik, V., Kumar, A.: Communication. sentiment analysis of twitter data using Naive Bayes
algorithm. Int. J. Recent Innov. Trends Comput. Commun. 6, 120–125 (2018)
10. Firmino Alves, A.L., Baptista, C.d.S., Firmino, A.A., Oliveira, M.G.d., Paiva, A.C.D.: A
comparison of SVM versus Naive-Bayes techniques for sentiment analysis in tweets: a case
study with the 2013 FIFA confederations cup. In: Proceedings of the 20th Brazilian
Symposium on Multimedia and theWeb, João Pessoa, Brazil, 18–21 November 2014,
pp. 123–130 (2014)
11. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern
Recognition, pp. 1–9 (2015)
Computer Vision and Pattern Recognition, pp. 770–778 (2016)
13. Girshick, R.: Fast R-CNN. In: IEEE International Conf.erence on Computer Vision,
pp. 1440–1448 (2015)
14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. In: International Conference on Neural Information Proceedings Systems,
pp. 1097–1105 (2012)
recognition. Comput. Sci. (2014)
16. Ren, S., Girshick, R., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection
with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149
(2017)
17. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional
networks (2016)
18. Tu, Y., Lin, Y., Wang, J., Kim, J.-U.: Semi-supervised learning with generative adversarial
networks on digital signal modulation classification. Comput. Mater. Continua 55(2),
243254 (2018)
19. Wang, J., Gao, Y., Liu, W., Sangaiah, A.K., Kim, H.-J.: Energy efcient routing algorithm
with mobile sink support for wireless sensor networks. Sensors 19(7), 1494 (2019)
20. Wang, J., Gao, Y., Wang, K., Sangaiah, A.K., Lim, S.-J.: An afnity propagation-based self-
adaptive clustering method for wireless sensor networks. Sensors 19(11), 2579 (2019)
21. Tang, Z., Ding, X., Zhong, Y., Yang, L., Li, K.: A self-adaptive Bell-LaPadula model based
on model training with historical access logs. IEEE Trans. Inf. Forensics Security 13(8),
20472061 (2018)
22. Gao, Y., Rong,W., Shen, Y., Xiong, Z.: Convolutional neural network based sentiment
analysis using Adaboost combination. In: Proceedings of the 2016 International Joint
Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016,
pp. 1333–1338 (2016)
23. Singhal, P., Bhattacharyya, P.: Sentiment analysis and deep learning: a survey. Center for
Indian Language Technology, Indian Institute of Technology: Bombay, Indian (2016)
24. Al-Sallab, A., Baly, R., Hajj, H., Shaban, K.B., El-Hajj, W., Badaro, G.: Aroma: a recursive
deep learning model for opinion mining in Arabic as a low resource language. ACM Trans.
Asian Low-Resour. Lang. Inf. Process. TALLIP 16, 1–20 (2017)
25. Jangid, H., Singhal, S., Shah, R.R., Zimmermann, R.: Aspect-based financial sentiment
analysis using deep learning. In: Proceedings of the Companion of the the Web Conference
2018 on The Web Conference, Lyon, France, 23–27 April 2018, pp. 1961–1966 (2018)
26. Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. WIREs Data
Min. Knowl. Discov. 8, e1253 (2018)
27. Wu, C., Wu, F., Wu, S., Yuan, Z., Liu, J., Huang, Y.: Semi-supervised dimensional
sentiment analysis with variational autoencoder. Knowl. Based Syst. 165, 30–39 (2019)
28. Wang, Y., Wang, M., Xu, W.: A sentiment-enhanced hybrid recommender system for movie
recommendation: a big data analytics framework. Wire. Commun. Mob. Comput. (2018)
29. Gupta, U., Chatterjee, A., Srikanth, R., Agrawal, P.: A sentiment-and-semantics-based
approach for emotion detection in textual conversations. arXiv 2017, arXiv:1707.06996
30. Salas-Zárate, M.P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., Rodriguez-
Garcia, M.A., Valencia-García, R.J.C.: Sentiment analysis on tweets about diabetes: an
aspect-level approach. Comput. Math. Methods Med. 2017 (2017)
31. Sharef, N.M., Zin, H.M., Nadali, S.: Overview and future opportunities of sentiment analysis
approaches for big data. JCS 12, 153–168 (2016)
32. Sohangir, S., Wang, D., Pomeranets, A., Khoshgoftaar, T.M.: Big data: deep learning for
financial sentiment analysis. J. Big Data 5(1), 1–25 (2018). https://doi.org/10.1186/s40537-
017-0111-6
Improved Face Detection System
Ratna Chakma, Juel Sikder(&), and Utpol Kanti Das
Department of Computer Science and Engineering, Rangamati Science

and Technology University, Rangamati, Bangladesh
Abstract. Biometric applications have been using face detection approaches

for security purposes such as human–crowd surveillance, many security-related
areas, and computer interaction. It is a crucial arena of recent research because
there is no fixed system to find the faces in a test image. Face detection is
challenging due to varying illumination conditions, pose variations, the com-
plexity of noises, and image backgrounds. In this research, we present a system
that can detect and recognize in the face by different pre-processing techniques;
Viola-Jones process adds together Haar Cascade, GLCM & Gabor Filter and
Support Vector Machine (SVM), is proposed for gaining better accuracy level in
the detection of facial portions and recognition of faces. The proposed system
has achieved better than other face detection and recognition systems. The
experiment has done in a MATLAB environment on different images of the FEI
Face, Georgia Tech faces, Faces95, Faces96, and MITCBCL databases. The
experimental result achieves detection of faces that represent reasonable accu-
racy rates of an average of 98.32%.
Keywords: Face detection Face recognition Viola-Jones SVM
1 Introduction
The digital image processing field was completed correctly to face related problems as
multidimensional to get a better solution, particularly hosting various pictures on
websites such as Picasa, Facebook, Twitter, Photo Bucket, etc. Supervised learning-
based detection is a more helpful technique today [1]. Existing systems have attracted
different fields, for spreading in actual-world conditions, for example, building safety,
checking people's identities, trapping criminals, etc. In our proposed methodology,
there are several reasons in [2–6] databases such as head scale and outlook variation,
decoration variation, changing impulse, brightness variation, pose changing, environ-
ment variation, degree variation, accuracy and error problem, expression variation
make face detection and face recognition systems fail to perform successfully. Viola-
Jones of object detection algorithms that used different techniques such as integral
image, cascade, contrast stretching, GLCM, Gabor Filter, and SVM classifier showed
better performance than previous techniques’ face recognition. In this study, facial part
detection of Viola-Jones process added to dimension minimization of GLCM & Gabor
Filter, Support Vector Machine (SVM) and Haar Cascade of machine learning methods
is suggested for bringing modern outcomes from previous limitations of actions. This
paper is divided into five sections: Section 1 presents an introduction of a short

https://doi.org/10.1007/978-3-030-93247-3_25
Improved Face Detection System 249
description. In Sect. 2, the face detection and recognition areas were completed into the
past work that summarises the suggested improved face detection and recognition
process in Sect. 3. The system result is presented and analyzed in Sect. 4. Finally,
Sect. 5 offers conclusions and future research plans.
2 Literature Review
The investigators [7] have proposed the identification process of the face utilizing
Principal Component Analysis (PCA), Gabor Filters, and Artificial Neural Networks
(ANN) techniques where associating Viola and Jones in the CMU database of the
image containing different styles, exposure, and the revelation that was able to be
diminished the detection of false acceptance errors and obtained an accuracy of 90.31%
than a past time. This exploration [8] has introduced a training process for bringing fast
execution and validity using FDDB dataset that dividing training images for detecting
features of the face within two isolate networks was performed to fetch outcome
utilizing Convolutional Neural Network (CNN). The authors [9] have proposed another
face detection system utilizing Viola-Jones, where detecting facial portions from var-
ious expressions had gotten 92% accuracy using the Bao database. In [10], the
researchers suggested a face recognition system using the machine learning method in
the ORL database where PCA and LDA presented 97% and 100% exactness among
two parts of aspects. In this research [11], PCA and 2DPCA had applied for getting
recognition performance of face using ORL and YALE database, 2DPCA is powerful
rather than PCA that works to extract face characteristics whereas gaining fast com-
putational time. A hash-type system for human face recognition has moved by the
quintet triple binary pattern (QTBP) [12]. Utilizing alignment of SVM and KNN, they
gained adequate recognition achievement in AT&T, Face94, CIE, AR, and LFW
databases rather than old activities that minimize high computational complexity time.
3 Methodology
The block diagram of the proposed system is shown in Fig. 1.
3.1 Input Image

First, facial test images are taken from FEI Face, Georgia Tech faces, Faces95,
Faces96, and MITCBCL databases. Image size would be 640 480 pixels for FEI
face database and Georgia Tech face database, 180 by 200 pixels for Faces95 database,
196 196 pixels for Faces96 database, and different dimensions for MITCBCL face
database. These databases are performed for evaluation by the proposed idea.
250 R. Chakma et al.
Input Image
Contrast Stretching
Extract Facial Part
Extract Features
SVM Classifier Feature Database
Identified Face
Fig. 1. Block diagram of the proposed system
3.2 Contrast Stretching

Sometimes input face images may contain noise and maybe blurred in some regions
because of technical errors while capturing the image. So, to remove that unwanted
noise and blurred portions in the system used partial contrast stretching [13]. By
optimal use of available colors and other characteristics, partial contrast stretching
strengthens the input image as much as possible. It intensifies the visual outlook,
ameliorates the signal-to-noise proportion, smoothes the region’s inner portion pre-
serving its boundaries, eradicates the noise and undesired parts not concerned with the
intended bit [14].
3.3 Extract Facial Part

Three strategies are considered for detecting human face portions utilizing [9] Viola-
Jones calculation: (1) An integrated image is identified by Haar-like features to extract
the facial features that are the shape of rectangular, (2) Machine learning strategy is
used for structures of subset selection among all accessible structures by Ada boost
algorithm, (3) Combination of numerous features are capably operated on the cascade
classifier that is fixed on resulting of the various filters. The facial parts detection
process is shown below in Fig. 2. Here, taking the original image from any database
that was processed to detect human face by face detection algorithm existing various
poses. Then searching face parts in the detected image by the object detection algo-
rithm. The nose detection process is applied to encode an image by Haar features that
exist in poorly classifiers that are cropped and detected using the bounding box. The
mouth detection process [15] is encoded in the mouth region that is consists of weak
classifiers for detectors mouse utilizing Haar features. The eye-detection process is
Fig. 2. Detection of facial parts process
carried out by the left eye and suitable eye detector using the Viola-Jones process and
searching that on eye areas in a face to detect the left eye and right eye [9]. The
detection of face plays an important role using the Cascade classifier [9]. The number
of black pixels was divided by the number of white pixels for every result attribute.
Haar features are used quickly to detect human faces that are similar to rectangle
features as follows:
Number of black rectangle

qðzÞ ¼ ð1Þ
Number of white rectangle
Machine learning uses the Adaboost algorithm, a robust classifier less powerful for
linear combination, as displayed in Eq. (2).
SðzÞ ¼ n1S1ðzÞ þ n2S2ðzÞ þ ::::: ð2Þ
The human databases of each image are crossed every place utilizing cascade
process which is considered a face and differently, does not work to detect area of the
face.
3.4 Extract Features

To extract feature vectors from detected facial parts, GLCM & Gabor Filter are used.
The GLCM functions represent texture features of an image. After seeing facial
characteristics, the significant features are extracted. To determine the meaning of a
given trial, extracted features are used. The system used the following methods for
feature extraction. Textures are the features whose focus is on the distinct pixels that
create an image. The proposed method describes texture features into mainly two types
such as signal processing and statistical. Statistical type contains GLCM, grey-level
histogram, run-length matrices, and auto-correlation features for texture extraction.
GLCM procedures extract 2nd-order statistical texture features [17]. Textural features
include entropy, correlation, contrast, homogeneity, and energy. The texture is par-
ticularly appropriate for this study because of its possessions. The system also used
Gabor filter features. The Gabor filters consist of standard deviation, orientation, and
the radial center frequency [18]. The method combined the GLCM and Gabor filters
into one feature set. This combination of the Gabor filters and the GLCM feature
generates a better outcome on the face dataset.
3.5 SVM Classifier

The proposed system used SVM to identify the face. The extracted features of all
database faces are stocked in the feature database. The SVM classifier calculates the
feature number of database images and the feature value of the input facial parts; based
on these values; the classifier will separate the input image from feature databases. The
Support Vector Machine is supplied as a separator. SVM will compare the test sample
feature set to all training samples and select the shortest distance. Support vector
machines were initially calculated for binary classification. The SVM is learned by
features given as an input to its training procedure. During training, the SVM identifies
the appropriate boundaries in the feature databases. Using received features, the SVM
starts classification steps. The SVM classifier classifies the input test face using
extracted features compared with the feature database and identifies the desired person
from the face database [19, 20].
4 Result and Analysis
FEI face database [2] has 200 subjects where 14 image poses per subject and counting
of entire images 2800. Every image resolution was defined by 640 480 pixels and
presented a bright, front, and vertical idea with the white type of the same environment
revolving 180 dimensions, scale variation 10%, outlook variation, changing the hair-
style, and decoration variation. Georgia tech face database [3] has 50 individuals where
each image remains 15 RGB image poses per subject. Few attributes have existed:
mutation of brightening, a different expression of face, out-look variation, changing
impulse, and scale. Faces95 database [4] has 72 persons that image resolution existed
by 180 by 200 pixels. Some features have stayed: image is the red type of screen
condition, changing image position as the front because of flaming. Faces96 database
[5] has 152 persons whose image resolution consists of 196 by 196 pixels. MIT-CBCL
face database [6] contains ten persons. The characteristics are the front position of the
face, high-quality image pixel, brightness variation, pose changing, environment
variation, and revolving 30 degrees. We have utilized 120, 40, 30, 41, 10 testing
images whereas taking 11, 11, 11, 11, 4 poses each person and 1320, 440, 330, 451, 40
training images for our research according to FEI face database, Georgia tech face
Table 1. Comparative analysis of other methods to our proposed method

Ref. Methodological approach Database Performance
(%)
[7] Viola and Jones Method CMU (Carnegie Mellon 90.31%
Artificial Neural Networks (ANN) University) database
Gabor Filters
Principal Component Analysis (PCA)
[8] Convolutional Neural Network (CNN) FDDB database 88.9%
[9] Viola-Jones Algorithm Bao database 92%
[10] Machine learning ORL database 97%
Linear Discriminant Analysis (LDA)
Support vector machine
Naïve Bayes
Multilayer Perceptron
PCA + LDA (Configuration B)
[21] Principal Component Analysis ORL database 92.5%
(PCA) Eigenface Face94 database 92.10%
[22] Support Vector Machine (SVM) YALE database 84%
Multiclass SVM
[23] Deep learning ORL database LFW 91%
Convolutional database 81%
Neural Network (CNN)
SIAMESE network
Proposed Viola-Jones Algorithm with Haar FEI Face, Georgia Tech 98.32%
Method Cascade, GLCM & Gabor Filter and face, Faces95, Faces96
Support Vector Machine (SVM) and MIT-CBCL
database, Faces95 database, Faces96 database, and MIT-CBCL face database respec-
tively. Experimental result-1 shows detected and recognized images in different poses
for the FEI face database, the Georgia tech face database, and the MIT-CBCL face
database are shown below in Fig. 3, 4, and Fig. 5, respectively. In this research, facial
parts detection and recognition have been done with FEI Face, Georgia Tech face,
Faces95, Faces96 and MITCBCL face databases, whose descriptions have been given
Fig. 3. Experimental Result-1 (FEI Face Database)

Fig. 4. Experimental Result-2 (Georgia Tech Face Database)
Fig. 5. Experimental Result-3 (MIT-CBCL Face Database)
in Table 1. The accuracy finding of a simple equation for detection and recognition
have shown below (Table 2):
FAR þ FRR
Accurancy Rate ¼ 100 ð3Þ
2
Table 2. Detection and recognition performance

Database False reject rate% False accept rate% Accuracy rate%
FEI Face 2.14 1.13 98.37%
Georgia Tech face 2.18 2.11 97.86%
Faces95 3.41% 0.00% 98.30%
Faces96 4.72% 0.00% 97.64%
MIT-CBCL face 1.11% 0.00% 99.45%
The execution of the detection and recognition rate in each database is shown
in Fig. 6.
Fig. 6. Accuracy graph
5 Conclusion and Future Works
In this research, the proposed improved face detection system has been tested FEI Face,
Georgia Tech face, Faces95, Faces96, and MIT-CBCL databases using pre-processing
techniques, Viola-Jones algorithm with Haar Cascade machine learning method,
GLCM & Gabor Filter and Support Vector Machine (SVM). The experimental result
achieves detection of facial portions and recognition faces from various human
expressions that represent reasonable accuracy rates of 98.37%, 97.86%, 98.30%,
97.64%, and 99.45% from FEI Face, Georgia Tech face, Faces95, Faces96, and MIT-
CBCL databases respectively. We plan to introduce better segmentation techniques,
better feature extraction methods, and classification algorithms for future work. We
also apply our approach in other domains of interest, such as different facial parts from
stream videos with complex backgrounds, medical image analysis, and satellite image
analysis.
References
1. Sikder, J., Das, U.K., Chakma, R.J.: Supervised learning-based cancer detection. Int. J. Adv.
Comput. Sci. Appl. (IJACSA) 12(5) (2021). https://doi.org/10.14569/IJACSA.2021.0120
5101
2. https://fei.edu.br/cet/facedatabase.html
3. http://www.anefian.com/research/facereco.htm
4. Libor Spacek’s Facial Images Databases. http://cmp.felk.cvut.cz/spacelib/faces/faces95.html

5. Libor Spacek’s Facial Images Databases. http://cmp.felk.cvut.cz/spacelib/faces/faces96.html
6. http://cbcl.mit.edu/software-datasets/heisele/facerecognition-database.html
7. Da'San, M., Alqudah, A., Debeir, O.: Face detection using viola and jones method and
neural networks. In: 2015 International Conference on Information and Communication
Technology Research (ICTRC 2015). IEEE (2015)
8. Triantafyllidou, D., Tefas, A.: Face detection based on deep convolutional neural networks
exploiting incremental facial part learning. In: 23rd International Conference on Pattern
Recognition (ICPR), Cancun Center, Cancun, Mexico, 4–8 December 2016 (2016)
9. Vikram, K., Padmavathi, S.: Facial parts detection using Viola Jones algorithm. In: 2017
International Conference on Advanced Computing and Communication Systems (ICACCS
2015), Coimbatore, India, 06–07 January 2017 (2017)
10. Sharma, S., Bhatt, M., Sharma, P.: Face recognition system using machine learning
algorithm. In: Proceedings of the Fifth International Conference on Communication and
Electronics Systems (ICCES 2020), IEEE Conference Record # 48766; IEEE Xplore (2020).
ISBN 978-1-7281-5371-1
11. Dandpat, S.K., Meher, S.: Performance improvement for face recognition using PCA and
two-dimensional PCA. In: 2013 International Conference on Computer Communication and
Informatics (ICCCI 2013), Coimbatore, India, 04–06 January 2013 (2013)
12. Tuncer, T., Dogan, S., Abdar, M., Pławiak, P.: A novel facial image recognition method
based on perceptual hash using quintet triple binary pattern. Multimedia Tools Appl. 79(39),
29573–29593 (2020)
13. Das, U.K., Sikder, J., Salma, U., Anwar, A.S.: Intelligent cancer detection system. In: 2021
International Conference on Intelligent Technologies (CONIT), pp. 1–6. IEEE, June 2021
14. Sikder, J., Das, U.K., Anwar, A.M.S.: Cancer cell segmentation based on unsupervised
clustering and deep learning. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC,
vol. 1324, pp. 607–620. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68154-8_53
15. Sikder, J., Chakma, R., Chakma, R.J., Das, U.K.: Intelligent face detection and recognition
system. In: 2021 International Conference on Intelligent Technologies (CONIT), pp. 1–5.
IEEE, June 2021
16. El Maghraby, A., Abdalla, M., Enany, O., El, M.Y.: Detect and analyze face parts
information using Viola-Jones and geometric approaches. Int. J. Comput. Appl. 101(3), 23–
28 (2014)
17. Mohanaiah, P., Sathyanarayana, P., GuruKumar, L.: Image texture feature extraction using
GLCM approach. Int. J. Sci. Res. Publ. 3(5), 1–5 (2013)
18. Li, W., Qian, D.: Gabor-filtering-based nearest regularized subspace for hyperspectral image
classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(4), 1012–1022 (2014)
19. Mahmud, T., Sikder, J., Chakma, R.J., Fardoush, J.: Fabric defect detection system. In:
Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2020. AISC, vol. 1324, pp. 788–800.
20. Sikder, J., Sarek, K.I., Das, U.K.: Fish disease detection system: a case study of freshwater
fishes of Bangladesh. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 12(6) (2021). https://doi.
org/10.14569/IJACSA.2021.01206100
21. Matin, A., Mahmud, F.: Recognition of an individual using the unique features of human
face. In: 2016 IEEE International WIE Conference on Electrical and Computer Engineering
(WIECON-ECE), AISSMS, Pune, India, 19–21 December 2016 (2016)
22. Sani, M.M., Ishak, K.A., Samad, S.A.: Evaluation of face recognition system using support
vector machine. In: Proceedings of 2019 IEEE Student Conference on Research and
Development (SCOReD 2009), UPM Serdang, Malaysia, 16–18 November 2009 (2009)
23. Wang, W., Yang, J., Xiao, J., Li, S., Zhou, D.: Face Recognition based on deep learning. In:
Zu, Q., Hu, Bo., Gu, N., Seng, S. (eds.) HCC 2014. LNCS, vol. 8944, pp. 812–820.
Paddy Price Prediction
in the South-Western Region
of Bangladesh
Juliet Polok Sarkar1(B) , M. Raihan1 , Avijit Biswas2 ,

Khandkar Asif Hossain1 , Keya Sarder1 , Nilanjana Majumder1 ,
Suriya Sultana1 , and Kajal Sana1
1
North Western University, Khulna 9100, Bangladesh
2
Bangabandhu Sheikh Mujibur Rahman Science and Technology University,
Gopalganj 8100, Bangladesh
Abstract. In the current scenario, farmers are losing a lot of profit due
to price fluctuations caused by climatic change and other price influenc-
ing factors. Farmers are affected emotionally and financially as a result
of this. Price forecasting may aid the agriculture supply chain in mak-
ing critical decisions to reduce and manage the risk of price fluctua-
tions. Predictive analysis is supposed to solve the problems as a result of
reduced agricultural productivity due to uncertain climatic conditions,
global warming, and other factors. This research focuses on identifying
appropriate data models that aid in achieving high price forecast accu-
racy and generality. Our dataset’s class imbalance was reduced using
SMOTE. However, SMOTE was not particularly beneficial because our
dataset only comprised data from three districts in the Khulna division.
We used Linear Regression, a Machine Learning Classification method,
to predict the price of the crop. To compare the prediction results, we
used a Neural Network. The data for forecasting paddy prices was origi-
nally collected by visiting local farmers in Bangladesh’s Khulna Division.
There are 154 instances in this dataset, each with its own set of 10 unique
attributes, such as division name, district name, sub-district name, mar-
ket name, pricing type, crop price, crop quantity, date, crop category,
and crop name. The models we built have an RMSE value of 114.48 and
MAE value of 80.08 for Linear Regression, while, for Neural Network
we got the lowest RMSE value of 338.2241 and MAE value of 293.1295.
Thus, it is concluded that the Linear Regression model performed better
and there is still potential for improvement.
Keywords: Machine learning · Predictive analysis · Neural networks ·

Price forecasting · Price prediction · Linear regression · Prediction

https://doi.org/10.1007/978-3-030-93247-3_26
Paddy Price Prediction in the South-Western Region of Bangladesh 259
1 Introduction
More than half of Bangladesh’s 15 million hectares of the total land is used for
agriculture. In our nation, agriculture is the most important economic pillar.
Agriculture is the primary source of income for the majority of families. Agricul-
ture accounts for the majority of the country’s gross growth. To meet the needs
of the country’s people, 60% of the land is used for agriculture. Modernization
of agricultural practices is expected to meet the requirements. As a result, the
farmers and the country’s economies are expected to expand. Prices have a big
influence on productivity, consumption, and government policies.
Prices influence farmers’ production decisions and consumers’ purchasing
decisions to a significant degree. Many variables, such as market support and
stabilization steps, product export and import, and so on, influence the price
of a commodity. Agricultural product prices are much more volatile than non-
agricultural products and services. Since food accounts for about 66% of con-
sumer spending, 45% of which is spent on rice (BBS, 1991), and rice covers 70%
of cropped land (BBS, 1986), the analysis of rice price is crucial for farmers,
merchants, buyers, and the government. Rice prices, which are influenced by
a variety of variables, are extremely difficult to predict. Aus, Aman, and Boro
are three of the most common crop varieties. Crops in those groups are grown
on various time schedules during the year. For Aus, Aman, and Boro, the cul-
tivation weathers are hot and humid (March to June), moist and rainy (July
to October), and cold and dry (November to December). Different districts in
Bangladesh have different climates, so environmental factors specific to these
areas must be considered. This will aid in the selection of the best districts for
the cultivation of various crops. This paper attempted to evaluate the changes in
crop prices over time, as well as their growth in location, yield, and output, and
to calculate the magnitude of annual price fluctuation and measure the degree of
volatility to identify the riskiness of selected crops in comparison to other crops.
The data produced could aid farmers in deciding how to best distribute
their limited resources among less risky crops. Many researchers have focused
on supply response (price-supply relationship) studies, i.e. actual price shifts
and relationships in corresponding areas, but there has never been any work on
price flexibility to understand how much the current quantity harvested affects
post-harvest prices in the markets. The principle of price flexibility is especially
important for agricultural products. Machine learning has become a vital pre-
diction technique in recent years, thanks to the rising trend toward Big Data,
it can forecast prices more correctly based on their features, independent of the
previous year’s data. In this study, We tried to use machine learning and math-
ematical methods to predict paddy prices to help out the farmers to determine
the allowable cost of their crop production and plan accordingly.
In this study, data is acquired from local farmers of the south-western part of
Bangladesh, Khulna division to be exact; therefore, this dataset is distorted. The
majority of real-world datasets are skewed. Researchers have suggested strategies
for managing unbalanced data at both the data and algorithmic levels, however,
260 J. P. Sarkar et al.
the SMOTE methodology utilized in studies has proved to perform better in

the literature. We presented a logistic regression model based on the SMOTE
dataset rebalancing method and other approaches in this research. SMOTE was
employed to correct for class imbalance in our dataset. Considering our dataset
only included data from three districts throughout the Khulna division, SMOTE
proved ineffective, and the logistic regression model could not achieve high accu-
racy. So instead of utilizing SMOTE, We utilized Linear Regression and a Neural
Network Model. Our model gives a low mean absolute error but a high root mean
squared error value and thus the accuracy of the price prediction is somewhat
mediocre but we plan on improving the model in the future. We obtained the
lowest Root Mean Squared Error value 338.2241 and the lowest Mean Absolute
Error value 293.1295 using Neural Network and the lowest Root Mean Squared
Error value 114.48, Mean Absolute Error value 80.08, Median Absolute Error
value 41.87, explained variance score value 0.92, and R2 score value 0.92 using
Linear Regression.
The remainder of the paper is laid out as follows. A brief review of recent
applications of analytics and paddy price forecasting is provided in Sect. 2. The
proposed techniques are discussed in Sect. 3. The experimental setup and findings
are presented in Sect. 4, and the paper is concluded in Sect. 5.
2 Related Works
Machine learning and prediction algorithms such as Logistic Regression, Decision

Trees, XGBoost, Neural Nets, and Clustering were used to identify and process
the pattern among data to predict the crop’s target price. When compared to
all other algorithms, it was found that XGBoost predicts the target better [1].
Rachana et al. proposed a forecasting model based on machine learning tech-
niques to predict crop price using the Naive Bayes Algorithm and crop benefit
using the K Nearest Neighbour technique. The assumptions are separate from
and unrelated to other factors that can be used to forecast prices [2]. R Man-
jula et al. have suggested using machine learning technology to forecast crop
prices. It briefly discussed how to use four algorithms: SVM (Support Vector
Machine), MLR (Multiple Linear Regression), Neural Network, and Bayesian
Network, as well as some examples of how they’ve been used in the past. The
dataset consisted of 21,000 homes, which were split into training and testing data
in an 80:20 ratio. They discovered that a linear model has a high bias (underfit),
while a model with a high model complexity has a high variance (overfit). As a
consequence of integrating the above models, the desired result can be obtained
[3]. A group of academics created a hybrid model that combines the effects of
multiple linear regression (MLR), an auto-regressive integrated moving average
(ARIMA), and Holt-Winters models for better forecasts. The suggested app-
roach is tested for the Iberian power market data collection by forecasting the
hourly day-ahead spot price with dataset periods of 7, 14, 30, 90, and 180 days.
The results reveal that the hybrid model beats the benchmark models and deliv-
ers promising outcomes in the vast majority of research settings [4]. Similarly,
another research group proposed a predictive model using three different types
of Machine Learning models, namely Random Forest, XGBoost, and LightGBM,
as well as two machine learning techniques, Hybrid Regression and Stacked Gen-
eralization Regression, to find the best solutions. They used the “Housing Price
in Beijing” dataset, which contains over 300,000 data points and 26 variables
that reflect housing prices exchanged between 2009 and 2018. These variables,
which acted as dataset features, were then used to forecast each house’s average
price per square meter [5]. Rohith used machine learning techniques and the sup-
port vector regression Algorithm to perform a study to determine crop price. A
decision tree regression machine-learning regression technique was implemented,
in which features of an object are observed and a model is trained in the struc-
ture of a tree to predict future data and generate meaningful continuous output
[6]. Furthermore, a group of researchers proposed prediction models based on
time-series and machine learning architectures such as SARIMA, HoltWinter’s
Seasonal method, and LSTM and analyzed their operation using RMSE value.
They found that the LSTM model turned out to achieve the best performance
[7]. A prediction model integrated with the fuzzy information granulation, MEA,
and SVM was proposed by another research team. It was concluded that the
MEA-SVM model obtained greater prediction accuracy [8]. Helin Yin et al. pro-
posed a hybrid STL-ATTLSTM model combined with two benchmarked models
the STL and LSTM mechanism and compared their prediction performances.
They discovered that the best performance was gained by the STL-ATTLSTM
model [9]. Likewise, a deep neural network method was proposed which later
was compared with a linear regression model and a traditional artificial neural
network and their performances were evaluated and inaugurated by Gregory D.
Merkel et al. They observed that the proposed DNN surpassed the ANN by
9.83% WMAPE [10].
3 Methodology
Figure 1 depicts the overall workflow of our research. Our research has been
divided into four categories. They are, indeed.
– Information gathering
– Synthetic Minority Oversampling Technique
– Data Mining for Exploratory Purposes
– Relationships: Linear vs. Non-Linear
3.1 Information Gathering

We visited local farmers in Bangladesh’s Khulna Division to collect paddy price
data for the forecast. There are a total of 154 instances in this dataset, each
with its own set of 10 unique features such as division name, district name,
sub-district name, market name, price type, crop price, date, and crop id. Our
dataset contains basic information on various types of paddy prices in the Khulna
Divisions’ different sub-districts and districts.
Start
Import Dataset
with 8 Features
Data
Processing
Dataset
Training
Neural Network Linear Regression
Calculate MSE Calculate MSE
Compare
Performance
End
Fig. 1. Work-flow of the study
3.2 Synthetic Minority Oversampling Technique (SMOTE)
SMOTE is an oversampling approach introduced by [11] to avoid a drop in

classifier performance due to dataset class imbalance. In contrast to typical
over-sampling approaches, SMOTE produces new instances from minor classes
“synthetically,” rather than reproducing them. It works in feature space rather
than data space, assuming that a minority class instance and its nearest vector
have the same class value. Each instance is treated as a vector in SMOTE, and
synthetic samples are created at random along the line separating the minority
sample from its nearest neighbor. To make the produced instances comparable
to the original minority class instances, they are assigned based on the charac-
teristics of the original dataset [12].
3.3 Data Mining for Exploratory Purposes

We first plotted our results, looking for linear relationships and thinking about
dimensionality reduction. Specifically, the problem of multicollinearity, which
can increase the explainability of our model while reducing its overall robustness.
Then, to gain more insight, we built a correlation heatmap. We were able to see
right away if there were any linear relationships in the data concerning each
function thanks to the correlation heatmap we developed.
3.4 Relationships: Linear vs. Non-linear

A linear regression model is built on the assumption of linear relationships, but a
neural network can detect non-linear relationships. A positive linear relationship,
a negative linear relationship, and a non-linear relationship are all depicted in
the graph below.
Linear Regression: The relationship between the independent and dependent

variables is discovered using regression analysis. In Linear Regression, the out-
come variable is a numerical value. In Logistic Regression, the outcome variable
is a categorical value. To get the smallest mistake, we fitted the best straight
line. We weighted every function in every observation in our regression model
and calculated the error against the observed performance. We used Python to
create a linear regression and examined the findings inside this dataset. The root
mean squared error was used to compare the two models.
Neural Network: When data is distributed and a straight line cannot be

used to bisect it, neural networks are used to club the typical data using circles
and ellipses. With a simple sequential neural network, we’ve achieved the same
results. As a consequence of matrix operations, a sequential neural network is
simply a series of linear combinations. However, there is a non-linear variable
in the form of an activation function that enables non-linear relationships to be
defined. We’ll use ReLU as our activation function in this example. Since we
haven’t normalized or standardized our data, this is a linear function, and tanh
would be useless to us. (Again, based on our results, this was another aspect
that had to be chosen on a case-by-case basis.)
4 Experimented Results and Discussions

The SMOTE method was used on our dataset to fix the issues associated with
unbalanced occurrences within our dataset.
However, as we can see from Table 1, SMOTE was not particularly beneficial
because our dataset only comprised data from three districts in the Khulna divi-
sion. Therefore, we used Linear Regression, a Machine Learning Classification
method, to predict the price of the crop. We have prepared a comparison table
for our research work to best evaluate our work (Table 2).
Table 1. Accuracy comparison with and without SMOTE
Attribute Without SMOTE Without SMOTE

Accuracy 0.99 0.82
f1-Score 0.99 0.90
Precision 0.99 0.99
Recall 1.00 0.82
Table 2. Comparison with other existing systems
Reference number Algorithm name RMSE

[1] XGBoost 56.50
[3] Simple, Polynomial, Multivariate Regression 16,545,470
[7] LSTM 7.27
[9] STL-ATTLSTM 380
Our research study Linear Regression 114.48
Neural Network 338.2241
4.1 Neural Network
Figure (a) depicts the RMSE values of several models of neural networks. We
got Root Mean Squared Error value 790.7305 for adam optimizer and 2 hidden
layers, RMSE value 344.6472 for adam optimizer and 3 hidden layers, RMSE
value 338.2241 for adam optimizer and 4 hidden layers, RMSE value 350.9243
for adam optimizer, and 5 hidden layers. Figure (b) depicts the MAE values
of several models of neural networks. We got a Mean Absolute Error value of
645.6945 for adam optimizer and 2 hidden layers, MAE value 307.1933 for adam
optimizer and 3 hidden layers, MAE value 473.4105 for adam optimizer and 4
hidden layers, MAE value 293.1295 for adam optimizer, and 5 hidden layers.
(a) Neural Network RMSE

(b) Neural Network MAE
4.2 Linear Regression
Figure (a) depicts the error values of both RMSE and MAE values of the linear
regression model. We got Root Mean Squared Error value 114.48, Mean Absolute
Error value 80.08, Median absolute error value 41.87, Explain variance score
value 0.92, and R2 score value 0.92. Figure (c) depicts the partial pair plot of
the features used in the model. Figure (b) depicts the heatmap of the features
used in our model for training.
(c) Pair Plot
5 Conclusion
It was decided that this study should be conducted since Bangladesh is a sig-
nificant paddy producer. As paddy prices fluctuate, farmers, traders, and con-
sumers who are involved in the production, selling, and consumption of paddy
are exposed to risk. Because of this, it is necessary to predict the price of paddy.
We surveyed several districts, sub-district, unions to predict the actual price of
paddy. We also added local markets for predictions of prices. SMOTE was used
to balance out the classes in our dataset. SMOTE wasn’t very beneficial because
our dataset only comprised data from three districts in the Khulna division. We
obtained the lowest Root Mean Squared Error value 338.2241 and the lowest
Mean Absolute Error value 293.1295 using Neural Network and the lowest Root
Mean Squared Error value 114.48, Mean Absolute Error value 80.08, Median
Absolute Error value 41.87, explained variance score value 0.92, and R2 score
value 0.92 using Linear Regression. The outcome of the prediction demonstrates
that using a Neural Network is not a smart option, as Linear Regression is more
efficient and faster. Because all attributes do not have the same balance in all
regions, predicting the crop price is challenging. Only a few features, such as loca-
tion, date, crop amount, and so on, are included in the dataset we acquired from
local farmers, which is insufficient for higher accuracy. Other efficient models
could be built in the future by incorporating more parameters, such as weather,
profit and loss statements, and so on, for potentially better outcomes.
References
1. Samuel, P., Sahithi, B., Saheli, T., Ramanika, D., Kumar, N.A.: Crop price predic-
tion system using machine learning algorithms. Quest J. Softw. Eng. Simul. 6(1),
14–20 (2020)
2. Rachana, P., Rashmi, G., Shravani, D., Shruthi, N., Kousar, R.S.: Crop price fore-
casting system using supervised machine learning algorithms. Int. Res. J. Eng.
Technol. (IRJET) 6, 4805–4807 (2019)
3. Manjula, R., Jain, S., Srivastava, S., Kher, P.R.: Real estate value prediction using
multivariate regression models. In: IOP Conference Series: Materials Science and
Engineering. vol. 263, p. 042098. IOP Publishing (2017)
4. Bissing, D., Klein, M.T., Chinnathambi, R.A., Selvaraj, D.F., Ranganathan, P.:
A hybrid regression model for day-ahead energy price forecasting. IEEE Access 7,
36833–36842 (2019)
5. Truong, Q., Nguyen, M., Dang, H., Mei, B.: Housing price prediction via improved
machine learning techniques. Procedia Comput. Sci. 174, 433–442 (2020)
6. Rohith, R., Vishnu, R., Kishore, A., Deeban, C.: Crop price prediction and fore-
casting System using supervised machine learning algorithms. Int. J. Adv. Res.
Comput. Commun. Eng. 9(3), 27–29 (2020). https://doi.org/10.17148/IJARCCE.
2020.9306
7. Sabu, K.M., Kumar, T.M.: Predictive analytics in agriculture: forecasting prices
of Arecanuts in Kerala. Procedia Comput. Sci. 171, 699–708 (2020)
8. Zhang, Y., Na, S.: A novel agricultural commodity price forecasting model based
on fuzzy information granulation and MEA-SVM model. Math. Probl. Eng. 2018
(2018). https://doi.org/10.1155/2018/2540681
9. Yin, H., Jin, D., Gu, Y.H., Park, C.J., Han, S.K., Yoo, S.J.: STL-ATTLSTM: veg-
etable price forecasting using STL and attention mechanism-based LSTM. Agri-
culture 10(12), 612 (2020)
10. Merkel, G.D., Povinelli, R.J., Brown, R.H.: Short-term load forecasting of natural
gas with deep neural network regression. Energies 11(8), 2008 (2018)
11. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic
minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
12. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O.,
Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886.
Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4 45
Paddy Disease Prediction Using
Convolutional Neural Network
Khandkar Asif Hossain1(B) , M. Raihan1 , Avijit Biswas2 ,

Juliet Polok Sarkar1 , Suriya Sultana1 , Kajal Sana1 , Keya Sarder1 ,
and Nilanjana Majumder1
1
North Western University, Khulna 9100, Bangladesh
raihan1146@cseku.ac.bd
2
Bangabandhu Sheikh Mujibur Rahman Science and Technology University,
Gopalganj 8100, Bangladesh
Abstract. The economy of Bangladesh depends on the agricultural

development of the country. In any year, the loss of crops will affect
the gross economy of the country. So, during the cultivation of the crops,
the farmers need to pay attention to the growth of the crops. But, the
crops get infected with different diseases such as Blight and Spot even
after the farmers pay attention to cultivation. So, it is required to detect
the diseases and provide appropriate measures as soon as possible. Image
capturing, pre-processing, segmentation, feature extraction, and classi-
fication are all processes in the disease detection process. The methods
for detecting plant diseases, mostly utilizing photographs of their leaves,
were described in this paper. Method of Convolutional Neural Networks
(CNN) predicts the disease of paddy using SGD optimizer with a learn-
ing rate of 0.00004 and two hidden layers which is 73.33% accurate with
disease images as input.
Keywords: Machine learning · Supervised learning · CNN ·

Classification · Image processing · Crop disease · Prediction
1 Introduction
Bangladesh is an agricultural nation. According to a National Agricultural Cen-

sus report and a World Bank collection of growth indicators for 2020, 16.5 million
farmer families reside in Bangladesh and 37.75% of the population works in agri-
culture. Rice is Bangladesh’s key food harvest, and is estimated that 75% of agri-
cultural land is used for rice cultivation, with 28% of GDP in 2020. Bangladesh,
India, China, Pakistan, Thailand, Burma, Japan, and Indonesia are the best
places to develop Rice. It comes in a wide variety in our region. Aus, Aman,
and Boro are the main ones to notice. In Bangladesh, an additional form of rice
known as “IRRI” grows well. Rice provides us with puffed rice, popcorn, rice
cake, cloths, paper, wine, and other products. We use straw as a fuel source and
to build straw huts. Above all, we must ensure that rice is produced properly.
https://doi.org/10.1007/978-3-030-93247-3_27
Paddy Disease Prediction Using Convolutional Neural Network 269
However, rice leaf diseases reduce yield. There are a variety of rice leaf diseases,
but due to the prevalence of these diseases in Bangladesh, we have identified two
diseases blight and spot in this paper. As a result of fungal infection, bacterial
blight causes drawn-out lesions around the leaf tips and edges that turn white
to yellow, then grey. The study of plant patterns and what we see visually when
a plant is infected are referred to as plant disease studies. Farmers typically use
a procedure in which agricultural experts inspect the plants with their own eyes
before concluding which disease the plants have based on basic tests. What this
method lacks is that a large number of farmers are delegated to a limited num-
ber of agriculture experts. As a result, before the specialist has time to inspect
a plant, it is severely afflicted, and the disease spreads to other plants. Fur-
thermore, in poor countries like Bangladesh, most farmers lack the awareness of
when to seek professional aid when a disease hits. The method we’re discussing
is a machine learning technique that employs a set of specific algorithms.
We used Google to find 140 photos of diseased paddy leaves for the forecast.
Images of two common paddy diseases, Blight and Spot, are included in our
dataset. Our database includes solutions for a variety of plant textures, including
several types of blight and spot.
We tried to make a model that can successfully identify blight and spot
disease. Our model successfully identified blight and spot disease with 73.33%
accuracy.
Our model accuracy is low for lack of proper sample images. Our model can
only identify general blight and spot disease but those diseases have subcate-
gories like bacterial blight and brown spot. We plan on working on a system
that can successfully identify any crop disease with a higher accuracy rate and
might also provide an immediate remedy without any help from an agriculture
officer.
2 Related Works
A deep convolutional neural network model was presented to identify apple leaf
disease prompted by the classical AlexNet, GoogLeNet, and their performance
improvements. A collection of 13,689 photos of damaged apple leaves was used
to define the four most frequent apple leaf diseases. The suggested model had
a 97.62% overall accuracy [1]. Instead of considering the entire leaf, Jayme and
his collaborators suggested a method that used individual lesions and spots to
diagnose disease. They used Transfer learning on a GoogLeNet CNN that had
been pre-trained. The identification of moderately diseased images was found to
be the most unsuccessful, although success rates in the other cases were much
higher [2]. Ritesh et al. specified a CNN-based predictive model for disease clas-
sification and prediction in paddy crops. They used disease images from the UCI
Machine Learning Repository, which included three forms of diseases, and found
that the test set had 90.32% accuracy and the training set had 93.58% accuracy
[3]. Similarly, a group of researchers proposed a prediction model based on var-
ious CNN architectures and compared it to a previous model that used feature
270 K. A. Hossain et al.
extraction to extract features before classifying with KNN and SVM. Transfer
learning was used to achieve the highest accuracy of 94%, with training accuracy
of 92% and testing accuracy of 90% [4]. Mr. V Suresh et al. proposed a CNN-
based predictive model to identify paddy crop disease. They used a dataset of
54,305 photos that covered 14 different crop organisms. They discovered that
the highest accuracy for a particular disease was 96.5% in that report [5]. Like-
wise, SVM, Bayesian Network, Neural Network, and Multiple Linear Regression
methods were evaluated and inaugurated by Yun Hwan Kim et al. [6]. Besides,
for establishing video inspection methods of crop diseases, a group of researchers
suggested a customized deep learning-based architecture in which faster-RCNN
was used. The proposed approach was shown to be effective. for video inspection
than VGG16, ResNet-50, ResNet-101, and YOLOv3 [7]. Shima Ramesh et al.
Proposed a machine learning model which was applied to a database containing
160 papaya leaf images for training the model. The proposed model had a 70%
overall accuracy [8]. Furthermore, Sharada P. et al. suggested a deep convolu-
tional neural network for species classification and disease prediction in crops.
They used 54,306 diseased as well as healthy images and found that the test set
had 99.35% accuracy [9].
3 Methodology
Figure 1 depicts the overall workflow of our research. Our research has been
divided into four categories. They are,
– Information gathering
– Data pre-processing
– Data conditioning
– Machine Learning Algorithms in Action
3.1 Information Gathering
We used Google to find 140 photos of diseased paddy leaves for the forecast.
Images of two common paddy diseases, Blight and Spot, are included in our
dataset. Our database includes solutions for a variety of plant textures, including
several types of blight and spot.
We tried to make a model that can successfully identify blight and spot
disease. Our model successfully identified blight and spot disease with 73.33%
accuracy.
Our model’s accuracy is low for lack of proper sample images. Our model
can only identify general blight and spot disease but those diseases have subcat-
egories like bacterial blight and brown spot. We plan on working on a system
that can successfully identify any crop disease with a higher accuracy rate and
might also provide an immediate remedy without any help from an agriculture
officer.
Start
Organize the
Data
Visualize and
Process
Datasets
Build CNN
using Keras
Sequential
Model
Train CNN
Plot Predictions
with Confusion
Matrix
Compare
Performance
End

Our data has been divided into three sets: training, validation, and testing. We
accomplished this by dividing the data into sub-directories for each data set.
There are 140 pictures in total, half of which are Blight and half of which are
Spot. We don’t have nearly as much data as we need for the activities we’ll be
performing, so we’ll just use a portion of it to have a similar amount of images in
both classes. The script adds 40 samples to the training set, 15 to the validation
set, and 15 to the evaluation set. There is an equal amount of Blight and Spot
in each package.
3.3 Data Conditioning

We used Keras’ ImageDataGenerator class to generate batches of data from the
train, valid, and test directories to train the model. We used ImageDataGen-
erator.flow from the directory() to construct a DirectoryIterator that produces
batches of normalized tensor image data from the data directories.
3.4 Machine Learning in Action
We’ve used a Keras Sequential model to construct the CNN after we get the
preprocessed and qualified dataset.
Convolutional Neural Network (CNN): A convolutional neural network

(CNN, or ConvNet) is a type of deep neural network that evaluates visual images
and is based on the share-weighted design of the convolution kernels that explore
the hidden layers and translation invariance properties.
Construction: A 2-dimensional convolutional layer is the model’s first layer

which consists of 32 output filters each with a kernel size of 3 × 3, and relu
activation function. We only defined the input shape on the first layer, which
was the shape of our data. Our images had a resolution of 224 pixels.
The image is 224 pixels wide and has three color channels: Red, Green, Blue.
As a result, we have an input shape of (224, 224, 3). After that, the data’s
dimensionality was reduced by adding a max-pooling layer. The production of
the convolutional layer was then Flattened and moved to a Dense layer. Since
this Dense layer is the network’s output layer, it has two nodes, one for blight
and one for spot. The Softmax activation function was applied to our output,
resulting in a probability distribution over the blight and spot outputs for each
sample. The Adam, SGD, RMSprop optimizer with a learning rate of 0.0001 to
0.00001, a loss of categorical cross-entropy, and accuracy as our output metric
was used to construct the model. The verbose parameter was set to 2, which
simply specifies the verbosity of the log output printed to the console during
training. We’ve defined 10 as the number of epochs we’d like to run.
4 Experimented Results and Discussions
Using the SGD optimizer, 0.0004 learning rate, 2 hidden layers, ReLu, and Soft-
max activation functions, we were able to achieve a result of 73.33% accuracy.
Using the RMSProp optimizer, 0.00001 learning rate, 2 hidden layers, ReLu, and
Softmax activation functions, we were able to achieve a result of 70.00% accu-
racy. Using the Adam optimizer, 0.00002 learning rate, 2 hidden layers, ReLu,
and Softmax activation functions, we were able to achieve a result of 56.67%
accuracy (Figs. 2, 3, 4, 5 and 6).
(a) Plot
(b) SGD,0.00004,hl2 Own Dataset
Fig. 2. Matrices and Plot 1
(a) RMSProp,0.00001,hl2 Own

Dataset

(a) Adam,0.00002,hl2 Own

Dataset
(a) SGD,0.00004,hl2 UCI Dataset
Fig. 5. Matrices UCI Dataset
(a) Plot
Fig. 6. Matrices and Plot 2 UCI Dataset

Comparison: We compared our model using a publicly available dataset from

UCI to get a better grasp of our model efficiency. Although our model achieved
73.33% accuracy with our dataset, it performed well above the average of 95%
with the UCI dataset in terms of accuracy. We compared similar work of other
researchers also (Table 1).
Table 1. Comparison with other existing systems
Reference number Sample size Accuracy

[9] 160 70%
UCI Dataset 80 95%
Our Research Study 140 73.33%
5 Conclusion
Present work demonstrates how to use a sequential model and CNN to predict
paddy disease. A system that can successfully predict paddy disease by using
sample images of the leaves was proposed. But, the sample dataset of paddy
disease was scarce. So, a digital archive was made that can help agricultural
officers to know about crop diseases. Using the SGD optimizer, 0.00004 learn-
ing rate, 2 hidden layers, ReLu, and Softmax activation functions, the model
achieved a prediction accuracy of 73.33% in identifying paddy diseases named
Blight and Spot using our dataset. The model achieved a prediction accuracy of
95% in identifying paddy diseases named Blight and Spot using UCI Dataset.
A flask web application was suggested to integrate with the above-mentioned
model which demonstrated successful prediction of Blight and Spot.
This web application cannot identify other paddy diseases except Blight and
Spot. In this work, a separate website for collecting disease image data and a
separate website for predicting the disease were constructed which are not user-
friendly.
This system can be fine-tuned further and more diseases can be added for
identification. Two separate websites can be merged to provide a seamless expe-
rience to the user by using the Django framework. Then the data acquisition
system can be automated by using robots or drones.
References
1. Liu, B., Zhang, Y., He, D., Li, Y.: Identification of apple leaf diseases based on deep
convolutional neural networks. Symmetry 10(1), 11 (2018)
2. Barbedo, J.G.A.: Plant disease identification from individual lesions and spots using
deep learning. Biosys. Eng. 180, 96–107 (2019)
3. Sharma, R., Das, S., Gourisaria, M.K., Rautaray, S.S., Pandey, M.: A model for
prediction of paddy crop disease using CNN. In: Das, H., Pattnaik, P.K., Rautaray,
S.S., Li, K.-C. (eds.) Progress in Computing, Analytics and Networking. AISC, vol.
1119, pp. 533–543. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-
2414-1 54
4. Sharma, P., Hans, P., Gupta, S.C.: Classification of plant leaf diseases using machine
learning and image preprocessing techniques. In: 2020 10th International Conference
on Cloud Computing, Data Science and Engineering (Confluence), pp. 480–484.
IEEE (2020)
5. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis.
6. Kim, Y.H., Yoo, S.J., Gu, Y.H., Lim, J.H., Han, D., Baik, S.W.: Crop pests predic-
tion method using regression and machine learning technology: survey. IERI Proce-
dia 6, 52–56 (2014)
7. Li, D., et al.: A recognition method for rice plant diseases and pests video detection
based on deep convolutional neural network. Sensors 20(3), 578 (2020)
8. Ramesh, S., Hebbar, R., Niveditha, M., Pooja, R., Shashank, N., Vinod, P., et al.:
Plant disease detection using machine learning. In: 2018 International Conference on
Design Innovations for 3Cs Compute Communicate Control (ICDI3C), pp. 41–45.
IEEE (2018)
9. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant
disease detection. Front. Plant Sci. 7, 1419 (2016)
Android Malware Detection System:
A Machine Learning and Deep Learning
Based Multilayered Approach
Md Shariar Hossain(&) and Md Hasnat Riaz(&)
Department of Computer Science and Telecommunication Engineering, Noakhali

Science and Technology University, Noakhali, Chittagong, Bangladesh
Abstract. In our growing world of technology, mobile phones have become one
of the most used devices. From the very beginning, Android has become the most
popular Operating system. This vast popularity naturally invited cybercriminals
to attack this OS with malware applications to steal or access important user data.
It is critical to detect whether an app is malware or not during installation or run
time. This paper proposes a malware detection system for android operating
system, which is a combination of static and dynamic analysis for both Machine
learning and deep learning classifiers. We tried to evaluate a multilayer detection
process that will work on user permissions data for static analysis and network
traffic data for dynamic analysis. User permissions will help the model to detect
the malware before it is installed from AndroidManisfest.xml file and the net-
work traffic data will help the model to detect the malware in the runtime. We
have applied this dataset both on the machine learning and deep learning clas-
sifiers to make the model more accurate and efficient. These features were
extracted from real android applications where we used deep Auto Encoder for
pre-processing and clean the dataset. We have got some interesting accuracy rates
as it is a combined multilayer model, then we are proposing the most accurate
classifier to make the final verdict about malware for our model, which will work
with a high rate for both static and dynamic analysis.
Keywords: Android malware detection Machine learning Static and

dynamic analysis Combined multilayer model Deep auto encoder
1 Introduction
Mobile malware targets the devices, attacks them and cause them loss or leakage of the
user’s confidential information. Mobile malware has started to progress at an alarming
rate that has been much more predominant on Android OS because of being an open
platform and beating the other mobile OS platform available in the market according to its
popularity. Android has owned 71.81% of Mobile Operating System Market where iOS
has 27.43%, Samsung 0.38% & KaiOS 0.14% globally till March 2021 [1]. Android-
powered gadgets are evaluated to be 1.2 billion in 2018, and the figure of worldwide
conveyances in 2022 demonstrates that there will be 1.4 billion Android Gadgets [2].
Furthermore, on the official marketplace of Android named Google play store contains
more than 3 million Android applications. By 2016 the Google play store adds about 1300

https://doi.org/10.1007/978-3-030-93247-3_28
278 Md S. Hossain and Md H. Riaz
new apps per day [3]. Android is the most growing operating system for a long time, it has
created an enormous space for hackers and cybercriminals to play on this ground. The
production rate of android malware reaches 9411 per day in 2018, so in every 10 s, a new
malware is produced [2]. Now a day’s android-based applications have become major
source of service provider for various smart city applications [21, 22]. Any infiltration in
these service based applications can cause catastrophic results.
More than 350000 new malwares are registered by AV-TEST institute in a single
day. They are classified on the basis of their characteristics [4]. Researchers recorded
the foremost perilous malware of 2020 for the android platform. These are BLACK-
ROCK, FAKESKY, The EventBot malware, AGENT SMITH Malware, ExoBot,
NotCompatible, Lastacloud, Android Ransomware, Android Police virus, Svpeng
virus, Ghost Push virus, Mazar Malware, Gooligan Malware, HummingBad virus,
HummingWhale virus, GhostCtrl virus, Lockdroid ransomware, Invisible Man, Lea-
kerLocker ransomware, DoubleLocker ransomware, Lokibot virus, Tizi Android,
Anubiscrypt ransomware, Opt-Out virus [5].
Applications
Application Framework
Libraries Android Runtime

(ART)
Linux Kernel
Fig. 1. Basic software stack of Android framework.
Figure 1 shows the basic software stack of Android framework. On the top of the
Linux Kernel there are libraries, Application Framework and Application Software.
Application software running in framework which includes Java-compatible libraries.
This is the target zone of most of the hackers and cyber criminals. They use malicious
libraries which make the application malicious. Application Framework includes
Activity Manager, Content Providers, various services, user permission section.
Usually, user gives the permissions to the application while installing without
noticing what the permissions are, are those necessary for that corresponding app?, and
most importantly what are the possible risk factors to be occurred by giving this access
to the permissions. Even Google’s security firewall in their official app marketplace,
Google Play Store, can still be bypassed which are considered safer than any other
third-party application. Security experts reported a list of 75 applications from the
Google Play Store which were infected with the Xavier Android virus [5]. However,
they were infected; it is not the only way. Even if, after the installation of a static
verified app that already has been scanned with its pre-permissions, there is still a great
chance to be infected by network traffic functionalities at the run time.
Android Malware Detection System: A Machine Learning and Deep Learning 279
Malware Detection is the core part of cyber and device security. There are two
known analytical approaches for malware detection, namely, static analysis and
dynamic analysis. Detection is done through the source code, Manifest file, API in
static analysis before the program's execution. Static Analysis is faster to detect a
malicious app and prevent malware before it is installed; however, it can be ignored by
obfuscation or encryption techniques.
On the other hand, dynamic analysis works on the execution behavior of the
application. This analysis method monitors the application execution characteristics
and android malware activity at the runtime. Researchers showed both static and
dynamic way to detect the android malware applying machine learning classifiers.
There are different accuracy rates for different classifiers—some of those worked with
permissions-based data set for static analysis [6]. For the Dynamic approach, network
traffic data were used. But there is an interesting relation of accuracy when we pre-
process the dataset with the Deep Auto Encoder (DAE) and apply them to both
Machine Learning and Deep learning classifiers for static and dynamic analysis.
In our research, we evaluated both static and dynamic analysis methods. We have
proposed a combined analysis system with a good accuracy rate on both machine learning
and deep learning classifiers where data was extract from the AndroidManifest.xml file
for permissions and APK file, java files for network traffic dataset. We used Deep Auto
Encoder (DAE) for cleaning the data [7]. We showed the change of accuracy rate before
and after using pre-processing by DAE. Information for the dynamic analysis collected
from the operating system during the runtime, such as system calls, network traffic access,
files, and memory modifications. For Static analysis, we extracted permission features
from AndroidManifest.xml file using various feature selection technique [8].
2 Related Works
A basic behavior of mobile malware is exchanging delicate data of the cell phone client
to malevolent inaccessible servers. Suleiman Y. Yerima at [9] proposed and investigated
a parallel machine learning based classification approach for early discovery of Android
malware. Utilizing genuine malware samples and benign applications, a hybrid classi-
fication model is created from the parallel combination of heterogeneous classifiers.
To find the best combination of both features selection and classifier, different
feature selection were applied to different machine learning classifiers [10].
There are two methods of Malware detection System, Static Analysis & Dynamic
Analysis. Static analysis is a strategy that surveys malevolent behavior within the
source code, the data, or the binary files without the direct execution of the application,
where Dynamic analysis is a set of strategies that considers the behavior of the malware
in execution through signal reenactments [11].
Here this system screens different permission based features and events obtained
from the android applications, and investigates these features by utilizing machine
learning classifiers to classify whether the application is good-ware or malware for
static analysis [6, 12, 13].
Where in dynamic analysis for identifying Android malwares by first analyzing
their network traffic features and after that building a rule-based classifier for their
detection [14, 15].
Ankur Singh Bist at [16] presented an outline of deep learning strategies like
Convolutional neural network, deep belief network, Auto-encoder, Restricted Boltz-
mann machine and recurrent neural network for Malware detection. Where data will be
pre-processed by an Auto-encoder to make a clean dataset with high accuracy [2].
This research focused on examining the behavior of mobile malware through
crossover approach. The hybrid approach relates and remakes the result from the static
and dynamic malware analysis in creating a follow of malevolent event [17].
As an example Crowdroid is a machine learning-based system that recognizes
Trojan-like malware on Android devices, by investigating the number of times each
system call has been issued by an application amid the execution of an activity that
requires client interaction [18].
3 Data and Features
Our system is a combination of Static and Dynamic Analysis. In the Static part, we
have taken Data from the Permission, Intent, uses-feature & API extracted from the
AndroidManifest.xml file. In the Dynamic Analysis, we are taking data from Network
traffic based on DNS, TCP & UDP.
Datasets have been taken in Comma Separated Value (CSV) format, which was
obtained by feature extraction and converted into CSV for both static & dynamic
approaches [11]. For the static analysis, we are using 398 331 dimensions data,
where the training data set will start from 20% and increase correspondingly. For the
Dynamic analysis, we are using 7845 17 dimensions data including 4704 benign
data & 3141 malicious data. Here the training data set will start from 20% and increase
correspondingly with random state = 45%.
Table 1. Features from permissions and network traffic datasets

Top 10 permission features for 1.android.permission.INTERNET
static analysis 2. android.permission.READ_PHONE_STATE
3. android.permission.ACCESS_NETWORK_STATE
4. android.permission.WRITE_EXTERNAL_STORAGE
5. android.permission.ACCESS_WIFI_STATE
6. android.permission.READ_SMS
7. android.permission.WRITE_SMS
8. android.permission.RECEIVE_BOOT_COMPLETED
9. android.permission.ACCESS_COARSE_LOCATION
10. android.permission.CHANGE_WIFI_STATE
Features from network traffic name, tcp_packets, dist_port_tcp, external_ips,
for dynamic analysis vulume_bytes, udp_packets, tcp_urg_packet,
source_app_packets,remote_app_packets,
source_app_bytes, remote_app_bytes, duracion,
avg_local_pkt_rate, avg_remote_pkt_rate,
source_app_packets.1, dns_query_times, type,
dtype = object
3.1 Analysis of Android Manifest and Permissions

Every Android app must have a Manifest file which is written at the root of the source
code set. It describes essential information of the app to the Android Build tools, the
Operating System and Google Play.
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
package ="com.example.MyApp."
android:installLocation="auto">
<uses-permission android:name="android.permission.READ_PHONE_STATE" />

<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.CHANGE_NETWORK_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
</application>
</manifest>
This is an AndroidManifest.xml file from an Android Application, where permis-

sions are taken before the app is installed. This whole file is in the block < mani-
fest > where there are many sub-blocks. In the < application > sub-block, all the user
permissions are given, which is under the control of a developer, not the users. It is up
to the developer whether he takes the permissions from the user at the runtime or not. In
our study, we used 398 permissions from 331 different applications.
4.1 Deep Auto Encoder (DAE)

An auto encoder is a type of artificial neural network utilized to learn efficient unsuper-
vised data coding. The target of an auto encoder is to learn a representation (encoding) for
a set of data, ordinarily for reduction of dimensionality to disregard the signal clamor by
preparing the network. Simply an auto encoder is a feedforward neural network which is
non-recurrent and comparative to single-layer perceptron that take part in multilayer
perceptron (MLP) – utilizing an input layer, one or more hidden layers and an output layer
and all are associated respectively. The number of nodes (neurons) of the output layer are
the same as the input layer. Its purpose is to reproduce its inputs (minimizing the dif-
ference between the input and the output) rather than anticipating a target esteem Y given
inputs X. Hence, auto encoders are unsupervised learning models. It consists of two parts,
the encoder and the decoder, which can be defined as transitions u and W such that:
U:X!F
W:F!X
U; W ¼ arg min k X ðWoUÞ k 2U;W

Input Layer Hidden Layer Output Layer
Fig. 2. Structure of deep auto encoder (DAE)
4.2 Research Approach

As shown in Fig. 3, we have two different analysis layers named static and dynamic
analysis in our model. Static analysis analyzes the user permissions (i.e. Internet, read
phone state, read network state, write external storage, read SMS, access coarse
location), Activity intents, features and Application Programming Interface (API).
Dynamic analysis analyze the network traffic features such as DNS, TC, UDP, TCP
Packets, Source App Packets, Remote App packets. We extracted our datasets from
these features as comma separated values (CSV) and applied these CSV datasets to the
machine learning and deep learning classifiers. We elected some popular classifiers for
both machine learning and deep learning respectively Naïve Bayes (NB), K Neighbors,
Decision Tree, Random forest, and Multilayer perceptron (MLP), Convolutional neural
network (CNN).
Table 3 and Table 2 are the tabular representation of accuracy percentages for the
static and dynamic analysis. After applying these classifiers we have got the different
accuracy rate for the different classifiers which are tabulated in Table 2. For the
machine learning classifiers we have got Decision tree algorithm with 94% of accuracy
for static analysis and Random forest classifiers with 92% of accuracy rate for dynamic
analysis which are the highest among all the machine learning. As stated we used MLP
and CNN as deep learning algorithms where we have got 90% and 75% of accuracy for
Multilayer perceptron (MLP) in static analysis and dynamic analysis respectively.
Table 2 shows the accuracy percentages of different classifiers for the static and
dynamic analysis before applying the Deep Auto Encoder (DAE) for data pre-
processing. After the measurement of the performance in Table 2 we applied DAE to
the datasets.
Permission Based Data
Training Sample
Training Sample
Static Analysis
Permission Intent Deep

Machine Learning Learning Classifiers
Classifiers Uses-Features API
Multi-Layer
Data Pre-Processing Auto-Encoder Perceptron (MLP)
Naive Bayes (NB)
Convolutional neural
K-neighbors (KNN) network (CNN)
Decision Tree Dynamic Analysis
Random Forest Network Traffic
DNS TC UDP
Testing
Testing S l
Network Based Data
Most
Accurate
Classifier
Is App
Malicious or
Not?
Alert the Malicious Alert the Benign
Information Information
Fig. 3. Methodology of combined multilayer android malware detection system
Table 3 shows the performance measurement of the classifiers after applying the
DAE on the datasets for pre-processing where machine learning classifier K Neighbors
was the most accurate with 88% and 94% respectively for static and dynamic analysis.
Deep learning classifiers showed a great increment after applying DAE as both of the
classifiers achieved 90% of accuracy where Multilayer perception (MLP) was the
highest accuracy rate of 88% and 90% respectively for Static and Dynamic analysis.
According to Fig. 3, our study considers the most accurate classifiers from the com-
bination of machine learning and deep learning algorithms for both static and dynamic
analysis where the considered classifiers will be used in our final proposed model. We
considered precision, recall, f1 score and support as the parameters of the classifiers.
Table 2. Accuracy of static and dynamic according to algorithms before preprocessing

Algorithms and Classifiers Static analysis Dynamic analysis
(Accuracy) (Accuracy)
Machine learning Naïve Bayes (NB) 84% 45%
K Neighbors 89% 89%
Decision Tree 94% 88%
Random forest 91% 92%
Deep learning Multilayer perceptron 90% 75%
algorithms (MLP)
Convolutional neural 89% 71%
network (CNN)
Table 3. Accuracy of static and dynamic according to algorithms after preprocessing

Algorithms and Classifiers Static analysis Dynamic analysis
(Accuracy) (Accuracy)
Machine learning Naïve Bayes (NB) 82% 80%
K Neighbors 88% 94%
Decision Tree 81% 89%
Random forest 84% 92%
Deep learning Multilayer perceptron 88% 90%
algorithms (MLP)
Convolutional neural 86% 90%
network (CNN)
5 Results and Discussion
As we see in Fig. 4 and Fig. 5 are showing the plot of accuracy according to our dataset
train-test splitting. For both Static and Dynamic analysis we started splitting data from
50%–50% train-test and gradually came up to 80%–20%. For both dynamic and static
we have the same graph of accuracy. Table 4 and 5 are explaining the parameters for
decision tree and k-neighbors. Kang, H., Jang, J. W., Mohaisen, A., & Kim, H.
K compared the average accuracy of most popular android malware detection system
Crowdroid and Andro-profiler categorizing on malware like Adwo, AirPush, Boxer,
FakeBattScar, FakeNotify and Gin Master with 350 benign applications where the found
99% accuracy for Andro-profiler and 35% accuracy for Crowdroid [19]. TinyDroid has
the 97% of accuracy rate as it is a light weight detection system [20]. But we have got for
both dynamic and static analysis the average accuracy rate 94% because of our com-
bined multilayered process.
Static Analysis (Decision Tree) Dynamic Analysis (K-Neighbors)

0.96 0.94 0.96 0.94
Accuracy and F1-score
Accuracy and F1-score

0.94 0.92 0.92 0.94 0.92 0.92
0.92 0.91 0.91 0.91 0.91
0.92
0.9 0.88 0.88 0.88 0.9 0.88 0.88 0.88
0.88 0.88
0.86 0.86
0.84 0.84
80%-20% 70%-30% 60%-40% 50%-50% 80%-20% 70%-30% 60%-40% 50%-50%
Train-Test split Train-Test split
accuracy f1-score accuracy f1-score

Poly. (accuracy) Poly. (accuracy)
Fig. 4. Accuracy and F1-score of Static Fig. 5. Accuracy and F1-score of Dynamic
analysis (Decision Tree) according to Train- analysis (K-Neighbors) according to Train-
Test split. Test split.
Table 4. Parameters of the decision tree Table 5. Parameters of the K-Neighbors

classifier. classifier
Decision tree Precision Recall F1-score Support K neighbors 3 Precision Recall F1-score Support
0 0.97 0.89 0.93 37 Benign 0.96 0.86 0.91 955
1 0.91 0.98 0.94 43 Malicious 0.85 0.96 0.90 612
Accuracy 0.94 80 Accuracy 0.94 1567
Macro avg 0.94 0.93 0.94 80 Macro avg 0.91 0.91 0.91 1567
Weighted avg 0.94 0.94 0.94 80 Weighted avg 0.91 0.91 0.91 1567
According to our proposed model two best classifiers are:

1) Static Analysis Decision Tree (DAE not applied) 94% of accuracy
2) Dynamic Analysis K Neighbors (DAE applied) 94% of accuracy
Among all of the classifiers we used, we got our best two classifiers for dynamic
and static analysis according to Table 2 and 3. The goal of this study was proposing a
malware detection system for android operating system, with a combination of static
and dynamic analysis for both Machine learning and deep learning classifiers to
evaluate a multilayered detection process. We made our final model with the combi-
nation of static and dynamic layers and picked the most accurate classifiers according
to our datasets. After applying these our final proposed model is shown in Fig. 6. Our
accuracy rate is not the best compared to others but the layers and model we provided
for our proposed system with average accuracy rate of 94% has a great impact on
Android malware detection sector for its installation (static) and runtime (dynamic)
detection algorithm.
Static Analysis Data Preprocessing Dynamic

by Deep Auto Analysis
Input Input
Decision Encoder K
Tree Neighbors
Malicious Malicious Benign

?
Fig. 6. Final proposed model
6 Conclusion
In this paper, a combination of machine learning and deep learning classification

approaches to a multilayer Android malware detection using the static and dynamic
analysis methods where an Auto Encoder was used to preprocess the datasets. The
proposed model utilized a wide range of features from the Android manifest file,
including permission features and Network traffic features. The growing malware
community is a brutal threat where existing detection tools, even Google’s official
marketplace can be bypassed. These approaches are always calling for alternatives
where the area of features and rate of accuracy will be more efficient. The combined
detection model in this paper proposed a multilayer scheme that provides a tool that
will work before the installation and runtime. If Android malware can pass static layer
detection, it will be detected by dynamic layer where both of them work with the most
accurate datasets according to their features. This proposed model will be very effective
in classifying a new application because 1) dynamic app features use Deep Auto
Encoder and 2) Permission features input will be taken from one of the root files of an
application before installation.
As future work, we aim to develop and evaluate an Android Malware Detection
engine using a more Neural Network Algorithm with Normalized & more specific Pre-
processing of the dataset. We are targeting to analyze the AndroidManifest.XML file to
extract features from the Services, Activity, and Broadcast Manager, giving us more
specific features for dynamic analysis in runtime.
References
1. Statcounter. https://gs.statcounter.com/os-market-share/mobile/worldwide. Accessed 05 Apr
2021
2. Naway, A., Li, Y.: Android malware detection using autoencoder. arXiv preprint arXiv:
1901.07315(2019)
3. CleverTop. https://clevertap.com/blog/mobile-growth-statistics/. Accessed 13 Apr 2021
4. AVTEST.Org. https://www.avtest.org/en/statistics/malware. Accessed 10 Apr 2021
5. GIZMEEK. https://gizmeek.com/researchers-listed-the-most-dangerous-malware-viruses-
android-viruses-of-2020. Accessed 11 Apr 2021
6. Tchakounté, F.: Permission-based malware detection mechanisms on android: analysis and

perspectives. J. Comput. Sci. 1(2) (2014)
7. Wang, W., Zhao, M., Wang, J.: Effective android malware detection with a hybrid model
based on deep autoencoder and convolutional neural network. J. Ambient. Intell. Humaniz.
Comput. 10(8), 3035–3043 (2018). https://doi.org/10.1007/s12652-018-0803-6
8. Mahindru, A., Sangal, A.L.: FSDroid:-A feature selection technique to detect malware from
android using machine learning techniques. Multimedia Tools Appl. 1–53
9. Yerima, S.Y., Sezer, S., Muttik, I.: Android malware detection using parallel machine
learning classifiers. In: 2014 Eighth International Conference on Next Generation Mobile
Apps, Services and Technologies, pp. 37–42. IEEE, September 2014
10. Mas’ud, M.Z., Sahib, S., Abdollah, M.F., Selamat, S.R., Yusof, R.: Analysis of features
selection and machine learning classifier in android malware detection. In: 2014 Interna-
tional Conference on Information Science & Applications (ICISA), pp. 1–5. IEEE, May
2014
11. López, C.C.U., Cadavid, A.N.: Framework for malware analysis in Android
12. Zarni Aung, W.Z.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2
(3), 228–234 (2013)
13. Tchakounté, F.: A malware detection system for android (2015)
14. Arora, A., Garg, S., Peddoju, S.K.: Malware detection using network traffic analysis in
android based mobile devices. In: 2014 Eighth International Conference on Next Generation
Mobile Apps, Services and Technologies, pp. 66–71. IEEE, September 2014
15. Zaman, M., Siddiqui, T., Amin, M.R., Hossain, M.S.: Malware detection in Android by
network traffic analysis. In: 2015 International Conference on Networking Systems and
Security (NSysS), pp. 1–5. IEEE, January 2015
16. Bist, A.S.: A survey of deep learning algorithms for malware detection. Int. J. Comput. Sci.
Inf. Secur. (IJCSIS), 16(3) (2018)
17. Mas’ud, M.Z., Sahib, S., Abdollah, M.F., Selamat, S.R., Yusof, R., Ahmad, R.: Profiling
mobile malware behaviour through hybrid malware analysis approach. In: 2013 9th
International Conference on Information Assurance and Security (IAS), pp. 78–84. IEEE,
December 2013
18. Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware
detection system for android. In: Proceedings of the 1st ACM workshop on Security and
Privacy in Smartphones and Mobile Devices, pp. 15–26, October 2011
19. Kang, H., Jang, J.W., Mohaisen, A., Kim, H.K.: Detecting and classifying android malware
using static analysis along with creator information. Int. J. Distribut. Sensor Netw. 11(6),
479174 (2015)
20. Chen, T., Mao, Q., Yang, Y., Lv, M., Zhu, J.: TinyDroid: a lightweight and efficient model
for Android malware detection and classification. Mobile Inf. Syst. (2018)
21. Haque, A.K., Bhushan, B., Dhiman, G.: Conceptualizing smart city applications: Require-
ments, architecture, security issues, and emerging trends. Expert. Syst. (2021). https://doi.
org/10.1111/exsy.12753
22. Haque, B., Shurid, S., Juha, A.T., Sadique, M.S., Asaduzzaman, A.S.M.: A novel design of
gesture and voice controlled solar-powered smart wheel chair with obstacle detection. In:
2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies
(ICIoT), pp. 23–28 (2020). https://doi.org/10.1109/ICIoT48696.2020.9089652
IOTs, Big Data, Block Chain and Health
Care
Blockchain as a Secure and Reliable
Technology in Business and Communication
Systems
Vedran Juričić1(&), Danijel Kučak2, and Goran Đambić2

1
Faculty of Humanities and Social Sciences,
University of Zagreb, Zagreb, Croatia
vjuricic@ffzg.hr
2
Algebra University College, Zagreb, Croatia
{danijel.kucak,goran.dambic}@algebra.hr
Abstract. Today we are witnessing a constant growth of data from different

sources and in different fields, like government, health, travel, business and
entertainment. Traditionally, this data is stored in one central place located on
company’s or organization’s server, providing easier control, management,
analysis and business intelligence. However, this architecture has certain dis-
advantages like existence of single point of failure, vulnerability and privacy
issues. Distributed ledger technology like Blockchain presents new architecture
that does not store data in one location and that does not have central authority.
Although Blockchain technology offers improvements in security, transparency
and traceability, it is criticized for its resource consumption, complexity and its
lack of clear usefulness. Because of its complexity and distributed architecture,
users are still suspicious about its security and privacy of its data. This paper
analyses various aspects of applying Block-chain technology in modern sys-
tems, focusing on data storage, processing and protection. By enumerating
common threats and security issues, this paper tries to show that Blockchain is
trustworthy, reliable and that can be a potential solution or participant for current
security problems.
Keywords: Data protection Data privacy Security Vulnerability

Distributed system Smart contracts Anonymity
1 Introduction
The constant growth of available government and commercial services, cloud com-
puting, social networks and the Internet of Things leads to a constant increase of the
amount of generated and stored data. About 90% of all the data in the world has been
generated from 2011 until 2013 [34] and there are justified predictions that data will
continue to increase exponentially. For example, IDC has predicted that global data
volume in 7 years will grow from 4.4. zettabytes to 44 zettabytes [11]. Some authors
claim that data will be growing 40% per year in the next decade [15].
Data gathering, processing and analysis in traditional business, government and
public organizations was performed on computers in their local network and

https://doi.org/10.1007/978-3-030-93247-3_29
292 V. Juričić et al.
organizations attempted to provide their services to users available through Internet.

Organizations commonly had a shared data storage available through network infras-
tructure [26], allowing simultaneous data access to more users or computers without
unnecessary redundancy and allowing storage and operations on data from different
sources. Processing, i.e. computing, in modern information systems is physically
decoupled from the storage, because this architecture provides hardware and software
independence, fragmentation and unused space reduction and easier scalability [8].
This allows data to be located remotely, in an external network or in a cloud, without
affecting the actual services or applications provided by organizations. Personal data,
bank accounts, asset information etc. are therefore scattered on multiple locations,
unknown to the users or owners. Also, some organizations are required to perform
daily backup of their entire data on different locations and applications of different
vendors are required to communicate one with another to perform certain tasks. Every
application needs to follow security policies, granting or restricting access to its
resources [6], which means that user credentials are shared between applications
through network. Each layer, application or communication introduces certain security
or privacy risk.
The foundation of relationship between users and organization is trust [1], which
can be defined as “the belief that the vendor will act cooperatively to fulfil the cus-
tomer’s expectations without exploiting their vulnerabilities” [30]. User’s perception of
trust in certain technology or application has an important impact in whether he adopts
and uses that technology [17]. This is especially relevant when organizations are
dealing with sensitive user data, for example financial and medical institutions, where
users expect their data is completely secure and their privacy completely protected.
Blockchain, a distributed ledger technology, was introduced about 10 years ago
[25]. It belongs to a class of disruptive technologies, which means it significantly
impacts and alters traditional businesses, industries and organizations. It was suc-
cessfully applied in cryptocurrency Bitcoin [4] in 2008. and is rapidly being integrated
in various aspects of modern applications and processes, because if its architecture,
design and specific characteristics. Technology does not require central authorization
authority, but relies on a principle of equality and justice, and is promising numerous
benefits and innovations in modern information and communication systems. Marc
Andreesen, one of the leading Silicon Valley investor, claims that “blockchain is the
most important invention since the internet itself”. Johann Palychata said that block-
chain should be considered as an invention like the steam or combustion engine [9].
This paper presents the most relevant characteristics of the Blockchain technology
focusing on data, security, protection and privacy. Because of its advantages, this
technology has very wide application in various sectors. This paper presents and
analyses the most recent or important applications in finance, public and social services
and education. However, users are often not aware that the technology is not com-
pletely safe and without any risk, primarily because it relies on the existing vulnerable
technologies, but also because of the Blockchain mechanisms and operations. Paper
enumerates possible threats and issues with data security and protection, and provides
Blockchain based solutions or proofs of Blockchain usage suitability.
Blockchain as a Secure and Reliable Technology 293
2 Blockchain Technology
The blockchain is a sequence of blocks that contains a complete list of transactions

[20]. Each block consists of the block header and the block body. Body contains a
transaction counter and a certain number of transactions, which depends on the block
size and the size of each transaction. Header contains a timestamp, a number of seconds
since 1/1/1970, a nonce (random number) and a hash value to the previous block or its
parent. The first block in this chain is classed a genesis block and has no parent. Each
block also contains a hash value of all the transactions in the block (Merkle tree root
hash).
Hash function used is SHA-256, a cryptographic mathematical function that was
chosen from Satoshio Nakamoto in an implementation of the cryptocurrency Bitcoin
[25]. Hash functions have an unidirectional characteristic, which means that for a given
input an output is really fast and easy to calculate, but it is almost impossible to find an
input for a given output. They also have a characteristic of uniqueness; for given input
only one unique hash value exists and it is almost impossible to calculate two equal
hash values for two different inputs, regardless of their bit distance.
As a result of this characteristics, it is easy for anyone to calculate a hash value in a
block header from the transaction data in a block body. If any bit in a transaction data is
modified, the calculated hash value will be different from the value found in a block
header, which means that someone was tampering with transaction data (or the hash
value). As already mentioned, a block also contains a hash of its parent block, which
enables validation of all previous blocks and all transactions in the whole chain.
Blockchain is a distributed database of records and all transactions that have been
created are available to all participants, or from the technical aspect, to all nodes in its
network [9]. When this databases changes, for example when a node creates a new
transaction, the database is resent and updated throughout the network. All nodes can
verify all transactions from the current state to the first transaction in a genesis block.
The problem this technology has successfully solved is who to trust in a completely
untrusted environment, a node that has calculated one hash value or a node with
another hash value, i.e. which user can publish the next block in a chain.
Blockchain solves this through implementing one of many possible consensus
models, like Proof of Work, Proof of Stake, Round Robin, Proof of authority, Proof of
Elapsed Time and Byzantine faulttolerant variants [14, 39]. Each node in a network has
a chance to publish an entire block, but competes with other nodes to receive a prize,
most likely a financial reward [39]. The competing nodes are called miners and cur-
rently receive a reward of 12.5 Bitcoins or about 69 000 US dollars [5].
Neither consensus model is perfect and a model is mainly suitable only for small
number of usage scenarios. One of the most popular models is Proof of Work
(PoW) consensus model, where nodes are solving a computationally intensive puzzle
and where their solution is the proof they have performed work. The puzzle complexity
is variable and adapted by the network itself, in order to achieve its solution in
approximately 10 min. In the Proof of Stake (PoS) all users that create blocks are
obligated to invest a certain amount of money (cryptocurrency). If created block can be
validated, network returns them the whole amount.
It is often criticized for its enormous resource (power) consumption. Miners’

processing power is six to eight times greater than today’s 500 most powerful super-
computers [31]. These resources are spent on solving unimportant and virtual mathe-
matical problems. Current trends in blockchain development are aimed at finding
reallife problems in science that can be used instead of an original puzzle and that ways
increase a blockchain usefulness. For example, Folding@home project simulates
biomedical processes, SETI@home processes data from telescopes, and there are some
attempts to implement solving NPcomplete problems [7].
Current blockchain systems can be classified as public, private and consortium
blockchains, that differ in consensus models, efficiency, etc. [7] Public blockchain is
completely open, permitting users from the whole world as its participants and
therefore have the greatest number of active systems and nodes. Private blockchains are
mostly limited to one organization and consortium to a selected set of nodes. Wang
et al. [37] have made a comparison between those classes, based on permissions,
immutability, efficiency, centralization and consensus process. It is shown that public
blockchain has the most suitable characteristics for security, authentication and
equality, but very low efficiency. Efficiency is a measure of block propagation, i.e. a
time required to create a new block of transactions. Private and consortium blockchain
have high efficiency, sacrificing integrity and increasing a risk for data tampering.
One of the extensions of blockchain functionality is smart contract, which marks
the beginning of the Blockchain 2.0 era. Smart contract contains functions and state
data, and is executed within the blockchain network. That way, blockchain can perform
additional operations, like store data, perform calculations, send funds to accounts and
expose data to public [39] and that way a network can be developed or adopted to
support different procedures or applications.
3 Applications of Blockchain Technology
Blockchain application can be observed following the evolution of blockchain tech-

nology. First, blockchain was known only as technology standing behind digital cur-
rencies. Because of its characteristics, it was then applied in the whole financial sector.
In the latest years, blockchain technology can be found in various areas of everyday
life, such as education, culture, healthcare etc. [38] However, most blockchain appli-
cations and models are still developed in financial sector while the benefits of the
technology is noticed for example in transportation where it helps improve the path of
products and recording of the paths, in healthcare where it also makes health records
easier to manage [3].
One of the sectors where blockchain can be seen, not only as a technology that
should be applied, but also as a technology that must be applied soon because of the big
increase in use is online learning and education in general. Online education has
become more popular over the past years, but there are many negative aspects of it. As
everything is online, students’ privacy is questionable, mostly in terms of students’
academic achievements and available student works. Also, sharing learning materials
among students and teaching materials among teachers can be challenging regarding
security. Maybe the most important part of online education are knowledge and ability
tests, and many doubts in the credibility of scores student achieve when being
examined online. Sharing data and privacy challenges are the main reason why new
data storing and sharing methods should be included in online education systems where
data should not be available to the public or any other interested parties. Blockchain
technology is expected to decrease the possibility for public to get access to students’
work, profiles or any relevant data.
Another important feature that is available in online education because of block-
chain technology, and it is of great importance, is evidence of finished online courses.
In other words, it is possible to give valid certificates that prove that a specific student
finished online course. Online education platforms save all the relevant data, such as
information about the course, teachers, students, student's grades, date of exam, etc.
and then encrypts the data, so that digital certificates with all the data can be given to
student. Student can then give the digital certificate to the employer or any other
interested institution who can then have all the information about the person (in this
case, former student) by using the public key [35]. Not only sharing information is
easier, but also getting to certificate in case of losing one.
Using blockchain in healthcare also has a great potential. Hölbl et al. [16] made a
research about currently active examples of blockchain technology implementation in
healthcare systems and prototypes made for academic research. They outlined that
blockchain technology could contribute to healthcare as it could be used in exchanging
medical records, drug prescriptions, risk data management concerning patients and
medicines, etc. It could also make patient data more secure as these data are extremely
important and sensitive.
Nowadays, patient-driven interoperability is desirable approach in healthcare,
instead of institution-driven interoperability. However, patient-driven interoperability
brings out many questions, for example privacy, security, patient engagement, legal
procedures and regulations etc. Blockchain technology could solve some of problems
that are mentioned, regarding data transaction, privacy and security of medical and
patients’ data [13]. Since it is already recognized among researchers that blockchain
technology could improve healthcare system, there is increasing number of research
connected to it. There are many descriptive works on the subject, but there are not
many prototype implementations [16]. One of the main reasons is that blockchain
technology itself should be upgraded so that it could respond to all the challenging
needs of health systems.
Many researchers claim that blockchain technology can help decentralizing stan-
dard voting systems [36]. Although blockchain technology has many advantages
regarding security and privacy, taking in consideration previous research, it is con-
cluded that blockchain technology still has many weaknesses to be implemented in
voting systems. At first glance, implementing blockchain technology could make
voting easier, it could reduce costs and people engaged in the whole process. On the
other hand, it could have serious impact on privacy and data security.
As it can be seen in financial sector, education or healthcare, personal data security
is often a problem in modern technology. Blockchain was first used as a technology in
cryptocurrency and it is proven that it can be trustworthy. Zyskind et al. [43] developed
a platform that makes users able to own and control their data, but also to let companies
provide customized services for them. It enables users to be familiar with all the
situations when their personal or other important data about them is collected, bus the
users will always be known as the real owners of their important data. On the other
hand, it makes companies easier to work because they do not have to fear they will
violate privacy policy. The decentralized platform they developed using blockchain
technology makes working with sensitive data easier when taking in consideration legal
rights and other regulations.
Zikratov et al. [42] explained how blockchain technology can help provide data
integrity. It can be done using transactions and authentication so that the data integrity
security is increased. They also mentioned disadvantages in using blockchain as it is
hard to obtain the system because there is not enough computing power for it. Also,
key that is used to encrypt the content could be cracked using brute force approach.
Feng et el. [12] also recognized protecting privacy as one of the greatest challenges
when using blockchain technology. They introduced several methodologies that should
help protect privacy and they also highlighted their disadvantages together with the
practical implementation of the methods. Security objectives in most of the method-
ologies are blurring transactional relationship and hiding origin and/or relation-
ship. Main disadvantages in the methodologies observed are waiting delay, no
protection on data and/or transaction target, storage overhead and not enough anon-
ymity. There are many known advantages of blockchain technology mentioned, but the
authors pointed out that it will not be possible to make the most of the technology as
long as the privacy and security issues are completely resolved.
Among all applications of blockchain, it is also argued that blockchain technology
could help improve vehicular ecosystem. Smart vehicles are connected with vehicle
owners, car manufacturers, road infrastructure etc., in other words they are connected to
Internet. As they are highly connected, it is hard to secure them, mostly because of the
great number of data. Security of vehicle is then endangered, but also the security of the
passengers. Centralization, lack of privacy and safety threads are the main reasons why
security and privacy methods that are used in smart vehicles. Dorri et al. [10] suggested
to implement security architecture based on blockchain technology. Public keys pro-
vide security, and as one on the most common blockchain technology properties is
decentralization, the problem with centralized control is also solved in that case. The
comparison of blockchain technology and conventional technologies is made for
insurance, electric vehicles, wireless remote software and car sharing services. The
advantages of using blockchain technology are distributed data exchange, secured
payment, user data privacy, and distributed authorization.
4 Data Protection and Privacy
Blockchain is one the core technologies in field of finance, business and science. There
are some differences in perception of its security and privacy, where blockchain is used
because of its improvements in security and privacy, but on the other hand, there exists
certain scenarios where its usage is not yet recommended.
There are many reports of security incidents in blockchain systems. This technol-
ogy relies on the existing infrastructure and computer equipment, making it vulnerable
to all “ordinary” or “classical” attacks. Sengupta et al. [33] identified the most relevant
attacks and grouped them into four categories: physical, network, software and data
attacks. Some of them include: Man in the Middle, Sybil, Spyware, Worms, etc. There
are known, more or less successful, countermeasures for them but are not directly
related to the blockchain technology.
4.1 Vulnerabilities and Attacks

There are some vulnerabilities arising from the blockchain architecture, consensus
models and protocols. These vulnerabilities are used to perform an attack on the
blockchain network and often result in users’ financial damage. For example, in 2013,
hackers stole over 4000 bitcoins from a company that stores cryptocurrencies in wallets
[24]. In 2017, over seven million dollars in Ethereum cryptocurrency were stolen from
the startup CoinDash [41]. The same year a crucial system’s component was destroyed,
resulting in inaccessible funds in almost 600 wallets with over 500.000 US dollars [28].
Some of the most fundamental attacks in the blockchain network are double
spending and eclipse attack. Double spending happens when an attacker buys good
from a merchant. He creates a transaction and waits until it appears in the next block.
When it appears, he takes the purchased goods. Then he releases two more blocks, one
containing a transaction that transfers funds to a second attackers address. Eclipse
attack exploits bandwidth limitations in network, which causes that devices are not
directly connected to all or many other devices. Attacker exploits this characteristic and
floods the target with his own IP addresses, making a victim’s device connect only to
attacker’s devices. That way, an attacker can send his victims incorrected data.
Zamani et al. [40] reviewed 38 security incidents with the aim to identify their
platform and root cause. The result of their research shows that only a small number of
incidents is caused by protocols, insider threats and social engineering. Application
vulnerability is found in 7 and server or infrastructure breach in 12, making them
causes for more than half observed incidents. Application vulnerability refers to flaws
and errors in functionality and most of them are a result of using smart contracts, i.e.
there is a programming or logical error allowing access to other people wallets and
data, theft and unauthorized money generation. For example, the Zerocoin had a
programming error, allowing an attacker to generate multiple spends that were then
sold and withdrawn [18]. An example of the infrastructure breach is a Coindash
website hack that allowed an Etheruem user address to be changed to the hacker’s [21].
Li et al. [22] made a taxonomy of blockchain’s risks and their causes, separately
analysing systems with and without smart contracts. They have enumerated 5 risks that
both systems have in common (blockchain 1.0 and 2.0): 51% vulnerability, private key
security, criminal activity, double spending and transaction privacy leakage and another
4 risks they have found are related to smart contracts (exclusive to blockchain 2.0).
51% vulnerability is a vulnerability in a consensus mechanism that is used to
confirm new transaction blocks. In a Proof of Work consensus mechanism, when a user
or a pool of users control more than half of computing resources in a network, they gain
control to the entire blockchain. An attacker is then able to present his own chain as a
genuine one and that enables performing the double spending attack [32]. Users are
usually grouped into mining pools (AntPool.com, BW.com, NiceHash, GHash.io etc.)
because it is more likely for a pool to solve a puzzle then an individual. Prize for solving
a puzzle is then distributed to all users in a pool. Proof of Stake and Delegated Proof of
Stake are also vulnerable to this attack, even if a pool has less than 51% computing
power. The loss from this attack in a Bitcoin network is 18 million US$ [32].
Private key in a blockchain represents user’s identity and security credential. It is
not generated by a thirdparty agency, but by the user itself. It is discovered that a
ECDSA algorithm has a vulnerability because it does not generate required random-
ness, which allows an attacker to recover the user’s private key. Because there exist no
centralized trusted thirdparty organization, it is difficult to track this kind of activities.
4.2 Improving Blockchain Security

Although blockchain security weaknesses have been identified, science and technical
community constantly work on solving different issues and proposing new approaches.
This chapter enumerates those that successfully solve the most critical security issues or
that have an important impact on other solutions.
Karame et al. [19] analysed double spending attacks and discovered that techniques
recommended by Bitcoin developers are not always effective. They performed a detail
analysis of requirements for this attack to occur and proved that proposed techniques
like introducing listening periods, promoting regular users as observers, etc. have
difficulties detecting doublespent transactions in certain scenarios. They argue that it is
crucial to implement an effective detection technique against this attack and proposed
their own modification to the current Bitcoin implementation. They have performed an
evaluation and tests that seems to show great success.
Wang et al. [37] claim that blockchain can even be used to improve security in
distributed networks, because the existing centralized solutions for malware detection
are also vulnerable to attacks. Noyes [27] proposed an interesting antimalware solution
that enables users in a blockchain to distribute virus patterns between themselves. The
solution is called BitAV and the results of a performed testing showed that it can
improve scanning speed and enhance tolerance to denial-of-service attacks.
One of the approaches dealing with a private key risk is multisig. Multisignature
scheme means that a group of signers can together produce a compact signature of a
single transaction [23]. When using multisig, a transaction is allowed only when signed
multiple times and can be used as an additional wallet protection. For example, a
transaction is not accepted if not signed with user’s personal key and the signature of
the online wallet site [29].
[29] have proposed a combination of blockchain and cloud computing. While
security of storing and transmitting data in cloud computing is already studied in detail,
it lacks privacy protection and anonymity. Because blockchain shows good charac-
teristics in this area, it could be combined with cloud computing and that way upgraded
to a more secure service. This has great potential in healthcare for storing patient data in
an offchain storage. Blockchain transaction are relatively small and linear, and are not
designed for storing large files, specially images and scans. They are then stored in a
distributed database or in a cloud (offchain) and immutable hashes of this data are
stored onchain guarantying their authenticity.
Mechanisms that ensure users’ privacy are also being researched. Self-Sovereign
Identity model is a successor to a usercentric approach that provides users a possibility
to share their identity across different services. SSI model preserves rights for partial
disclosure of someone’s data and identity and enables users to have control over their
personal data. Personal data in this model is no longer stored in a raw format across
different online services, enabling information about users’ actions, transactions and
general communication to be anonymized. The next step in identity model develop-
ment is implementing self-sovereign in blockchain [2] that, except self-sovereign
mechanism, enables controlled access to users’ identities and data to anyone in a
blockchain network. This includes an exchange of digital assets between users, like
documents, attributes and claims, without need of a central, thirdparty authority.
5 Conclusion
This paper describes the blockchain technology and its most common usage scenarios
in different fields, like science, education and business. Blockchain, because of its
architecture, used technology and protocols, shows some very good characteristics
desirable in modern information systems and guarantees data immutability, trust and
transparency in an untrusted distributed environment.
It makes no special requirements on its users because its implementation is based
on common protocols and technology. This also makes it vulnerable to the known
security issues of those technologies, but also many authors identified the vulnerabil-
ities in the blockchain itself, caused by issues in consensus mechanisms, cryptography
algorithms and smart contracts. Each security issue has more or less successful solution
and some of them are enumerated in this paper.
Even though various solutions were found and tested, each of them focuses on a
single problem. Little research has been made for the blockchain network behavior in
conditions where more solutions are implemented simultaneously. The challenges of
the blockchain technology still exist, but it has a large user, developer and science
community and is very attractive nowadays, not just for research but for usage in real
and complex information systems. It has been significantly improved, mature and
stable, and authors generally agree that its advantages and possibilities far surpass its
disadvantages.
References
1. Al-Omari, H., Al-Omari, A.: Building an e-government e-trust infrastructure. Am. J. Appl.
Sci. 3(11), 2122–2130 (2006). https://doi.org/10.3844/ajassp.2006.2122.2130
2. Bernabe, J.B., Canovas, J.L., Hernandez-Ramos, J.L., Moreno, R.T., Skarmeta, A.: Privacy-
preserving solutions for Blockchain: review and challenges. IEEE Access 7, 164908–164940
(2019)
3. Beck, R., Avital, M., Rossi, M., Thatcher, J.: Blockchain technology in business and
information systems research. Bus. Inf. Syst. Eng. 59, 381–384 (2017)
4. Bitcoin – Open source P2P money (n.d.). https://bitcoin.org/en/
5. BitInfoCharts. Bitcoin (BTC) price stats and information. BitInfoCharts (2019). https://
bitinfocharts.com/bitcoin/
6. Blaze, M., Feigenbaum, J., Ioannidis, J., Keromytis, A.D.: The role of trust management in
distributed systems security. In: Vitek, J., Jensen, C.D. (eds.) Secure Internet Programming.
LNCS, vol. 1603, pp. 185–210. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-
48749-2_8
7. Buterin, V.: On Public and Private Blockchains (2015). https://blog.ethereum.org/2015/08/
07/onpublic-and-private-blockchains/
8. Cao, W., et al.: PolarFS: an ultra-low latency and failure resilient distributed file system for
shared storage cloud database. Proc. VLDB Endow. 11, 1849–1862 (2018)
9. Crosby, M., Nachiappan, P., Verma, S., Kalyanaraman, V.: BlockChain technology: beyond
bitcoin. Appl. Innov. Rev. 2, 7–19 (2016)
10. Dorri, A., Steger, M., Kanhere, S.S., Jurdak, R.: BlockChain: a distributed solution to
automotive security and privacy. IEEE Commun. Mag. 55(12), 119–125 (2017). https://doi.
org/10.1109/MCOM.2017.1700879
11. EMC: The Digital Universe of Opportunities: Rich Data and the Increasing Value of the
Internet of Things. EMC (2014). https://www.emc.com/leadership/digital-universe
12. Feng, Q., He, D., Zeadally, S., Khan, K., Kumar, N.: A survey on privacy protection in
blockchain system. J. Netw. Comput. Appl. 126, 45–58 (2018)
13. Gordon, W., Catalini, C.: Blockchain technology for healthcare: facilitating the transition to
patient-driven interoperability. Comput. Struct. Biotechnol. J. 16, 224–230 (2018)
14. Gramoli, V.: From blockchain consensus back to Byzantine consensus. Future Generation
Computer Systems (2017)
15. Hajirahimova, M., Aliyeva, A.: About big data measurement methodologies and indicators.
Int. J. Modern Educ. Comput. Sci. 9, 1–9 (2017)
16. Hölbl, M., Kompara, M., Kamisalic, A., Nemec Zlatolas, L.: A systematic review of the use
of blockchain in healthcare. Symmetry 10(10), 470 (2018)
17. Hoehle, H., Huff, S., Goode, S.: The role of continuous trust in information systems
continuance. J. Comput. Inf. Syst. 52(4), 1–9 (2012)
18. Insom, P.: Zcoin's Zerocoin Bug explained in detail. Zcoin (2017). https://zcoin.io/zcoins-
zerocoin-bug-explained-in-detail/
19. Karame, G., Androulaki, E., Capkun, S.: Double-spending fast payments in bitcoin. In:
Proceedings of the ACM Conference on Computer and Communications Security, pp. 906–
917 (2012)
20. Lee Kuo Chuen, D.: Handbook of Digital Currency. 1 edn. Elsevier, Amsterdam (2015)
21. Leyden, J.: CoinDash crowdfunding hack further dents trust in cryptotrading world. The
register (2017). https://www.theregister.com/2017/07/18/coindash_hack/
22. Li, X., Peng, J., Chen, T., Luo, X., Wen, Q.: A survey on the security of blockchain systems.
Future Generation Computer Systems (2017)
23. Ma, C., Jiang, M.: Practical lattice-based multisignature schemes for blockchains. IEEE
Access 7, 179765–179778 (2019). https://doi.org/10.1109/ACCESS.2019.2958816
24. Mcmillan, R.: $1.2M Hack Shows Why You Should Never Store Bitcoins on the Internet.
Wired (2013). https://www.wired.com/2013/11/inputs/
25. Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System. Bitcoin.org (2009). https://
bitcoin.org/bitcoin.pdf
26. Narayan, S., Chandy, J.: Parity redundancy in a clustered storage system. In: 4th
International Workshop on Storage Network Architecture and Parallel I/Os SNAPI, pp. 17–
24 (2007)
27. Noyes, C.: Bitav: fast anti-malware by distributed blockchain consensus and feedforward
scanning (2016). arXiv preprint arXiv:1601.01405
28. Parity Technologies. A Postmortem on the Parity Multi-Sig Library Self-Destruct. Parity
(2017). https://www.parity.io/a-postmortem-on-the-parity-multi-sig-library-self-destruct/
29. Park, J., Park, J.: Blockchain security in cloud computing: use cases, challenges, and
solutions. Symmetry. 9, 164 (2017)
30. Pavlou, P.A., Fygenson, M.: Understanding and predicting electronic commerce adoption:
an extension of the theory of planned behavior. MIS Q. 30(1), 115–143 (2006)
31. Santos, M.: Not even the top 500 supercomputers combined are more powerful than the
Bitcoin network. 99 Bitcoins (2019). https://99bitcoins.com/not-even-the-top-500-
supercomputers-combined-are-more-powerful-than-the-bitcoin-network/
32. Sayeed, S., Marco-Gisbert, H.: Assessing blockchain consensus and security mechanisms
against the 51% attack. Appl. Sci. 9(9), 1788 (2019)
33. Sengupta, J., Ruj, S., Dasbit, S.: A comprehensive survey on attacks, security issues and
blockchain solutions for IoT and IIoT. J. Netw. Comput. App. (2019)
34. SINTEF. Big Data, for better or worse: 90% of world's data generated over last two years.
ScienceDaily (2013). www.sciencedaily.com/releases/2013/05/130522085217.htm
35. Sun, H., Wang, X., Wang, X.: Application of blockchain technology in online education. Int.
J. Emerg. Technol. Learn. (iJET). 13, 252 (2018)
36. Taş, R., Tanrıöver, Ö.Ö.: A systematic review of challenges and opportunities of blockchain
for e-voting. Symmetry 12(8), 1328 (2020). https://doi.org/10.3390/sym12081328
37. Wang, H., Zheng, Z., Xie, S., Dai, H., Chen, X.: Blockchain challenges and opportunities: a
survey. Int. J. Web Grid Serv. 14, 352–375 (2018)
38. Wu, J., Tran, N.: Application of blockchain technology in sustainable energy systems: an
overview. Sustainability 10, 3067 (2018)
39. Yaga, D., Mell, P., Roby, N., Scarfone, K.: Blockchain Technology Overview. National
Institute of Standards and Technology Internal Report 8202 (2019)
40. Zamani, E., He, Y., Phillips, M.: On the security risks of the blockchain. J. Comput. Inf.
Syst. (2018)
41. Zhao, W.: $7 Million Lost in CoinDesk ICO Hack. Coindesk (2017). https://www.coindesk.
com/7-million-ico-hack-results-coindash-refund-offer
42. Zikratov, I., Kuzmin, A., Akimenko, V., Niculichev, V., Yalansky, L.: Ensuring data
integrity using blockchain technology. In: 2017 20th Conference of Open Innovations
Association (FRUCT), pp. 534–539 (2017)
43. Zyskind, G., Nathan, O.: Decentralizing privacy: using blockchain to protect personal data.
In: 2015 IEEE Security and Privacy Workshops, pp. 180–184. IEEE (2015)
iMedMS: An IoT Based Intelligent
Medication Monitoring System
for Elderly Healthcare
Khalid Ibn Zinnah Apu, Mohammed Moshiul Hoque(B) ,

and Iqbal H. Sarker

of Engineering and Technology (CUET), Chittagong 4349, Bangladesh
{khalidex,mmoshiul 240,iqbal}@cuet.ac.bd
Abstract. In recent years, the ageing population is growing swiftly

and ensuring proper healthcare for the elderly and physically challenged
people has gained much attention from academic, medical or industrial
experts. Many older people undergo sickness or inability, causing it chal-
lenging to look out of themselves concerning timely medicine taking. Any
trivial ignorance, such as ignoring to take medications or taking shots
at the wrong schedule, may cause potentially disastrous problems. This
paper presents an intelligent medication system (called ‘iMedMS’) using
IoT technology to monitor whether the patient is taking medicine accord-
ing to the physician’s prescription and schedule. The system includes an
alert system that notifies the patient about the exact time of medica-
tion and a feedback system that send SMS to the patient, physician or
caregiver while any pillbox is empty. Moreover, the system embedded
seven physical buttons to express the patient’s various feelings after tak-
ing medicine. Several experimental results show the functionality of the
proposed medication system.
Keywords: Health informatics · Internet of Things · Intelligent

medication monitoring · Elderly healthcare · Medication remainder
1 Introduction
The proportion of older adults living alone has increased dramatically in recent
years. Many of them undergo various illnesses or disabilities, causing it challeng-
ing to manage their healthcare. There are many diseases where taking medicine
at a scheduled time has been considered of utmost importance. Any trivial uncon-
sciousness, such as disregarding to take medications or using medications at the
unscheduled time, may cause disastrous difficulties. Elders are more affected
by the timing of taking a particular drug than others; in order to prevent any
dysfunction or illness, appropriate timing is an obligation [2,3]. According to
WebMD research, about 46% of people forget to take their medication at a
scheduled time, where most older adults are in total proportion. Moreover, the
https://doi.org/10.1007/978-3-030-93247-3_30
iMedMS: An IoT Based Intelligent Medication Monitoring System 303
elderly are usually prescribed various drugs that need to be taken at specific
times. Maintaining track of taking the correct drugs at a specific time each day
can become an arduous practice for the elderly, as it is not as apparent as it
could be for a younger individual.
Older people usually suffer from sight, memory or logical capabilities, decreas-
ing proportionately with age. Thus, it is very challenging for older people to
remember which pill to take at which time. A human caregiver or nurse may be
employed to care or monitor the older patient, but past surveys resulted in the
case of the nurse who provided a patient with a person with paralysis in place of
an antacid that was commanded by the physician, causing the patient’s demise
[4]. It is also challenging for physicians or caregivers to know the up-to-the-
minute information about a patient. Therefore, IoT based patient’s medication
monitoring system can be a helpful solution for older people. IoT is consid-
ered a practical solution in health monitoring to track any patient health status
in real-time. It facilitates that the individual’s affluence physiological data is
secured inside the cloud and stays in the clinics lessened for routine checkups.
Moreover, the patient’s health can be observed and disease diagnosed by any
physician remotely. In highly infectious diseases such as covid-19, it is always a
more reliable idea to monitor affected patients using remote health monitoring
technology. Thus, the IoT based device or technique is the best option for the
patient, caregiver and physician [5].
Several medicine containers or pillboxes are available in the market, but
most of them have restricted usage and do not adapt for the elderly due to
their extended size, high-priced, and user-friendliness. Therefore, it is essential
to receive the specified feedback from the patient after consuming a particular
drug. This work proposes an intelligent medication intake monitoring system for
the elderly and impaired users to manage their scheduled medicine effectively.
The proposed system is easy to use, portable and cost-effective. The specific
contributions are highlighting in the following:
– Develop an intelligent medication monitoring system using IoT devices and
cloud-based technology.
– Develop a software interface to interact with the medication system incorpo-
rating medicine intake reminder module, notification generation module, and
patient’s feedback module.
– Investigate the functionality of the proposed system using quantitative data
collected from patients and doctors.
2 Related Work
Developing an intelligent medication monitoring system or smart pillbox is rare.
Kiruthiga et al. [1] developed an IoT based medication system that can observe
whether the proper amount of medicine is received by the patient or not at
the scheduled time. This system cannot inform the status of medicine amount,
not portable and deficiency to share the patient’s feeling if any complicacy is
occurred after intaking a particular pill. Some works presented a magic pillbox
304 K. I. Z. Apu et al.
[6,7] that promotes both the monitoring and physiological features of a patient’s
healthcare. Nevertheless, it merely informs the patient to accept pills according
to her/his vital symptoms without extensive schedules, and this system can not
use to a circumstance when the patient is away from the residence. David et al. [8]
presents a medicine dispensing scheme comprising a pill closet and dispenses pill
dosages system for single and multiple patients, but it provides limited health-
monitoring services. Chen et al. [9] developed a micro-controller based pillbox
that can deliver a pill at a pre-defined time. However, this system did not provide
any means to save time regarding patient’s pill intake. Huang et al. [10] proposed
a smart pillbox based on the pill case technique, which supplies a pill from the pill
case at a scheduled time. However, the proper functioning of this system solely
depends on the availability of Internet connectivity. A design of an intelligent
drug box proposed in [11] that can remind the patient regarding his/her schedule
of taking a medication. Minaam et al. [12] proposed a smart pillbox containing
nine distinct sub-boxes as a medicine reservoir. However, this box worked for
nine distinct pills only, and the provision of the patient’s feedback sharing is
absent.
Dhukaram et al. [13] demonstrate a pill management system containing dif-
ferent kinds of medication schedules, but this system did not generate any noti-
fication or feeling sharing components after taking a pill. Wang et al. [14] pre-
sented a smartphone-based medication intake remainder system that can alert
the patient if he/she forgot to take a scheduled pill. However, this system cannot
notify the doctor about the patient status and compatibility issues with a par-
ticular medicine. Features to design medication reminder apps have proposed in
[15]. This system generates too many short messages, which may be challenging
to follow and lack to share any feeling if any complicacy arises after intaking a
pill. An IoT-based health monitoring system used numerous wearable sensors to
capture various physiological data and share them with doctors and patients’ rel-
atives [16]. However, this system uses remote monitoring that is costly, and wear-
ing so many sensors are not practical and increase uncomfortableness. Bansal
et al. [17] used ZigBee transceiver to transfer physiological information within
a small range. However, this system did not provide any feedback module. An
android-based solution was proposed by Lakshmanachari et al. [18] which, gath-
ers data from the patients and send it to the data centre for further processing.
Nevertheless, the proposed system does not contain any pill management mod-
ule or alert system. Haider et al. [19] proposes a medicine planner which can fill
pills automatically into the pillbox. However, the patient feedback system and
monitoring functions are missing in this system. The proposed medication intake
monitoring system develops a pill container with an automatic cloud-based pill
scheduling and patient feeling-sharing facility to address the previous method’s
shortcomings.
3 Proposed System Architecture

This work aims to develop a tool that will allow the owner to track each pill to
ingest naturally and easily. No unusual training or detailed knowledge is required
to run the tool. This device can log the pill name, time to be used, the actual time
of taking a pill. The tool is connected wirelessly through a cloud framework to
record data and manage patient’s medication intake monitoring. Figure 1 shows
an overview of the proposed medicine intaking monitoring system. The proposed
architecture comprises three main modules: (i) prescription generation, (ii) pill
management and (iii) feelings sharing.
Fig. 1. Proposed architecture for IOT based medication system
Each of the modules interacts with the cloud-based system so that every
interaction will be updated in real-time. Each user tag with a unique identifica-
tion number for tracking him/her.
3.1 Smart Prescription Generation Module
Pre-processing prescriptions into the desired format is essential as the data is

used to implement the IoT system. Medicine name, dose, duration, writes and
saves in the cloud database system to interact with the Arduino-Uno. At the
bottom of the prescription, various remarks are attached so that the doctor can
suggest issues of the patients’ relative health. The prescription can be imple-
mented on a digital pill dispenser system and formatted in javascript (metadata)
for further processing. Figure 2 shows an interface of the prescription generation
module in which a doctor can write the name and dose of a specific drug. For
example, doctor’s input 1+1+1 denotes 3 times a day (morning: 9.00 am, noon:
2.00 pm and night: 10.00 pm) and 0 indicates no medicine will require at that
specific time.
Fig. 2. Cloud based IoT integrated prescription interface
The prescription and patient data are stored on the cloud server in JSON
format. The medicine parameter (M) can represent as the function (Eq. 1), and
the prescription for JSON denotes by the symbol P. To generate the prescription,
a function parameter for each medicine is defined by a doctor pre-processed by
the function M1 , M2 , ..., Mn where the subscript n denotes the medication
number.
F (M [C])||F (D)||F (E)
M1 = (1)
F (i T )
here, F(M[C]) denotes a function to define medicine category (i.e., capsule or
tablet), F(D) represents a function to specify medicine intake duration in the
number of days, F(E) denotes a function for instruction to take medicine (i.e.,
take in empty or full stomach), F (iT ) indicates a function to define interval time
of taking a dosage of the medicine (i,e., after 8 h or 3 times a day).
3.2 Medicine Management Module
The medicine management system is the integration of hardware and software.

Various hardware compositions are used to develop the medication management
module. A GSM module (SIM900) establishes a connection between the device
and the prescription server. The system sends data to the Arduino Mega 2560
embedded on ATmega2560 CMOS 8-bit microcontroller (MC). This MC contains
54 digital input/output pins; 15 are used as PWM outputs with 16 combinations
of inputs, 4 UARTs and a 16 MHz crystal built-in oscillator, a USB power cable
jack, an ICSP header, and there is also a reset switch [14]. Figure 3 depicts
significant components of the medicine management module.
The IR sensor components contain four components: IR emitter, IR receiver,
pill compartment, and empty pill detection unit. The input of the sensor emitter
(pill-taking schedule provided by the cloud) is given to the IR receiver. The
empty detection unit checks whether the compartment is empty or not. If it is
found empty, an SMS will send to the relatives’ phone via the GSM module. If the
Fig. 3. Components of medicine management module
pill compartment is not empty, then a medicine intake alert is generated by the
system. Figure 4 shows the developed module includes four main constituents:
pillbox, SIM 900 module, Arduino, and feeling sharing.
Fig. 4. Hardware implementation of smart pill box system
3.3 Medicine Intake Alert Module

The alert system takes input from the prescription unit via the cloud-server
through SIM900 and GSM modules. When a medication intake schedule acti-
vates, the device communicates with the server to activate the buzzer with an
LED glowing. Thus, the patient is aware of the pill-taking time. When the alarm
system gets activated, the specific compartment door is triggered automatically
by using a step motor governed by the Arduino [12]. The medication manage-
ment system fetches the server’s data and activates the alert system with a
flashing LED according to the scheduled medicine intake time. The alarm acti-
vation depends on the time, quantity, and medicine type, which can stimulate
as in Eq. 2.
(P [qty])||L(S)
O(A) = (2)
D(T )
here, O(A) denotes the buzzer alert, (M [qty]) represents the pill quantity
(amount of doges), L(S) indicates LED status on (1)/off (0), and D(T ) indicates
pill taking time. For example, O(A) = 11 indicates the alert activated for 1st
row-1st column pill to be taken from compartment and buzzer activated, and
O(A) = 21 indicates the alert activated for 2nd row-1st column with buzzer
alarm.
3.4 Notification Generation Module

If a specific drug is finished, the medicine box door automatically closes and
alerts to indicate the medicine stock is empty. When an alarm is triggered, a
notification is sent to the patient through SMS using SIM900 to reminds medicine
intaking time. The system notifies the doctor about the dose and pill after taking
the medication by the patient based on the predefined rule such as “If the patient
X NOT TAKEN a medicine at Day Y Dose A, then send”. Figure 5 shows a
sample snapshot of notification through SMS.
3.5 Feeling Sharing Module
A patient may feel complications or discomfort after taking a medication. Thus,

the device includes a feeling sharing module to share the patient’s various feelings
with the doctor or caregiver after receiving a prescriptive medicine. The device
embedded seven push buttons to convey seven kinds of feelings to the patient.
The following patient’s feelings are stored and send to the doctor for taking
necessary measures.
– Neutral: button can use while the patient is not feeling any change in his/her
health condition after medication.
– Good: This button can use for the positive change in the patient’s health
condition.
– Better: while a patient is on the path to cure, this button will use.
– Pain: if the patient feels an ache in any body part, then this button can use
– Headache: button can be utilized when the patient feels severe headaches.
– Vomiting: button can push in the case of nausea.
– Major complication: If the patient is in a dire situation or severe health
condition, this button is functional.
Fig. 5. Sample SMS notification by iMedMS
4 Experiments
The developed iMedMS investigates the efficiency and accuracy of the cloud-
based prescription module and intelligent IoT pill-management system. We used
Linus CentOS 7, CPU 8 cores Xeon Processor, 1 Gbps dedicated connection,
including load balancing to implement the cloud server. The doctor prepares
prescriptions through the prescription generation module using the cloud server.
The medication box interacts with the cloud system through SIM 900 module
and Arduino. A buzzer connected to the Arduino starts to ring at a scheduled
time of pill taking while receiving a cloud server command. The system notifies
the doctor and caregiver if the patient has taken medicine, and the corresponding
data is stored. The system also stored data related to any feelings shared by the
patient. The system can also detect and notify if any of the desired pillboxes is
empty.
The doctor initiates the process by completing prescriptions in a cloud-based
system. Then the user device (iMedMD) will automatically communicate with
the cloud server and would be prepared for alert generation. While the patient
takes the pill, this data will be sent to the cloud server via GSM (SIM900)-
GPRS modules. After taking a pill, feelings sharing data will be collected from
the patient to the server.
4.1 Data Collection

The performance of iMedMS is analyzed after setting up the prescription in the
cloud system concerning the medication schedule, buzzer alarm, LED flashing,
and patient’s feeling sharing responses. The iMedMS experimented and simu-
lated for interacting 76 patients and 11 doctors. A total of 8567 medication
trials is carried out over 14 days in 228 prescriptions, having three medicine
instances in 4 doges per day. Table 1 shows the summary of the collected data
from iMedMS.
Table 1. Summary of collected data
Attributes Values
Number of medicine 372
Number of doges 8567
Feedback received 1750
Total patient 76
Total prescription 228
Total doctors 11
N o. of correct events
Accuracy = × 100% (3)
T otal N o. of events
4.2 Results
To evaluate the proposed system’s performance, the success and failure ratio is
measured in three functionalities: (i) prescription analysis and dose detection,
(ii) buzzer and light interaction, and (iii) feedback propagate to the doctor via
cloud system. Equations 2–5 are used to measure the success ratio (SR) and
failure ratio (FR).
N o. of correct events
SR = × 100% (4)
T otal N o. of events
N o. of undetected events
FR = × 100% (5)
T otal no. events
The results indicate that the proposed system achieved a success rate of
93.75%, whereas it failed to perform its functionality in 6.25% cases. The errors
occurred due to the communication failures between IoT devices and the cloud
server in a few cases.
Table 2 illustrates the result various measures (SR and error rate (ErrR)) in
performing three functionalities based on experimental simulation. In Prescrip-
tion analysis and dosage detection, error rate is calculated based on failure of
prescription text processing and fail to separate the medicate doges indicated by
doctor. In function (ii), error rate is calculated based on whether buzzer alarm-
light interacts within the specified medication doges time or not, in (iii) error
rate is calculated based on feedback received by the doctor via cloud system.
Table 2. Performance on different functions
Functionality SR (%) ErrR (%)

(i) Prescription analysis and dosage detection 95.50 3.10
(ii) Buzzer and light interaction 93.80 2.50
(iii) Feedback propagate to doctor via cloud system 91.20 1.75
Additionally, execution time and power consumption of the system were also
measured. The execution time of the system refers to the time requires (in ms)
for triggering various events (such as run task, layout, function call and so on)
in the cloud system. The event utilization time also shows IoT communication
breakdown of data processing queue path in a cloud environment where CPU
utilization represents the IoT communication system’s bandwidth. The parsing
data (31.3%) and run task (31.7%) utilized the top portion (63% in total) of the
total run time events. Figure 6 shows the IoT communication with the server
in the waterfall model. In communication with server JS, markup, CSS, font,
and other assets are processed as a waterfall model where CPU utilization and
execution bandwidth have been visualized in time and bar graphs.
Fig. 6. IoT communication with the cloud
Power consumption varies in a different state while sensor nodes are acti-
vated. The device consumes the lowest amount of power while it is in an idle
state. The system needs a certain amount of power to be kept active, consumed
in the idle mode for running the system. The sensor needs additional power to
sense, process and transmit data. The power consumption increases with the
increased number of processing. Thus, the total power consumption (Pc ) by the
system can be calculated using Eq. 6.
Pc = Pi + P s + P p + P t (6)
The total power consumed by the system is 6.451 W. The highest power
consumption occurred for data transmission (Pt ) amount to 2.686 W, where the
idle state (Pi ) consumed 1.136 W, the sensing state (Ps ) consumed 1.166 W, and
1.436 W was consumed for processing (Pp ).
5 Conclusion
This paper developed an IoT-based intelligent medication intake motoring sys-

tem called iMedMS. The developed system can generate a cloud-based prescrip-
tion indicating the medicine name, schedule, and dosage amount. An alert sys-
tem through a buzzer and SMS are generated to remind the patient’s medica-
tion schedule. The iMedMS also embedded a feeling sharing module to express
seven kinds of feelings if any discomfort or complicacy occurs after taking any
medicine. The cloud-server framework manages all communication between the
patient and the system in which relevant data can be recorded for further inves-
tigation. Although the proposed framework is inexpensive and automatic, some
functionality should include to improve performance. The whole system can be
implemented on mobile apps and its interface to the cloud for enhancing porta-
bility. Integration of iMedMS with computerized medical and private health
records can be actualized to ensure real-time motoring. The size of iMedS may
be curtailed to be realistic and manageable. Moreover, a plan of voice interplays
can be incorporated for ease or effortless operation.
References
1. Kiruthiga, S., Arunthadhi, B., Deepthyvarma, R., Divya Shree, V.R.: IOT based
medication monitoring system for independently living patient. Int. J. Eng. Adv.
Technol. 9(3), 3533–3539 (2020)
2. Sankar, A.P., Nevedal, D.C., Neufeld, S., Luborsky, M.R.: What is a missed dose?
Implications for construct validity and patient adherence. AIDS Care 19(6), 775–
780 (2007)
3. Sawand, A., Djahel, S., Zhang, Z., Naı̈t-Abdesselam, F.: Multidisciplinary
approaches to achieving efficient and trustworthy eHealth monitoring systems. In:
Proceedings of IEEE/CIC International Conference on Communications in China
(ICCC), Shanghai, China, pp. 187–192 (2014)
4. Chelvam, Y.K., Zamin, N.: M3DITRACK3R: a design of an automated patient
tracking and medicine dispensing mobile robot for senior citizens. In: Proceedings
of 2014 International Conference on I4CT, Langkawi, Malaysia, pp. 36–41 (2014)
5. Almotiri, S.H., Khan, M.A., Alghamdi, M.A.: Mobile health (m- health) system in
the context of IoT. In: Proceedings of 2016 IEEE 4th International Conference on
FiCloudW, Vienna, Austria, pp. 39–42, August 2016
6. Wan, D.: Magic medicine cabinet: a situated portal for consumer healthcare. In:
Gellersen, H.-W. (ed.) HUC 1999. LNCS, vol. 1707, pp. 352–355. Springer, Heidel-
berg (1999). https://doi.org/10.1007/3-540-48157-5 44
7. Wan, D., Gershman, A.V.: Online Medicine Cabinet, US Patent. no. US6539281B2
(2003)
8. Haitin, D., Asseo, G.: Medication Dispensing System including Medicine Cabinet
and Tray Therefor. US Patent. no. US20040054436A1 (2006)
9. Chen, B.-B., Ma, Y.-H., Xu, J.-Li.: Research and implementation of an intelligent
medicine box. In: Proceedings of 2019 4th International Conference on IGBSG,
Hubei, China, pp. 203–205 (2019)
10. Huang, S.-C., Chang, H.-Y., Jhu, Y.-C., Chen, G.-Y.: The intelligent pill box-design
and implementation. In: Proceedings of 2014 IEEE International Conference on
Consumer Electronics, Taiwan, Taipei, pp. 235–236 (2014)
11. Shashank, S., Tejas, K., Pushpak, P., Rohit, B.: A smart pill box with remind and
consumption using IOT. Int. Res. J. Eng. Technol. 4(12), 152–154 (2017)
12. Minaam, D.S.A., Abd-ELfattah, M.: Smart drugs: improving healthcare using
smart pill box for medicine reminder and monitoring system. Future Comput.
Inf. J. 3(2), 443–456 (2018)
13. Dhukaram, A.V., Baber, C.: Elderly cardiac patients’ medication management:
patient day-to-day needs and review of medication management system. In: Pro-
ceedings of IEEE ICHI, Philadelphia, PA, USA, pp. 107–114 (2013)
14. Wang, M.Y., Tsai, P.H., Liu, J.W.S., Zao, J.K.: Wedjat: a mobile phone based
medicine in-take reminder and monitor. In: Proceedings of 2009 9th IEEE Inter-
national Conference on BIBE, Taichung, Taiwan, pp. 423–430 (2009)
15. Stawarz, K., Cox, A.L., Blandford, A.: Don’t forget your pill! designing effective
medication reminder apps that support users’ daily routines. In: Proceedings of
SIGCHI Conference on Human Factors in Computing Systems, Toronto, Canada,
pp. 2269–2278 (2014)
16. Valsalan, P., Baomar, T.A.B., Baabood, A.H.O.: IOT based health monitoring
system. J. Crit. Rev. 7(4), 739–742 (2020)
17. Bansal, M., Gandhi, B.: IoT based smart health care system using CNT electrodes.
In: Proceedings of 2017 ICCCA, Greater Noida, India, pp. 1324–1329 (2017)
18. Lakshmanachari, S., Srihari, C., Sudhakar, A., Nalajala, P.: Design and imple-
mentation of cloud based patient health care monitoring systems using IoT. In:
Proceedings of 2017 ICECDS, Chennai, India, pp. 3713–3717 (2017)
19. Haider, A.J., Sharshani, S.M., Sheraim, H.S., et al.: Smart medicine planner for
visually impaired people. In: Proceedings of ICIoT, Doha, Qatar, pp. 361–366
(2020)
Internet Banking and Bank Investment
Decision: Mediating Role of Customer
Satisfaction and Employee Satisfaction
Jean Baptiste Bernard Pea-Assounga(&) and Mengyun Wu
School of Finance and Economics, Jiangsu University, 301 Xuefu Road,

Zhenjiang 212013, China
{aspeajeanbaptiste,mewu}@ujs.edu.cn
Abstract. This study looked into the effects of internet banking, staff, and
customer satisfaction on bank investment decisions using certain Congolese
banks. The ten surveyed banks were administered a total of 1800 questionnaire
items, out of which 1500 representing 83.33% were considered. Using SPSS,
SmartPLS, and Stata statistical software, the data was analyzed employing
percentages of respondents, correlation analysis, and the System of Regression
Equations approach. The overall findings have shown that internet banking and
bank investment decisions have had a positive effect, and also customer satis-
faction and employee satisfaction partially mediate the nexus between internet
banking and bank investment decision, reinforcing prior studies and leading to
generalization. According to the findings, the resulting recommendations are
suggested to both individual and institutional investors: investors must be told
that several factors can influence their investment decision-making, including
customer satisfaction and employee satisfaction, and they should be aware of
these factors.
Keywords: Bank investment decision Customer satisfaction Employee

satisfaction Internet banking Republic of Congo
1 Introduction
The internet and advanced innovations have influenced new business models and bank
investment decisions [1, 2]. Internet banking is defined as a channel of banking that lets
consumers perform an extensive range of nonfinancial and financial services via a
website of bank [2, 3]. Numerous banks experience the deployment of the internet
banking system by attempting to decrease cost whereas enhancing customer services
[4, 5]. Despite the potential benefits that consumers may gain from internet banking, its
implementation in a firm is limited and does not always meet expectations [5, 6]. To be
innovative, organizations must effectively use and obtain input from many resources,
including human capital, customers, etc. [7].
Traditional investment theory dictated company investment by fluctuations in
interest rates. Lower market rates reduce capital expenditures and increase beneficial
investment projects. However, research from several countries shows that investment
decisions are mainly based on rules of thumb rather than standard financial models [8].
https://doi.org/10.1007/978-3-030-93247-3_31
Internet Banking and Bank Investment Decision: Mediating Role 315
In the practical aspect, this may show that the investments choices are less delicate to
loan cost changes as suggested by the traditional theory of investment. Also, Hjelseth,
Meyer, and Walle [9] stated that investment decisions are frequently based on reliable
criteria that disregard financing costs. A common financial assumption is that firms
make investment decisions based on a required profit rate that is influenced by the costs
of capital. A lower interest rate should raise the frequency of beneficial investments.
From some papers we have read, many researchers are focusing only on the impacts
of internet banking on bank performance [4, 5], which does not include employee
satisfaction, customer satisfaction, and bank investment decision variables. Therefore,
this study seeks to fill the gap by investigating the mediating role of Customer Satis-
faction (CSAT) and Employee Satisfaction (ES) in the nexus Internet Banking (IB) and
Bank Investment Decision (BID) in the Congolese context.
Firms and individuals must make three types of financial decisions namely capital,
dividend, and investment decisions. Making investment decisions is a core duty for
many companies, notably those in the financial sector. Choosing the correct mix of
short-term and long-term investments enhances an organization’s overall revenues. The
increased use and frequent changes of technology and stakeholders’ demands have
made these decisions more complex for financial and banking organizations. A study of
this rising issue in the Republic of Congo will enable banks better understand the
complexities of financial investment decisions as well as how to boost their total profit
through the right investment decisions.
The effects of not carrying out this research can be tremendous on banks, gov-
ernment, and the other stakeholders of the banks as well as academicians. For instance,
banks stand to lose a large amount of profit, through a couple of wrong decisions. This
could deny banks the funds needed to propel growth and expansions. Also, the con-
sequence of making the wrong decisions could deny the stakeholders the opportunity to
maximize their dividends and therefore could have an indirect impact on the banks’
dividend decisions. In addition, the traditional financial theory posits that people make
rational decisions, but at the same time some people do irrational decisions as well that
affect their future [10]. Furthermore, the customers and the employees of the banks risk
losing service quality and good working conditions respectively if banks continuously
make wrong investment decisions through their lack of understanding of how the
growing technology in the industry and stakeholder demands affect investment deci-
sions [11]. The objective of this research is therefore to address the question of how
banks in the Republic of Congo are supposed to make investment decisions in a
growing competitive global financial industry, which is mostly driven by changing
technology and stakeholders’ satisfaction or demands.
In this paper, we examined how internet banking can favorably affect banks’
investment decisions by using SmartPLS-3, Stata, and SPSS to analyze data from a
survey of banking institutions in the Republic of Congo.
This study contributes to banks’ management and strategic innovations theory and
practice. The study also contributes to banking literature and practice in two ways. 1)
Better knowledge of how banking innovation services affect management and mar-
keting concepts like BID, ES, and CSAT. 2) Showing the role of ES and CSAT in the
relationship between IB and bank investment decisions. This research will help bank
316 J. B. B. Pea-Assounga and M. Wu
and organization managers to develop their markets by boosting their technology and
innovations services. Lastly, the findings of this research are vital in developing
countries as they will encourage politicians and bank management to pursue policies
that foster technology and innovation in the banking sector, finally benefiting the
overall financial system.
The study is organized as the first section of this paper provides background
information and a brief introduction. The second section discusses the research liter-
ature. The third part describes the data source, sample, and analytical model used. The
study’s practical implications, limitations, and recommendations for further research
and conclusion are discussed in the last section of the paper.
2 Review of the Literature and Formulation of Hypotheses
The review of the literature and hypotheses formulation are discussed in this part.
2.1 Literature Review and Conceptual Framework

According to scholars, human capital (employees) and customers are the most
important sources of competitive advantage [7]. Using human capital is frequently
more significant than having it for esteem creation [12], execution [13], and risk
recovery [7]. An investment is a process of putting your money into various financial
resources or foundations with uncertain prospective returns.
Nandini [14] defined investment as a financial commitment with favorable returns.
Investment returns might be in the form of financial gains, regular income, or a
combination of both. The daily investment decisions made by individuals or institu-
tions such as banks determine tomorrow’s losses or profits [14]. However, not every
investment pays off since investors are not always rational [10]. Innovation technology,
stakeholders’ satisfaction, knowledge, and judgment all affect bank investment deci-
sions. To invest may have significant effects on the future of banks if they know when,
where, and how to invest. Essentially, everyone makes investments at some point in
their lives, whether it is saving or depositing money in a bank, buying stocks, insur-
ance, equipment, or building the infrastructures, etc. Nevertheless, every investment
entails risks as well [14]. Antony and Joseph [15] described investment decisions as a
psychological procedure since organizations and individuals make decisions based on
available options. Financial experts commonly utilize basic, specialized, and judg-
mental investment investigations. Technical, fundamental, and instinctive analyses all
rely on established financial hypotheses that are in line with rationality [16]. A study by
Flor and Hansen [17] stated that Technological advances impact a firm’s investment
decision, as they affect the investment cost.
The empirical literature on this subject is extensive, as business investment has long
been a focus of financial research. Corporate investments and CEO confidence, effi-
ciency and investment management forecast [18], and volatility and investment [19].
Unlike these studies, we use the key marketing variables namely, employees’ and
customers’ satisfaction, and analyze their effects on the companies’ investment deci-
sions (see Fig. 1).
Customer
Satisfaction
(CSAT)
H2 H4
Bank
Internet
H1 Investment
Banking (IB)
Decision (BID)
H3 H5
Employee
Satisfaction
(ES)
Fig. 1. The study’s conceptual framework
2.2 Study Hypotheses Development
Relationship Between Internet Banking and Bank Investment Decision

The marginal benefits of technology continue to outweigh the marginal costs. Hu and Xie
[20] recognized the value of internet banking in growing the productivity, efficiency, and
profitability of the banking industry. Also, some studies suggest a link between inno-
vation technology and Investment decisions. For instance, Turedi and Zhu [21] observed
a positive moderating effect of IT decision-making structure mechanisms on the IT
investment–organization performance relationship. Božić and Botrić [22] have argued
that the decision to invest adequately in innovation is to a great degree of complexity, on
the one hand, due to insufficient resources, and on the other, due to the different inno-
vation directions that businesses have to choose from. In the same sense, per previous
studies, they also show that a firm’s decision to invest in innovation (R&D) increases
with its size, market share, and diversification, and with demand-pull and technology
push forces [23]. Based on the foregoing explanation, we hypothesized that;
H1: Internet banking has a significant positive effect on bank investment decisions.
Internet Banking and Customer Satisfaction
Nazaritehrani and Mashali [24] investigated “Development of E-banking channels and
market share in underdeveloped countries”. Their findings show that internet banking
lowers bank-operating expenses and boosts consumer satisfaction and retention. Simi-
larly, in the banking sector, the availability of internet banking services and user-
friendliness appear to be correlated with high customer satisfaction and retention. In
internet banking, there is a considerable relationship between e-customer satisfaction and
loyalty. Moreover, another work by Rahi, Ghani, and Ngah [2] suggests that numerous
banks have implemented IB to decrease costs whilst enhancing customers’ services.
H2: There is a positive significant relationship between internet banking and cus-
tomer satisfaction.
Internet Banking, and Employee Satisfaction

As technology advances, so do citizens’ expectations of banking services. While some
people still prefer traditional banking, new technology has had a favorable impact on
banking. People prefer using ATMs and posing devices for shopping to waiting in bank
queues. With the advent of new technology like mobile banking, the internet, and
ATMs, bank branches are getting quieter and employees are less stressed. Overall,
technology influences employee satisfaction positively [25]. Here are several e-banking
services that influence banking and job satisfaction. Also, Hammoud, Bizri, and El
Baba [26] argued that technology and innovation have significantly strengthened the
banking system.
H3: Internet banking has a significant positive effect on employee satisfaction.
Customer Satisfaction, and Bank Investment Decision
How does customer satisfaction contribute to more capital investment? Firstly, as
customer satisfaction includes both new consumer perceptions and experiences of the
quality of a business’s services and products, high customer satisfaction can lead to
highly predictable revenue flows and prospective chances for growth. Customers tend
to buy more from companies with which they are more engaged. Fornell et al. [27]
illustrate that customer satisfaction results in high clients expenditure and potential
demand. As a consequence, companies with high customer satisfaction can generate
more income.
Secondly, a steady and loyal client base is formed by customer satisfaction [28],
which reduces cash flow volatility, and future capital costs as well as enhancing
consumer loyalty, improving the company’s reputation, lowering transaction costs, and
increasing workforce efficiency and productivity. Consequently, great customer satis-
faction, regarding neoclassical investment theory, encourages enterprises to invest
more in resources. Vo et al. [29] found that enterprises with higher consumer satis-
faction will spend more on future capital expenditures. Overall, this study suggests that
consumer satisfaction influences a firm’s investment policy.
H4: There is a positive significant relationship between CSAT and Bank investment
decisions.
Employee Satisfaction and Bank Investment Decision
The human resource is regarded as an organization’s most valuable asset. Also,
employee motivation and job satisfaction play a big role in employees’ performance. In
other words, employees’ contentment with their jobs is critical to their performance.
Research by Bai et al. [30] demonstrates that restricting businesses’ power to fire
employees has two opposed effects on investment. The protection from wrongful ter-
mination and the fear of being fired might help employees to focus on their tasks, take
creative risks, and develop skills that benefit their current employment. These impacts
may lead to greater profitability and new or more desirable opportunities, resulting in
higher investments. Increasing the cost of labor transition reduces resource spending as
organizations prepare for increasing irreversibility investments.
H5: Employee satisfaction has a positive significant relationship with Bank
investment decisions.
The Mediating Roles of CSAT and ES on the Nexus Between IB and BID
Economic theory states that the expected return on investment (ROI) and the cost of
capital (COC) drive a company’s investment decisions. Tobin’s famous Q theory com-
pares the company’s ratio Q (the marginal market value of assets unit) with the marginal
cost of investment. As Tobin’s Q incorporates both expected cash flows and capital costs,
organizations can spend more when there is tremendous growth potential or low costs.
Sorescu and Sorescu [31] show that investing in companies with great customer
satisfaction generates financial returns with low risk. Similarly, Merrin et al. [32]
asserted that consumer satisfaction can be used to gauge stock price sentiment. These
banks are giving creative solutions to aid their customers and employees with scarce
resources in reducing operating costs, risks, and inefficiencies while enhancing cus-
tomer satisfaction and employee productivity [33]. Given the theoretical underpinnings
and empirical evidence, and the fact that internet banking may be necessary for staff
creativity, customers, and investment decisions, we predict that ES and CSAT mediate
the relationship between IB and BID. The following hypotheses will be investigated in
light of this assumption:
H7: Customer satisfaction would mediate the relationship between IB and Bank
H8: Employee satisfaction can mediate the relationship between IB and Bank
3 Data and Methodology
This research employs quantitative tools to collect and analyze the data. The survey
questionnaires quantified the study variables. This study’s goal is to examine the effects
of internet banking on bank investment decisions using employee and customer satis-
faction as mediators. The study’s participants were randomly selected from among bank
staff and customers. The survey instruments were translated into French and then back
into English to ensure that the intended implications of each question were understood.
The population of the investigation consisted of 2,011 employees and 496,009 cus-
tomers, source: Bank of Central African States (BEAC) and Central African Banking
Commission (COBAC), 2019 and according to the number of commercial banks
operating in the country. The data were gathered from 11 commercial banks in the
Republic of Congo, mainly in Brazzaville and Pointe-Noire. Based on the provided
population, a sample size of 1800 participants with a 5% confidence level was deter-
mined. The research engages a sample size of 1200 customers and 600 banks’ staff.
There were 1500 valid surveys gathered and evaluated, 1000 customers and 500
employees from the 1800-targeted respondents, reflecting an 83.33% response rate.
The study also employed a Likert scale ranging from 1 and 5, with 1 denote strongly
disagree and 5 denote strongly agree, and incorporated questions from other experts
who had undertaken similar research. The data was gathered from October 2019 to
June 2020.
The paper’s measurements were derived from past research and altered for this
present investigation. The internet banking items were measured using the questions
adopted from Rahi, Ghani, and Ngah [2]; the items for employee satisfaction was
adopted from Yee, Yeung, and Cheng [34], while the items for customer satisfaction
was taken from Hammoud, Bizri and El Baba [26] and finally items for bank invest-
ment decision are adopted from Ogunlusi and Obademi [16].
Justification for Path Analysis Using System of Regression Equations
Path analysis using System of Regression Equations (SRE) is one of the most exten-
sively utilized multivariate investigative procedures due to its ability to manage non-
standard information disseminations encountered in social sciences. Also, the System
of Regression Equations characterizes the coefficients of path analysis as regression
coefficients [35]. A well-defined hypothesis test can convincingly reject theoretical
predictions. Additionally, it enables the plotting of residuals and the examination of
data concerns such as heteroskedacity, outliers, autocorrelation, and non-normality [35,
36]. As a result, employing SRE modeling in Path analysis makes progressively a
rational sense for achieving the study’s objectives [36].
We derived the following system of equations from the conceptual framework
depicted in Fig. 1:
8
>
> BID ¼ a0 þ B0 IB þ e1
>
> CSAT ¼ a1 þ B1 IB þ e2
>
>
>
>
>
> ES ¼ a2 þ B2 IB þ e3
<
BID ¼ a3 þ B3 CSAT þ e4
ð1Þ
>
> BID ¼ a4 þ B4 ES þ e5
>
>
>
> BID ¼ a5 þ B5 IB þ B6 CSAT þ e6
>
>
>
> BID ¼ a6 þ B6 IB þ B7 ES þ e7
:
BID ¼ a7 þ B8 IB þ B9 ES þ B10 ES þ e8
With B0 ; B1 ; B2 ; B3 ; B4 ; B5 ; B6 ; B7 ; B8 ; B9 and B10 6¼ 0.

Confirmatory Factor Analysis (CFA)
To guarantee the data’s trustworthiness, we used the statistical package for social
sciences (SPSS V.26) to perform discriminant and convergent validity tests. The
Principal Component Analysis (PCA) with Varimax Rotation (Varimax with Kaiser
Normalization) was used to find factors with eigenvalues greater than one. This study
also employs exploratory factor analysis (EFA). The CFA represents a statistical
multivariate approach employed to assess the accuracy of measured items in describing
constructs. This study uses CFA to examine the common method variance. Normally,
variables having a loading factor of 0.500 and higher are considered for analysis [37].
The exploratory factor analysis (EFA) identified four factors accounting for 74.23% of
the overall variance in the research variables, with KMO = 0.763 and Scree plot
direction differing by a factor of five. The Bartlett’s Test of Sphericity was used to
assess construct validity, whilst Kaiser-Meyer Olkin (KMO) has been used to assess
individual variable sampling adequacy. Bartlett’s Test of Sphericity revealed that the
correlation between research’s constructs is 30020.196 and that is significant and rel-
evant (P < 0.000). Each factor’s scales exhibit a high connection with one another,
confirming the scales’ convergent validity [35]. The CFA and EFA results demonstrate
that the loadings values of 18 items are greater than 0.70. This implies that each item is
strongly integrated into its relying construct, as shown by the CFA and EFA results,
and demonstrates the indicators’ (items’) reliability and sufficiency [38].
Measurements Validity and Reliability

The Cronbach’s Alpha, composite reliability, and Average Variance Extracted coeffi-
cients were determined. The Cronbach’s Alpha (CA) is a measure of scale or item
internal consistency or reliability and is defined as:
Pk !
i¼1 rY i
2
k
CA ¼ 1 ð2Þ
k1 r2X
Where r2X denotes the observed total item scores variance and r2Y i represents the
construct variance.
The average variance extracted (AVE) and composite reliability (CR) are defined as
follows:
P
k2
AVE ¼ ð3Þ
n
P
ð kÞ 2
CR ¼ P 2 P ð4Þ
ð kÞ þ 1 k2
Here k denotes the factor’s loading value and n denotes the total number of
indicators.
The Cronbach’s Alpha (CA) values were over the acceptable range of 0.70. The
Composite reliability (CR) is also higher than 0.70 and indicates that the constructs are
very reliable [35]. The extracted average variance (AVEs) coefficients were assessed
for convergence validity. The AVEs values for this research range between 0.571 and
0.663, which is acceptable and above the threshold of 0.5. Table 1 summarizes the
validity and reliability findings.
The discriminant validity of estimating models was also assessed using the Fornell-
Larcker criterion. According to the Fornell-Larcker criterion, the AVEs square root must
be larger than any other correlation of the constructs’ associations with others [39].
Table 2 shows that the results of this examination satisfy the Fornell-Larcker criteria.
We conducted descriptive statistics to evaluate the variables’ standard deviations,
means, and correlations. The study found significant and positive correlations between
research variables, with coefficients ranging from 0.389 to 0.842. The correlation
coefficients in Table-3 are below 0.9, indicating no common methods bias concerning
the research constructs [40].
Common Method Bias (CMB) Test
Since both endogenous and exogenous constructs data were collected via question-
naires, a common bias test is required [40]. Harman’s single factor test was used to
evaluate the study’s constructs. The results showed that a single component merged
explains around 44% of the model’s variance, which is below the threshold of the level
of 50%, suggesting no common method bias in the constructs [40]. The measurement
model’s fit indices such as SRMR, RMSEA, NFI, and CFI were also examined. As
reported in Table 4, all indices of fit were acceptable. Therefore, the indicators of scale
were considered suitable for subsequent investigations.
Table 1. Indicators loadings and reliability of construct

Indicators FL CA CR AVE
IB1 0.747 0.819 0.869 0.571
IB2 0.771
IB3 0.746
IB4 0.772
IB5 0.741
ES1 0.732 0.814 0.878 0.644
ES2 0.861
ES3 0.861
ES4 0.745
CSAT1 0.832 0.823 0.883 0.653
CSAT2 0.792
CSAT3 0.804
CSAT4 0.804
BID1 0.773 0.870 0.907 0.663
BID2 0.732
BID3 0.895
BID4 0.762
BID5 0.895
Notes: ES- Employee Satisfaction, IB–Internet Banking, BID-
Bank Investment Decision, CSAT-Customer Satisfaction, FL-Item
Loadings, CR-Composite Reliability, AVE-Average Variance
Extracted, and CA- Cronbach’s Alpha.
Table 2. Validity of discriminant (Fornell-Larcker criterion)

BID CSAT ES IB
BID 0.854
CSAT 0.542 0.808
ES 0.464 0.446 0.802
IB 0.667 0.610 0.410 0.756
Note: Bold and underlined Values are
the square root of AVE
Table 3. Inter-Items means (M), Std. Deviation, and Correlation

Variables Mean Std. Deviation (1) (2) (3) (4)
(1) BID 17.04 4.239 1.000
(2) IB 16.51 3.991 0.620** 1.000
(3) CSAT 13.32 3.400 0.842** 0.566** 1.000
(4) ES 14.75 3.339 0.456** 0.389** 0.434** 1.000
Note: N = 1500; **. Correlation is significant at the 0.01 level (2-tailed)
Table 4. Model fit summary

Measure Saturated model Threshold value
SRMR 0.075 0.08
RMSEA 0.029 0.05 – 0.08
NFI 0.891 >0.95 or >0.90
CFI 0.972 >0.95
Note: CFI-Comparative Fit Index, SRMR-
Standardized Root Mean Square Residual,
NFI-Normed Fit Index, and RMSEA-Root
Mean Square Error of Approximation
4 Results
The questionnaire yielded the following demographic information: gender, age, edu-
cational level, and monthly income. According to the results, 752 participants were
males (50.1%) and 748 respondents were females (44.9%). 400 respondents repre-
sented 26.7% of the population between the ages of “18–25 years”; 503 respondents
represented 33.5% of the population between the ages of “26–35 years”; and 356
respondents represented 23.7% of the population between the ages of “36–45 years.“
Additionally, the results indicate that 123 respondents, or 8.2%, are from the ele-
mentary level; 240 respondents, or 16.0%, are from high school; 361 respondents, or
24.1%, are from the diploma level; 511 respondents, or 34.1%, are from the under-
graduate level; and 265 respondents, or 17.7%, are from the postgraduate level.
4.1 Hypothesis Analysis

We used Stata (reg3, estimate 2sls) to perform the structural equation model (Path
analysis), as proposed by Westland [35] that two stages least squares regression rep-
resents one of the strongest multivariate techniques to assess path analysis coefficients.
The outcomes of the variables that were tested revealed a positive and significant
association with the predictor factors.
Direct Effects of IB on BID, CSAT and ES; and the Effects of CSAT and ES on
BID
To examine the direct effects of internet banking (IB), customer satisfaction (CSAT),
and employee satisfaction (ES) on bank investment decisions, we tested a structural
equation modeling as presented in Fig. 2. The outcomes indicated that unstandardized
coefficients (see Table 5) from internet banking to bank investment decision, customer
satisfaction and employee satisfaction were respectively 0.659 (p < 0.000), 0.482
(p < 0.000) and 0.326 (p < 0.000). Thus, H1, H2, and H3 were accepted.
Both customer satisfaction and employee satisfaction exerted statistically and sig-
nificant effects on bank investment decision (b = 0.175, p < 0.000) and (b = 0.578,
p < 0.000). Thus, H4 and H5 were also confirmed.
Table 5. Hypothesis test, unstandardized coefficients

(1) (2) (3) (4) (5) (6) (7) (8)
BID CSAT ES BID BID BID BID BID
IB 0.659*** 0.482***
0.326*** 0.136*** 0.555*** 0.128***
(0.0215) (0.0181)
(0.0199) (0.0106) (0.0223) (0.0107)
***
CSAT 0.175 0.084*** 0.070***
(0.0108) (0.0124) (0.0129)
ES 0.578*** 0.320*** 0.458***
(0.0292) (0.0267) (0.0117)
Constant 6.165*** 5.369*** 9.375*** 1.388*** 8.513*** 0.344* 3.163*** -0.010
(0.365) (0.308) (0.338) (0.148) (0.442) (0.162) (0.430) (0.185)
N 1500 1500 1500 1500 1500 1500 1500 1500
R2 0.385 0.320 0.151 0.888 0.207 0.899 0.439 0.900
F-Stat 937.42 705.07 267.38 11874.4 392.21 6676.47 585.52 4498.44
Note: Standard errors are in parentheses, *** p < 0.001, ** p < 0.01, * p < 0.05; BID-
Bank Investment Decision, IB -Internet Banking, CSAT- Customer Satisfaction and ES-
Employee Satisfaction
4.2 Mediation Analysis

Table 6 demonstrates the indirect effects of internet banking (IB) on bank investment
decision (BID), this effect is mediated via customer satisfaction (CSAT) and employee
satisfaction (ES) factors.
Table 6. The mediation tests (Direct effects, indirect effects, and Total effects)
Direct effects
Paths Coef. St. Err. Z P>z [CI at 95%]
CSAT < - IB 0.610 0.018 33.691 0.000 0.572–0.641
BID < - CSAT 0.870 0.007 122.176 0.000 0.855–0.884
BID < - ES 0.024 0.010 2.522 0.012 0.006–0.044
BID < - IB 0.127 0.010 12.229 0.000 0.107–0.148
ES < - IB 0.410 0.023 18.170 0.000 0.364–0.457
Indirect effects
BID < - IB 0.540 0.016 32.860 0.000 0.501–0.567
Total effects
CSAT < - IB 0.610 0.018 33.691 0.000 0.572–0.641
BID < - CSAT 0.870 0.007 122.176 0.000 0.855–0.884
BID < - ES 0.024 0.010 2.522 0.012 0.006–0.044
BID < - IB 0.667 0.015 45.097 0.000 0.638–0.701
ES < -IB 0.410 0.023 18.170 0.000 0.364–0.457
Note: Bootstrapping outputs from SmartPLS
Table 7 indicates that the indirect relationship between IB and bank investment
decision via CSAT was statistically significant as b = 0.530 and p < 0.000. Similarly,
the mediating effects of ES on the relationship between IB and bank investment
decisions were 0.010 (p < 0.012). Thusly reasons, statistically positive and significant
mediation effects may be inferred, indicating that H6 and H7 were likewise accepted.
The results in Tables 6, 7, and Fig. 2 demonstrate that direct and indirect effects of
IB on bank investment decision (BID) are statistically and positively significant,
indicating that customer satisfaction (CSAT) and employee satisfaction (ES) partially
mediate the effect of IB.
Table 7. Specific indirect effects

Paths Std. Coef. St. Err. Z P > z [95% CI]
IB - > CSAT - > BID 0.530 0.016 32.698 0.000 0.495–0.556
IB - > ES - > BID 0.010 0.004 2.527 0.000 0.002–0.018
Fig. 2. Structural equation modelling
4.3 Discussion
We notice in this study that innovation components and stakeholders influence bank
investment decisions. Our outcomes support prior studies showing BID is affected by
IB and stakeholders [17, 21, 23], and contradict other researchers’ findings that BID is
positively linked to only a few of the above components [29, 31]. Also, we found that
various components of innovation and technologies have varied effects on BID. In
particular, IB has both direct and indirect impacts on the decision to invest in banks.
Similarly, employee and customer satisfaction partially mediated the decision to
connect internet banking and bank investment. These results demonstrate that the mere
focus on workers and customers should be taken into account to guarantee the benefits
anticipated from the decisions of banks. In addition, in promoting bank investment
decisions, stakeholders tend to play the most important role. This is possibly due to the
culture of relationship-oriented in the corporate atmosphere of the banks, which illus-
trates social and interpersonal agreement relationships. Companies with significant
innovation technologies can build successful interactions with various stakeholders,
leading to additional market opportunities and enhance outputs. In short, these results add
to the literature on innovation by revealing key mechanisms by which components of
innovation help banks to make better decisions. They also contribute to the HRM lit-
erature by advising which IB aspects a firm should focus on more heavily. In addition,
our findings support the idea that numerous variables can mitigate the impact of IB on
bank investment decisions. Nonetheless, our research is amongst the first to explore the
role of CSAT and employee satisfaction in mediation. In particular, the impact of the
components of the IB on bank investment decisions is partly mediated by both customer
satisfaction and employee satisfaction. Customer satisfaction tends to have more effect
than employee satisfaction, as seen in Tables 6 and 7 and leads to bank investment
decisions. These results add to current innovation and HRM literature and may serve as
guidance for companies to develop a comprehensive HR process to boost the IB, the
loyalty of stakeholders, and the decision to invest by banks. These results shed a new
bright on structural mechanisms, which ease bank investment decision-making.
5 Conclusion, Limitations, and Opportunities for Further

Studies
Our results have two major consequences for practitioners. First, as the IB aspect is
related to customer satisfaction, employee satisfaction, and bank investment decisions,
managers should aim to continually improve and sustain their IB through investments
in selection and recruitment of employees, development, and training of employees,
optimization and design of procedures, and other human resource management
(HRM) activities. Managers must remember that different elements of innovation have
different cumulative effects on investment decisions. Thus, based on the investment
functioning emphasized by their market strategies, they should devote more resources
to unique components. In particular, if their companies are to boost the benefits of
investment, they should make greater efforts to improve the satisfaction of stake-
holders. However, if the priority of the plan is to increase revenues from investments,
particular attention has to be considered to customer satisfaction. Their workers may
already have advanced experience, abilities, and skills relevant to their job. In this
situation, it is not a primary concern to develop human resources further.
In this paper, we have built a theoretical framework that explains the mediating
roles of customer satisfaction and employee satisfaction on the relationship between IB
and bank investment decisions, and verified the hypotheses by examining data gathered
from Congolese banks. The outcomes demonstrate that the component of innovation,
namely internet banking, is positively associated with customer satisfaction and
employee satisfaction, which ultimately contribute positively to the bank investment
decision. Both customer satisfaction and employee satisfaction partially mediate the
effects of IB on bank investment decisions.
Most innovation research focuses only on exploring the individual or contextual
components that improve it since few companies can survive, prosper and succeed in
the competitive world in which we live without innovation. This study gives a
framework on which more research will help companies to understand when and how
the positive effects of the creative activities of their workers can be stimulated, while
also minimizing the negative effects.
There are some drawbacks to our research, which in turn give possible directions
for further investigation. Firstly, we used a cross-sectional method to explore the
inherent mechanism of the effect of internet banking on bank investment decisions.
A cross-sectional design, however, does not disclose causality among constructs. To
establish a clear causal association and analyze the possible period lag effects of IB
accumulation, future work can conduct longitudinal investigations. Second, this
research was performed in the sense of Congolese banks. As financial industries are
typically innovation-oriented and knowledge-intensive, associations IB, customer sat-
isfaction, employee satisfaction, and decision-making on bank investment may be
stronger in this environment than in other organizations. To corroborate the validity of
our study results, it is suggested that future studies should gather data from different
companies. Third, by concentrating on the mediating role of stakeholders (employee
and customer) satisfaction, this study explores the underlying process between IB and
bank investment decisions. Stakeholders subjectively assess the scale elements of the
above-mentioned constructs. More realistic metrics for bank investment decisions will
need to be obtained in the future study, such as a rate for recurrent investment or capital
investment for investment decision-making. Lastly, our research emphasizes two major
mediators, namely customer satisfaction and employee satisfaction between IB and
bank investment decisions. Yet, some significant contextual components can mitigate
their impact, particularly from an innovation security perspective. Future research will
therefore need to investigate the moderating effects of such contextual variables to gain
more insights, such as internet security and perceived risk. Moreover, the future study
can also extend this work by adding other mediating variables such as bank perfor-
mance, bank competitive advantage, and bank sustainability.
Appendix
Questionnaire of the study

“Items questionnaire”
“Internet Banking (IB)”
“IB1. You feel confident while using the e-banking method to access money”
“IB2. Internet banking enables me to complete a transaction quickly”
“IB3. Online banking enhances your effectiveness in doing banking transactions”
“IB4. You find online banking useful”
(continued)
(continued)
Questionnaire of the study
“IB5. Online banking saves your time”
“Customer Satisfaction (CSAT)”
“CSAT1. I am satisfied with the transaction processing via E-Banking services”
“CSAT2. I think I made the correct decision to use the E-Banking services”
“CSAT3. My satisfaction with the E-Banking services is high”
“CSAT4. Overall, E-Banking services are better than my expectations”
“Employee Satisfaction (ES)”
“ES1. We are satisfied with the salary of this bank”
“ES2. We are satisfied with the promotion opportunity of this bank”
“ES3. We are satisfied with the job nature of this bank”
“ES4. We are satisfied with the relationship of my fellow workers of this company”
“Bank Investment Decision (BID)”
“BID1. My investment reports better results than expected”
“BID2. My investment in IT and Innovation has demonstrated increased cash flow growth in
the past 5 years”
“BID3. My investment in technology innovation has a lower risk compared to the market
financial products in general”
“BID4. My investment in sustainability activities has a high degree of safety”
“BID5. My investment proceeds will be used in a way that benefits society”
References
1. Aboobucker, I., Bao, Y.: What obstruct customer acceptance of internet banking? Security
and privacy, risk, trust, and website usability and the role of moderators. J. High Technol.
Managem. Res. 29, 109–123 (2018). https://doi.org/10.1016/j.hitech.2018.04.010
2. Rahi, S., Abd Ghani, M., Hafaz Ngah, A.: Integration of unified theory of acceptance and
use of technology in internet banking adoption setting: evidence from Pakistan. Technol.
Soc. 58, 101120 (2019). https://doi.org/10.1016/j.techsoc.2019.03.003
3. Hoehle, H., Scornavacca, E., Huff, S.: Three decades of research on consumer adoption and
utilization of electronic banking channels: a literature analysis. Decis. Support Syst. 54, 122–
132 (2012)
4. Alalwan, A.A., Baabdullah, A.M., Rana, N.P., Tamilmani, K., Dwivedi, Y.K.: Examining
adoption of mobile internet in Saudi Arabia: extending TAM with perceived enjoyment,
innovativeness and trust. Technol. Soc. 55, 100–110 (2018). https://doi.org/10.1016/j.
techsoc.2018.06.007
5. Martins, C., Oliveira, T., Popovič, A.: Understanding the Internet banking adoption: a
unified theory of acceptance and use of technology and perceived risk application. Int. J. Inf.
Manage. 34, 1–13 (2014). https://doi.org/10.1016/j.ijinfomgt.2013.06.002
6. Rahi, S., Abd Ghani, M.: Customer’s perception of public relation in e-commerce and its
impact on e-loyalty with brand image and switching cost. J. Internet Banking Commerce.
2016, 21 (2016)
7. Ma, L., Zhai, X., Zhong, W., Zhang, Z.-X.: Deploying human capital for innovation: a study
of multi-country manufacturing firms. Int. J. Prod. Econ. 208, 241–253 (2019)
8. Lane, K., Rosewall, T.: Firms’ investment decisions and interest rates. RBA Bull. 2015, 1–7
(2015)
9. Hjelseth, I.N., Meyer, S.S., Walle, M.A.: What factors influence firms’ investment
decisions? Econ. Comment. 2017(10) (2017). http://hdl.handle.net/11250/2558941
10. Velmurugan, G., Selvam, V., Abdul, N.N.: An empirical analysis on perception of investors’
towards various investment avenues. MJSS 6, 427 (2015). https://doi.org/10.5901/mjss.
2015.v6n4p427
11. Carbó‐Valverde, S., Cuadros‐Solas, P.J., Rodríguez‐Fernández, F.E.Y.: The effect of banks’
IT investments on the digitalization of their customers. Glob. Policy. 11, 9–17 (2020).
https://doi.org/10.1111/1758-5899.12749
12. Holcomb, T.R., Holmes, R.M., Jr., Connelly, B.L.: Making the most of what you have:
managerial ability as a source of resource value creation. Strat Manage. J. 30, 457–485
(2009). https://doi.org/10.1002/smj.747
13. Ndofor, H.A., Sirmon, D.G., He, X.: Firm resources, competitive actions, and performance:
investigating a mediated model with evidence from the in-vitro diagnostics industry. Strat.
Manage. J. 32, 640–657 (2011)
14. Nandini, P.: Gender differences in investment behavior with reference to equity investments.
Doctoral dissertation, Pondicherry University (2018)
15. Antony, A., Joseph, A.I.: Influence of behavioural factors affecting investment decision—an
AHP analysis. Metamorphosis 16, 107–114 (2017). https://doi.org/10.1177/
0972622517738833
16. Ogunlusi, O.E., Obademi, O.: The impact of behavioural finance on investment decision-
making: a study of selected investment banks in Nigeria. Global Bus. Rev. 22, 1–17 (2019)
17. Flor, C.R., Hansen, S.L.: Technological advances and the decision to invest. Ann Finan. 9,
383–420 (2013)
18. Goodman, T.H., Neamtiu, M., Shroff, N., White, H.D.: Management forecast quality and
capital investment decisions. Account. Rev. 89, 331–365 (2014). https://doi.org/10.2308/
accr-50575
19. Panousi, V., Papanikolaou, D.: Investment, idiosyncratic risk, and ownership. J. Finan. 67,
1113–1148 (2012). https://doi.org/10.1111/j.1540-6261.2012.01743.x
20. Hu, T., Xie, C.: Competition, innovation, risk-taking, and profitability in the Chinese
banking sector: an empirical analysis based on structural equation modeling. Discret. Dyn.
Nat. Soc. 2016, 1–10 (2016). https://doi.org/10.1155/2016/3695379
21. Turedi, S., Zhu, H.: How to generate more value from IT: the interplay of IT investment,
decision making structure, and senior management involvement in IT governance. CAIS 4,
26 (2019)
22. Božić, L., Botrić, V.: Innovation investment decisions: are post (transition) economies
different from the rest of the EU? Eastern J. Eur. Stud. 8, 25–43 (2017). https://nbn-
resolving.org/urn:nbn:de:0168-ssoar-61825-3
23. Crespi, G., Zuniga, P.: Innovation and productivity: evidence from six Latin American
countries. World Dev. 40, 273–290 (2012). https://doi.org/10.1016/j.worlddev.2011.07.010
24. Nazaritehrani, A., Mashali, B.: Development of e-banking channels and market share in
developing countries. Finan. Innov. 6(1), 1–19 (2020). https://doi.org/10.1186/s40854-020-
0171-z
25. Turkyilmaz, A., Akman, G., Ozkan, C., Pastuszak, Z.: Empirical study of public sector
employee loyalty and satisfaction. Industr. Manage. Data Syst. 111, 675–696 (2011). https://
doi.org/10.1108/02635571111137250
26. Hammoud, J., Bizri, R.M., El Baba, I.: The impact of e-banking service quality on customer
satisfaction: evidence from the lebanese banking sector. SAGE Open 8, 215824401879063
(2018)
27. Fornell, C., Rust, R.T., Dekimpe, M.G.: The effect of customer satisfaction on consumer
spending growth. J. Mark. Res. 47, 28–35 (2010). https://doi.org/10.1509/jmkr.47.1.28
28. Sarkar Sengupta, A., Balaji, M.S., Krishnan, B.C.: How do customers cope with service
failure? A study of brand reputation and customer satisfaction. J. Bus. Res. 68, 665–674
(2015)
29. Vo, L.V., Le, H.T.T., Le, D.V., Phung, M.T., Wang, Y.-H., Yang, F.-J.: Customer
satisfaction and corporate investment policies. J. Bus. Econ. Manag. 18, 202–223 (2017)
30. Bai, J., Fairhurst, D., Serfling, M.: Employment protection, investment, and firm growth.
Rev. Finan. Stud. 33, 644–688 (2020). https://doi.org/10.1093/rfs/hhz066
31. Sorescu, A., Sorescu, S.M.: Customer satisfaction and long-term stock returns. J. Mark. 80,
110–115 (2016). https://doi.org/10.1509/jm.16.0214
32. Merrin, R.P., Hoffmann, A.O.I., Pennings, J.M.E.: Customer satisfaction as a buffer against
sentimental stock-price corrections. Mark Lett. 24, 13–27 (2013). https://doi.org/10.1007/
s11002-012-9219-9
33. Obeng, A.Y., Mkhize, P.L.: An exploratory analysis of employees and customers’ responses
in determining the technological innovativeness of banks. Electron. J. Inf. Syst.
Develop. Countries. 80, 1–23 (2017). https://doi.org/10.1002/j.1681-4835.2017.tb00586.x
34. Yee, R.W.Y., Yeung, A.C.L., Cheng, T.C.E.: The impact of employee satisfaction on quality
and profitability in high-contact service industries. J. Oper. Manag. 26, 651–668 (2008).
https://doi.org/10.1016/j.jom.2008.01.001
35. Westland, J.C.: Structural Equation Models: From Paths to Networks. Springer International
Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-16507-3
36. Pea-Assounga, J.B.B., Yao, H.: The mediating role of employee innovativeness on the nexus
between internet banking and employee performance: evidence from the republic of Congo.
Math. Probl. Eng. 2021, 1–20 (2021). https://doi.org/10.1155/2021/6610237
37. Ringle, C.M., Wende, S., Becker, J.-M.: SmartPLS 3. Bönningstedt: SmartPLS (2015)
38. Henseler, J., Ringle, C.M., Sarstedt, M.: A new criterion for assessing discriminant validity
in variance-based structural equation modeling. J. Acad. Mark. Sci. 43(1), 115–135 (2014).
https://doi.org/10.1007/s11747-014-0403-8
39. Hair, J.F., Sarstedt, M., Ringle, C.M., Gudergan, S.P.: Advanced Issues in Partial Least
Squares Structural Equation Modeling. Sage Publications, Thousand Oaks (2017)
40. Lowry, P.B., Gaskin, J.: Partial least squares (PLS) structural equation modeling (SEM) for
building and testing behavioral causal theory: when to choose it and how to use it. IEEE
Trans. Profess. Commun. 57, 123–146 (2014). https://doi.org/10.1109/TPC.2014.2312452
Inductions of Usernames’ Strengths
in Reducing Invasions on Social
Networking Sites (SNSs)
Md. Mahmudur Rahman1 , Shahadat Hossain2(B) , Mimun Barid3 ,

and Md. Manzurul Hasan4
1
Bangabandhu Sheikh Mujibur Rahman Aviation and Aerospace University
(BSMRAAU), Dhaka, Bangladesh
mahmud@bsmraau.edu.bd
2
City University, Dhaka, Bangladesh
shahadat.cse@cityuniversity.edu.bd
3
University of South Asia, Dhaka, Bangladesh
4
American International University-Bangladesh (AIUB), Dhaka, Bangladesh
manzurul@aiub.edu
Abstract. The internet allows people to make social contacts and to

communicate among different Social Networking Sites (SNSs). If users
are ignorant of their exposures, it may reveal their identities and may
enhance cyber-attacks. Hence, password secrecy is usually prioritized to
protect our personal information. Besides, usages of the same usernames
across many SNSs expose users’ identities to other users and intrud-
ers. Hackers can use usernames to track usage patterns and manipulate
social media accounts or systems. As a result, in terms of security, user-
names must be treated the as same as passwords. This empirical study
illuminates the analyses of usernames’ strengths by predicting weak user-
names with machine learning models to limit poor username selections.
We have analyzed the Reddit usernames dataset (83958) to see how fre-
quently people choose weak usernames for their accounts. Our predictive
models correctly categorize strong and weak usernames with an average
accuracy of 87%.
Keywords: Username selection · SNS · Support vector classification ·

Linear support vector classification · Random Forest · KNN
1 Introduction
The advancement of the internet has created a plethora of opportunities for com-
munications through Social Networking Sites (SNSs). Increasingly complicated
algorithms are being used to make it more challenging for the hackers to hack
personal information from popular websites. Every system is unsecured until it
is standalone. But we shall have to be connected with others in the twenty first
century. In this connection we are trying to make our credentials complex to
https://doi.org/10.1007/978-3-030-93247-3_32
332 Md. M. Rahman et al.
complexes, i.e. polynomial to exponential time breakable systems. Even so, they
cannot guarantee that all users have strong usernames. The same username on
various social networking sites allows hackers to recognize the pattern and to
hack a user account. We usually focus on strong passwords, where as both user-
names and passwords are essential for digital authentications as well as digital
authorizations. Knowing the correct username is required to locate a particular
resource within the system while knowing the correct password is required to
access that resource. Since ancient times, passwords have been employed as a
unique code to identify malicious individuals. On the other hand, people use
basic and easy-to-remember passwords [24]. As computer power and internet
access have been increased, cyberattacks are being observed. In this regard, new
techniques of threatening victims are being evolved. For example, a network of
internet-connected devices (botnet) may be utilized to dramatically lower the
Time to Crack (TTC) [6], requiring more difficult passwords.
Nonetheless, a lack of cybersecurity awareness prohibits consumers from
putting appropriate safeguards in place. This study demonstrates that users are
willing to forego protections in exchange for convenience. As a result, the risk of
an attack increases, especially if multiple accounts reuse identical data. Hence,
poor password management magnifies the affect of breaches: the most recent
attacks are measured in millions of compromised logins and password combi-
nations [16]. Many attackers target weak users by obtaining credentials from
accounts that can be used to access other websites. According to the Identity
Theft Resource Center [3], there are approximately 1,108 data breaches affecting
US consumers in 2020. However, research on the usernames limit to very low to
the best of our knowledge. Therefore, our study employs machine learning(ML)
classification techniques to assess usernames’ strengths in order to reduce the
risk of selecting a common (may not always applicable) and weaker username
for different SNSs. Our dataset of 83958 usernames from Reddit is investigated
by various parameters and have been predicted through our models. Then on, it
can appropriately categorize any weak username. Thus, it reduces the likelihood
of getting attacked by well-known notorious hackers. Our research demonstrates
that weaker usernames are widely used, and our predictive models determine
whether a username is vulnerable or not. Our objectives of this paper are given
below:
– To observe username strengths in SNSs.

– To predict username strengths using RF, KNN, SVC, and LinearSVC.
– To analyze and to compare evaluation matrices of applied models.
– To give a parametric observation on the Reddit dataset.
The rest of the sections are as follows. First, we describe some related works
in Sect. 2. Then, in Sect. 3, we describe the procedure we have followed to perform
the analysis. Next, in Sect. 4, we give our findings and classification performance
before concluding in Sect. 5.
Inductions of Usernames’ Strengths Over SNSs 333
2 Related Work
Individuals commonly use usernames and passwords to log into their accounts.
Many devices now use iris scanners and pattern matchings to determine user
authentication, and however, the alphanumeric texts are still the most preva-
lent. As a result, many cyber-security professionals have concentrated on pass-
words: research shows that nearly all users establish short, memorable passwords
that they reuse across numerous accounts [9,21]. Furthermore, more stringent
passwords and complicated access control measures are being created (e.g., two-
factor authentication). Today, users can implement their passwords in various
situations, but they may choose to ignore them for the sake of ease. While pass-
word complexity is always being prioritized, we have given less attention to
usernames, with only few studies assessing their impacts on account’s securities
[12,16]. However, research has revealed that increasingly personalized login cre-
dentials, such as biometric recognitions are being used (e.g., Fingerprint, Iris).
In any case, account names serve as the first security line for websites that hold
sensitive data (e.g., bank accounts, trading data). Fandakly et al. [10] empha-
sized account names as the first line for the credential that impacts an account
security.
Shi [19] proposed a mechanism of user discrimination based on the username
characteristics. Furthermore, to secure users’ passwords online, the authors of
[23] presented a virtual password idea in which users can choose from a variety
of virtual password schemes ranging in security from bad to robust. Basta et al.
[7] noticed that relatively little emphasis had been made on the username format
and observed that most firms designate an account using specific versions of the
person’s first name and last name. As a result, acquirings of usernames become
pretty straightforward. According to Wang et al. [25], hackers are attracted
to cloud-based password managers. With the master password and the user’s
phone, they suggested a bidirectional generative password manager. In addition,
Perito et al. [17] utilized the N-gram model to assess the uniqueness of a user’s
username and to locate several profiles belonging to the same person.
Leona et al. [22] investigated five distinct password management behaviors
to see how well consumers understood password quality. According to their
research, users comprehend good and bad passwords, but the concept of pass-
word security varies from person to person. Password reuse is another source of
security vulnerability and to detect it, Jeffrey et al. [14] devised a two-pronged
strategy involving detection and remediation. Coban et al. [8] also concentrated
on username reuse, extracting similar usernames from other internet domains
using various machine learning algorithms.
3 Methodology
This section defines the process and methods we follow in this study. The archi-
tecture of the proposed way to username selection approach is shown in Fig. 1.
Fig. 1. Architecture of username selection approach
3.1 Username Selection Process

The username contains a string of letters, numbers, and a few special charac-
ters. It should not be permissible to use the same username twice within the
same SNS. Some users worry about compromising their accounts, although they
are using a strong, unique password from a password manager and only are
reusing usernames. If we have a similar username for several SNSs, our infor-
mation can be traced using a single username, which is extremely sensitive to
being breached [17]. According to Kumar et al.[15], more than 50% of users in
different SNSs use the same logins. Therefore, in order to distinguish ourselves
from online profiling, we must adopt different usernames. In this research, we
have examined usernames’ strengths and human behaviors, which are the rea-
sons for using vulnerable usernames. Human tendencies that lead to breaches
while using usernames include:
1. Username shown in text: Normally, the password is masked while typing,

but the username is always unveiled in clear text, which plays a vital role for
other miscreants to steal and use that in password generation using a brute
force mechanism.
2. Social sharing of username: Social sharings of usernames also harm cre-
dential hackings as they are exposed publicly.
3. Email as username: Most social sites are now using Email as username and
increased the vulnerability as these are featured in different business diaries
and business cards.
4. Similar usernames on different sites: Users have high tendencies to reuse
a username-like passwords which can be used as the stepping stone for attack-
ing websites of low security.
5. Personal information based username: Many of us malpractice the per-
sonal name-based usernames, that link to the other accounts of the same
person.
3.2 Dataset Description
We collect username data from a repository Kaggle, where we find a dataset titled
Reddit usernames [2]. Reddit’s dataset contains around 26 million usernames
who have commented at least once in the comment section. For our subsequent
study, we take 83958 usernames from the 26 million records in the MS Excel file
and exclude the frequencies of their comments. Following that, we employ sev-
eral parameters to determine the usernames’ strengths. Several security experts
have stated that several features are required to form a secure username [1].
As a result, we measure following those specifications. We use the length of the
username, the number of digits in the username, existences of special characters,
and camel cases as our attributes. We identify these characteristics for each user-
name after selecting these attributes. For example, if a username only consists
of characters, then its strength will be ‘very weak’. Similarly, if the username
contains only one feature, then it will be considered ‘weak.’ If it has two features,
we consider our usernames of ‘normal’ strength. Further, if it has three features,
then it is ‘strong’. If it contains all four features, then its strength will be ‘very
strong’.
Fig. 2. Bar chart of usernames’ trends
We have found some observations after visualization of our dataset. First, we

find that different users prefer to use different types of usernames. Some users
choose short usernames, and some use long ones. Figure 2 shows that most of
the users’ on Reddit select usernames with length more than eight. Besides, only
few people use camel-case, making their usernames guessable. The tendencies
to use numbers and special characters are lower in number, that also make
their usernames predictable. Based on these four criteria, we have measured the
strengths of usernames (in Fig. 3). Suppose, users are maintaining four criteria
in their usernames, which indicate the high strengths. We have found that few
people have very strong usernames, and most people use normal, weak, and very
weak usernames.
Fig. 3. Different strength of username
3.3 Data Pre-processing
To classify our data properly by the ML classification algorithm, we have per-

formed some data processings. We have converted our strength feature from
nominal to numeric data. So that we can accurately find out weak usernames.
The username strength (very weak, weak, normal ) is given a label of ‘0’ and
the strength (strong, very strong) is given a ‘1’ label. The dataset is shuffled
and splited into training (70%) and testing (30%) data for applying different
models. In our dataset, we find that the maximum username strength is weaker
than that of the strong label. To mitigate this imbalance, we have performed
Synthetic Minority Oversampling Technique (SMOTE) from python imbalance
library before Random Forest (RF) analysis.
3.4 Random Forest (RF)
Deterministic trees are a type of classifier that manifests itself as the space par-
titioning of a recursive instance. Nodes in the decision tree form a directed tree
with no incoming edges, along with a root node [13,18]. We choose RF because
it has a shallow bias and variation. A different decision tree algorithm, j48 is
superior to RF in terms of correctness [4] if we compromise some miscellaneous
issues. Tree density in RF improves efficiency and accuracy estimation, but it
slows down the computation. There should be many features considered when
dividing a node and a minimum number of leaf nodes. Regression and classifi-
cation can be done with the RF algorithm, which has more excellent stability
than that of other decision tree algorithms.
3.5 K-nearest Neighbor (KNN)
KNN is a non-parametric classifier. A KNN model is created using training

samples, and then the quality of those samples is determined by the KNN process
(classification or regression) [5,11]. In order to use KNN, we need to specify
these types of information. If k(k = 1, 2, 3, 4, 5, ......n) is positive, then the new
member is identified by the maximization vote of a neighbor. This k reflects
the nearest neighbor’s number. KNN-regression calculates the average value of
k-nearest neighbors for each new member that joins the model. By applying
Euclidean distance, the training samples are incorporated into the model.
3.6 Support Vector Classification (SVC)

Support vector machines for classification (SVC) are supervised learning models
with related learning algorithms used in machine learning to evaluate data and to
predict results [20]. The basic SVM is a non-probabilistic binary linear classifier
that takes input data and predicts which of two possible output classes will be
produced for each given input. Based on a set of training examples are labeled as
methods and are separated into two categories. Then, SVC training technique
generates a model that assigns fresh examples to one of two categories. SVC
model maps the instances as points in space, with a substantial gap separating
the examples of the various categories. New examples are then mapped into the
same region and classified according to which side of the divide they are on.
3.7 Linear Support Vector Classification (LinearSVC)

This SVC method performs classification using a linear kernel function and works
well with large numbers of samples. The Linear SVC model has more parameters
than that of the SVC model, such as penalty normalization (‘L1’ or ‘L2’) and
loss function. The linear kernel methodology cannot be adjusted because linear
SVC is dependent on it. After training the model, different regularizations can
be utilized in the model. Non-linear classifiers are slower than linear classifiers.
4 Experiment and Result Analysis

Our experiment constructs a machine learning system using Python(version 3)
to learn various usernames. We have used Scikit-learn, a python library. After
completing data preprocessing, we have started our experiment by classifying
the dataset through various machine learning algorithms. We have selected five
features for the classification. Next, we have scaled our dataset and split 83958
usernames into training (70%) dataset and testing (30%) dataset. Once the train-
ing has been completed, we have used RF KNN, SVC, and LinearSVC classifica-
tion models to predict our testing data. We have recorded the results for further
analysis.
Table 1. Confusion matrix of different RF, KNN, SVC, and LinearSVC models
Actual value RF KNN

Weak Strong Weak Strong Weak Strong
Predicted value Weak TP FN 20276 983 19456 1840
Strong FP TN 2356 1573 1956 1936
Actual value SVC LinearSVC
Weak Strong Weak Strong Weak Strong
Predicted value Weak TP FN 20227 1069 20173 1123
Strong FP TN 2268 1624 2409 1483
Test data have been anticipated by fitting training data to models and are
represented in a confusion matrix. Confusion Matrix shows a table of True Pos-
itive (TP), False Positive (FP), False Negative(FN), and True Negative(TN)
values from the test data (Table 1).
Table 2. Result table
Precision Sensitivity Specificity Accuracy

RF 0.90 0.90 0.40 0.87
KNN 0.91 0.91 0.50 0.85
SVC 0.90 0.90 0.42 0.87
Linear SVC 0.89 0.89 0.38 0.86
After applying the test data to our model, we have evaluation matrices of our
models, which is shown in Table 2. As we aim to predict the usernames which are
weak in strength, we concentrate our analysis through observing how our models
perform to predict weak usernames from the dataset. We have found that in terms
of accuracy Random Forest and SVC model have achieved 0.87 score, which is the
highest among other models. In sensitivity matrices, both RF and SVC models
achieve 0.90 scores, that means these two models can predict True Positive 90%
correctly. KNN model achieves 0.91 score in terms of sensitivity. We have observed
that all models perform low specificity as we have fewer strong usernames in the
Reddit dataset. Our models can predict the weak usernames, which assist the users
in choosing strong usernames for any social media account.
Fig. 4. Importance of features.
Furthermore, we have the importance of features (Fig. 4) from our RF model

through applying feature importances , a method from python Random Forest
classifier library. For example, we have observed that usernames containing num-
bers, have lengths with 8 characters (minimum), and camel-cases impact the RF
model more. Besides, usernames containing special characters have less impor-
tance on our RF model.
5 Future Research Direction and Conclusion

Despite gaining considerable insights into different username strengths, we have
discovered fewer datasets and insufficient literature on usernames’ strength anal-
ysis from various SNSs. In the future, we hope to apply other machine learning
techniques to verify if the same individual has previously used any username at
any account, so it eliminates the possibility of online tracing and profile hacking
in addition to our username strictness.
Along with a password, a user’s username may contain crucial informa-
tion regarding online privacy. Unfortunately, compromising accounts often have
weak usernames (no numeric characters, special characters, camel-case, or long
length). This article emphasizes the need for strong usernames for account secu-
rity and presents machine learning classifiers to detect weak usernames. Our
analyses have shown that Random Forest(RF) and Support Vector Classifier
(SVC) can correctly categorize 87% of our weak usernames. This study will help
SNS owners to secure their clients in a better manner and urge users to choose
more robust usernames to avoid being hacked.
References
1. The importance of creating a strong online username. https://www.bellco.org/
advice-planning/fraud-prevention/online-security/username-tips.aspx
2. Reddit usernames. https://kaggle.com/colinmorris/reddit-usernames
3. Identity theft resource center’s
R 2020 annual data breach report reveals 19 percent
decrease in breaches, January 2021. https://www.idtheftcenter.org/identity-theft-
resource-centers-2020-annual-data-breach-report-reveals-19-percent-decrease-in-
breaches/
4. Ali, J., Khan, R., Ahmad, N., Maqsood, I.: Random forests and decision trees. Int.
J. Comput. Sci. Issues (IJCSI) 9(5), 272 (2012)
5. Altman, N.S.: An introduction to Kernel and nearest-neighbor nonparametric
regression. J. Am. Stat. 46(3), 175–185 (1992)
6. Anish Dev, J.: Usage of botnets for high speed MD5 hash cracking. In: Third
International Conference on Innovative Computing Technology (INTECH 2013),
pp. 314–320 (2013). https://doi.org/10.1109/INTECH.2013.6653658
7. Basta, A., Basta, N., Brown, M.: Computer security and penetration testing. Cen-
gage Learning (2013)
8. Çoban, Ö., Inan, A., Ozel, S.A.: Your username can give you away: matching
Turkish OSN users with usernames, vol. 10, pp. 1–15 (2021). http://www.ijiss.
org/ijiss/index.php/ijiss/article/view/896
9. Das, A., Bonneau, J., Caesar, M., Borisov, N., Wang, X.: The tangled web of
password reuse. In: NDSS, vol. 14, pp. 23–26 (2014)
10. Fandakly, T., Caporusso, N.: Beyond passwords: enforcing username security as
the first line of defense. In: Ahram, T., Karwowski, W. (eds.) AHFE 2019. AISC,
vol. 960, pp. 48–58. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-
20488-4 5
11. Fix, E., Hodges, J.L., Jr.: Discriminatory analysis-nonparametric discrimination:
small sample performance. California Univ. Berkeley, Technical report (1952)
12. Grassi, P., et al.: Digital identity guidelines: authentication and lifecycle manage-
ment, 22 June 2017. https://doi.org/10.6028/NIST.SP.800-63b
13. Ho, T.K.: The random subspace method for constructing decision forests. IEEE
Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
14. Jenkins, J., Grimes, M., Proudfoot, J., Lowry, P.: Improving password cyberse-
curity through inexpensive and minimally invasive means: detecting and deter-
ring password reuse through keystroke-dynamics monitoring and just-in-time
fear appeals. Inform. Technol. Dev. 20, 196–213 (2013). https://doi.org/10.1080/
02681102.2013.814040
15. Kumar, S., Zafarani, R., Liu, H.: Understanding user migration patterns in social
media. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
16. Onaolapo, J., Mariconti, E., Stringhini, G.: What happens after you are PWND:
Understanding the use of leaked webmail credentials in the wild. In: Proceedings
of the 2016 Internet Measurement Conference, pp. 65–79 (2016). https://doi.org/
10.1145/2987443.2987475
17. Perito, D., Castelluccia, C., Kâafar, M.A., Manils, P.: How unique and traceable
are usernames? CoRR abs/1101.5578 (2011). http://arxiv.org/abs/1101.5578
18. Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers-a sur-
vey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 35(4), 476–487
(2005). https://doi.org/10.1109/TSMCC.2004.843247, http://ieeexplore.ieee.org/
document/1522531/
19. Shi, Y.: A method of discriminating user’s identity similarity based on username
feature greedy matching. In: Proceedings of the 2nd International Conference on
Cryptography, Security and Privacy, pp. 5-9. ACM, March 2018. https://doi.org/
10.1145/3199478.3199512
20. de Souza, D.L., Granzotto, M.H., de Almeida, G.M., Oliveira-Lopes, L.C.: Fault
detection and diagnosis using support vector machines-a SVC and SVR compari-
son. J. Safety Eng. 3(1), 18–29 (2014)
21. Stainbrook, M., Caporusso, N.: Convenience or strength? Aiding optimal strategies
in password generation. In: Ahram, T.Z., Nicholson, D. (eds.) AHFE 2018. AISC,
vol. 782, pp. 23–32. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-
94782-2 3
22. Tam, L., Glassman, M., Vandenwauver, M.: The psychology of password manage-
ment: a tradeoff between security and convenience. Behav. IT 29, 233–244 (2010).
https://doi.org/10.1080/01449290903121386
23. Umadevi, P., Saranya, V.: Stronger authentication for password using virtual pass-
word and secret little functions. In: International Conference on Information Com-
munication and Embedded Systems (ICICES 2014), pp. 1–6. IEEE, February 2014.
https://doi.org/10.1109/ICICES.2014.7033936
24. Ur, B., Bees, J., Segreti, S.M., Bauer, L., Christin, N., Cranor, L.F.: Do users’
perceptions of password security match reality? In: Proceedings of the 2016 CHI
Conference on Human Factors in Computing Systems, pp. 3748–3760 (2016)
25. Wang, L., Li, Y., Sun, K.: Amnesia: A bilateral generative password manager. In:
36th IEEE International Conference on Distributed Computing Systems, ICDCS
2016, Nara, Japan, June 27-30, 2016. pp. 313–322. IEEE Computer Society (2016).
10.1109/ICDCS.2016.90, https://doi.org/10.1109/ICDCS.2016.90
Tomato Leaf Disease Recognition Using
Depthwise Separable Convolution
Syed Md. Minhaz Hossain1,2 , Khaleque Md. Aashiq Kamal1 , Anik Sen1,2 ,
and Kaushik Deb2(B)
1
Department of Computer Science and Engineering, Premier University,
Chattogram 4000, Bangladesh
2
of Engineering and Technology, Chattogram 4349, Bangladesh
debkaushik99@cuet.ac.bd
Abstract. Various diseases of plants are the main reason behind reduc-
ing production, resulting in a significant loss in agriculture. The evolu-
tion of deep learning and its diversification use in different fields extends
the opportunity to recognize plant disease accurately. The challenges in
plant disease recognition are limited to homogeneous background and
high memory for a large number of parameters. In this work, a dataset
of 2880 tomato plant images is used to train the depthwise separable
convolution-based model to reduce the trainable parameters for memory
restriction devices such as mobile. An independent set of test images,
including 612 tomato plant images of nine diseases, is used to assess
the model under different illumination and orientations. Depthwise Sep-
arable Convolution-based tomato leaf disease recognition model enti-
tled reduced MobileNet outperforms according to the trade-off among
accuracy, computational latency, and scale of parameters, and achieves
98.31% accuracy and 92.03% F1-score.
Keywords: Tomato leaf diseases · Memory size · Computational

latency · Depthwise separable convolution · Sustainable accuracy
1 Introduction
One of the most cultivated crops is the tomato in the present world. According to
Statista, about 180.77 kilo-metric tons of tomato were cultivated in the year 2019
worldwide [14]. The quantities of production are vastly affected by the diseases
of tomato plants. Therefore, early identification of diseases plays an important
role in monitoring the plants in agricultural industry. For this reason, different
methods are applied in the field of agriculture. The well-known application of
chemical ways harms the health of fresh plants and humans and influences the
environment negatively. Moreover, these methods increase the cost of tomato
productions. In general, the diseases infect the leaves and leaflets, the roots,
the stems, and the fruits of the tomato plants. In this study, diseases that are
affecting leaves and leaflets are considered.
https://doi.org/10.1007/978-3-030-93247-3_33
342 S. Md. Minhaz Hossain et al.
Machine learning opens the scopes of automated post-harvest monitoring [1],

prediction of crop production with respect to weather parameters in [4], plant
leaf disease recognition in [8] and guidance of robots in the field of agriculture.
Typical machine learning models are suitable and successful in certain condi-
tions, and specific setup. The accuracy of these models decreases considerably
in uncontrolled conditions. Considering the diversification of the deep learning
model, researchers promoted to apply it to achieve advanced performance in
agriculture. However, the uses of deep learning still face some challenges: limita-
tion of device memory (number of parameters), sustainable accuracy (not a fall
in testing a new dataset), and latency in computation (floating-point operations
and multiply-accumulate operation).
Sustainable accuracy is a vital crisis in CNN based plant leaf disease (PLD)
recognition models. Adding new PLD images reduces the accuracy [5]. In addi-
tion, various works are restricted to symmetric backgrounds [5,15] and respon-
sive to the situations of image capturing [11]. Among all the PLD recognition
works, there are two benchmark works for tomato leaf disease recognition in [2]
and [3] achieve better accuracy. However, they do not investigate restriction to
symmetric backgrounds.
Moreover, most of the cutting edge CNN models, such as VGG in [2,5,15],
InceptionV4 in [15], AlexNet in [5], DenseNet in [15], InceptionV3, DenseNet201,
and custom CNN model in [13], achieved promising accuracy rate for their deep
and dense constructions. Though, these models have limitation to space (mem-
ory) for mobile and IoT based diseases recognition of plant leaf and costs of
computation for faster convergence.
We propose a depthwise separable convolution (DSC)-based tomato leaf dis-
ease (DSCPLD) recognition model called reduced MobileNet to overcome the
mentioned restrictions of present PLD recognition models. Our emphasis is to
establish a concrete trade-off among accuracy, number of parameter, and com-
putational latency for mobile/IoT based tomato leaf disease recognition using
modification in MobileNet based on [9].
2 Related Work
The manual monitoring of plant diseases is chaotic, hard-working, and chal-

lenging. Moreover, the system depends on the situation. Therefore, researchers
investigate automatic detection systems to overcome this hectic problem and
make the activities of farmers more effective and correct. Several upgrades have
been applied in CNN models for detecting PLDs in recent years.
Ferentinos et al. [5] developed a CNN model for recognizing 58 diseases of
25 plants. They achieved 99.53% accuracy rates for VGG. On the other hand,
accuracy was decreased for unknown data to the training model and reduced by
25–35%. In [15], VGG, ResNet, Inception and DenseNet were used and achieved
99.75% of accuracy for DenseNet for recognizing 38 PLDs of 14 classes. However,
the cost of computation is an issue. Liang et al. [11] proposed a modified CNN
model for detecting rice blast disease and reached better accuracy than the
Tomato Leaf Disease Recognition Using Depthwise Separable Convolution 343
Table 1. Dataset descriptions of tomato leaf disease recognition model
Disease class #Org. images Distribution techniques

Train Validation Test
Bacterial spot 490 320 102 68
Early blight 490 320 102 68
Healthy 490 320 102 68
Late blight 490 320 102 68
Leaf mold 490 320 102 68
M osaic virus 490 320 102 68
Septoria leaf spot 490 320 102 68
Spider mites T wo spotted spide mite early 490 320 102 68
Y ellowLeaf Curl V irus rymildew 490 320 102 68
Total 4410 2880 918 612
feature extraction technique. In this work, the modified CNN model achieved
an accuracy of 95.83%. However, this model is sensitive to conditions of image
capturing and requires increasing the dataset. On the other side, authors in
[6] proposed a novel CNN model based on 4199 images of rice leaf diseases. It
recognized diseases of rice by decreasing the network parameters to recognize
five types of diseases of rice leaf. Their model achieved the 99.78% as training
accuracy and 97.35% as validation accuracy. The authors of [2] applied CNN
based model to detect the nine diseases of tomatoes. They used 10000 of tomato
images as training and 500 images as testing from the plant village. The accuracy
of the model is 91.2%.
The authors in [10] studied the computational complexity and memory
requirements for PLD recognition. The work in [12] proposed a study among
various strategies of pooling techniques, named mean-pooling, max-pooling, and
stochastic pooling, to detect rice leaf diseases using CNN models. They achieved
95.48% accuracy for stochastic pooling. The researchers found that it needs to
increase the sample sizes for optimizing the number of parameters.
3 Materials and Method

In this section, our proposed model is discussed in detail.
3.1 Dataset
In our work, 2880 original RGB images of tomato plants of nine different diseases
are used to train, 918 images are used to validate, and 612 images are used to test
the model. The total dataset consists of 4410 images. These images are collected
from the PlantVillage dataset1 . The images of nine different class of tomato leaf
diseases are shown in Fig. 1. Complete information regarding the tomato leaf
disease dataset is described in Table 1.
1
https://www.kaggle.com/emmarex/plantdisease.
Fig. 1. Samples of tomato leaf disease images: (a) Bacterial Spot, (b) Early Blight, (c)
Healthy, (d) Late Blight, (e) Leaf Mold, (f) Mosaic Virus, (g) Septorial Leaf Spot, (h)
Two Spotted Spider Mite, and (i) Yellow Leaf Curl Virus.
3.2 Applying Directional Augmentation to Images

Image captured in different orientations is one of the main issues in tomato
leaf diseases recognition system. The features of any image can be spatially
transformed due to the relative arrangement of the capturing device. Moreover,
it is also problematic to have images from a different angle to overcome the issues
[7]. As a result, We applied directional augmentation techniques to increase our
dataset, which increases our model’s capacity.
An image’s mirror symmetry means to increase all pixels after considering
a line as an axis. A vertical line is selected of an image in horizontal mirror
symmetry, and then all pixels are increased. On the other hand, A horizontal line
of an image is selected in vertical mirror symmetry, and all pixels are increased.
3.3 Applying Lighting Disturbance to Images

The weather condition has a vital role in capturing an image, and image quality
is affected by the sunlight, shadow, and cloudy weather. To improve the gen-
eralization ability, we create images by changing the sharpness, brightness, and
contrast values.
Increasing the sharpness of an image indicates intense edges and bor-
ders as the objects in that image emerge. Let, Q(x,y) and Q(x, y) =
[r(x, y), b(x, y), g(x, y)]T are considered as pixel in a RGB. We use Laplace to
pixel for adding sharpness to that image by using Eq. 1.
⎡ 2 ⎤
∇ [r(x, y)]
∇2 [Q(x, y)] = ⎣∇2 [g(x, y)]⎦ (1)
∇2 [b(x, y)]
An image’s brightness means to increase or decrease of values of a pixel in
RGB mode. Suppose the original RGB value is B0 and the transformation factor
of brightness is d. We get the changed RGB value (B) after using the brightness
transformation factor, as shown in Eq. 2.
B = B0 × (1 + d) (2)
An image’s contrast means increasing the bigger RGB value and decreas-
ing the smaller RGB value by considering the brightness median. Suppose, the
original RGB value B0 , the transformation of brightness factors is d, and the
median of brightness is i. We find the changed RGB value (B) after applying
the contrast feature as shown in Eq. 3.
B = i + (B0 − i) × (1 + d) (3)
3.4 Disease Recognition Using Reduced MobileNet
In this section, we describe the basic depthwise separable convolution, basic

module of reduced MobileNet, models design and tuning.
Depthwise Separable Convolution. Depthwise separable convolution has

two convolutions: depthwise convolution and pointwise convolution. It splits 3
× 3 convolutions into a 3 × 3 depthwise convolution and a 1 × 1 pointwise
convolution. DSC operation is consists of two steps. Depthwise convolution is
a channel-wise convolution. It performs the convolution using individual input
channels. Then it performs pointwise convolution, which is similar to traditional
convolution with kernel size 1 × 1. Each channel’s output is combined by point-
wise convolution. The traditional convolution’s (CostC ) cost of computation is
shown in Eq. 4.
CostC = M.M.K.K.N.P (4)
On the other hand, the cost for depthwise separable convolution (CostD ) is
shown in Eq. 5.
CostD = M.M.K.K.N + M.M.N.P (5)
The traditional convolution’s weight (WC ) is shown in Eq. 6.
WC = K.K.N.P (6)
The depthwise separable convolution’s weight (WD ) is shown in Eq. 7.
WD = K.K.N + N.P (7)
where, N is considered as the number of input channel and P is considered as

the number of output channel. K × K is considered as width and height of the
kernel and M × M is considered as width and height of the feature map of input.
Finally, Eqs. 8 and 9 show the reduction on weights (FW ) and operation
(FCost ).
WD 1 1
Fw = = + 2 (8)
WC P K
Table 2. Reduced MobileNet architecture for tomato leaf disease recognition
Function Filter/Pool #Filters Output #Parameters

Input - - 224 × 224 0
Convolution 3 × 3 32 32 × 222 × 222 896
Depthwise convolution 3 × 3 32 32 × 64 × 64 32,800
Pointwise convolution 1 × 1 64 64 × 64 × 64 2112
Depthwise convolution 3 × 3 64 64 × 1 × 1 262,208
Pointwise convolution 1 × 1 128 128 × 1 × 1 8320
Global average pooling - - 1 × 1 × 128 0
Dense - - 1 × 1 × 12 1,161
Softmax - - 1 × 1 × 12 0
CostD 1 1
FCost = = + 2 (9)
CostC P K
The cost for computing depthwise separable convolution with K × K filter
can be reduced K 2 times than the traditional convolutional layer [9].
Basic Depthwise Separable Convolution Modules. Two variations of con-

volution are used in depthwise separable convolution: in first one, pointwise con-
volution adjacent to depthwise convolution; in another one, batch normalization
and ReLU used between each of depthwise convolution and pointwise convolu-
tion. From these concepts, we propose reduced MobileNet, based on module in
as shown in Fig. 2 for recognizing tomato leaf disease.
Fig. 2. Primary module for tomato leaf disease recognition based on depthwise sepa-
rable convolution.
Model Design and Tuning. Architecture of the tomato leaf disease recog-
nition model based on MobileNet entitled reduced MobileNet is represented in
Table 2 with input size 224 × 224. We split our dataset into three parts: train,
validation, and test in the ratio of 70-20-10. RMSprop optimizer with 0.001
learning rate is used. Batch size of 32 is used and the model was trained for 200
epochs.
4 Result and Observation

Recognition of tomato leaf disease experiments are performed on Intel(R) Core
i7 8700U 3.2 GHz with 8 GB of RAM. The proposed system is executed with the
sklearn packages of Python.
4.1 Performance Evaluation

To evaluate our proposed reduced MobileNet recognition model’s outcome, we
compare it with VGG16, VGG19, and AlexNet based on mean test accuracy
and mean F1-score. As the number of samples in the dataset is imbalanced, we
use some performance measures, such as mean class accuracy and mean class
F1-score. The comparison among tomato leaf disease recognition models with
perspective to training accuracy, validation accuracy (val accuracy), mean test
accuracy and mean F1-score is as shown in Table 3.
Table 3. Performance of various tomato leaf disease recognition models
Models Train accuracy Val accuracy Test accuracy F1-score

VGG16 99.45% 99.05% 99.10% 92.54%
VGG19 99.48% 99.21% 98.21% 90.19%
AlexNet 97.32% 95.12% 94.65% 86.78%
Reduced MobileNet 99.23% 98.72% 98.31% 92.03%
Table 4. A concrete representation of computational latency and model size of various

tomato leaf disease recognition models
Models Image size FLOPs MACC # Parameters

VGG16 180 × 180 213.5 M 106.75 M 15.2 M
VGG19 180 × 180 287.84 M 143.92 M 20.6 M
AlexNet 224 × 224 127.68 M 63.84 M 6.4 M
Reduced MobileNet 224 × 224 3.70 M 2.15 M 0.31 M
4.2 Selection of the Best Model Based on All Criteria
From Table 3, it is shown that VGG16 performs better mean test accuracy of
99.10% and F1-score 92.54% on our tomato dataset. It is 0.79% better in accu-
racy and 0.51% in F1-score than our proposed model. However, VGG16 requires
almost 49 times more parameters than our proposed recognition model, as shown
in Table 4. Considering all factors included in Tables 3 and 4, reduced MobileNet
is best among all the tomato leaf disease recognition models for mobile and IoT-
based recognition. Performances of each class for tomato leaf disease recognition
are shown in Table 5.
The confusion matrix, Accuracy vs epoch curve and Loss vs epoch curve of
reduced MobileNet are shown in Figs. 3 and 4(a–b).
Table 5. Accuracy, Precision, Recall and F1-score of each classes of tomato leaf disease
Class Accuracy Precision Recall F1-score Support

T omato bacterial spot 97.71% 86% 96% 90% 68
T omator early blight 98.04% 93% 78% 85% 68
T omato healthy 97.71% 82% 94% 88% 68
T omato late blight 98.69% 97% 91% 94% 68
T omato leaf mold 96.89% 89% 82% 85% 68
T omato mosaic virus 98.53% 88% 100% 94% 68
T omato septoria leaf spot 99.35% 100% 94% 97% 68
T omato Spider mites T wo spotted spider mite 99.35% 100% 94% 97% 68
T omato Y ellow Leaf Curl V irus rymildew 98.53% 93% 94% 93% 68
Total 98.31% 93.33 % 90.78% 92.03% 612
Fig. 3. The confusion matrix of reduced MobileNet model.
Fig. 4. (a) Accuracy vs epoch curve for tomato leaf disease recognition model and (b)
Loss vs epoch curve for tomato leaf disease recognition model.
4.3 Processing Steps Using Our Reduced MobileNet Model
A processing example of tomato leaf image using reduced MobileNet is depicted

in Fig. 5(a–e) with some activations on each of the layers.
Fig. 5. Activations on: (a) convolution layer; (b) first depthwise convolution layer; (c)
first pointwise convolution layer; (d) second depthwise convolution layer; (e) second
pointwise convolution layer.
4.4 Evaluation of Generalization for Our Proposed Model
For evaluation of generalization in our reduced MobileNet, we test this model

using a tomato leaf disease dataset taiwan.7z (https://data.mendeley.com/
datasets/ngdgg79rzb/1/files/255e82b6-2b3a-41d2-bd07-7bfaf284a533 (accessed
on 17 February 2021)). We consider only tomato bacterial spot, tomato healthy,
and tomato late blight images for testing our DSCPLD model. There are 493
infected tomato leaf images, including 176 bacterial spot images, 160 healthy
images, and 157 late blight images. Reduced MobileNet achieves the best mean
test accuracy of 92.45% for recognizing the three tomato disease classes, and
accuracy falls down 5.86% less than testing with our dataset, as shown in Table 6.
Table 6. Evaluation for generalization using various optimizers
Datasets SGD Adam RMSprop

Tomato dataset 78.75% 84.34% 92.45%
Our dataset 80.06% 92.21% 98.31%
4.5 Comparison
In our work, we investigate a fall in accuracy for testing a new set of tomato
images. However, generalization is better than the work in [5]. By performing
our proposed recognition model, it is proved that we can reduce the computa-
tional latency and memory spaces for mobile and IoT-based tomato leaf disease
recognition than other benchmark CNN models, as shown in Table 4. Compari-
son with the other state-of-the-art work is shown in Table 7; where, NR = not
resolved, R = resolved, PR = partially resolved.
Table 7. Comparison of the performances between our model and a benchmark model
Reference Classes Models Generalization Complexity Memory Accuracy

[2] 10 Custom NR NR R 92.23%
Our work 9 Reduced MobileNet R R R 98.31%

Precision agriculture is a crucial point in the agro-industry. Improvements in
technologies make it easy to detect and classify diseases accurately. However,
in precision agriculture, sustainable accuracy, complexity analysis for detection
time and memory size are also becoming important factors.
In our work, we include images under uneven illumination and different orien-
tations, making the model more efficient to trace the tomato leaf diseases appro-
priately. However, accuracy falls at 5.86% using reduced MobileNet in case of
testing new data from another dataset. This model provides better performance
than [5] in terms of rate of fall in accuracy. Besides, reduced MobileNet is very
effective for mobile and IoT-based tomato leaf disease recognition due to the
lower network parameters of the model and lower computational cost.
Further, we will focus on the stages of tomato leaf diseases to visualize the
symptoms’ changes with time.
References
1. Vasilyev, A.A., Samarin, G.N., Vasilyev, A.N.: Processing plants for post-harvest
disinfection of grain. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019.
AISC, vol. 1072, pp. 501–505. Springer, Cham (2020). https://doi.org/10.1007/
978-3-030-33585-4 49
2. Agarwal, M., Singh, A., Arjaria, S., Sinha, A., Gupta, S.: ToLed: tomato leaf
disease detection using convolution neural network. Procedia Comput. Sci. 167,
293–301 (2020)
3. Ashok, S., Kishore, G., Rajesh, V., Suchitra, S., Sophia, S.G.G., Pavithra, B.:
Tomato leaf disease detection using deep learning techniques. In: 2020 5th Inter-
national Conference on Communication and Electronics Systems (ICCES), pp.
979–983 (2020). https://doi.org/10.1109/ICCES48766.2020.9137986
4. Borse, K., Agnihotri, P.G.: Prediction of crop yields based on fuzzy rule-based
system (FRBS) using the Takagi Sugeno-Kang approach. In: Vasant, P., Zelinka,
I., Weber, G.-W. (eds.) ICO 2018. AISC, vol. 866, pp. 438–447. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-00979-3 46
5. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis.
6. Hossain, S.M.M., et al.: Rice leaf diseases recognition using convolutional neural
networks. In: International Conference on Advanced Data Mining and Applica-
tions, pp. 299–314 (2021)
7. Hossain, S.M.M., Deb, K., Dhar, P.K., Koshiba, T.: Plant leaf disease recognition
using depth-wise separable convolution-based models. Symmetry 13(3), 511 (2021)
8. Hossain, S.M.M., Deb, K.: Plant leaf disease recognition using histogram based
gradient boosting classifier. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO
2020. AISC, vol. 1324, pp. 530–545. Springer, Cham (2021). https://doi.org/10.
1007/978-3-030-68154-8 47
9. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile
vision applications (2017)
10. Kaur, S., Pandey, S., Goel, S.: Plants disease identification and classification
through leaf images: a survey. Arch. Comput. Methods Eng. 26, 507–530 (2019)
11. Liang, W.J., Zhang, H., Zhang, G.F., Cao, H.X.: Rice blast disease recognition
using a deep convolutional neural network. Sci. Rep. 9(1), 1–10 (2019)
12. Lu, Y., Yi, S., Zeng, N., Liu, Y., Zhang, Y.: Identification of rice diseases using
deep convolutional neural networks. Neurocomputing 267, 378–384 (2017)
13. Patidar, S., Pandey, A., Shirish, B.A., Sriram, A.: Rice plant disease detection and
classification using deep residual learning. In: Bhattacharjee, A., Borgohain, S.K.,
Soni, B., Verma, G., Gao, X.-Z. (eds.) MIND 2020. CCIS, vol. 1240, pp. 278–293.
Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6315-7 23
14. Shahbandeh, M.: Vegetables production worldwide by type 2019. https://www.
statista.com/statistics/264065/global-production-of-vegetables-by-type/
15. Too, E.C., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of fine-tuning
deep learning models for plant disease identification. Comput. Electron. Agric.
161, 272–279 (2019)
End-to-End Scene Text Recognition
System for Devanagari and Bengali Text
Prithwish Sen(B) , Anindita Das , and Nilkanta Sahu
Indian Institute of Information Technology, Guwahati, India

{prithwish.sen,anindita.das,nilkanta}@iiitg.ac.in
http://iiitg.ac.in/faculty/nilkanta/
Abstract. Scene text detection and recognition have been explored

extensively in the recent past but very few among those are in Indian lan-
guages. In this paper, an end-to-end system for the detection and recog-
nition of Devanagari and Bengali text from scene images is proposed. The
work is done in three stages namely detection, matra removal, and char-
acter recognition. Firstly, the PP-YOLO network for scene text detection
is used. As both the languages under consideration have matra-line, a
U-Net based matra-line removal strategy is used next. U-Net almost
achieves 100% accuracy for segmentation, which contributes to the over-
all performance of the proposed scheme. Finally, character recognition
is done with the help of a CNN. To train the CNN a dataset is created
with Devanagari and Bengali text images. Experimental results show the
efficiency of the individual stages as well as the efficiency of the scheme
as a whole.
Keywords: OCR · Bengali · Devanagari · Matra · U-Net · CNN ·

PP-YOLO
1 Introduction
Scene text holds enormous semantic information which helps to acknowledge
the scene better. Because of this, detection and recognition of text in natural
scenes drew significant attention from the computer vision community in the last
decade. The problem with the natural scene image is that they exhibit varied
illuminations, distortions, fonts, background, etc. Most of the existing work done
on natural scene text detection and recognition primarily focuses on English, but
minimal works were made with Indian languages.
Indian languages have vast diversity in its pattern or way of representation.
This wide variance of Indian languages makes it more challenging to find a
unique solution for OCR. As far as OCR is concerned, the Bengali (Bangla) and
Devanagari language got a good share of attention from the researcher in this
regard. Bengali and Devanagari are spoken by 210 million and 182 million people
all over the world respectively. These two languages are also the most spoken
languages in India. Bengali is the national language of Bangladesh. Usually,
Indian languages contain vowels, consonants, and compound letters. Compound
https://doi.org/10.1007/978-3-030-93247-3_34
End-to-End Scene Text Recognition System for Devanagari 353
letters are a combination of two or more letters and have different shapes. These
languages do not have any upper/lower case. Most of the characters in a word are
connected by horizontal lines called the ‘MATRA’ line. Bengali and Devanagari
(and other Indian languages) OCR is more challenging than the English language
because of the large set of compound letters and also matra’s makes it difficult
to segment the characters.
The process of scene text detection/recognition involves many functional
steps such as text detection, word segmentation, character segmentation, OCR
engine, or character classification. OCR is the process of automatic recognition
of text from images of text. In the proposed scheme, We considered text as an
object and applied one of the efficient object detection algorithms i.e. PP-YOLO
for text detection [14]. The recognition task is divided into two sub-tasks, first
matra removal and then character recognition. For matra removal, probably for
the first time, we used U-net architecture.
2 Literature Review
The problem of scene text recognition mainly consists of two sub problems,
firstly scene text detection and secondly text recognition. Some researchers
tried to solve these two tasks separately, where some approached it as a single
problem. For scene text detection initially hand-crafted statistical features were
used. Approaches based on statistical Features [25] and intelligence of humans
were used to evaluate text detection and text recognition. These Hand Crafted
schemes use properties like Stroke Width Transform [17], where an image oper-
ator is used to evaluate the stroke width of pixels. In 2011, Hog-Features [22],
a method presented by the authors to spot the words from an unconstrained
environment. MSERs [16] which includes a selector to exploit properties of text
that are of higher-order in nature. Mishra et al. [13] used an English dictionary
to compute higher-order prior to estimate recognition.
With the rise of deep learning algorithms, researchers started exploring deep
learning based schemes [18] and got significant success. A Sanskrit targeted
OCR system [6] is proposed where an attention-based LSTM model is used for
extracting Sanskrit characters. Azeem et al. [4] detects counter also segments
digits and recognizes them using a Mask-RCNN. In the year 2021 [23], a spatial
graph-based CNN to identify paragraphs is proposed. This process involves 2
steps as line splitting and clustering. Ghosh [7] and his group tried to represent
the semantic information of the movie poster by transfer learning to recognize
text from graphics-rich posters. Recently OCR systems seem to include train-
ing of both detection and recognition [21] altogether. In 2017 [24], CNN and
RNN based end to end model to recognize french with the extraction of textual
content are proposed. Again, RCNN based deep network to address text detec-
tion and recognition of scene images [11] with an evaluation of features only
once is introduced. Zhang [26] reinforces text reading and information extrac-
tion together with a fusion of textual features and multimodal visual. In the year
2021, Huang et al. [9] proposed an end-to-end Multiplexed Multilingual Mask
354 P. Sen et al.
TextSpotter which helps in the identification of scripts at word level and also
helps to recognize.
3 Proposed Scheme
In the proposed scheme (Fig. 1), an image which is primarily a natural scene
image with/without text content is fed into the system. The system first localized
the word in the scene followed by several other steps to recognize each and every
character at the scene text. The detailed methodology is described below.
Fig. 1. System diagram
3.1 Text Detection
Considering text detection as a regression problem so as to specify separate

bounding boxes and associated confidence score. A neural network is trained to
predict text bounding boxes and confidence scores directly from natural scene
text images in one evaluation. The PP-YOLO model [12] is trained with 2000
Bengali natural scene text images, IIIT-ILST [1]. It is obvious that lower the FPS
higher the accuracy of the detection scheme. PP-YOLO produces localization
errors but it is unlikely to forecast false positives in the background.
3.2 Matra Line Removal Using U-Net
After detecting the text, a top-down segmentation approach is applied to extracts

the characters. In most of the Bengali and Devanagari characters, the presence
of the matra line creates trouble in separating the characters in a word. So
removing the matra line [19] can easily isolate the characters in a word.
To remove matra line we used U-Net model, which had been used successfully
for image segmentation for other application [20]. U-Net takes two sets of input,
original text image, and Images with only matra lines. The model is trained
for these data and which reflects the change in the two images as an output.
It contains an expansive path and a contracting path i.e. right side and the
left side. The left side reproduces the typical model of a CNN. Each of these
downsampling steps in the network, the feature channels gets twice. The right
side path step follows upsampling of the feature channels which gets halved.
Trimming is essential because of the deprivation of border pixels in each of
convolution. Finally, mapping feature vectors to the corresponding classes is
done. In total, our model contains 23 convolutional layers. Thus when this model
is tested on sample images it removes the matra line with 100% accuracy.
3.3 Character Segmentation and Classification
After the matra line is removed from the word, the word undergoes charac-
ter segmentation with connected component analysis. It finds contour for each
character and separates them. Connected Component is the process of finding
connected edges in a given image with the same pixel intensity. Hence, a dis-
tinguishable boundary between pixels of the same intensity and with different
intensities is found. Finally, each character from the image is separated by a
distinguishable boundary based on the region of interest.
A multilingual OCR system generally uses one of two approaches. I) It
includes a classifier for all the character sets of all languages under consider-
ation. II) It designs a separate classifier for each language which is preceded by
a language classifier. In the proposed scheme we used the first method.
Fig. 2. CNN block diagram
Segmented Characters from above fed into classifiers as shown in Fig. 2 and
outputs the predicted class for each character of the input image. At first, input
characters are pre-processed with grayscale conversion and histogram equaliza-
tion. A convolutional neural network takes the input and generates feature maps
for the corresponding input. ReLU is used as an activation function for each
356 P. Sen et al.
convolutional layer which eliminates the vanishing gradient problem. After suc-
cessful training, we predict which character class the input characters belong to
with the softmax function.
4 Experimental Result
To train our text detection module we used IIIT-ILST [1] dataset, along with
2000 collected scene images. For recognition module a combination of Bangla-
LekhaIsolated [2], ISI Bengali [3], Devanagari [15] and ours collection of printed
datasets are used. The whole process of scene text recognition can be realized
from Fig. 3.
Fig. 3. Scene text recognition pipeline
For text detection performance, precision and recall metrics are evaluated.
Further, to find the average precision, the area under the curve plotted as in
Fig. 4 must be evaluated. This is done by finding 11 point segmentation of recall
and interpolated precision. Table 1 shows the different values of segmented recall
and interpolated precision. Hence, the resulting Average Precision is found out
to be 90.32%.
Table 1. Segmented recall and interpolated precision
Segmented 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall
Interpolated 1 1 0.83 0.84 0.86 0.86 0.86 0.83 0.88 0.976 1
precision
Fig. 4. Performance evaluation plot
To check the versatility of the proposed architecture, the whole recognition

dataset has been divided into training set (85%) and testing set (15%).
Firstly, we trained our model with the text detection dataset mentioned
above. The detection pipeline is trained with the help of darknet and was done
with NVIDIA’s Tesla K80 GPU. Secondly, for recognition with mixed Dataset
taking a batch size of 100 and the number of epochs equals 60 with the learn-
ing rate of the model being 0.001. The proposed model achieves 98.23% in
terms of accuracy which shows that the model is performing well with differ-
ent languages. The proposed model almost achieved 100% accuracy in matra
line removal using the U-Net model. The results of implementing the proposed
model on the datasets are portrayed along with the comparison in Table 2.
Table 2. Model accuracy comparison
Language Other works Our method

Bengali and Devanagari 92.00% [8] 98.89%
95.39% [10]
80.02% [25]
86.21% [5]
5 Conclusion
In this paper, we considered two different Indian languages i.e. Devanagari and
Bengali in natural scene text images. The proposed scheme consists of a three-
staged pipeline for detection and segmentation followed by an end-to-end Optical
Character Recognition (OCR) system. Firstly, for detecting the text in a natural
scene text image a neural network trained with darknet is used. As both the lan-
guages under consideration have matra-line, a U-Net based matra-line removal
358 P. Sen et al.
strategy is used for the first time. Finally, the character recognition pipeline is
trained with our created mixed dataset. Furthermore, our proposed method out-
performs other existing approaches in terms of accuracy. In the future, we are
interested to carry out further testing the performance of the proposed scheme
with other Indian languages too.
References
1. http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst
2. https://data.mendeley.com/datasets/hf6sf8zrkc/2
3. https://www.isical.ac.in/∼ujjwal/download/SegmentedSceneCharacter.html
4. Azeem, A., Riaz, W., Siddique, A., Saifullah, U.A.K.: A robust automatic meter
reading system based on mask-RCNN. In: 2020 IEEE International Conference
on Advances in Electrical Engineering and Computer Applications (AEECA), pp.
209–213. IEEE (2020)
5. Bhunia, A.K., Kumar, G., Roy, P.P., Balasubramanian, R., Pal, U.: Text recog-
nition in scene image and video frame using color channel selection. Multimedia
Tools Appl. 77(7), 8551–8578 (2018)
6. Dwivedi, A., Saluja, R., Sarvadevabhatla, R.K.: An OCR for classical Indic docu-
ments containing arbitrarily long words. In: Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition Workshops, pp. 560–561 (2020)
7. Ghosh, M., Roy, S.S., Mukherjee, H., Obaidullah, S.M., Santosh, K., Roy, K.:
Understanding movie poster: transfer-deep learning approach for graphic-rich text
recognition. Vis. Comput. 37, 1–20 (2021)
8. Ghoshal, R., Roy, A., Parui, S.K., et al.: Recognition of Bangla text from outdoor
images using decision tree model. Int. J. Knowl. Based Intell. Eng. Syst. 21(1),
29–38 (2017)
9. Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. arXiv
10. Islam, R., Islam, M.R., Talukder, K.H.: Extraction and recognition of Bangla texts
from natural scene images using CNN. In: El Moataz, A., Mammass, D., Mansouri,
A., Nouboud, F. (eds.) Image and Signal Processing, pp. 243–253. Springer Inter-
national Publishing, Cham (2020)
11. Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional
recurrent neural networks. In: 2017 IEEE International Conference on Computer
Vision (ICCV), pp. 5248–5256 (2017). https://doi.org/10.1109/ICCV.2017.560
12. Long, X., et al.: PP-YOLO: an effective and efficient implementation of object
detector. arXiv preprint arXiv:2007.12099 (2020)
13. Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order
language priors. In: BMVC - British Machine Vision Conference. BMVA, Surrey,
UK, September 2012. https://doi.org/10.5244/C.26.127, https://hal.inria.fr/hal-
00818183
14. Naosekpam, V., Kumar, N., Sahu, N.: Multi-lingual Indian text detector for mobile
devices. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) Computer
Vision and Image Processing, pp. 243–254. Springer, Singapore (2021)
15. Narang, V., Roy, S., Murthy, O.R., Hanmandlu, M.: Devanagari character recogni-
tion in scene images. In: 2013 12th International Conference on Document Analysis
and Recognition, pp. 902–906. IEEE (2013)
16. Neumann, L., Matas, J.: Text localization in real-world images using efficiently
pruned exhaustive search. In: 2011 International Conference on Document Analysis
and Recognition, pp. 687–691 (2011). https://doi.org/10.1109/ICDAR.2011.144
17. Ofek, E., Epshtein, B., Wexler, Y.: Detecting text in natural scenes with stroke
width transform. In: 2010 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 2963–2970. IEEE Computer Society, Los Alamitos, CA,
USA, June 2010. https://doi.org/10.1109/CVPR.2010.5540041
18. Peng, X., Wang, C.: Building super-resolution image generator for OCR accuracy
improvement. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol.
12116, pp. 145–160. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-
57058-3 11
19. Rahman, A., Cyrus, H.M., Yasir, F., Adnan, W.B., Islam, M.M.: Segmentation
of handwritten Bangla script. In: 2013 International Conference on Informatics,
Electronics and Vision (ICIEV), pp. 1–5 (2013). https://doi.org/10.1109/ICIEV.
2013.6572635
20. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
https://doi.org/10.1007/978-3-319-24574-4 28
21. Subedi, B., Yunusov, J., Gaybulayev, A., Kim, T.H.: Development of a low-cost
industrial OCR system with an end-to-end deep learning technology. IEMEK J.
Embed. Syst. Appl. 15(2), 51–60 (2020)
22. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P.,
Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-15549-9 43
23. Wang, R., Fujii, Y., Popat, A.C.: General-purpose OCR paragraph identification
by graph convolution networks. arXiv preprint arXiv:2101.12741 (2021)
24. Wojna, Z., et al.: Attention-based extraction of structured information from street
view imagery. In: 2017 14th IAPR International Conference on Document Analysis
and Recognition (ICDAR), vol. 1, pp. 844–850 (2017). https://doi.org/10.1109/
ICDAR.2017.143
25. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation
for scene text recognition. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, vol. 25, pp. 4042–4049, June 2014. https://doi.
org/10.1109/CVPR.2014.515
26. Zhang, P., et al.: TRIE: end-to-end text reading and information extraction for
document understanding. In: Proceedings of the 28th ACM International Confer-
ence on Multimedia, pp. 1413–1422 (2020)
A Deep Convolutional Neural Network
Based Classification Approach for Sleep
Scoring of NFLE Patients
Sarker Safat Mahmud1(B) , Md. Rakibul Islam Prince1 , Md. Shamim2 ,

and Sarker Shahriar Mahmud3
1
Mechatronics and Industrial Engineering, Chittagong University of Engineering
and Technology, Chattogram 4349, Bangladesh
2
Computer Science and Engineering, Khulna University of Engineering
and Technology, Khulna 9203, Bangladesh
3
Mechanical Engineering, Bangladesh University of Engineering and Technology,
Dhaka 1000, Bangladesh
Abstract. Sleep stages classification problem is very much important

for diagnosing any kind of sleep-related disease. The solution of this clas-
sification is a very sensitive work that has been done so far manually by
experts. Now from the advancement of Machine Learning (ML) in every
expect, every classification problem has been done now automatically
by some ML model. Sleep scoring is also not different from that. But
most of the approaches for sleep scoring are done for normal patients.
As a result, this paper uses a state-of-the-art Deep Convolutional Neu-
ral Network model to solve the problem. We didn’t use any kinds of
Polysomnography (PSG) files which are used traditionally instead our
approach uses only raw data from the EEG electrodes. It can classify
the sleep stage of the patients suffering from Nocturnal Frontal Lobe
Epilepsy (NFLE). All kinds of latest data analysis tools are used in this
approach.
Keywords: NFLE · CNN · Sleep stage classification · Hypnogram ·

MNE module
1 Introduction
Sleep is one of the most important actions in our daily routine. Sleeping takes
up a significant portion of one’s day. Sleep is a restorative period when almost
all body repairing activities are done. It enables the body to remain fit for the
next day. It lifts one’s mood, improves thinking and memory, reduces stress and
blood pressure, and eventually boosts the immune system. Any kind of sleep
disturbances and disorders can hamper a man’s lifelong activity. Sleep apnea is
the most dangerous symptom of a sleep disturbance, as it can lead to serious
illnesses like high blood pressure, stroke, heart failure, irregular heartbeats, and

https://doi.org/10.1007/978-3-030-93247-3_35
A Deep Convolutional Neural Network Based Classification Approach 361
heart attacks, as well as Alzheimer’s disease [1–7]. So every aspect of sleep has
been thoroughly explored.
Classification of sleep stages is done manually by human experts [8]. CNN
performed better than the other methods and obtained high performance for
sleep stage classification in EEG recordings with one-dimensional CNN models
yielding higher accuracy [9]. Sleep stage diagnosis is both expensive and incon-
venient for patients. Patients must stay in the sleep lab for numerous nights to
complete the diagnosis, which is inconvenient. Sleep labs require a highly reg-
ulated atmosphere, human experts, and high-tech devices, and patients must
stay for multiple nights to complete the diagnosis, which is inconvenient. As
a result, portable gadgets are substantially more cost-effective and convenient
for those patients [9]. They also analyzed the limitations and capabilities of the
Deep Learning (DL) approach in sleep stage categorization during the last 10
years, concluding that DL is the best tool for sleep stage score among various
artificial intelligence technologies. A recurrent Neural Network (RNN) is one of
the numerous forms of Deep Learning Techniques that can predict future states
based on the prior sequential stream of data. As a result, RNN is used in time
series applications such as capturing the stage transition of the sleep stages in
[10]. In [11], automatic sleep stage scoring is done based on raw single-channel
EEG and they used a new hybrid architecture with CNN and bidirectional LSTM
in the MASS and Sleep-EDF datasets. In [12] paper, Artificial Neural Network
has been used to detect epilepsy patients from EEG signals. On the Sleep Heart
Health Study (SHHS) database, a CNN-based algorithm was utilized to auto-
matically classify sleep stages using cardiac rhythm from an ECG signal, and it
achieved 0.77 accuracies [13].
The microstructure of sleep, which includes CAP and arousal, is a major topic
of development and research [9]. Because of the short duration time, conventional
sleep scoring often misses this microstructure detection [9]. The CAP is an EEG
activity that causes sleep instability and sleeps disturbance [14]. CAP occurs
frequently throughout non-REM stages of sleep.
CAP levels are frequently elevated in epileptic illnesses such as “Nocturnal
Frontal Lobe Epilepsy (NFLE)” [14]. Consequently, the traditional hypnograms
of NFLE sufferers and healthy people are not the same. Till now there are very
little researches which have been done on normal people’s sleep scoring and
among them, there is hardly any work that has been done especially only for
NFLE patients as well as none of them used deep learning techniques. Even
though the CAP’s arrival has a significant impact on sleep phases, only a little
work has been done for the automatic sleep scoring for these kinds of patients.
This paper focuses not only on developing an automatic sleep stage scoring
technique for NFLE patients but also on bringing a Deep Neural Network (DNN)
into the action.
362 S. S. Mahmud et al.
2 Background
2.1 Sleep Scoring
Experts have categorized all stages of sleep into two categories: non-REM and
REM (Rapid Eye Movement). Each type is associated with distinct brain activ-
ity. The brain becomes more active in the REM stage, and the eye movement is
significantly faster than in the non-REM period when the eye moves slowly. The
non-REM sleep comes first, followed by shorter REM sleep. By this, the REM
duration increases each time the cycle repeats. To enter the REM stage, the
sleep cycle must pass through 3 or 4 phases in the non-REM period, depending
on the scoring system. These non-REM periods are divided into light and deep
sleep. Eye movement and muscular action slow down in the Light non-REM
stage. During the Deep Non-REM period, the body is repaired, muscles and
bones are built. The immune system improves and the body temperature drops
down significantly. Heart rates and respiration increase during the REM stage.
In the REM period, our brain becomes far more active. This period is marked by
intense dreams. Adults spend 20% of their entire sleep time in the REM state,
while babies spend 50% of their time there [15].
The electroencephalogram, or EEG, is used to assess electrical activity in
the brain, which is used to score sleep. Other biological signals include the elec-
trooculogram (EOG) for eye movement and the electromyogram (EMG) for mus-
cle tone measurement. They’re useful for scoring as well. This type of sleep study
is called Polysomnography (PSG), a parametric type of sleep study.
So based on this evaluation of the PSG data, the sleep scoring is followed
mainly by two standards. In 2007, the American Academy of Sleep Medicine
(AASM) divided sleep stages into five groups [16], and before that, the scoring-
based standards were dominated by Rechtschaffen and Kales (R&K, 1966) [17].
According to R & K rules, sleep stages are classified into seven divisions. Wake,
stage 1, stage 2, stage 3, stage 4, stage REM, and movement time are the different
stages. On the other side, AASM standards modified the guidelines and kept 5
important stages from the R&K rules [18]. S1 to S4 are referred to as N1, N2, N3
in the AASM classification. The AASM standard is a simplified version of the
R&K regulations. However, in the vast majority of situations, these two criteria
are used to score sleep. The CAP sleep database that has been used also followed
the R&K scoring excluding the movement time.
2.2 Convolutional Neural Network
Convolutional Neural Networks (CNN) are widely utilized for pattern recognition
tasks including image analysis and other computer vision applications. A CNN
is a type of artificial neural network that belongs to the machine learning and
artificial intelligence domains (AI). The special ability of pattern recognition has
made CNN much more powerful than the other AI techniques. The hidden layers
of CNN consist of convolution layers which help it to learn the patterns in the
input data. Those patterns could be image fragments, allowing CNN to compare
them and see if they’re present in other images or the same one. In the context
of machine learning, these little pieces are referred to as “features.” The same
approach can be used to reduce one-dimensional data sequences, known as 1D
CNN. CNN, like other deep learning approaches, learns these features and then
performs other operations like pooling and classification using fully connected
layers following convolution with filters [19].
2.3 Electroencephalogram (EEG)

The EEG test looks for abnormalities in brain waves or electrical activity. Elec-
trodes composed of small metal discs with thin wires are deposited onto the
scalp during the process. The electrodes detect minute electrical charges pro-
duced by brain cells’ activity. The most common technology used in sleep study
is the electroencephalogram (EEG). The approaches can be extended for appli-
cation in various disciplines of neuroscience research, while the focus is on sleep
research. One of the physiological changes that occur during sleep is an alter-
ation in the electroencephalogram (EEG) signal, which can be utilized to identify
and monitor sleep apnea occurrences. According to research published in The
Neurodiagnostic Journal [20], electroencephalography, or EEG, technology that
analyzes brain function could enable earlier identification of common mental and
neurological illnesses such as autism, ADHD, and dementia. This EEG test is
the most popular method of diagnosing epilepsy.
3 Deep CNN Based Sleep Stage Classification

The classifier model is only the core of the whole architecture. There are sev-
eral stages before attaining the last classification. The whole process is being
illustrated by Fig. 1.
Known EEG signal of 6 Noise reduction and

Convert EDF file to CSV
specific channel from each resample every signal to
dataframe
patients 8 hr sleep EDF file 256 Hz
Make the whole dataset for

Train model with all patients
Build Model with 1D CNN Deep Learning
dataset
Classification
Compare all the models and

Evaluate Model Make Prediction
keep the best model
Fig. 1. System diagram of deep CNN based sleep stage classification.

3.1 System Architecture

The main objective of this research is to employ Deep Convolutional Network
to develop an autonomous sleep stage scoring system for NFLE patients. The
CAP sleep database (CAPSLPDB) is used here which is the only CAP detec-
tion database that provides the PSG recordings of CAP detection [14]. In this
database, sleep stages are classified according to R&K rules. It is a collection of
108 polysomnographic (PSG) recordings registered at the Sleep Disorders Cen-
ter of the Ospedale Maggiore of Parma, Italy [21]. As the model is now in its
preparatory stage, there are many scopes for further research.
The CAP sleep database has 40 NFLE patients of age group between 14 to
67. As this research focuses on people aged from 14 to 27, fifteen people fall
within this category having a similar kind of hypnogram. From the annotations,
each person’s hypnogram for around 8 h was generated. Hand-engineered features
and expert personnel were responsible for the data annotation task. The gener-
ated PSG file consisted of 22 channels and among them, only 6 channels were
elected according to the significance level. Since this is a classification problem
Deep CNN was applied to make this task automatic. Through the hypnogram
generated from the model, future sleep stages will be predicted.
3.2 Pre-processing of Data

While this research work faced several problems including timestamp and fre-
quency discrepancy, various preprocessing methods were used to address those
issues. To begin with, the hand-engineered data annotations and the EDF file do
not have the same timestamp. For instance, in many EDF files, there were more
data than its respective annotation file. To fix that issue, unscored data were
manually removed from the corresponding patient’s EDF file. As previously men-
tioned 6 most significant EEG channels are ROC–LOC, FP2–F4, F4–C4, F7–T3,
F3–C3, T4–T6. Besides, some EDF files have a sample rate 512 Hz, whereas oth-
ers have a sample rate 256 Hz. So, in the latter step, all the data were taken at
a constant sample rate 256 Hz by downsampling the higher one.
“MNE” and “pyediflib” are two key Python modules that were used for
extracting, analyzing, and visualizing EDF files. The MNE package in Python is
used to extract all of the EDF files and transfer them into DataFrame. By using
the MNE module those EDF files were read and all of the EEG sensor records
were analyzed. All the signals had a frequency range of 0.3 to 30 Hz because a
bandpass filter was used at the beginning for noise reduction. The values of the
6 channels in each patient’s EDF file were then sent to the 1D CNN model.
3.3 Model Architecture

Deep CNN Layers: The 1D CNN differs slightly from the standard 2D CNN.
However, the end goal remains the same. Instead of using images, the raw signal
data will be used to find patterns. The varying amplitude values of the wave
function can be found in the raw EDF file of the EEG signals. The frequency
band alpha (8–12 Hz) is more accessible in the wake stage, according to the
[22]. As sleep deepens, the frequency band is known as theta (4–8 Hz) and delta
(4–8 Hz) becomes less prominent (0.5–4 Hz). With the model’s 1D convolution
process, these wave function’s cyclic occurrences are detected as patterns. The
convolution layer mainly tries to recognize the features by learning those patterns
from the changes in wave function amplitude. The first layers search for patterns
in smaller ranges. It tries to identify patterns from a larger timestamp as the
layers get deeper. By adjusting several hyperparameters, four Conv1D layers
were eventually utilized. The following filters make up these four layers: 64, 128,
256, and 512. For the convolution procedure, a 3 ranked tensor was used as the
kernel size. The pooling layer is introduced to the model after each convolution
1D layer, resizing the distorted values from the signal. Since it also shrinks
down the maximum values and preserves the major information, therefore, less
computational power is needed. The model includes a dropout layer after each
convolution and pooling layer. After each epoch, the model tries to learn the
features (in our case the patterns of the signals), and then it moves on to the
fully connected layers, where the main task is to classify the patterns based on
the annotations (Table 1).
Table 1. Model summary
Layer (type) Output shape Param

conv1d (Conv1D) (None, 7678, 64) 1216
max pooling1d (MaxPooling1D) (None, 2559, 64) 0
dropout (Dropout) (None, 2559, 64) 0
conv1d 1 (Conv1D) (None, 2557, 128) 24704
max pooling1d 1 (MaxPooling1D) (None, 852, 128) 0
dropout 1 (Dropout) (None, 852, 128) 0
conv1d 2 (Conv1D) (None, 850, 256) 98560
conv1d 3 (Conv1D) (None, 281, 512) 393728
flatten (Flatten) (None, 47616) 0
dense (Dense) (None, 128) 6094976
dense 1 (Dense) (None, 64) 8256
dropout 4 (Dropout) (None, 64) 0
dense 2 (Dense) (None, 6) 390
Total params: 6,621,830
Trainable params: 6,621,830
Non-trainable params: 0
Hyperparameters: Check Table 2 for the hyperparameters that have been

used in the model
Table 2. Hyperparameters of the DNN based model
Parameters Status
Optimizer Adam
Loss function Categorical Cross-entropy
Batch Size 32
Epoch 50
Learning rate 1e−4
Activation function ReLU, Softmax
The number of nodes at the output layer 6
The number of nodes at the input layer (7680,6)
Activation Function: The activation function plays an important role in the

neural network. In a Neural network, the weights and biases are updated based
on the error of an output. The activation function makes this back-propagation
possible and because of it, the neuron is decided to be activated or not in a
layer. As a result, because of it, the network introduces with non-linearity. The
proposed Deep CNN used mainly two activation functions: ReLU and Softmax.
There are also other powerful activation functions such as Sigmoid, Softplus,
Tanh, and Exponential function, etc.
– Rectified Linear Units (ReLU): The math of the ReLU function is simple.
Whenever any negative value enters, it becomes zero and for any positive or
zero it stays the same. This enables the CNN mathematically healthy by
keeping learned values from rushing towards infinity or getting stuck near 0.
It is used in most of the CNN
ReLU (x) = max(0; x) (1)
– Softmax: Softmax is a special type of sigmoid function which is mainly used

for multiclass classification in the output layer. It converts all the input in a
probability distribution, as a result, the sum of the output becomes 1 and all
the output vector ranges from 0 to 1.
exp(xi )
Softmax(xi ) = (2)
j exp(xj )

The accuracy of this model obtained from the CAP sleep database, which is the
only database for NFLE patients [9], is 60.46% for 15 persons. The CNN model
didn’t get enough results in the case of accuracy, but it still has the potential to
collaborate with NFLE people, since there has been very little work done with
deep learning in this dataset. The reason for falling is because of the variation
of the people’s sleep cycle. Since these patients are not healthy and their CAP
fluctuations are considerably more variable than normal people’s, it does not
follow the same trend in the long run and it is clear from the Fig. 3. NFLE
patients tend to remain on S1, S2, S3 stage more frequently than normal people.
So, the model did not provide perfect patterns for all of the people. The model’s
accuracy obtained better results when it was trained with fewer participants.
So, there is a lot more scope to diversify the model. From Fig. 2, the accuracy
obtained on the training set is shown. Here, the best weights of the model is
captured.
Fig. 2. Accuracy on training set.
Fig. 3. Difference between normal and NFLE patients in sleep stage frequency
5 Future Work and Conclusion

Since there has not been so much work done with automatic sleep scoring for
NFLE patients, the main target will be to practically implement this model in
the medical field as an experimental feature. But the first task will be increasing
its accuracy. By observing different models and approaches we have decided
to make a hybrid model with CNN and bidirectional LSTM. The bidirectional
LSTM layer has the ability to capture the stage transition. Another approach
for increasing accuracy is to go for transfer learning. So, if our next model still
do not show good potential, then we will move to the transfer learning approach.
References
1. Bianchi, M.T., Cash, S.S., Mietus, J., Peng, C.-K., Thomas, R.: Obstructive sleep
apnea alters sleep stage transition dynamics. PLoS ONE 5(6), e11356 (2010)
2. Stefani, A., Högl, B.: Sleep in Parkinson’s disease. Neuropsychopharmacology
45(1), 121–128 (2020)
3. Pallayova, M., Donic, V., Gresova, S., Peregrim, I., Tomori, Z.: Do differences
in sleep architecture exist between persons with type 2 diabetes and nondiabetic
controls? J. Diabetes Sci. Technol. 4(2), 344–352 (2010)
4. Tsuno, N., Besset, A., Ritchie, K., et al.: Sleep and depression. J. Clin. Psychiatry
66(10), 1254–1269 (2005)
5. Siengsukon, C., Al-Dughmi, M., Al-Sharman, A., Stevens, S.: Sleep parameters,
functional status, and time post-stroke are associated with offline motor skill learn-
ing in people with chronic stroke. Front. Neurol. 6, 225 (2015)
6. Mantua, J., et al.: A systematic review and meta-analysis of sleep architecture and
chronic traumatic brain injury. Sleep Med. Rev. 41, 61–77 (2018)
7. Zhang, F., et al.: Alteration in sleep architecture and electroencephalogram as an
early sign of Alzheimer’s disease preceding the disease pathology and cognitive
decline. Alzheimer’s Dement. 15(4), 590–597 (2019)
8. Schulz, H.: Rethinking sleep analysis: comment on the AASM manual for the scor-
ing of sleep and associated events. J. Clin. Sleep Med. 4(2), 99–103 (2008)
9. Loh, H.W., et al.: Automated detection of sleep stages using deep learning tech-
niques: a systematic review of the last decade (2010–2020). Appl. Sci. 10(24), 8963
(2020)
10. Hsu, Y.-L., Yang, Y.-T., Wang, J.-S., Hsu, C.-Y.: Automatic sleep stage recurrent
neural classifier using energy features of EEG signals. Neurocomputing 104, 105–
114 (2013)
11. Supratak, A., Dong, H., Wu, C., Guo, Y.: DeepSleepNet: a model for automatic
sleep stage scoring based on raw single-channel EEG. IEEE Trans. Neural Syst.
Rehabil. Eng. 25(11), 1998–2008 (2017)
12. Sallam, A.A., Kabir, M.N., Ahmed, A.A., Farhan, K., Tarek, E.: Epilepsy detec-
tion from EEG signals using artificial neural network. In: Vasant, P., Zelinka,
I., Weber, G.-W. (eds.) ICO 2018. AISC, vol. 866, pp. 320–327. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-00979-3 33
13. Sridhar, N., et al.: Deep learning for automated sleep staging using instantaneous
heart rate. NPJ Digital Med. 3(1), 1–10 (2020)
14. Terzano, M.G., et al.: Atlas, rules, and recording techniques for the scoring of cyclic
alternating pattern (cap) in human sleep. Sleep Med. 3(2), 187–199 (2002)
15. Felson, S.: Stages of sleep: REM and non-REM sleep cycles, October 2020
16. Ebrahimi, F., Mikaeili, M., Estrada, E., Nazeran, H.: Automatic sleep stage classi-
fication based on EEG signals by using neural networks and wavelet packet coeffi-
cients. In: 2008 30th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society, pp. 1151–1154. IEEE (2008)
17. Moser, D., et al.: Sleep classification according to AASM and Rechtschaffen and
Kales: effects on sleep scoring parameters. Sleep 32(2), 139–149 (2009)
18. Danker-Hopfe, H., et al.: Interrater reliability for sleep scoring according to the
Rechtschaffen and Kales and the new AASM standard. J. Sleep Res. 18(1), 74–84
(2009)
19. Mandy: Convolutional neural networks (CNNs) explained. Available: https://
deeplizard.com/learn/video/YRhxdVk sIs
20. Moeller, J., Haider, H.A., Hirsch, L.J.: Electroencephalography (EEG) in the
diagnosis of seizures and epilepsy (2019). UpToDate https://www.uptodate.com/
contents/electroencephalography-eeg-in-the-diagnosis-of-seizures-and-epilepsy.
Accessed 29 Sept 2020
21. Chui, K.T., Zhao, M., Gupta, B.B.: Long short-term memory networks for driver
drowsiness and stress prediction. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.)
ICO 2020. AISC, vol. 1324, pp. 670–680. Springer, Cham (2021). https://doi.org/
10.1007/978-3-030-68154-8 58
22. Vilamala, A., Madsen, K.H., Hansen, L.K.: Deep convolutional neural networks for
interpretable analysis of EEG sleep stage scoring. In: 2017 IEEE 27th International
Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE
(2017)
Remote Fraud and Leakage Detection System
Based on LPWAN System for Flow Notification
and Advanced Visualization in the Cloud
Dario Protulipac, Goran Djambic, and Leo Mršić(&)
Algebra University College, Ilica 242, 10000 Zagreb, Croatia

{dario.protulipac,goran.djambic,leo.mrsic}@algebra.hr
Abstract. This research presents a possible solution for water flow monitoring
and alarming system, specially designed for use at remote locations without
electricity and Internet access. To make such a system affordable, the low-cost
widely available components were tested along with LoRa open network. In
addition to the design of the device, it is demonstrated how electricity con-
sumption can be reduced in the sensor platform and the range of the whole
system is measured. Research include distance measured at 92 different points
and the measurement covered an area of 21.5 km2 in City of Zagreb, Croatia.
Keywords: LoRa IoT Microcontroller Water flow meter
1 Introduction
One of the common disasters that can befall an object that is rarely inhabited is the
rupture of a water pipe. This phenomenon is mainly caused by the freezing of residual
water in the elbows of pipes and valves, during the winter. As the temperature rises, if
the main valve was not closed or it was damaged, water will leak. The aim of this
research is to propose a system that will inform the owner or the person caring for such
a facility that an adverse event has occurred. It must be considered that the electricity is
mostly turned off at such facilities, and therefore the internet is not available either. The
system that will perform such a task must therefore have its own power supply with the
longest possible life of autonomous operation and must not rely on becoming con-
nected to the Internet from the facility itself. As a suitable solution for reporting
unwanted water flow, the system is proposed in this paper. This system consists of
three parts: (i) water flow sensor; (ii) LPWAN central transceiver; (iii) background
system (backend). The water flow sensor is located on the building itself. Its role is to
measure the flow of water through a pipe. If a flow occurs, then the sensor must report
its size to the central transceiver. It must also report the moment when the water
stopped flowing through the pipe. The sensor itself consists of a water flow meter, a
microcontroller and a LoRa LPWAN module. Depending on the type of water flow
meter, it can be placed immediately behind the water meter or as in the case of this
work in which a water meter of section R 1/2” was used, in a place like a garden
tap. The microcontroller and the LoRa LPWAN module can be a few meters away from
the measuring point, depending on the voltage drop on the connection cable between

https://doi.org/10.1007/978-3-030-93247-3_36
Remote Fraud and Leakage Detection System Based 371
the microcontroller and the water flow meter. This circuit is completely autonomous. It
can run on battery power for a long time and does not need a commercial network
connection. The central transceiver, as well as the sensor platform, uses a
LoRa LPWAN module that has the same communication parameters set as on the
sensor platform. The receiver also consists of a microcontroller connected to an
LPWAN transceiver. In this case, we use a type of microcontroller that has a built-in
Wi-Fi 802.11n module (Fig. 1).
Fig. 1. System architecture
This dual connection allows it to forward the messages it receives from the sensor
via the LoRa LPWAN receiver to the background system via a local or public IP
network. Like the sensor platform, the central transceiver can be powered by a battery,
but it is recommended to connect it to a mains socket. The position of the central
transceiver is not tied to the position of the background system, but it is important that
the LoRa module on it and the LoRa module on the sensor platform are within radio
range and that it is possible to connect to an IP router equipped with Wi-Fi receiver/Wi-
Fi router. Using the MQTT protocol, the central transceiver will notify the background
system of the occurrence or cessation of water flow at the sensor location. In addition to
these two messages, the sensor also sends periodic messages. They are important to
confirm that everything is OK with the sensor. In the system proposed in this paper,
periodic normal state messages are sent every 12 h, while in the case of an alarm,
messages are sent every 15 min. As a background system, one Raspberry Pi micro-
computer was used. Of course, depending on the needs, it is possible to use any other
Intel or Arm based computer, powered by Linux, Windows or MacOS operating
system. The background system receives messages from the central transceiver via the
MQTT intermediary. An intermediary program (middleware) is subscribed to the
MQTT messages related to the described alarm system, which prepares the received
messages in a suitable format and saves them in a time-oriented database (TSDB). As
part of the background system, a web-based user interface has been added that allows
you to see if the water flow sensor is in an alarm state and when it last responded to the
central transceiver. The same application has the ability to report a message to the end
user via email or via the instant messaging system. For an example of this paper, the
Telegram instant messaging system was used. The whole system is designed as a
demonstration and only one water flow sensor and one central transceiver are used. By
372 D. Protulipac et al.
introducing the LoRaWAN communication protocol, which is a software upgrade of

the existing system, and using a multi-channel central transceiver, the system can be
expanded to a number of sensors that send data to multiple central transceivers. Also,
the background system does not have to be on a single computer and can be deployed
to multiple servers as needed. In this way, it is possible to build a very robust and
flexible alert network that covers more widespread areas [12, 13, 14].
2 Literature Review
The LoRa protocol is a modulation of wireless data transmission based on existing

Chirp Spread Spectrum (CSS) technology. With its characteristics, it belongs to the
group of low power consumption and large coverage area (LPWAN) protocols.
Looking at the OSI model, it belongs to the first, physical layer [1]. The history of the
LoRa protocol begins with the French company Cycleo, whose founders created a new
physical layer of radio transmission based on the existing CSS modulation [2]. Their
goal was to provide wireless data exchange for water meters, electricity and gas meters.
In 2012, Semtech acquired Cycleo and developed chips for client and access devices.
Although CSS modulation had hitherto been applied to military radars and satellite
communications, LoRa had simplified its application, eliminating the need for precise
synchronization, with the introduction of a very simple way of encoding and decoding
signals. [2] In this way, the price of chips became acceptable for widespread use. LoRa
uses unlicensed frequency spectrum for its work, which means that its use does not
require the approval or lease of a concession from the regulator.
These two factors, low cost and free use, have made this protocol extremely
popular in a short period of time.
The EBYTE E32 (868T20D) module was used to create the paper [6]. The module
is based on the Semtech SX1276 chip. The maximum output power of the module is
100 mW, and the manufacturer has declared a range of up to 3 km using a 5 dBi
antenna without obstacles, at a transfer rate of 2.4 kbps. This module does not have an
integrated LoRaWAN protocol, but is designed for direct communication (P2P). If it is
to be used for LoRaWAN, then the protocol needs to be implemented on a micro-
controller. Communication between the module and the microcontroller is realized
through the UART interface (serial port) and two control terminals which are used to
determine the state of operation of the module. The module will return feedback via the
AUX statement.
LoRaWAN is a software protocol based on the LoRa protocol [1]. Unlike the
patent-bound LoRa transmission protocol, LoRaWAN is an open industry standard
operated by the nonprofit LoRa Alliance. The protocol uses an unlicensed ISM area
(Industry, Science and Medicine) for its work. In Europe, LoRaWAN uses the ISM part
of the spectrum that covers the range between 863–870 MHz [4]. This range is divided
into 15 channels of different widths. For a device to be LoRaWAN compatible, it must
be able to use at least the first five channels of 125 kHz and support transmission
speeds of 0.3 to 5 kbps. Due to the protection against frequency congestion, the
operating cycle of the LoRaWAN device is very low and the transmission time must
not exceed 1% of the total operation of the device [4].
In addition to defining the type of devices and the way they communicate via
messages, the LoRaWAN protocol also defines the appearance of the network itself [5].
It consists of end devices, usually various types of sensors in combination with
LoRaWAN devices. The sensors appear to central transceivers or concentrators. One
sensor can respond to multiple hubs which improves the resilience and range of the
network. Hubs are networked to servers that process incoming messages. One of the
tasks of the server is to recognize multiple received messages and remove them. Central
transceivers must be able to receive a large number of messages using multi-channel
radio transceivers and adaptive mode, adapting to the capabilities of the end device.
The security of the LoRaWAN network is ensured by authorizing the sensor to the
central transceiver, and messages can be encrypted between the sensor and the appli-
cation server via AES encryption [5].
MQTT is a simple messaging protocol [11]. It is located in the application layer of
the TCP/IP model (5–7 OSI models). It was originally designed for messaging in M2M
systems (direct messaging between machines). Its main advantage is the small need for
network and computer resources. For these reasons, it has become one of the primary
protocols in the IoT world. This protocol is based on the principle of subscriptions to
messages and their publication through intermediaries. An intermediary, commonly
called a broker, is a server that receives and distributes messages to clients who may be
publishers of messages or may be subscribed to them in order to receive them. The two
clients will never communicate with each other [7, 8].
3 Methodology
The most important segment of the sensor platform is its reliability. To make sure that
an accident occurs in time, we must first ensure the reliability of the platform. Precisely
for this reason, in the solution proposed in this paper, periodic reporting from the
sensor platform to the system is set. The device will report periodically every 12 h, and
this is taken care of by the alarm system on the microcontroller. Namely, STM32F411
is equipped with a clock that monitors real time (RTC), and offers the ability to set two
independent alarms. In this case, one of them is in charge of waking up the process that
sends periodic messages with the current state of the measured water flow through the
meter [3, 15, 16].
Before the software implementation of the measurement, it should be noted that the
pulse given by the sensor at its output voltage is 5 V. Although the used microcon-
troller will tolerate this voltage at its input, it is better to lower it to the declared input
value of 3.3 V. Such voltage is obtained by two resistors, one with a value of 10 kX
and the other of 22 kX, connected in a simple voltage divider [9]. The connection
method is clearly shown in the diagram. The flow volume measurement itself is done
by monitoring the number of pulses sent by the water sensor via a standard time
counter. Each pulse will be registered by the microcontroller as an interrupt. When
pulses appear, it is possible to measure the flow and report it via LoRa radio
transmission.
The frequency of the timer is set to 1 MHz via a divider. By comparing the number
of clock cycles between the two interrupts, one can very easily obtain the pulse
frequency given by the water flow sensor. Knowing the pulse frequency and pulse
characteristic, the water flow can be calculated using pre-defined procedure.
The first measured flow value greater than zero sets the sensor platform to an alarm
state. As long as there is a flow, periodic advertising will take place every 15 min
instead of every 12 h. Five minutes after the flow stops, the device will sound the end
of the alarm, and the next call will be made regularly after 12 h or earlier in the event of
a new alarm. The alarm system works internally in such a way that the last measured
value of the water flow is read every 5 s. This value, together with the current counter
time, is continuously stored by the measurement process in the form of a time and flow
structure. The read value is stored in a field the size of three elements. If after three
readings all three elements in the field are equal, it can be determined that there was no
flow in the last 15 s and the device exits the alarm state. The system waits another five
minutes before announcing the end of the alarm over the LoRa connection. If the flow
occurs again within these five minutes, the system will act as if the alarm has not
stopped, that is, it will send a flow message after 15 min (Fig. 2).
Fig. 2. Water flow sensor connection diagram
LoRa notifications are intentionally delayed so that in the event of a constant

occurrence and interruption of the flow, they would not often send radio messages.
During the measurement, the circuit is supplied with 5 V DC. This is the recommended
operating voltage for the LoRa module and water flow sensor used, while the micro-
controller can be powered by 5 V or 3.3 V. In this measurement, the first goal is to
show that the peak current value will not reach a value greater than 300 mA, which is
the maximum that the microcontroller circuit can withstand. This data allows us to
power the entire circuit through the microcontroller using the built-in USB port and
thus simplify the appearance of the entire sensor. The second goal is to reduce power
consumption in order to prolong the autonomy of the sensor operation as much as
possible. As an external power supply, a laboratory power supply R-SPS3010 from

Nice-power was used, which can provide a stable operating voltage from 0 to 30 V
with a current of up to 10 A. The universal measuring instrument UT139B from UNI-T
is connected in series. It is set to measure milliamperes during the measurement,
keeping the maximum measured value on the screen.
4.1 Range Measurement

The range was measured from the Zagreb settlement of Vrbani 3, which is located next
to Lake Jarun. This location, gives us an insight into what range can be expected in
urban and what in rural conditions. Namely, from the central transceiver to the north
there is a very urban part with many residential buildings and dense traffic infras-
tructure, while on the south side is Lake Jarun and the Sava River, which are mostly
green areas, smaller forests, and only a few lower buildings. The limiting factor is the
position of the antenna of the central transceiver, which was located on the first floor of
a residential building, approximately 4 m above ground level and surrounded by
buildings. When measuring on the side of the central transceiver, an omnidirectional
antenna with a gain of 3.5 dBi was used, which is stationary placed on the outside of
the window of a residential building. On the sensor side, for mobility, a smaller antenna
with 2 dBi gain was used. The signal was sent in the open “out of hand”. The position
of each measurement was recorded via a GPS device on a mobile device and later
transferred to Google Earth. In Google Earth, it is possible to import recorded mea-
suring points and measure the distance between them and the antenna of the central
transceiver. According to the manufacturer's specification, the maximum range that can
be expected from these modules is 3 km in almost ideal conditions with a 5 dBi
antenna. In order to somehow approach this distance despite the unfavorable mea-
surement position, the data transfer rate was reduced from the standard module settings
from 2.4 kbps to 300 bps. Due to the small amount of data that needs to be transmitted,
this is not a limiting factor in practice, and due to the low transmission speed, a smaller
amount of errors was obtained when recognizing the received signal and increased
success in receiving messages over long distances. In figure below the measured range
of the fabricated LoRa system is shown. The position of the central transceiver is
shown with an asterisk, while the points from which the signal from the sensor
managed to reach it are shown in green. Red dots indicate places where it was not
possible to communicate between the sensor and the central transceiver. As expected,
the largest range of 3393 m was achieved to the southeast, where apart from a couple of
residential buildings near the antenna, there were no additional obstacles. Towards the
southwest, the obtained result was 2773 m. However, according to the urban part of the
city, the maximum achieved range was 982 m to the east, and to the north it was only
860 m (Fig. 3).
Fig. 3. Central transceiver antenna position and measuring range
4.2 Results
According to the specification, the maximum consumption of the used module is
130 mA. The measured consumption of the water flow sensor is 4 mA. The maximum
current that can be conducted through the sensor board development board is 300 mA,
and the circuit on the development platform used is designed so that the Vbus USB
terminal and the 5 V terminals of the circuit are on the same bus. From this we can
conclude that the entire interface with the sensor and the LoRa module can be powered
by the USB interface. However, it is necessary to optimize the consumption so that the
circuit can run on a commercially available battery for as long as possible. Table shows
the current measurements during the operation of the microcontroller. Here, the
microcontroller operated with a maximum operating clock of 96 MHz and without any
power optimization. Data are given separately for each element to make it easier to
track optimization (Table 1).
Table 1. Circuit current without optimization

Connected system components Current [mA] State
Microcontroller 26.65 Wait
Microcontroller 26.88 Event stop
Microcontroller + LoRa Module 39.16 Wait
Microcontroller + LoRa Module 121.5 Signal send
Microcontroller + LoRa Module + Sensor 42.51 Wait
Microcontroller + LoRa Module + Sensor 125.7 Signal send
As the flow sensor does not have the possibility of optimization, in Table the values
of the current flowing through it are singled out and at the end of each step they will
only be added to the obtained results. Table shows that by reducing the operating clock,
the current decreased by 11 mA, which is a decrease of slightly more than 40% in the
consumption of microprocessors (Table 2).
The first step of optimization is to lower the processor clock to 48 MHz (Table 3).
Table 2. Current through the water sensor

Current [mA] State
3.35 Idle
4.03 Flow
Table 3. Current with reduced microprocessor clock speed

Connected system components Current [mA] Stanje
Microcontroller 15.50 Wait
Microcontroller 15.91 Event stop
Microcontroller + LoRa Module 28.15 Wait
As the LoRa module on the sensor platform is not used for receiving messages,
there is no need to keep it constantly active. Fortunately, this module has a mode in
which it shuts down its radio transceiver. By changing the code on the microcontroller,
an operating mode was introduced where the radio transceiver is turned on only when
necessary. With this procedure, the total current through the microcontroller and the
LoRa module dropped to 17.7 mA in standby mode. The STM32F411 microcontroller
has various energy saving functions. One of them is a sleep state in which we stop the
processor clock completely and listen only to interruptions coming from external
devices or clocks. As FreeRTOS was used in the paper, instead of directly sending the
microprocessor to sleep, FreeRTOS tickless mode was used [10]. In it, FreeRTOS stops
working and puts the microprocessor to sleep. This lowers the current through the
circuit consisting of the microcontroller and the LoRa module to 5.87 mA in standby
mode, with the total current through the entire circuit now being only 9.22 mA in
standby mode. Measuring the current strength has successfully shown how it is pos-
sible to use a USB port to power the entire circuit. Also, in several interventions on the
program code of the microprocessor, it was possible to lower the current from
42.51 mA to 9.22 mA, which is a difference of 78%. This is very important because
waiting time is the state in which the circuit is located almost all the time. Using a
portable USB charger (power bank) with a capacity of 10000 mAh (the most common
value at the time of writing), with such consumption can be counted on approximately
40 days of autonomous operation of the sensor. Radio signal acquisition showed very
good results considering the power and position of the antenna. This measurement is an
indication of how even without a great search for the ideal antenna position, a quite
decent range can be achieved with a device that has the output power of an average
home Wi-Fi system. The maximum measured distance was 3393 m in terms of mea-
surements from ground level and without optical visibility. There is also a large dif-
ference in the behavior of LoRa radio protocols between urban and rural areas. While
in an uninhabited area the range exceeded the manufacturer’s specifications, in places
with several residential buildings, the range dropped sharply. It can be concluded that
for the purpose of reporting adverse events in rural and remote areas, LoRa LPWAN is
an excellent solution. Smaller range in the urban area is very easy to compensate with
more densely placed central transceivers.
4.3 Future Research

Further power savings can be achieved if the stop state or standby mode of the
microcontroller is used instead of the sleep state. In stop mode, the CPU microcon-
troller shuts down, and in idle mode, the memory. In these states, the microprocessor
needs a little more time to wake up, and when writing the code, it is necessary to pay
attention to resetting the initial settings of the microcontroller. Also, instead of the
STM32F4 series, which belongs to the series of higher performance, a series of
microcontrollers that are specially made for low consumption can be chosen, eg
STM32L series. The range of the device, which proved to be very good even in these
conditions, can be further improved by placing the antennas in a position that allows
unobstructed optical visibility between the sensor antenna and the antenna of the
central transceiver. It should be borne in mind that in practice both antennas will be
stationary and it will be possible to adjust the antenna on the sensor platform side. If the
monitored objects are located in approximately the same direction of the antenna, the
signal should cover a narrower area, so instead of omnidirectional antennas, directional
antennas can be used that allow greater range.
5 Conclusion
This paper shows how today, in home-made construction with really modest financial
expenses, a prototype device with a function almost unimaginable can be made only a
decade ago. If the price of the central platform, which can be performed with any
standard computer, is neglected, less than HRK 250 was spent for the entire platform.
Of course, the knowledge required to build the system and the time spent on devel-
opment are incomparably greater. During operation, most attention and time was spent
on developing the sensor platform as the most critical part of the system. Ultimately, a
completely autonomous and reliable sensor platform was successfully designed and
built, which, in addition to its basic function, had to serve as an intermediary for
adjusting the LoRa module and as an instrument for measuring range. To achieve the
longest possible autonomy of the sensor platform, studying the current through the
sensor platform, the consumption of microcontrollers and radio modules was gradually
reduced to almost one fifth of the original consumption while waiting. This is com-
pletely done by software shutting down individual components of the system at times
when we do not need them and quickly turning them on when the need arises. The
operation of the entire system was finally tested using a signal generator, thus con-
firming the correct operation of all three parts of the system. In order to obtain
confirmation that the developed system can meet the requirement for radio range, the
distance was measured at 92 different points and the measurement covered an area of
21.5 km2. The radio reach distances obtained by measuring in this paper fully con-
firmed that LoRa is a more than acceptable solution for water flow control in houses
where the owners do not live most of the year, and are located within or on the wider
periphery of the settlement. The obtained results show that such a prototype could be
applied in practice even now, without major changes in implementation, only with the
connection to the solder plate and placement in a suitable housing. Also, by using other
types of sensors, the prototype can serve as a basis for collecting various other
information from less frequently visited locations.
References
1. LoRa developers portal. https://lora-developers.semtech.com/
2. Slats, A.: A Brief History of LoRa: Three Inventors Share Their Personal Story at The
Things Conference 2020 (2020). https://blog.semtech.com/a-brief-history-of-lora-three-
inventors-share-their-personal-story-at-the-things-conference. Dec 2020
3. Semtech: LoRaTM Modulation Basics, Application note AN1200.22, 2015 (2020). https://
semtech.my.salesforce.com/sfc/p/E0000000JelG/a/2R0000001OJa/
2BF2MTeiqIwkmxkcjjDZzalPUGlJ76lLdqiv.30prH8. Dec 2020
4. LoRa Alliance: LoRaWAN Regional Parameters, 2020 (2020). https://lora-alliance.org/sites/
default/files/2020-06/rp_2-1.0.1.pdf. Dec 2020
5. LoRa Alliance: LoRaWAN 1.1 Specification, 2020 (2020). https://lora-alliance.org/sites/
default/files/2018-04/lorawantm_specification_-v1.1.pdf. Dec 2020
6. EBYTE E32–868T20D User manual (2021). https://www.ebyte.com/en/downpdf.aspx?id=
132. Mar 2021
7. Carmine, N.: Mastering STM32, Leanpub (2018). https://leanpub.com/mastering-stm32.
Dec 2020
8. Agus Kurniawan, Internet of Things Projects with ESP32, Packt Publishing (2019). ISBN
978–1–78995–687–0
9. Horowitz, P.: (2015) Winfield Hill. Cambridge University Press, The Art of Electronics
Third Edition (2015)
10. Barry, R.: Mastering the FreeRTOSTM Real Time Kernel (2020). https://www.freertos.org/
fr-content-src/uploads/2018/07/161204_Mastering_the_FreeRTOS_Real_Time_Kernel-A_
Hands-On_Tutorial_Guide.pdf. Nov 2020
11. HiveMQ, MQTT & MQTT 5 Essentials e-book (2021). https://www.hivemq.com/download-
mqtt-ebook/. Feb 2021
12. Mrsic, L., Zajec, S., Kopal, R.: Appliance of social network analysis and data visualization
techniques in analysis of information propagation. In: Nguyen, N.T., Gaol, F.L., Hong, T.-
P., Trawiński, B. (eds.) ACIIDS 2019. LNCS (LNAI), vol. 11432, pp. 131–143. Springer,
13. Mrsic, L., Jerkovic, H., Balkovic, M.: Interactive skill based labor market mechanics and
dynamics analysis system using machine learning and big data. In: Sitek, P., Pietranik, M.,
Krótkiewicz, M., Srinilta, C. (eds.) ACIIDS 2020. CCIS, vol. 1178, pp. 505–516. Springer,
Singapore (2020). https://doi.org/10.1007/978-981-15-3380-8_44
14. Intelligent Computing & Optimization: Conference proceedings ICO 2018. Springer, Cham
(2018). ISBN 978–3–030–00978–6 https://www.springer.com/gp/book/9783030009786
15. Intelligent Computing and Optimization. Proceedings of the 2nd International Conference on
Intelligent Computing and Optimization 2019 (ICO 2019). Springer International Publish-
ing, ISBN 978–3–030–33585–4. https://www.springer.com/gp/book/9783030335847
16. Intelligent Computing and Optimization, Proceedings of the 3rd International Conference on
Intelligent Computing and Optimization 2020 (ICO 2020). https://doi.org/10.1007/978-3-
030-68154-8
An Analysis of AUGMECON2 Method
on Social Distance-Based Layout Problems
Şeyda Şimşek1(&), Eren Özceylan2, and Neşe Yalçın1

1
Industrial Engineering Department, Adana Alparslan Türkeş Science and
Technology University, 01250 Adana, Turkey
{ssimsek,nyalcin}@atu.edu.tr
2
Industrial Engineering Department, Gaziantep University, 27100 Gaziantep,
Turkey
eozceylan@gantep.edu.tr
Abstract. In the COVID-19 era, social distance has become a new source of
concern for people. Decision-makers have a limited idea of how to allocate
people according to social distance due to the lack of preparedness for the
pandemic. It is essential to think about both distributing as many individuals as
possible in a particular area and minimizing the infection risk. This new con-
cern’s multi-objective state affords decision-makers the opportunity to solve the
problem using enhanced methodologies. The AUGMECON2 method, one of the
recent popular generation methods, is used to produce the exact Pareto sets for
the problem. The scale and time constraints of the challenge have been exam-
ined, and recommendations have been made to decision-makers on the trade-off
between the number of people and the infection risk.
Keywords: COVID-19 Social distancing Multi-objective optimization

Layout optimization AUGMECON2
1 Introduction
Coronavirus disease 2019 (COVID-19), a novel disease, was discovered in Wuhan,

China, at the end of 2019. This one-of-a-kind disease, which is infectious, is caused by
a coronavirus. Because of the virus’s spread throughout the world, this disease has
quickly become a global concern [1]. The virus spreads through droplets that exit from
the infected person’s nose or mouth [2]. Infected people may spread the virus via
airborne besides droplets to others who come into contact with them. Respiratory
diseases, like coronavirus, are inextricably linked to close physical contact, as are many
other diseases [3]. When it comes to transmission via physical contact, social distance
may be defined as a combination of non-pharmaceutical treatments used to prevent the
transmission of any contagious disease. It’s also known as physical separation, and the
goal is to create a predetermined amount of physical distance between people while
limiting the likelihood of close contact [4]. In the study of Chu et al. [5], more than
25,000 patients are investigated to determine the impact of distance between patients
and other persons. According to their findings, it is demonstrated that the importance of
using face masks and keeping a safe distance to prevent the virus from spreading. Also,

https://doi.org/10.1007/978-3-030-93247-3_37
382 Ş. Şimşek et al.
it is recommended that people wear masks and maintain a physical distance of at least
one meter or two meters if possible between themselves and infected people recom-
mended. The fundamental benefit of distance is that it prevents SARS-CoV-2 infection
and reduces transmission.
With the coronavirus pandemic, governments have recommended and even
imposed measures for social distancing through legislation to ensure the safety of their
nation. Even though the measures are self-evident, there are no widely accepted norms
for applying the social distance rule. As a result, public spaces such as restaurants,
markets, offices and universities make their own decisions regarding how to apply
social distance measures. To ensure the application of social distance rules, the allo-
cation of people based on social distance measures has recently been handled in the
literature. When allocating people according to social distance, two objectives spring to
mind: ensuring safety and locating as many people as possible. It is not a trivial task to
create a safer environment by simultaneously locating as many people as possible while
ensuring the lowest possible virus load and infection risk, because the more people
mean the more possible source of the virus.
The main purpose of this paper is to introduce a general approach as a multi-objective
optimization problem including maximizing the number of tables seated by only one
person (also can be thought of as maximizing the number of people) under social
distance and minimizing the infection risk. These competing objectives encourage the
use of multi-objective optimization techniques. When these techniques are considered for
this study, the generation methods are considered superior methods in the aspects of
computational speed, giving Pareto sets as all possible solutions and providing advan-
tages to decision-makers. One of the most used generation methods, the e-constraint
method, which has some advanced versions such as the augmented e-constraint method
(AUGMECON) introduced by Mavrotas [6] and AUGMECON2 method introduced by
Mavrotas and Florios [7], is considered as a useful method. Due to its novelty and
feasibility, the AUGMECON2 version of the method is applied in this study. To the best
of the authors’ knowledge, this method has been applied for the first time to an allocation
problem including both virus load and infection risk and social distance in a finite area.
The rest of the study is organized as follows: Sect. 2 conducts a broad assessment
of the literature on social distance-based layout optimization. Section 3 introduces a
generic approach for multi-objective optimization problems and methods. The findings
of the applications are presented in Sect. 4, and the conclusions and future roadmap are
indicated in Sect. 5.
2 Literature Review
Even though distance constraint is not a new concern, optimization based on social
distance can be considered a novel topic. When social distance constraint is taken into
account, the spread of the virus and infection risk must also be considered. In the
related literature, these two aspects have been handled by some researchers (Table 1).
Due to the huge decline in both passenger numbers and GDP associated with air
travel [20], the studies [8–13] suggest various models for the air transportation sector.
The allocation of passengers on the plane, the apron and the waiting queue are all
An Analysis of AUGMECON2 Method 383
Table 1. A literature review about social distance-based layout problems.

Reference Application Methodology Performance metrics/goals
area
Milne Air Agent based Aisle and window risk, window risk,
et al. [8] transportation modeling and seat interferences, boarding
simulation times/Assessing boarding methods
Salari, Air Mathematical Distance between seats and aisle and
et al. [9] transportation modeling and people, number of people/Maximizing
simulation the safe load under social distancing
Milne Air Agent based Boarding time, seat interferences,
et al. [10] transportation modeling and infection risk/Comparing the boarding
simulation methods under social distancing
Cotfas Air Agent based Boarding time, seat interferences, extra
et al. [11] transportation modeling and luggage storage duration, aisle seats
simulation and window seat risk/Comparing the
boarding methods
Milne Air Agent based Boarding time, virus spread, window
et al. [12] transportation modeling risk, aisle risk/Evaluating boarding
methods under health constraint
Pavlik Air Mathematical Number of passengers, total
et al. [13] transportation modeling risk/minimizing transmission risk
Moore Public Mathematical Configuration of the vehicle, group
et al. [14] transportation modeling and based seat arrangement/Assigning
heuristics people according to physical distancing
Kudela Artificial Mathematical Distance between points, number of the
[15] examples modeling and points, time/Best practicing of social
heuristics distancing
Contardo Allocation in Mathematical The configuration of tables, the shape
and Costa dining room modeling of room and tables, sitting sense of the
[16] customers/Finding the best way to
place tables that socially distant
Fischetti Allocation in Mathematical Distance between people, virus
et al. [17] restaurants, modeling function related with
beach, theater distance/Maximizing the number o
tables while minimizing the infection
risk
Dündar Allocation in Heuristics Social distancing, surface
and the university area/Maximum allocation of seats,
Karaköse maximum of minimum of social
[18] distancing model
Ugail Allocation in Mathematical Position of doors, windows, dimension
et al. [19] the university modeling of physical space/Suggesting optimal
designs under social distancing
significant challenges for this industry. Also, the problem of public transport including
school buses and trains has been addressed by Moore et al. [14]. Aside from trans-
portation models, every aspect of social life requires layout optimization based on
social distance due to the lack of preparedness for the pandemic. Some studies [15–19]
have attempted to solve this problem by allocating people under social distance con-
straints for their safety to different types of places such as universities, restaurants,
beaches, etc.
3 Multi-objective Optimization on Social Distance
Multi-objective integer programming is an essential research area due to many real-life

situations which need discrete representations by integer variables [21]. Solving
algorithms for multi-objective mathematical programming may be classified into three
categories: a priori methods, interactive methods, and a posteriori (generation) methods
[22].
When considering these techniques for this study, the generation methods might be
useful due to their computational speed [7]. Furthermore, the generation methods have
lots of advantages such as giving all possible solutions (i.e. the Pareto sets) and pro-
viding a whole picture for a multi-objective problem to decision-makers. Also known
as a posteriori methods, the generation methods include the most commonly used
solution techniques called e-constraint and weighting. When these two techniques are
compared, the e-constraint method has the following several advantages: alters the
feasible region and represents richer efficient sets, produces the unsupported efficient
solutions in multi-objective integer and mixed integer problems, does not need to scale
the objective functions and eliminates the strong effects of scaling on results, and
controls the numbers of the generated efficient solutions by altering the ranges of
objective functions [6].
The e-constraint method takes into consideration the most significant objective as
an objective function while considering other objective(s) as constraint(s). Thanks to
this process, efficient solutions are produced by changing the right hand side of each
constraint [23].
The AUGMECON method proposed by Mavrotas [6], the previous version of the
AUGMECON2, is the augmented e-constraint method. The solving steps of this
method for a classical maximization problem are given in Eq. (1).

max f1 ðxÞ þ eps S2 =r2 þ S3 =r3 þ . . . þ Sp =rp
s:t:
f2 ðxÞ S2 ¼ e2
f3 ðxÞ S3 ¼ e3
ð1Þ
...
fp ðxÞ Sp ¼ ep
x2S
Si 2 R þ :
All the e values are the parameters for the right hand side and all the r values are the
ranges of the respective objective functions. Also, S values show the surplus variables.
And, the eps is a variable that is limited as [10–6, 10–3] [24]. The AUGMECON2
method introduced by Mavrotas and Florios [7] in Eq. (2) mainly differs in the
objective function part.

max f1 ðxÞ þ eps S2 =r2 þ 101 S3 =r3 þ . . . þ 10ðp2Þ Sp =rp : ð2Þ
The model proposed in this study comes from a recent study by Fischetti et al. [17],
which was inspired by wind turbine allocation and used this approach for layout
optimization under the social distance constraint. According to a discrete set V of
possible points, a binary variable xi the allocation of one table (person) in a point has
been defined as 1 or 0 in Eq. (8). A minimum and the maximum number of tables
(people) have been defined as a constraint in Eq. (5). An infection risk variable Iij has
also been defined in Eq. (10), which is correlated with the dij variable meaning the
distance between people.
In the study of Fischetti et al. [17], the objective function is the difference between
maximizing profit including the number of tables and total infection risk. The trade-off
between the objectives in this problem directed the problem addressed in this paper.
Thus, the maximizing objective is accepted as maximizing the number of people as
a single objective without any coefficient and adapted to the AUGMECON2 method as
seen in Eq. (3). Then, the objective minimizing the total infection risk is defined as a
constraint given in Eq. (4) according to the general approach of the AUGMECON2.
X
Max z ¼ i2V
ðxi Þ þ eps ðS2 =r2 Þ: ð3Þ
X
s:t: i2V
ðwi Þ S2 ¼ e2 : ð4Þ
X
Nmin i2V
ðxi Þ Nmax : ð5Þ
xi þ xj 1 ½i; j 2 E1 : ð6Þ
X
I X
i2V ij i
wi þ Mi ð1 xi Þ i 2 V: ð7Þ
xi 2 f0; 1g i 2 V: ð8Þ
wi 0 i 2 V: ð9Þ
Iij . . .1=dij3 : ð10Þ
Since there must be a defined minimum distance between two people, the constraint
given in Eq. (6) ensures the social distance between two people. This distance is
defined as 3 m for the applied model in this study. The size of the distance can be
varied from country to country and government as well. The constraint given in Eq. (7)
forces to deactivate the total infection risk in case of a point is not allocated by any
person. Lastly, the constraint in Eq. (9) ensures that the total infection risk is never less
than zero.
4 Applications
In this section, the AUGMECON2 method is applied to the proposed model. The
source codes [25] have been modified and then applied to the defined datasets. All
applications are performed on an Intel(R) Core(TM) i7-4702MQ CPU @ 2.20 GHz
computer with 8 GB RAM running Windows 8.1 and solved in GAMS 34.3.0 envi-
ronment using CPLEX 20.1.0 solver.
In the study, three datasets with 150 points are used as coordinates of the potential
allocation points. The first dataset gathered from the study of Fischetti et al. [17] is
Dataset1 whose variables are varied between 6.8 and 14.9 for x-coordinate and 8.4 and
23.4 for y-coordinate. Dataset2 and Dataset3 are created randomly. The Dataset2 and
Dataset3 are varied between 1 to 12.9 and 2 and 16.9 for x-coordinate, and 2 and 11.3
and 4 and 12.9 for y-coordinate, respectively. The distance between points of each
dataset that is considered and evaluated independently has an irregularity. All the
datasets can be obtained from Github [26].
Table 2. Payoff tables and execution times for the Dataset2.

Different sized inputs
20 points 30 points 40 points 50 points 100 points 150 points
max min max min max min max min max min max min
f1 f2 f1 f2 f1 f2 f1 f2 f1 f2 f1 f2
f1 3 1 3 1 3 1 3 1 8 1 12 1
f2 0.12 0.0 0.09 0.0 0.09 0.0 0.09 0.0 0.49 0.0 0.99 0.0
Time (sec.) 1.63 0.56 0.71 0.75 37.26 12,611.56
All applications are performed according to the six different sized inputs of each
dataset. These six inputs are generated sequentially from the first 20, 30, 40, 50, 100 of
150 points, and all 150 points for each dataset. As an example,P the payoff tables with
respect to both objectives (illustrated as max f1 ¼ z ¼ i2V ðxi Þ and min f2 ¼
P
i2V ðwi Þ for easy representation), and the execution times for all inputs of Dataset2
are given in Table 2.
The payoff table gives the individual optimization of each objective as diagonal [7].
In Table 2, the payoff tables are given and the conflicting behavior of maximizing
people and minimizing the total amount of infection risk is approved. The more people
always ended up with the more infection risk. At the same time, if the possible points
have increased from 20 to 30 and from 30 to 40; the number of people stays the same,
but the total amount of infection risk has decreased for Dataset1 and Dataset2. Because
the more possible allocation points give more opportunities to allocation, it decreases
the infection risk.
The trade-offs between the number of people and the total amount of infection risk
for that number of people can be seen from the Pareto frontiers. The model has been
implemented for the 150 points of the Dataset2 and 5 different options are observed as
efficient points depicted in Fig. 1 according to the number of people on the horizontal
line and the total amount of infection risk on the vertical line. Any decision-maker can
easily evaluate and decide to choose one option. For example, a decision-maker can
evaluate and make a healthy decision for this problem according to the obtained five
efficient solutions based on the number of people and the total amount of infection risk
that are (12, 0.99), (11, 0.71), (10, 0.50), (8, 0.24), (1, 0).
Fig. 1. A Pareto frontier for 150 points of the Dataset2.
Table 2 also shows that the execution times are less than 1 min for the 20, 30, 40,
50 and 100 points. Also, the execution times of Dataset1 and Dataset3 give similar
results as Dataset2. However, the execution time after 100 points is surprisingly and
drastically increased. An exponential increase is observed for all three datasets.
The AUGMECON2 method also depicts the allocation points according to each
efficient solution. In here, let it be assumed that an assumption has been made as; the
difference in the amount of infection risk between 11 and 12 people is tolerated and it is
decided to select 12 people. According to this selection, Fig. 2 shows a visual repre-
sentation of allocation points.
As seen from Fig. 2, blue points show all possible allocation points and red points
show the allocated points at the optimal result for Dataset2. All allocated points have a
minimum of 3 m from each other. As a result, a total of 12 points out of 150 are
determined by considering social distancing norms and infection risk.
Fig. 2. All possible allocation points and the allocated points for the Dataset2.
5 Conclusions
The social distance measures first appeared in our lives with COVID-19, prompted us
to enact new rules. One of them is allocating people according to social distance
measures. In this context, getting people in a finite area while following social distance
measures is handled in this study. COVID-19 is a recent disease but social distance
strategies are appropriate for any respiratory disease that is linked to close contact. The
aim of this study is to present a general approach as a multi-objective optimization
problem including maximizing people under social distance and minimizing the
infection risk. Among the many methods suitable to solve multi-objective problems,
the AUGMECON2 method is used. The original model has been modified and inte-
grated into the method. In the analysis, three datasets are taken into account and the
results are evaluated. Time and size limitations are observed. The Pareto efficient sets
are generated and possible efficient solutions are provided as a whole picture to the
decision-maker. This multi-objective problem might be assumed to view through the
eyes of two decision-makers. One is an owner of any place such as a restaurant who
wants to increase the number of customers while the other is a customer who wants to
sit in a safe location as much as possible. A method satisfying the preferences of both
decision-makers has been investigated to meet the trade-off between these two
conflicting objectives. It is observed from the obtained results that the overall infection
risk increases by roughly 40% for each one more person and a 39% rise from 11 to 12
people is regarded tolerable by decision-makers. As a result, the conflicting structure of
the problem is supported by the analysis results, which confirm that the more people are
the higher the infection risk is provided.
The AUGMECON2 method is an improvable method and has other recent versions
such as AUGMECON-R and A-AUGMECON2 methods. Therefore, a further research
direction may include a broad comparison of all AUGMECON versions may provide
broader perspectives for this problem. Due to the time and size limitations, the method
is only applied to a limited area. Because of this, some suggestions are presented for
further research. Also, with some advanced methods and heuristics, the problem may
be extended and solved for bigger areas. In addition, the structure of the problem may
be enriched by adding new objectives. The possible allocation points may be included
chairs, tables or desks seated by more than one person for each point. With further
epidemiological researches, products such as air conditioners and air cleaners that can
affect viruses in different ways may also be located in places. Lastly, the distribution of
possible points might be more regular or more irregular depending on the creativity and
needs of the selected places.
References
1. Yu, Y., et al.: Patients with COVID-19 in 19 ICUs in Wuhan, China: a cross-sectional study.
Crit. Care 24, 1–10 (2020)
2. Aldila, D., et al.: A mathematical study on the spread of COVID-19 considering social
distancing and rapid assessment: the case of Jakarta, Indonesia. Chaos Solitons Fractals 139,
1–22 (2020)
3. Sun, C., Zhai, Z.: The efficacy of social distance and ventilation effectiveness in preventing
COVID-19 transmission. Sustain. Cities Soc. 62, 1–10 (2020)
4. Moosa, I.A.: The effectiveness of social distancing in containing Covid-19. Appl. Econ. 52,
6292–6305 (2020)
5. Chu, D.K., Akl, E.A., Duda, S., Solo, K., Yaacoub, S., Schünemann, H.J.: Physical
distancing, face masks, and eye protection to prevent person-to-person transmission of
SARS-CoV-2 and COVID-19: a systematic review and meta-analysis. Lancet 395, 1973–
1987 (2020)
6. Mavrotas, G.: Effective implementation of the e-constraint method in multi-objective
mathematical programming problems. Appl. Math. Comput. 213, 455–465 (2009)
7. Mavrotas, G., Florios, K.: An improved version of the augmented e-constraint method
(AUGMECON2) for finding the exact pareto set in multi-objective integer programming
problems. Appl. Math. Comput. 219, 9652–9669 (2013)
8. Milne, R.J., Delcea, C., Cotfas, L.A., Ioanaş, C.: Evaluation of boarding methods adapted for
social distancing when using apron buses. IEEE Access 8, 151650–151667 (2020)
9. Salari, M., Milne, R.J., Delcea, C., Kattan, L., Cotfas, L.A.: Social distancing in airplane seat
assignments. J. Air Transp. Manag. 89, 1–14 (2020)
10. Milne, R.J., Cotfas, L.A., Delcea, C., Craciun, L., Molanescu, A.G.: Adapting the reverse
pyramid airplane boarding method for social distancing in times of COVID-19. PLoS ONE
15, 1–26 (2020)
11. Cotfas, L.A., Delcea, C., Milne, R.J., Salari, M.: Evaluating classical airplane boarding
methods considering COVID-19 flying restrictions. Symmetry 12, 1–26 (2020)
12. Milne, R.J., Delcea, C., Cotfas, L.A.: Airplane boarding methods that reduce risk from
COVID-19. Saf. Sci. 134. (2021). https://doi.org/10.1016/j.ssci.2020.105061
13. Pavlik, J.A., Ludden, I.G., Jacobson, S.H., Sewell, E.C.: Airplane seating assignment
problem. Serv. Sci. 13, 1–18 (2021)
14. Moore, J.F., Carvalho, A., Davis, G.A., Abdulhasan, Y., Megahed, F.M.: Seat assignments
with physical distancing in single-destination public transit settings. IEEE Access 9, 42985–
42993 (2021)
15. Kudela, J.: Social distancing as p-dispersion problem. IEEE Access 8, 149402–149411
(2020)
16. Contardo, C., Costa, L.: On the optimal layout of a dining room in the era of COVID-19
using mathematical optimization. http://arxiv.org/abs/2108.04233
17. Fischetti, M., Fischetti, M., Stoustrup, J.: Safe distancing in the time of COVID-19. Eur. J.
Oper. Res. (2021). https://doi.org/10.1016/j.ejor.2021.07.010
18. Dundar, B., Karakose, G.: Seat assignment models for classrooms in response to Covid-19
pandemic. J. Oper. Res. Soc. 1–13 (2021). https://doi.org/10.1080/01605682.2021.1971575
19. Ugail, H., et al.: Social distancing enhanced automated optimal design of physical spaces in
the wake of the COVID-19 pandemic. Sustain. Cities Soc. 68 (2021). https://doi.org/10.
1016/j.scs.2021.102791
20. ICAO, Effects of Novel Coronavirus (COVID-19) on Civil Aviation: Economic Impact
Analysis, Economic Development Air Transport Bureau. https://www.icao.int/sustainability/
Documents/COVID-19/ICAO_Coronavirus_Econ_Impact.pdf
21. Özlen, M., Azizoğlu, M.: Multi-objective integer programming: a general approach for
generating all non-dominated solutions. Eur. J. Oper. Res. 199, 25–35 (2009)
22. Nikas, A., Fountoulakis, A., Forouli, A., Doukas, H.: A robust augmented e-constraint
method (AUGMECON-R) for finding exact solutions of multi-objective linear programming
problems. Oper. Res. Int. J. (2020). https://doi.org/10.1007/s12351-020-00574-6
23. Mavrotas, G.: Generation of efficient solutions in multiobjective mathematical programming
problems using GAMS. Effective Implementation of the e-Constraint. https://www.
researchgate.net/publication/228612972
24. Mavrotas, G., Florios, K.: AUGMECON 2: a novel version of the e-constraint method for
finding the exact pareto set in multi-objective integer programming problems. https://www.
gams.com/modlib/adddocs/epscmmip.pdf
25. GAMS Source Code for AUGMECON2. https://www.gams.com/latest/gamslib_ml/libhtml/
gamslib_epscmmip.html
26. Datasets for applications. https://github.com/Seydase/Datasets.git
An Intelligent Information System
and Application for the Diagnosis and Analysis
of COVID-19
Atif Mehmood, Ahed Abugabah(&), Ahmad A. L. Smadi,

and Reyad Alkhawaldeh
College of Technological Innovation, Zayed University, Abu Dhabi, UAE

Ahed.abugabah@zu.ac.ae
Abstract. The novel coronavirus spread across the world at the start of 2020.
Millions of people are infected due to the COVID-19. At the start, the avail-
ability of corona test kits is challenging. Researchers analyzed the current sit-
uation and produced the COVID-19 detection system on X-ray scans. Artificial
intelligence (AI) based systems produce better results in terms of COVID
detection. Due to the overfitting issue, many AI-based models cannot produce
the best results, directly impacting model performance. In this study, we also
introduced the CNN-based technique for classifying normal, pneumonia, and
COVID-19. In the proposed model, we used batch normalization to regularize
the mode land achieve promising results for the three binary classes. The pro-
posed model produces 96.56% accuracy for the classification for COVID-19 vs.
Normal. Finally, we compared our model with other deep learning-based
approaches and discovered that our approach outperformed.
Keywords: COVID-19 CNN Batch normalization Classification
1 Introduction
The novel coronavirus spread worldwide to more than 200 countries and put an
unpredictable load on the healthcare system. COVID-19 directly affects the lungs and
disturbs the upper respiratory system. It was first found out in Wuhan, China. In a
shorter time, it was spread and direct effect on the lungs tissues. At the start, coron-
avirus cases exponentially increased and reached a million cases around the world.
Once infected with COVID-19, a patient may have various symptoms and indicators of
infection, including fever, coughing, and respiratory sickness, among other things.
Severe occurrences of the infection can result in pneumonia, trouble breathing, multi-
organ failure, and even death in complex situations [1]. Due to sudden outbreaks, many
developed countries’ health care systems downfall. During COVID-19 situation
required more ventilators in hospitals. In the critical situation, many countries
A. Abugabah—Please note that the AISC Editorial assumes that all authors have used the western
naming convention, with given names preceding surnames. This determines the structure of the
names in the running heads and the author index.

https://doi.org/10.1007/978-3-030-93247-3_38
392 A. Mehmood et al.
announced the lockdown, did not allow outside the house, and strictly banned gath-
ering in the community. In the fight against COVID-19, it is critical and necessary to
conduct effective screening of infected people so that confirmed patients can be seg-
regated and treated as soon as possible after being identified [2].
The test is performed based on patient respiratory specimens. It has been shown
that the lungs of patients suffering from COVID-19 symptoms exhibit various visual
characteristics. Examples of these are ground-glass opacities, which can be utilized to
discriminate between COVID-19 infected persons and non-infected individuals.
A method that relies on chest radiography, according to scientists, has the potential to
be a helpful tool in the diagnosis, measurement, and follow-up of COVID-19 patients.
Because of the complicated anatomical patterns of lung participation that might alter in
extent and appearance over time, the correctness of CXR identification of COVID-19
infection is heavily reliant on radiographic expertise [3]. As a result of an insufficient
set of sub-trained thoracic radiologists, it is difficult to provide appropriate interpre-
tation of complicated chest investigations, particularly in developing countries, where
mainstream radiologists and physicians occasionally review chest imaging in devel-
oping nations. There are significant benefits to using a chest radiology image-based
detection mechanism over a standard approach [4].
Recently, there have been many techniques for current cutting edge methodologies
and applications to real-world problems [5, 6]. It has the advantages of being quick,
analyzing more than one case at the same time, having better availability, and, most
importantly, being highly beneficial in hospitals with a limited amount of testing kits
and facilities. As a result of the relevance of radiation in the global healthcare system
and the widespread availability of radiology imaging devices throughout the country,
radiography-based approaches are becoming increasingly accessible [7]. In artificial
intelligence (AI), deep learning is considered the subset of machine learning. These
algorithms were inspired by the human brain structure that is called an artificial neuron.
Although deep learning techniques, in particular convolutional neural networks, are
being explored (CNNs), have consistently outperformed humans in many computer
tasks such as computer vision and classification [8]. In recent research, deep learning-
based approaches produced different results for detecting pneumonia and COVID-19
disease. The main reason behind deep learning-based approaches is that researchers do
not require handcrafted features. In machine learning, based techniques rely on
handcrafted features, which directly reduced the model performance and took extra
resources and time [9].
Wong et al. [10] developed a deep learning-based approach for classifying normal,
pneumonia, and COVID-19. Apostolopoulos et al. [4] introduced new pre-trained
models that achieved 98.75% to 93.48% for binary classification. In researchers also
test the seven different approaches based on deep learning. They used small data
samples for all techniques and attained 80.23% to 90.19% accuracy for classifying
normal and COVID-19 patients. Wang and Xia [11] introduced a new CNN-based
architecture distinguishing between normal, pneumonia, and COVID-19. They used
more than 10000 images for the classification purpose during the experimental process
and achieved 92.4% accuracy. In researchers [12] developed another CNN-based
technique that applied on the chest X-ray images. They also used the pre-trained model,
especially Xception architecture. That architecture is already pre-trained on the
An Intelligent Information System and Application 393
ImageNet database. They achieved the 89.6% average accuracy on multi-class classi-
fication. Alazab et al. [13] developed three different techniques for the classification of
COVID-19 X-Ray scans. They attained 90% to 97% performance in term classification.
Researchers used the five pre-trained CNN models: Inception, ResNet 101, ResNet
152, and InceptionV3. In the experimental process, they used three different binary
classes and also used the 5-fold cross-validation. Sethy et al. [14] designed another
model combining CNN and support vector machine (SVM). In researchers prove that
the combination of ResNet50 produced the best results.
The most significant factor contributing to the success of AI-based solutions in the
medical field is to extract the automatic features from the input data samples. However,
in deep learning-based approaches still, many issues which decreased the classification
performance on COVID-19 scans. The primary issue with many models due to over-
fitting. They did not seem able to produce the best results. In our research study, we
overcame this issue and regularized the model that can perform the best results in terms
of classification. Furthermore, we used the normalization approach in the proposed
model to perfect and avoid the overfitting issue.
2 Methodology
Our proposed model is based on three major stages in this study, first pre-processing
the data samples. Second, extract the more valuable features from the processed data to
classify the normal and COVID-19 patients. This study designed the CNN-based
technique, which included the number of convolutional layers and pooling layers and
the different batch normalization techniques. The proposed model flow chart is shown
in Fig. 1, and Fig. 2 shows the details of the proposed model.
Fig. 1. The proposed model flow chart includes data collection to final classification results.

We acquired the input data samples from the kaggle database and used our model to
classify the three binary classes. In CNN-based approaches, we need more data to
reduce the overfitting issue. To overcome this matter, we used the augmentation
approach to extend the data X-ray scans. For this, different parameters include flipping,
zoom value and brightness value, and shifting.
2.2 CNN Model

In the proposed model, CNN layers extract the local characteristics derived from the
input sample data. These layers have biases, and different weights used when the model
is being trained during the training procedure. These weights are shared from one layer
to the next layer. In terms of dimensionality reduction, we used the max-pooling layers.
When the CNN-based approach using for training on the scratch data, different
parameters included activation function and learning rate with batch normalization
[15]. The proposed model used 12 convolutional layers with 4 max-pooling and three
fully connected, as shown in Fig. 2.
Fig. 2. Proposed CNN-based model with normalization technique.
3 Results and Analysis
During experimental processes, we used the Keras library on the workstation with
32 GB RAM. In this study, we used 1290 scans belongs to COVID, 1946 normal, and
2154 pneumonia patients. For classification, we extract the features from all data
samples. During training the model, we split the data 80% for training and 20% for
testing.
Table 1. Proposed CNN based approach performance on three binary classes.

Binary classes Accuracy (%) Sensitivity (%) Specificity (%)
COVID-19 vs. Normal 96.56 94.73 97.08
COVID-19 vs. Pneumonia 93.39 91.89 93.58
Normal vs. Pneumonia 92.41 90.46 93.31
We can see the proposed model results in Table 1. Our CNN-based approach
produced the best results for the classification of COVID-19 vs. Normal and attained
96.56% performance in terms of accuracy. Furthermore, remaining two binary classes,
we achieved the 93.39% and 92.41%, respectively. In Table 2. We show the com-
parison between other deep learning models. We can see our proposed model produced
promising results.
An Intelligent Information System and Application 395
Table 2. Proposed model comparison with other approaches.

Methods Accuracy (%)
Ozturk et al. [3] 87.02
Khan et al. [8] 89.60
Hemdan et al. [16] 90
Proposed model 95.56
4 Conclusion
Nowadays, COVID-19 is still increasing daily. This situation still needed the proper
computer-aided system (CAD), which is based on deep learning approaches that can
detect the COVID-19 on time. In this study, we introduced the CNN-based approach
with the combination of batch normalization for the classification of COVID-19. Our
proposed model extracts the more useful features from the three binary classes and
produces promising results in classification. During the experimental process, we
acquire the data samples from the kaggle database. Finally, we attained 96.56%,
93.39%, and 92.41% on three binary classifications, respectively.
Funding. This research is supported by Zayed University, Office of the research.
References
1. Mahase, E.: Coronavirus: COVID-19 has killed more people than SARS and MERS
combined, despite lower case fatality rate (2020)
2. Liu, Y., Gayle, A.A., Wilder-Smith, A., Rocklöv, J.: The reproductive number ofcovid-19 is
higher compared to SARS coronavirus. J. Travel Med. (2020)
3. Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Acharya, U.R.: Automated
detection of covid-19 cases using deep neural networks with x-ray images. Comput. Biol.
Med. 121, 103792 (2020)
4. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images
utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 43(2),
635–640 (2020)
5. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing & Optimization, vol. 866.
6. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization: Proceedings
of the 2nd International Conference on Intelligent Computing and Optimization 2019 (ICO
2019), vol. 1072. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4
7. Wang, D., et al.: Clinical characteristics of 138 hospitalized patients with 2019 novel
coronavirus–infected pneumonia in Wuhan, China. JAMA 323(11), 1061–1069 (2020)
8. Khan, A.I., Shah, J.L., Bhat, M.M.: CoroNet: a deep neural network for detection and
diagnosis of covid-19 from chest x-ray images. Comput. Methods Programs Biomed. 196,
105581 (2020)
9. Lalmuanawma, S., Hussain, J., Chhakchhuak, L.: Applications of machine learning and
artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons
Fractals 139, 110059 (2020)
10. Wang, L., Lin, Z.Q., Wong, A.: Covid-net: a tailored deep convolutional neural network
design for detection of covid-19 cases from chest x-ray images. Sci. Rep. 10(1), 1–12 (2020)
11. Wang, H., Xia, Y.: ChestNet: a deep neural network for classification of thoracic diseases on
chest radiography. arXiv preprint arXiv:1807.03058 (2018)
12. Chollet, F.: Xception: deep learning with depth wise separable convolutions. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258
(2017)
13. Alazab, M., Awajan, A., Mesleh, A., Abraham, A., Jatana, V., Alhyari, S.: Covid-19
prediction and detection using deep learning. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 12,
168–181 (2020)
14. Sethy, P.K., Behera, S.K.: Detection of coronavirus disease (covid-19) based on deep
features (2020)
15. Mehmood, A., et al.: A transfer learning approach for early diagnosis of Alzheimer’s disease
on MRI images. Neuroscience 460, 43–52 (2021)
16. El-Din Hemdan, E., Shouman, M.A., Karar, M.E.: COVIDX-Net: a framework of deep
learning classifiers to diagnose covid-19 in x-ray images. arXiv e-prints pp. arXiv–2003
(2020)
Hand Gesture Recognition Based Human
Computer Interaction to Control Multiple
Applications
Sanzida Islam(B) , Abdul Matin, and Hafsa Binte Kibria
Department of Electrical and Computer Engineering, Rajshahi University

of Engineering and Technology, Rajshahi 6204, Bangladesh
Abstract. Human Computer Interaction (HCI) is nothing but a system

where humans can interact with the computer more naturally and effi-
ciently. The main aim is to eliminate the generally used controllers such
as - mouse, keyboard, pointers, etc. which works as a barrier between
humans and computers. This research provides a method for detecting
hand gestures using computer vision techniques for controlling various
applications in real-time. The proposed method detects all the skin-
colored objects from the captured frames and then detects the face by
using Haar based classifier. The number of fingers is detected by the con-
vexity defect approach and then the movement of the hand is tracked.
These are considered as the features of the hand gesture recognition sys-
tem. This hand gesture recognition system doesn’t require any dataset,
hence this is simpler to develop. The detected face is blocked. After the
gesture is recognized, they’re translated into actions. 20 commands are
generated from the hand gestures and sent to the computer via the key-
board. Due to this method, multiple applications like-video player, music
player, PDF reader, slideshow presentation, etc. whichever application
takes input from the keyboard can be controlled with this single system.
The system can be used for different purposes like human-robot commu-
nication, e-learning, touch-less interaction with the computer, etc.
Keywords: Computers vision · Hand gesture recognition · Convexity

defect · Computer application control
1 Introduction
In this digital era, the computer is a part of our daily life. So it is necessary to
have the interaction between humans and computers as natural as possible. Input
devices like mouse, keyboard, joystick, pointers, etc. are like a barrier between
humans and the computer. Gesture-based input can remove this barrier and
make the interaction more natural and easier. Gestures can be generated from
the motion of any part of the human body such as the face, head, eyes, hands,
etc. [1].

https://doi.org/10.1007/978-3-030-93247-3_39
398 S. Islam et al.
Hand gestures are used for normal interaction among humans. Hand gestures
are used very frequently as it’s simple and expressive at the same time. Hand
gesture recognition is used for creating user interfaces, such as home appliances
and medical systems, etc. [2]. It can be used for performing mouse operation also
[3]. There can be two types of hand gestures. Static or dynamic hand gestures
[4]. The static gesture means a shape of the hand, whereas the dynamic ges-
ture means a series of hand gestures like hand gestures obtained from a video.
The proposed method uses some of the most common natural gestures to give
instructions to the computer like-move up, down, left, right along with finger
count. This work aims to present a real-time system for hand gesture recogni-
tion based on the detection of the number of fingers and their movement. These
gestures will be converted to inputs to control multiple desktop applications.
In this system, input image is acquired from the webcam and then some
pre-processing is done to reduce the noise. After the processing is completed,
the number of finger is detected from the region of interest. Then the movement
of hand is tracked and finally the gesture is recognized and instructions are
generated. At last, various applications can be controlled using those predefined
gestures.
2 Literature Review
The basic steps in the gesture recognition system are image acquisition, process-
ing, feature extraction, and gesture recognition. Many different methods have
been implemented by researchers based on their application. Each method has
its pros and cons in time requirement, simplicity, cost, accuracy and efficiency.
In [5], a dynamic hand gesture recognition system is developed to control
the VLC media player. This system contains a central computation module that
performs the image segmentation using skin detection and approximate median
technique. The motion direction of the hand region is used as a feature and
a decision tree is used as a classification tool in this application. This system
can only control the VLC media player where as our proposed system can con-
trol multiple applications as it sends the command through the keyboard. The
average accuracy of this system is around 75%.
In another paper [6], the skin region is detected by the skin color model based
on Hue. A segmentation algorithm is developed to separate the hand from the
face. This paper uses the least square method to fit the trajectory of hand gravity
motion. The angle and direction of the hand movement are used for recognizing
four gestures (left, right, up, and down).
Another paper [7], presented a system to control an industrial robot with
hand gestures. This system uses a convexity defect approach to count the fingers
[8] and give commands according to the instructions predefined. The instructions
are given to the robot via serial communication to perform only four operations
(left, right, forward, and backward).
In [9], a hand gesture recognition system is presented which can recognize
seven gestures and launch different applications using them. This system uses
HGR Based Human Computer Interaction 399
a convex hull and convexity defect approach for feature extraction [8] and also
uses a haar cascade for classifying hand gestures without exposing fingers (palm
and fist).
The proposed method uses convexity defect approach to detect number of fin-
gers and track the movement of the hand in four direction. Thus it can generate
more unique gestures instead of just five fingers.
3 Proposed Architecture of HGR System

There are several steps in this proposed system. Such as image acquisition via
webcam, RGB to HSV conversion, skin color detection, eliminate face by using
haar cascade, noise elimination, skin segmentation, thresholding, binary image
enhancement using morphological transformation erosion, dilation, and gaussian
blur, feature extraction using contour, convex-hull, and convexity defects, count
the number of fingers in front of the webcam and track the hand movement.
After completing all the above steps, the decision is taken about the command
to be given to control the chosen application. Figure 1 shows the block diagram
of the proposed system.
Fig. 1. Block diagram of the hand gesture recognition system.
3.1 Skin Detection and Face Elimination
The image was acquired from the webcam in the form of frames. Then it was
converted to HSV after applying Gaussian blur. After that skin-colored objects
were detected from the frames using a mask. Then face was detected using the
400 S. Islam et al.
Fig. 2. Hand detection and face elimination Fig. 3. Threshold image of the hand
Haar cascade classifier and blocked by a black rectangle on the frame [10]. It
wasn’t taken into consideration as it’s one of the biggest skin-colored objects in
the frame. Now, the only skin-colored object that remained inside the frame, is
the hand shown in Fig. 2.
3.2 Noise Removal and Image Enhancement

After the skin color is detected morphological image processing is applied to
remove the remaining noise. At first, erosion is applied. It removes the extra
pixels from the hand. In the second stage, dilation is applied which adds pixels
to the missing parts of the hand. At last, the Gaussian blur is applied to reduce
more noise from the frame.
3.3 Thresholding
This is used for the segmentation of an image to obtain a binary image. This
method compares each pixel’s intensity with the threshold value. If the value is
greater than the threshold value then it’s replaced with a white pixel otherwise
with a black pixel. So the output contains only the skin-colored object in white.
Figure 3 shows the threshold image of the hand.
3.4 Feature Extraction

After these steps are done, the image is ready for feature extraction. There are
three stages in this proposed method.
Contour. The first step is to find the contour of the hand. Contour is the curve
that joins the boundary points of the hand having the same intensity or color.
These points are found due to the change in the intensity of neighboring pixels.
Fig. 4. Contour and Convex hull of hand. Fig. 5. Convexity defect.
It’s found from the threshold image that was formed in the previous stage. In
Fig. 4, the green curve showing the outline of the hand is called the contour.
If there is more than one skin-colored object then the contour is drawn for the
biggest object in the frame. If there are multiple people in front of the camera
then contour will be drawn around the hand which is the largest among all. This
can be based on the distance where they are sitting. The nearest person’s hand
will be seen as the biggest one and the contour will be drawn around it.
Convex Hull. It’s a polygon that bounds all the contour points. It surrounds all
the white pixels from a binary image. The red polygon drawn in Fig. 4 represents
the convex hull of the hand. It determines the fingertip location.
Convexity Defect. The difference between the contour and convex hull is
called the convexity defect. These are the parts of the convex hull but it’s not
the part of the main object. In Fig. 5 the straight lines represent the convex
defects. Three components of the convex defect are a start point, endpoint, and
far point. Yellow dots represent a far point. Using these points the number of
fingers is determined. The process to determine the number of fingers is discussed
in the next section.
3.5 Classification
After the convexity defect points are found the number of fingers is counted. The
classification of hand gesture is done by the number of fingers and their direction
of movement. The cosine rule is used to find the angle at the far point between
the line drawn from the start and endpoint to the far point, i.e. the convex points
or the fingertips for all defects in the gesture shown at any moment. A triangle
like Fig. 6 is formed with the convex points and the far point shown in Fig. 7

B = A2 + C 2 − 2AC cosγ (1)
402 S. Islam et al.
Fig. 6. A triangle. Fig. 7. A triangle is formed with a

start, end, and far points.
From this equation the angle can be calculated.
A2 + C 2 − B 2
γ = Cos−1 ( ) (2)
2AC
If the angle is smaller than 90◦ then it’s considered as a defect. Thus the number
of fingers is found as-
f ingers = def ect + 1 (3)
After the finger is counted, the movement of the center of the hand is tracked
(up, down, left, or right). This is done by tracking the change in the starting and
ending position of the hand in X coordinate and Y coordinate in every frame.
Let, the initial position of the center pixel is (x1, y1) and after 5 consecutive
frames, the position is (x2, y2). If x2 is greater than x1 then that means the hand
has moved in the right direction and vice versa. Same rules are applied when the
movement is along the y axis. Since there can be some unintentional movement
of hand, so We have considered the movement only if it’s greater then 5 cm. This
way each finger is moved to four direction and generated 20 commands.
3.6 Gesture Recognition and Action Generation

After the gestures are recognized, the instructions are given to the application
via the keyboard. After performing one action, a small delay of 5 s is taken before
another command is given. Since many application takes command from the key-
board, all of them can be controlled with this system. Slide presentation, music
player, video player, image viewer, PDF reader, and much other application can
be operated with these 20 gestures.
4 Results
The hand gesture recognition system has been implemented successfully to con-
trol different applications running on a computer. The proposed system can use
Table 1. Hand gestures used to control video player/music player etc.
No. of Hand Keyboard Function

fingers movement Action
1 Right F Toggle fullscreen
2 Left Home/Left Backward/Previous
Right End/Right Forward/Next
3 Any M Mute/Unmute
4 Up Page Up Volume up
Down Page Down Volume down
5 Any Space Play/Pause
20 hand gestures in total. Each of the five fingers can move in four directions
(up, down, left, and right). Figure 8 shows the output image where skin color is
detected and the face is blocked using haar cascade. The threshold image is also
shown on the right corner and then the contour and the convex hull are drawn
on this binary image of the hand. Later, the number of fingers is calculated and
shown in the text in the frame. The background color matches the skin color,
that’s why a white or any other background is used.
Fig. 8. Skin color detection, face elimination, thresholding, and finger detection.
Table 1 shows some of the hand gestures, keyboard actions, and their cor-
responding functions that were used to control some applications like vlc media
player, music player, etc. The Figs. (9, 10, 11, 12 and 13) show different ges-
tures controlling the vlc media player (volume up/down, forward/backward,
play/pause, etc.).
Table 2 shows some of the hand gestures, keyboard actions, and their corre-
sponding functions that were used to control applications like pdf reader, slide
404 S. Islam et al.
Fig. 9. Move forward when 2 fingers move Fig. 10. Increase the volume when 4
to the right. fingers move upward.
Fig. 11. Decrease the volume when 4 fingers Fig. 12. Mute when 3 fingers are
move downward. shown.
Fig. 13. Pause/play when 5 fingers are Fig. 14. Presentation slide change with
shown. a hand gesture (five finger).
presentation, photo viewer, etc. Figure 14 shows the slide change using hand
gesture in Microsoft PowerPoint. There are still many gestures left that can be
used to give more commands to these applications. As these applications take
some common keyboard instructions like page up, down, home, end, etc. that’s
why they don’t need separate systems to control them. This single system alone
can control all these applications that take commands from the keyboard.
Table 3 shows the recognition rate of each gesture. All the gestures are rec-
ognized very fast and accurately. The overall accuracy is 97.8% in good lighting
conditions. The lighting environment affects the overall performance badly, as it
affects the outcome of skin detection. The error that also arises in low lighting
conditions is that the device is unable to identify the color of the skin because
the pixel color is too dark. The whole surface of the hand skin is well defined in
bright lighting conditions.
Table 2. Hand gestures used to control pdf reader/slide presentation/photo viewer

etc.
No. of Hand Keyboard Function

fingers movement action
1 Up Ctrl + Plus Zoom in
Down Ctrl + Minus Zoom out
2 Left Home/Left Previous
Right End/Right Next
3 Any M Mute
4 Up Page Up Scroll up
Down Page Down Scroll down
5 Any Space Next
Table 3. Hand gestures recognition rate.
No. of Direction of No. of Recognition rate

fingers hand testing
1 Up/Down 80 99%
Left/Right 90
2 Up/Down 85 99%
Left/Right 88
3 Up/Down 90 95%
Left/Right 70
4 Up/Down 80 98%
Left/Right 110
5 Up/Down 85 98%
Left/Right 90
5 Conclusions and Future Work

From the successful experiment, it can be asserted that the proposed system can
recognize the gestures properly in real-time. Also, it can control multiple appli-
cations like VLC media player, power-point presentation, pdf reader, chrome
browser etc. with the classified gestures. Then they can open a video or book on
an app and control it with hand gestures, as seen in the performance of the hand
gesture recognition system. In future, instead of skin color-based segmentation
CNN can be used which will reduce the problem with low lighting and other skin
colored objects appearing in front of the camera. With some modifications, this
system can be useful for training the humanized social robots or for touch-less
interaction between humans and computer.
406 S. Islam et al.
References
1. Shukla, J., Dwivedi, A.: A method for hand gesture recognition. In: 2014 Fourth
International Conference on Communication Systems and Network Technologies,
pp. 919–923 (2014)
2. Holte, M.B., Tran, C., Trivedi, M.M., Moeslund, T.B.: Human pose estimation
and activity recognition from multi-view videos: comparative explorations of recent
developments. IEEE J. Sel. Topics Signal Process. 6(5), 538–552 (2012)
3. Veeriah, J.V., Swaminathan, P.: Robust hand gesture recognition algorithm for
simple mouse control. Int. J. Comput. Commun. Eng. 2(2), 219–221 (2013)
4. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human
computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2012). https://doi.
org/10.1007/s10462-012-9356-9
5. Paliwal, M., Sharma, G., Nath, D., Rathore, A., Mishra, H., Mondal, S.: A dynamic
hand gesture recognition system for controlling VLC media player. In: 2013 Inter-
national Conference on Advances in Technology and Engineering (ICATE), pp.
1–4. IEEE (2013)
6. Jingbiao, L., Huan, X., Zhu, L., Qinghua, S.: Dynamic gesture recognition algo-
rithm in human computer interaction. In: 2015 IEEE 16th International Conference
on Communication Technology (ICCT), pp. 425–428. IEEE (2015)
7. Ganapathyraju, S.: Hand gesture recognition using convexity hull defects to con-
trol an industrial robot. In: 013 3rd International Conference on Instrumentation
Control and Automation (ICA), pp. 63–67. IEEE (2013)
8. Mesbahi, S.C., Mahraz, M.A., Riffi, J., Tairi, H.: Hand gesture recognition based on
convexity approach and background subtraction. In: 2018 International Conference
on Intelligent Systems and Computer Vision (ISCV), pp. 1–5. IEEE (2018)
9. Haria, A., Subramanian, A., Asokkumar, N., Poddar, S., Nayak, J.S.: Hand gesture
recognition for human computer interaction. Procedia Comput. Sci. 115, 367–374
(2017)
10. Cuimei, L., Zhiliang, Q., Nan, J., Jianhua, W.: Human face detection algorithm
via HAAR cascade classifier combined with three additional classifiers. In: 2017
13th IEEE International Conference on Electronic Measurement and Instruments
(ICEMI), pp. 483–487. IEEE (2017)
Towards Energy Savings in Cluster-Based
Routing for Wireless Sensor Networks
Enaam A. Al-Hussain(&) and Ghaida A. Al-Suhail

enaam.mansor@uobasrah.edu.iq
Abstract. Wireless Sensor Networks (WSNs) are mainly composed of a

number of Sensor Nodes (SNs) that gather data from their physical surroundings
and transmit it to the Base Station (BS). These sensors, however, have several
limitations, including limited memory, limited computational capability, rela-
tively limited processing capacity, and most crucially limited battery power.
Upon these restricted resources, clustering techniques are mainly utilized to
reduce the energy consumption of WSNs and consequently enhance their per-
formance. The Low Energy Adaptive Clustering Hierarchy (LEACH) protocol
serves as a good benchmark for clustering techniques in WSNs.
Despite LEACH retains energy from sensor nodes, its energy efficiency is still
considerably compromised due to unpredictable and faster power draining.
Therefore, the goal of this paper focuses on how the LEACH protocol may be
used effectively in the field of environmental monitoring systems to address
issues about energy consumption, efficiency, stability, and throughput in a
realistic simulation environment. The realistic performance analysis and
parameter tuning were carried out utilizing the OMNET++/Castalia Simulator to
serve as a baseline for future developments.
Keywords: WSNs LEACH Clustering Energy efficiency OMNET

Castalia
1 Introduction
Recently, Wireless sensor networks (WSNs) have been regarded as a significant

research area due to their critical involvement in a variety of applications. Wireless
sensor nodes collect data, analyze it for optimization, and then send it to the sink via a
network of intermediary nodes. The network of these nodes as a whole constitutes the
wireless sensor network, which is capable of organizing data and transmitting it to the
requester (sink) [1]. Meanwhile, energy efficiency is still a critical problem in the
design of WSN’s routing protocol according to resource constraints and the non-
rechargeability of resources for sensor nodes [2, 3].
Notably, clustering approach is widely used approach for managing the topology of
WSNs, since it may significantly enhance the network’s performance. It can make
nodes in groups according to predefined criteria such as ensuring QoS, optimizing
resource requirements, and balancing network load. A leader node which manages each
cluster is called Cluster Head (CH). This node is responsible for data collection from

https://doi.org/10.1007/978-3-030-93247-3_40
cluster members (CMs) and transmitting it to the Base Station. Clustering techniques
eliminate the need for resource-constrained nodes to transfer data directly to gateways
(sinks), which results in energy depletion, inefficient resource utilization, and
interference.
Numerous studies on energy efficiency and data collection for cluster-based routing
algorithms have been conducted [4–7]. The most of these strategies consist of two
phases: (i) Setup phase and (ii) Steady-State phase. The first phase involves the selection
and formation of CHs, as well as the assignment of a TDMA schedule to member nodes
by the CH [8]. Meanwhile, the former phase is responsible for transmitting the identi-
fiable data to their CHs via a specified TDMA slot allocated by the setup phase’s CH.
Then, the CHs collect the data from CMs and transfer it to the Base Station.
Several LEACH, PEGASIS, TEEN, APTEEN, and HEED protocols [9–12] are
devoted as the primary hierarchical routing protocols in WSN. Each has numerous
variants that are adapted to certain applications.
Typically, the Sensor Nodes (SNs) consume a great deal of energy during data
transmission rather than data processing. As a result, it is critical to minimize redundant
sensed data transmission to the BS through the efficient deployment of Cluster Heads
(CHs) in a network. Hence, it is important to evaluate the routing protocol in major
aspects and scenarios to guarantee the real-world design of WSNs and ensure optimal
environment simulation for further improvement utilizing a variety of optimization
methods.
In this paper, the LEACH protocol is evaluated as a good benchmark for a single-
hop clustering algorithms. Numerous scenarios are presented to evaluate the overall
energy efficiency and throughput. Moreover, in order to find the typical values for each
scenario, several parameters are considered, including the optimal CHs percentage,
packets received by the Sink (BS) located in various locations under various node
density and data rates. Extensive simulation demonstrates that once the node density of
the same area size increases, the network’s energy consumption decreases, resulting in
extending the network lifetime of a WSN. Additionally, it is observed that when the
CH percentage is optimal, the energy consumption of a network is minimal. However,
when the CH percentage of a network exceeds an optimal value, energy consumption
increases, significantly reducing the network’s lifetime.
The rest of this paper will be structured as follows. Firstly, the literature review is
addressed in Sect. 2. In Sect. 3 the LEACH protocol is described in detail. Meanwhile,
in Sect. 4 the network model is discussed. Section 5 displays and discusses the sim-
ulation results. Finally, in Sect. 6, the conclusion has been drawn.
2 Related Works
“The Low Energy Adaptive Clustering Hierarchy (LEACH) protocol [13] is one of the
most well-known protocols. It makes use of energy consumption by employing adaptive
clustering via its advantage as a good benchmark for clustering routing protocols in
WSNs and MANETs. Within LEACH, the nodes in the network field are clustered and
established. Each cluster has a single leader node identified as the cluster head (CH), and
this node is selected at random manner. Moreover, while the LEACH protocol retains
Towards Energy Savings in Cluster-Based Routing for WSNs 409
energy from sensor nodes, its energy efficiency is likely impacted by random and fast
energy dissipation, which is increased by the cluster’s unequal distribution of nodes and
the time restriction imposed by the TDMA MAC Protocol [13–15].
In LEACH protocol, the CHs are randomly assigned to operate as relay nodes for
data transmission; afterward, the cluster heads shift roles with regular nodes to spend a
uniform amount of energy in all nodes. The suggested hybrid approach extends the
lifetime of nodes while decreasing the energy consumption of the transmission.
Numerous research have recently examined the routing and energy consumption
challenges related to LEACH protocol by modifying the mathematics models to
increase overall performance using a variety of efficient ways [16, 17]. Meanwhile,
intelligent algorithms [18–22] are also used as a viable strategy for lowering the energy
consumption of WSNs and extending the network’s lifetime. Furthermore, other
researchers have stressed the critical role of Fuzzy Logic System (FLS) in the decision-
making process for CH efficiency in WSNs [23]. All these studies emphasize on the
predefined protocol with specific parameters that affect the efficiency of the optimized
LEACH protocol’s routing. Such parameters include the sensor node’s life time, the
total number of packets received, the latency of the transmission, and the scalability of
the number of sensor nodes.
Nevertheless, most works evaluated their proposed protocols in a virtual environ-
ment without examining the effect of the original protocol’s parameters on the net-
work’s efficiency. Thus it is critical to evaluate the routing protocol in major aspects
and scenarios using realistic simulation environments such as Castalia and OMNET++
Simulator. This technique ensures that WSNs are designed in the actual world envi-
ronment and provides a realistic implementation for further development of the
LEACH protocol and its versions (LEACH-C, M-LEACH,…etc.) using various opti-
mization techniques.
3 Low Energy Adaptive Clustering Hierarchy Protocol
LEACH is a pioneering WSN clustering routing protocol. LEACH Protocol’s major

purpose is to enhance energy efficiency by random CH selection. LEACH is operated
in rounds that consist of two phases: Set-Up Phase and Steady-State Phase. Clusters are
constructed and a cluster head (CH) is elected for each cluster during the setup phase.
Meanwhile, during the steady phase, the data is detected, aggregated, compressed, and
transmitted to the base station.
i. Set-Up Phase: The Set-Up step involves the selection and construction of CHs, as
well as the assignment of a TDMA schedule to member nodes.
1. Cluster Head Selection: Each node assists in the process of CH selection by

randomly creating a value between (0 and 1). If the random number generated
by the SN is smaller than the threshold value T (n), the node becomes CH, else
it considers as CM and waits for ADV messages to join the nearby CH.
Equation 1 is used to find the value of T (n).
P
if n 2 G
TðnÞ ¼ 1Pðr mod 1=PÞ ð1Þ
0 Otherwise
Where: P is the percentage of the CHs, which is used at the beginning of each
round (starting at time t), such that expected the number of CHs nodes for this
round is K.
P ¼ K=N ð2Þ
2. Cluster Formation: Once the CHs are elected, they broadcast ADV messages
to the rest of the sensors using CSMA MAC protocol. Non-CHs must maintain
their receivers throughout the Set-Up phase to hear all CHs’ ADV messages.
After this phase is complete, each sensor determines which cluster it belongs to
based on the RSSI value. Meanwhile, each sensor node (SN) transmits JOIN-
REQ messages to its corresponding CH using CSMA.
3. Schedule Creation: Each CH node generates a TDMA schedule based on the
number of JOINT-REQ messages received. The schedule is broadcast back to
the cluster’s nodes to inform them when they can transmit.
ii. Steady-State Phase: The steady-state or transmission phase is where environ-
mental reports are communicated from the network field. During this phase, each
sensor node transmits its data to the CH during its assigned time slot (intra-cluster
communication), meanwhile, each CH aggregated the data from the corresponding
CMs and sent it to the BS (inter-cluster communication).
The key advantages and limitations of the LEACH protocol can be summarized as
follow (Table 1):
4 Network Model
The following criteria are considered when describing the network model based on the
proposed protocol:
1. Sensor Nodes are uniformly distributed across a M M interesting area, and
throughout the process, all nodes and the BS remain stationary (non-mobile).
2. Each sensor node is capable of sensing, aggregating, and transmitting data to and
from the base station (BS) and other sensors (i.e., acts as a sink node).
3. The network’s nodes are non rechargeable and have homogeneous initial energy.
4. To ensure optimal performance, the Sink Node (BS) is positioned in the network
field’s center. Quite frequently, the assumption is made that the communication
links between the nodes are symmetrical. As a result, when it comes to packet
transmission, any two nodes’ data rate and energy consumption are symmetrical.
5. The nodes operate in power control mode, with the output power determined by the
receiving distance between them.
Table 1. Advantages and Limitations of LEACH protocol.

Advantages Limitations
▪ The clustering technique used by the ▪ Expansion of the network may result in a
LEACH protocol results in decreased trade-off between the energy distances of a
communication between the sensor network CH and a BS
and the BS, extending the network’s lifetime
▪ CH utilizes a data aggregation technique to ▪ Due to the random number principle, nodes
reduce associated data on a local level, do not resurrect to become CHs, which
resulting in a significant reduction in energy further reduces their energy efficiency
consumption
▪ Each sensor node has a reasonable chance ▪ No consideration is made of heterogeneity
of becoming the CH and subsequently a in terms of energy computational capabilities
member node. This maximizes the lifetime of and link reliability
the network
▪ By utilizing TDMA Scheduling, intra- ▪ The TDMA approach imposes constraints
cluster collisions are avoided, extending the on each frame’s time slot
battery life of sensor nodes
5 Simulation Results and Performance Analysis
This section discusses the LEACH’s performance evaluation. The LEACH protocol is
examined when a network of 100 sensor nodes is uniformly distributed over a 100 x
100 m2 area. The BS is positioned in the sensor field’s center. All nodes should have
initial energy of 3 J. Moreover, we used around the time of 20 s in our scenarios with a
maximum simulation time equal to 300 s. The size of all data messages is the same and
the slot time is utilized to 0.5 in all simulation situations. The total overview of
simulation parameters is shown in Table 2.
Table 2. Simulation parameters.

Parameters Value Parameters Value
Network size 100 100 m2 Initial energy 3J
No. of nodes 100 Simulation time 300 s
No. of clusters 5 Round time 20 s
Location of BS 50 50 m
Node distribution Uniform Packet header size 25 Bytes
BS mobility Off Data packet size 2000 Bytes
Energy model Battery Bandwidth 1 Mbps
Application ID Throughput test
5.1 Performance Evaluation of LEACH Protocol

In this section, numerous factors are considered when evaluating Low Energy Adaptive
Clustering, including the number of nodes, the CH percentage, and the area size.
The LEACH protocol’s performance is quantified in term of the total energy consumed
by sensor nodes during each round for data processing and communication. Also,
reliability is another metric evaluated by the total number of received data packets.
Experimental Case I
Figures 1 and 2 depict the effect of node density (number of nodes per m2) and area size
on energy consumption. Where (50, 100, 200) sensor nodes are uniformly distributed
across 100 100 m2 and 200 200 m2 areas, respectively. Each node has initial
energy of 3J, with a CH percentage of 5%. If the CH percentage remains constant but the
network’s node density increases, this results in an increase in the number of CHs in the
network proportional to the network’s node density. The energy consumption of nodes
is minimal at CH = 5% of 100 100 m2 area networks with 100 nodes (5 CHs
selected), and minimal at 200 nodes (10 CHs selected) of 200 200 m2 network. This
is because as the coverage area increases, the node consumes more energy transmitting
the sensed information to the sink with the fewest CHs possible.
Fig. 1. Total energy consumption Fig. 2. Total energy consumption
In Fig. 1, it is shown that when the CH percentage is optimal, the energy con-
sumption of a network becomes minimal. However, when the CH percentage of a
network exceeds an optimal value, energy consumption increases, significantly
reducing the network’s lifetime. So that it’s important to choose the optimal value of
the CHs percentage to avoid extra power consumption from the sensor nodes.
Experimental Case II
Figures 3 (a–d) illustrate the effect of node density (number of sensor nodes per m2),
area size, and packet rates expressed as a percentage of CHs on the total number of
packets received at the sink. The network is configured as in Table 3:
Table 3. Network Configuration.

2
Area (m ) Node density No. of nodes CH percentage Packet rate
100 100 0.002 20 5%, 8%, 10% 1, 3
0.006 60
0.01 100
200 200 0.002 80 5%, 8%, 10% 1, 3
0.006 240
0.01 400
(a) packet rate = 1 packet/sec/node. (b) packet rate = 3 packet/sec/node.
(c) packet rate = 1 packet/sec/node. (d) packet rate = 3 packet/sec/node.
Fig. 3. (a–d): The effect of node density, area size, and the packet rates with CHs percentage on
the total number of packets received at the sink
In Figs. 3 (a–d), the obtained results illustrate that increasing the packet rate results
in a decrease in the network’s packet reception rate, this occurs due to increased CH
congestion. Increased packet rate enables source sensor nodes to relay the sensed data
more quickly to their CHs during their assigned time slot. CH is now receiving more
packets from its associated sensor nodes than it is broadcasting to a sink as a result of
this increase in the packet rate. In effect, the Congestion arises in the WSN as a result of
this condition. Thereby, the sensor buffer begins to overflow, increasing packet loss and
lowering the rate at which packets are received in the WSN.
6 Conclusions and Discussion
The Low Energy Adaptive Clustering Hierarchy (LEACH) is evaluated with many
considerations, including node density, CH percentage, packet rates, and Area size.
As seen from the findings, the CH percentage remains constant but the network’s
node density increases. This results in an increase in the number of CHs in the network
proportional to the network’s node number. Moreover, when the CH percentage is
optimal, the energy consumption of a network is minimal. However, when the CH
percentage of a network exceeds an optimal value, energy consumption increases,
significantly reducing the network’s lifetime. So that it’s important to choose the
optimal value of the CHs percentage to avoid extra power consumption from the sensor
nodes. The energy consumption of nodes is minimal at CH = 5% of 100 100 m2
area networks with 100 nodes (5 CHs selected), and minimal at 200 nodes (10 CHs
selected) of 200 200 m2 network. This is because as the coverage area increases, the
node consumes more energy transmitting the sensed information to the sink with the
fewest CHs possible. As the number of CHs increases, the amount of energy consumed
is reduced proportionately.
In addition, the obtained results also illustrate that increasing the packet rate can
cause in a decrease in the network’s packet reception rate due to the increase in CH
congestion. Note that once packet rate is increased this would enable source sensor
nodes relay the sensed data more quickly to their CHs during their assigned time slot.
CH is now receiving more packets from its associated sensor nodes than it is broad-
casting to a sink, then this may increase the packet rate. As a result, Congestion arises
in the WSN and the sensor buffer begins to overflow. This means that packet loss
becomes high and a significant reduction happens in the resultant packet rate during
packets delivery in the WSN.
For future work, fuzzy logic systems and intelligent algorithms such as FPA, GWO,
ACO, and ABC algorithms can be utilized to improve the routing strategy in the
LEACH protocol. Additionally, multi-hop routing techniques can be also considered
for optimal monitoring system design.
References
1. Priyadarshi, R., Gupta, B., Anurag, A.: Deployment techniques in wireless sensor networks:
a survey, classification, challenges, and future research issues. J. Supercomput. 76(9), 7333–
7373 (2020). https://doi.org/10.1007/s11227-020-03166-5
2. Banđur, Đ, Jakšić, B., Banđur, M., Jović, S.: An analysis of energy efficiency in Wireless
Sensor Networks (WSNs) applied in smart agriculture. Comput. Electron. Agric. 156, 500–
507 (2019)
3. Kalidoss, T., Rajasekaran, L., Kanagasabai, K., Sannasi, G., Kannan, A.: QoS aware trust
based routing algorithm for wireless sensor networks. Wireless Pers. Commun. 110(4),
1637–1658 (2019). https://doi.org/10.1007/s11277-019-06788-y
4. Ketshabetswe, L.K., Zungeru, A.M., Mangwala, M., Chuma, J.M., Sigweni, B.: Heliyon 5,
e01591 (2019)
5. Mann, P.S., Singh, S.: Energy-efficient hierarchical routing for wireless sensor networks: a
swarm intelligence approach. Wireless Pers. Commun. 92(2), 785–805 (2016). https://doi.
org/10.1007/s11277-016-3577-1
6. Fanian, F., Rafsanjani, M.K.: Cluster-based routing protocols in wireless sensor networks: a
survey based on methodology. J. Netw. Comput. Appl. 142, 111–142 (2019)
7. Singh, H., Bala, M., Bamber, S.S.: Taxonomy of routing protocols in wireless sensor
networks: a survey. Int. J. Emerg. Technol. 11, 63–83 (2020)
8. Rostami, A.S., Badkoobe, M., Mohanna, F., Keshavarz, H., Hosseinabadi, A.A.R.,
Sangaiah, A.K.: Survey on clustering in heterogeneous and homogeneous wireless sensor
networks. J. Supercomput. 74, 277–323 (2018)
9. Al-Shaikh, A., Khattab, H., Al-Sharaeh, S.: Performance comparison of LEACH and
LEACH-C protocols in wireless sensor networks. J. ICT Res. Appl. 12, 219–236 (2018)
10. Khedr, A.M., Aziz, A., Osamy, W.: Successors of PEGASIS protocol: a comprehensive
survey. Comput. Sci. Rev. 39, 100368 (2021)
11. Asqui, O.P., Marrone, L.A., Chaw, E.E.: Evaluation of TEEN and APTEEN hybrid routing
protocols for wireless sensor network using NS-3. In: Rocha, Á., Ferrás, C., Montenegro
Marin, C.E., Medina García, V.H. (eds.) ICITS 2020. AISC, vol. 1137, pp. 589–598.
12. Ullah, Z.: A survey on Hybrid, Energy Efficient and Distributed (HEED) based energy
efficient clustering protocols for wireless sensor networks. Wirel. Pers. Commun. 112(4),
2685–2713 (2020). https://doi.org/10.1007/s11277-020-07170-z
13. Kwon, O.S., Jung, K.D., Lee, J.Y.: WSN protocol based on leach protocol using fuzzy. Int.
J. Appl. Eng. Res. 12, 10013–10018 (2017)
14. Lee, J.S., Teng, C.L.: An enhanced hierarchical clustering approach for mobile sensor
networks using fuzzy inference systems. IEEE Internet Things J. 4, 1095–1103 (2017)
15. Amutha, J., Sharma, S., Sharma, S.K.: Strategies based on various aspects of clustering in
wireless sensor networks using classical, optimization and machine learning techniques:
Review, taxonomy, research findings, challenges and future directions. Comput. Sci. Rev.
40, 100376 (2021)
16. Basavaraj, G.N., Jaidhar, C.D.: H-LEACH protocol with modified cluster head selection for
WSN. In: International Conference on Smart Technologies for Smart Nation (SmartTech-
Con), pp. 30–33. IEEE (2017)
17. Cui, Z., Cao, Y., Cai, X., Cai, J., Chen, J.: Optimal LEACH protocol with modified bat
algorithm for big data sensing systems in Internet of Things. J. Parallel Distrib. Comput. 132,
217–229 (2019)
18. Devika, G., Ramesh, D., Karegowda, A.G.: Swarm intelligence-based energy‐efficient
clustering algorithms for WSN: overview of algorithms, Analysis, and Applications. In:
Swarm Intelligence Optimization, pp. 207–261 (2020)
19. Tamtalini, M.A., El Alaoui, A.E.B., El Fergougui, A.: ESLC-WSN: a novel energy efficient
security aware localization and clustering in wireless sensor networks. In: 1st International
Conference on Innovative Research in Applied Science, Engineering and Technology
(IRASET), pp. 1–6. IEEE (2020)
20. Sharma, N., Gupta, V.: Meta-heuristic based optimization of WSNs energy and lifetime-a
survey. In: 10th International Conference on Cloud Computing, Data Science & Engineering
(Confluence), pp. 369–374. IEEE (2020)
21. Yuvaraj, D., Sivaram, M., Mohamed Uvaze Ahamed, A., Nageswari, S.: An efficient Lion
optimization based cluster formation and energy management in WSN Based IoT. In:
Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 591–607.
22. Mitiku, T., Manshahia, M.S.: Fuzzy logic controller for modeling of wind energy harvesting
system for remote areas. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2019. AISC,
23. Al-Husain, E., Al-Suhail, G.: E-FLEACH: an improved fuzzy based clustering protocol for
wireless sensor network. Iraqi J. Electr. Electron. Eng. 17, 190–197 (2021)
Utilization of Self-organizing Maps for Map
Depiction of Multipath Clusters
Jonnel Alejandrino1(&), Emmanuel Trinidad2,

Ronnie Concepcion II3, Edwin Sybingco1, Maria Gemel Palconit1,
Lawrence Materum1, and Elmer Dadios3
1
Department of Electronics and Computer Engineering, De La Salle University,
2401 Taft Avenue, 1004 Manila, Philippines
{jonnel_alejandrino,edwin.sybingco,
maria_gemel_palconit,lawrence.materum}@dlsu.edu.ph
2
Department of Electronics Engineering, Don Honorio Ventura
State University, 2001 Bacolor, Philippines
ettrinidad@dhvsu.edu.ph
3
Department of Manufacturing Engineering and Management,
De La Salle University, 2401 Taft Avenue, 1004 Manila, Philippines
{ronnie.concepcion,elmer.dadios}@dlsu.edu.ph
Abstract. Clustering of multipath components (MPC) simplifies the analysis of

the wireless environment to produce the channel impulse response which leads
to an effective channel model. Automatic clustering of the MPC has been uti-
lized as a replacement to the traditional manual approach. The arbitrary nature of
MPC that interacts with the surrounding environment still challenges wireless
researchers to utilize algorithms that are fitted based on the measured data of the
channel. For enhancing the clustering process, visualization plays a considerable
part in inferring knowledge in the dataset and the clustering results. Hence, the
combination of the automatic and manual approach in clustering enhances the
process, leading to efficient and accurate extraction of the clusters using visu-
alization. Self-Organizing Map (SOM) has been proven helpful in aiding the
clustering and visualization in different fields process which can be combined to
form a hybrid system in clustering problems. In this paper, the investigation of
the effectiveness of SOM in visualizing the MPC extracted from the COST2100
channel model (C2CM) and visualize clustering tendencies of the dataset.
Keywords: Clustering Multipath components (MPC) Visualization Self-

organizing maps
1 Introduction
Wireless devices grow exponentially along with the demand for higher data rates,
reliability, and massive connections which formulates stringent standards to be met by
wireless designers [1]. These demands can be achieved by exploiting the spatial
domain of the wireless system by employing multiple-input multiple-output (MIMO)
antenna systems. In designing and simulating a radio system, channel models are used
to represents the environment to assess the effect of the multiple obstructions in the

https://doi.org/10.1007/978-3-030-93247-3_41
418 J. Alejandrino et al.
propagated signals. Attaining highly efficient wireless communication systems are

evaluated first by simulating the environment using channel models that lessen the need
to build the systems for testing. Cluster-based channel models have been proposed to
data analysis and low complexity by clustering the measured values to obtain the
channel impulse response (CIR). Clustering can be seen as an unsupervised class in the
field of machine learning (book clustering) due to the absence of target labels at the
output function. Clustering arranges the dataset based on the similarity of features.
Many methods have been used to cluster the wireless multipath clusters. The tech-
niques have been proposed based on algorithms that extract significant features of the
MPC.
Studies and measurements in channel modeling have concluded that MPC arrives
in clusters [2]. These phenomena are the basis of producing cluster-based channel
models. Many measurement campaigns have been proposed to extract the clustering
structure of MPC. The traditional approach uses a manual identification of clusters by
visual means [3]. The manual approach can be practical if small data are gathered.
However, in the case of multipaths, especially in urban environments, the manual
approach becomes tedious and hard to distinguish visually due to overlapped visual-
ization of data points. Another drawback of manual clustering is the subjective nature
of the approach, which may lead to different interpretations. The MIMO angular fea-
tures of each MPC results in high dimensional data, typically 5 to 7 dimensions needed
to evaluate each MPC. Czink et al. [4] proposed the framework in automatically
clustering the MPC using k-means that enhances the clustering process instead of the
manual approach.
The automatic framework yielded rich developments and investigations of different
clustering algorithms to use in clustering MPCs. However, clustering algorithms pose
problems such as the initialization process of determining the optimal number of
clusters. Furthermore, automatic clustering using algorithms lessens the subjective
nature but limits the physical interpretation of the clustered MPC data. In addition, the
human-in-the-loop process has been proposed in the literature [5, 6], where the human
intervention in different parts of the clustering process has been reported useful.
Interactivity makes the human domain knowledge be applied and effectively interpret
the clustering result or the data’s inherent structure. For inferring knowledge before-
hand, visualization is an efficient tool to represent and reveal hidden structures of data
[7]. In the past decade, relevant techniques have been proposed to overcome some
drawbacks of using SOM. In [8], the SOM is modified to consider the winning fre-
quency of each neuron in the map. The quantization and topographic error are used to
evaluate their proposed modified SOM. Furthermore, to reduce the randomization of
the initial weights, a proposed augmentation that considers the selection of weights is
studied in [9]. Also, the utilization of SOM to find the number of clusters in [10] has
been done. A protocol independent approach was proposed in [11] that utilized the
clustering of the SOM that reduces computational costs. Relative to the two-step
approach, the initial value of SOM was used as an input to the K-means clustering
algorithm to reduce the initial value problem of the latter algorithm [12]. The number of
clusters is usually predetermined in many clustering algorithms. With the proposed
approach in [13], the number of clusters was derived from the topology based on a
polar SOM.
Utilization of Self-organizing Maps for Map Depiction of Multipath 419
Lastly, an unsupervised image classification problem in [14] is evaluated using a

modified SOM with two layers that have been considered as Deep SOM with its
extension E-DSOM is shown to have comparable results with Auto Encoders. The
above-mentioned literature briefly shows the SOM to be of significant aid in different
processes of the unsupervised learning problems with different techniques that have
been proposed to aid the visualizations of datasets.
The rest of this paper is organized as follows. Section 2 briefly reviews related
works in the clustering process of MPCs and some techniques that aid the SOM.
Section 3 presents the dataset used and the procedure of the SOM. The results are
presented and analyzed in Sect. 4. Finally, the paper is concluded in Sect. 5.
2 Multipath Clustering
A model from wireless sensor networks based on ANN and SVM [2], many channel
models have been standardized and used in testing wireless systems. Cluster-based
channel models caught the attention of researchers, which shows different approaches
in clustering MPCs. A middle-ground approach is proposed in [15] to perform auto-
matic and manual clustering of the MPC. Through the use of MIMO, a hybrid data
acquisition model Alejandrino et al. [16] has been proposed, and the angular properties
of the MPC have also been extracted. Up to today, there are still no algorithms that
outperform the other due to the stochastic nature of the measured data that varies from
one environment to another. The clustering of the MPC still covers a challenge to
wireless system engineers. A proposed channel model in an indoor scenario with
additional two metrics for clustering, namely the MPC length and the arrival interval,
was done by [17]. If the introduced parameters are added in the clustering domain, the
algorithm will add additional dimensionality in the clustering process. The works in
[18] introduce a score- fusion technique using 5 CVI to obtain the number of optimal
clusters where they simulated an urban environment. The MPC data is fed into the K-
means algorithm. Comparison of clustering algorithms has been shown in [19], where
the K-Power means has been seen to have more accurate performance. A comparative
study of spectral and signal clustering with AI-based approach showed in [20], where
their proposed algorithm shows a significant increase in proportional to the increasing
number of clusters.
Several techniques have been developed to visualize data from the traditional
scatter plot to the sophisticated technique using Application based cluster and
connectivity-specific routing Protocol [21]. Aforementioned dimensionality reduction
techniques have been utilized to project the MPC and visualize it in a scatterplot
matrix. Reducing the dimension of high-dimensional data sets lowers the computa-
tional cost and provides a clearer picture through several visualizations. Different
algorithms have been a broad scope of research due to the vast amount of measurement
campaigns that differs from one environment to another. The visualization aspect of the
measured data and clustered results have been limited in the traditional 2D and 3D
scatterplot. Hence, this paper aims to apply the SOM in visualizing all the parameters in
a topology preserving map.
3 SOM and Dataset
This section defines the advantages of SOM in clustering applications. It also describes
the acquisition of the dataset. SOM is utilized in this proposed visualization because it
has a relative approach with multipath clustering when compared to artificial neural
network. SOM also has the potential to visualize the clustering and mapping of
topology-based components of multipath communication. Data are acquired through
series of model and manipulation of components. Topology was described by recon-
figuring the standard map through its indicated neurons. The accustomed model vectors
are developed, initialized, and shifted into the best neuron available from the given
input vector. The developed SOM serves as the acquiring map that captures and store
the dataset to complement the neighboring group of points. SOM algorithm descrip-
tion, cluster-based model used, and data acquisition model were elaborated below.
3.1 SOM
SOM has been widely used in different fields for clustering and exploratory data
analysis [23, 24]. The learning is identified as competitive and cooperative as opposed
to the error-correcting nature of the artificial neural networks. Connections of the
weights are based on the number of features n of the dataset vector, as shown in Fig. 1.
The SOM proposed by Tuevo Kohonen, also called the Kohonen network, is an
unsupervised competitive neural network that aims to represent high-dimensional
patterns into a lattice, topology usually in a rectangular or hexagonal structure. The
structure dictates the number of neighboring points in the neuron where four and six are
used for rectangular and hexagonal structures, respectively.
SOM can also be seen as a topology-based clustering method and provides a map-
ping that aids the visualization of cluster structures in the dataset [25]. One example is in
agricultural application of wireless connectivity [26]. The neighborhood function pre-
serves the topology, and the data points can be projected into 2D space, which can be
easily visualized. Two advantages of using SOM in clustering and visualization are; first,
the clustering structure of the data set can be visualized; second, is the visualization of the
distribution [25]. Essentially, the learning process of the SOM is as follows, the com-
petition for the best matching unit (BMU), cooperative, and the weight update.
The initialization of a map is to describe its topology where the number of neurons
is indicated. The model vectors are also initialized and updated and are moved towards
the BMU or winner neuron c from the input vector xi. The initialized SOM map can be
seen as a net that captures the structure of the dataset and organizing itself to match and
close the neighboring points. The SOM algorithm can be summarized as follows [24].
1. Initialize random values of the weights wi
2. Find the winning neuron c at time (t) by using Euclidean norm
c ¼ argminfxðtÞ mi ðtÞg
where x ¼ ½x1 ; . . .; xm 2 RM
Fig. 1. SOM achitecture [22]
3. The weights of the winner neuron and neighbors are updated using:
mi ðt þ 1Þ ¼ mi ðtÞ þ hci ðtÞ½xðtÞ mi ðtÞ
where t is the learning step and hci(t) is the neighborhood function

The evaluation of the performance of the map, the quantization error Qe, and
topographic error Te are used [8]. The neighborhood function can be utilized where the
Gaussian function is the most common using the unified distance matrix (U-matrix)
proposed by Ultsch [24] as a visualization technique to show the boundaries between
clusters of data of the SOM. This process is achieved by calculating the Euclidean
distance between neurons to reveal the local structures of the data. The visualization
uses color-coding schemes to show the distance between neurons. The hits or size
markers represent the distribution of each neuron.
3.2 COST2100 Channel Model

The cluster-based channel model has gained attention over the past decade. The
European Cooperation in Science and Technology (COST) has developed the C2CM
[21] which covers the parameters of MPC in the azimuth and elevation angles. In
addition, the stochastic nature of MIMO channels can be reproduced alongside the
multi-link and single link properties. The extracted MPC features are stacked as a
vector containing the parameters such as azimuth of arrival and departure, the elevation
of arrival and departure, delay, and power for each MPC. The vector can be represented
as x = [s h, AOA /, AOA h, AOD /, AOD]. Hence, the measurements can be stacked
into a matrix X corresponding to one snapshot of the measurements. The 5-dimensional
feature of the vector can be normalized and transformed, and fed to clustering algo-
rithms. The semi-urban non-line of Sight for a single link is used in this study due to
the huge amount of MPC produced in one snapshot. The extracted data consists of
1500 MPC, each with the corresponding features azimuth and elevation of arrival and
departure, the delay, and the relative power. However, the relative power is truncated as
a feature in the visualization process. The MPC also has their corresponding cluster-id
that serves as the ground truth for validation. The dataset of one snapshot consists of 20
clusters with corresponding distributions.
4 Visualization and Analysis
The SOM Toolbox is used in implementing the experiment used in this paper, as
suggested in [23]. The experiment is conducted as follows and depicted in Fig. 1. The
dataset served as input in MATLAB, followed by the initialization of the map structure.
The data set is projected first as the initialization alongside the U-matrix and the
corresponding parameters (Fig. 2).
Fig. 2. Procedure of the experiment
The first step is to initialize the dataset and the topology of the map. Random
initialization is projected in Fig. 3, where the scattered points show the untrained SOM.
In the proposed topology of the map, the authors used the number of MPC equals
the number of neurons in the grid to provide better resolution. Concurrently the training
process, the batch algorithm is preferred over sequential training for computational
efficiency. The rough tuning computes for the global structure of the map and then
proceeds with the fine-tuning for the neighborhood exploration in the map. Experi-
mentally, the iterations were modified from 30, 500, 1000, and 1500. The 1500 iter-
ation step has more errors than the 1000 iterations, resulting in overtraining the
map. The iteration with lower topographic and quantization errors, hence, the map with
1000 steps for both the tunings map is used. The quantization error Qe = 0.501 and
topographic error Te = 0.37 After the optimal reduced error is found, the U-matrix is
presented alongside the parameters in Fig. 4, where the features are also seen to be
organized according to their weights.
The U-matrix and hits are also computed and projected and shown in Fig. 5. Visual
inspection of the map shows the boundaries between clusters, where brighter colors
indicate a more considerable distance between neurons and darker colors closer to each
other. Figure 5 also shows a visualization of the distribution of data points in the nodes,
and it can be observed that the boundaries have empty hits colored with red. The larger
the indicator means more MPC in that node. In addition, the hits show agreement with
the number of clusters from the ground-truth data that consists of 20 clusters.
Fig. 3. Random initialization of the SOM
Fig. 4. SOM after rough and fine tuning
The use of SOM in visualizing data extracts all the features of each MPC since the
connection of the input vector to the nodes is dictated by the number of parameters.
However, the iterative process of reducing the errors and training steps can be reduced
by utilizing the variants of SOM techniques that are also considered for future
investigation.
Fig. 5. HITS projected in the U-matrix
5 Conclusion
Visualization of measured data has been widely used in revealing the cluster structure
of data. In this paper, the SOM is utilized to analyze its performance in visualizing the
clusters of MPC. The manual approach’s laborious process and subjective nature can
be overcome using the SOM visualization of the MPCs in which the U-matrix reveals
cluster boundaries effectively. As the wireless propagation environment becomes
complex, the need for such visualization assistance can be of great use to show cluster
structure for CIR extraction. By visualizing the data, knowledge can be inferred before
utilizing algorithms that can increase computational costs. In addition, visualization can
also evaluate clustering results, and the advantages of both the automatic and manual
approaches can be combined efficiently.
References
1. Series, M.: Minimum Requirements Related to Technical Performance for IMT-2020 Radio
Interface(s) Report 2410-0 (2017)
2. Alejandrino, J., Concepcion II, R., Lauguico, S., Palconit, M.G., Bandala, A., Dadios, E.:
Congestion detection in wireless sensor networks based on artificial neural network and
support vector machine. In: 12th International Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control, Environment, and Management
(HNICEM), pp. 1–6. IEEE (2020)
3. Oestges, C., Clerckx, B.: Modeling outdoor macrocellular clusters based on 1.9-GHz
experimental data. IEEE Trans. Vehicular Technol. 56(5), 2821–2830 (2007)
4. Czink, N., Cera, P., Salo, J., Bonek, E., Nuutinen, J., Ylitalo, J.: A framework for automatic
clustering of parametric MIMO channel data including path powers. In: Vehicular
Technology Conference, pp. 1–5. IEEE (2006)
5. Keim, D.A.: Information visualization and visual data mining. Trans. Visual. Comput.
Graph. 8(1), 1–8 (2002)
6. Concepcion, R., II., dela Cruz, C.J., Gamboa, A.K., Abdulkader, S.A., Teruel, S.I., Macaldo,
J.: Advancement in computer vision, artificial intelligence and wireless technology: a crop
phenotyping perspective. Int. J. Adv. Sci. Technol. 29(6), 7050–7065 (2020)
7. Chen, W., Guo, F., Wang, F.: A survey of traffic data visualization. Trans. Intell.
Transp. Syst. 16(6), 2970–2984 (2015)
8. Chaudhary, V., Ahlawat, A., Bhatia, R.S.: An efficient self-organizing map learning
algorithm with winning frequency of neurons for clustering application. In: 3rd International
Advance Computing Conference (IACC), pp. 672–067. IEEE (2013)
10. Mishra, M., Behera, H.: Kohonen self organizing map with modified K-means clustering for
high dimensional data set. Int. J. Appl. Inf. Syst. 2(3), 34–39 (2012)
11. Alejandrino, J., et al.: Protocol-independent data acquisition for precision farming. J. Adv.
Comput. Intell. Intell. Inf. 25(4), 397–403 (2021)
12. Wang, H., Yang, H., Xu, Z., Zheng, Y.: A clustering algorithm use SOM and K-means in
intrusion detection. In: International Conference on E-Business and E-Government,
pp. 1281–1284 (2010)
13. Xu, L., Chow, T., Ma, E.: Topology-based clustering using polar self-organizing map. Trans.
Neural Netw. Learn. Syst. 26(4), 798–808 (2015)
14. Wickramasinghe, C.S., Amarasinghe, K., Manic, M.: Deep self-organizing maps for
unsupervised image classification. IEEE Trans. Indust. Inf. 15(11), 5837–5845 (2019)
15. Materum, L., Takada, J., Ida, I., Oishi, Y.: Mobile station spatio-temporal multipath
clustering of an estimated wideband MIMO double-directional channel of a small urban 4.5
GHz microcell. EURASIP J. Wirel. Commun. Netw. 2009, 1–16 (2009)
16. Alejandrino, J., Concepcion, R., Almero, V.J., Palconit, M.G., Bandala, A., Dadios, E.: A
hybrid data acquisition model using artificial intelligence and IoT messaging protocol for
precision farming. In: 12th International Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control, Environment, and Management
(HNICEM), pp. 1–6. IEEE (2021)
17. Li, J., Ai, B., He, R., Yang, M., Zhong, Z., Hao, Y.: A cluster-based channel model for
massive MIMO communications in indoor hotspot scenarios. Trans. Wirel. Commun. 18(8),
3856–3870 (2019)
18. Moayyed, M.T., Antonescu, B., Basagni, S.: Clustering algorithms and validation indices for
mmWave radio multipath propagation. In: Wireless Telecommunications Symposium
(WTS), pp. 1–7. IEEE (2019)
19. Teologo, A.: Cluster-wise Jaccard accuracy of KPower means on multipath datasets. Int.
J. Emerg. Trends Eng. Res. 7, 203–208 (2019)
20. Ladrido, J.M., Alejandrino, J., Trinidad, E., Materum, L.: Comparative survey of signal
processing and artificial intelligence based channel equalization techniques and technologies.
Int. J. Emerg. Trends Eng. Res. 7(9), 31–322 (2019)
21. Alejandrino, J., Concepcion, R., Lauguico, S., Flores, R., Bandala, A., Dadios, E.:
Application-based cluster and connectivity-specific routing protocol for smart monitoring
system. In: 12th International Conference on Humanoid, Nanotechnology, Information
Technology, Communication and Control, Environment, and Management (HNICEM),
pp. 1–6. IEEE (2020)
22. Palamara, F., Piglione, F., Piccinin, N.: Self- organizing map and clustering algorithms for
the analysis of occupational accident databases. Saf. Sci. 49(8), 1215–1230 (2011)
23. Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013)
24. Krak, I., Barmak, O., Manziuk, E., Kulias, A.: Data classification based on the features
reduction and piecewise linear separation. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.)
ICO 2019. AISC, vol. 1072, pp. 282–289. Springer, Cham (2020). https://doi.org/10.1007/
978-3-030-33585-4_28
25. Shieh, S.-L., Liao, I.-E.: A new approach for data clustering and visualization using self-
organizing maps. Expert Syst. Appl. 39(15), 11924–11933 (2012)
26. Concepcion, R.S., II., et al.: Adaptive fertigation system using hybrid vision-based lettuce
phenotyping and fuzzy logic valve controller towards sustainable aquaponics. J. Adv.
Comput. Intell. Intell. Inf. 25(5), 610–617 (2021)
Big Data for Smart Cities and Smart Villages:
A Review
Tajnim Jahan, Sumayea Benta Hasan, Nuren Nafisa,

Afsana Akther Chowdhury, Raihan Uddin,
and Mohammad Shamsul Arefin(&)

sarefin@cuet.ac.bd
Abstract. An urgent want remains for cities to get smarter, as to handle huge-
scale urbanization and finding new methods to manipulate complexity, increase
efficiency and improve excellence of lifestyles. With the urbanization progress,
urban control is going through a chain of evocations within the new state of
affairs. Smart city which is a modern form of municipal construction, flourished
gradually in quick development scheme of a new intelligence technology. For
the construction and betterment of smart cities, big data technology serves
important support. Therefore, by reviewing forty papers, this research represents
the characteristics of smart city as well as villages, analyzes the solicitation of
big data technologies in smart city or smart village design and shows the
findings which could be used by the researchers to do further research.
Keywords: Big Data Internet of Things Smart city Smart village Smart
sensors Cloud computing Networks
1 Introduction
Due to current progression in Information and Communication Technology, the idea of

Smart City has grown to be a brilliant scope in improving the excellence of ordinary
urban lifestyles. Technology has been exploited in improving access for public
transport, traffic management, water optimization and power delivery, and enhancing
regulation inducement services, schools as well as hospitals etc. These terms generate a
large amount of data for analytical purposes. Different way of life has been provided by
cities and urban regions than rural areas. There are settled entities and identifying this
flourishing area of cross-disciplinary practices, and they mean to conventional, tech-
nical, financial and political factors which keep evolving, thus proposing opportunities
for certain refinement of the idea of smart city [1]. For sustainability, it is a heavy task
to conduct a study and a proposal in terms of extensive areas by city coverage, as well
as differentiations in society [7]. Smart objectives are being expanded to support and
qualify an enlarged range of solutions which is based on cellular architecture and
wireless sensor networks. Few examples of smart objectives in our regular life are
smart phones, watches, tablets, clinical devices, smart TVs and cars, security networks,
building automation sensors and access control systems [10].

https://doi.org/10.1007/978-3-030-93247-3_42
428 T. Jahan et al.
In Smart Villages, consecutive and updated networks as well as services are

improved as being a result of upgraded telecommunication technologies, novelties and
the well use of knowledge for the advantages of inhabitants and businesses. One of the
most important findings of countryside’s armature is noted in the scope of dynamism.
Another one is apparent in digital architecture and skills [5]. For building smart city or
village some components are needed. Figure named Fig. 1 shows the basic components
of smart cities.
While moving to big data, it is an enormous data analysis, which refers to an
amount of data so immense that processing applications of traditional data are not
capable to capture processes and present the outcomes in a suitable period [7]. Big data
encompasses a broad variety of data types with inclusion of structured, unstructured
and semi structured data.
Using big data in terms of smart cities and villages is really a very challenging task
to complete because planning a smart city is a balancing act. Some of the challenges
include data mobility, data security, data integrity, volume of data, cost, data validity
etc. It is difficult to define as there are so many consideration parts and components.
Fig. 1. Components of Smart City
Basically, Big Data Technologies is the improved software that incorporates data
sharing, data mining, data storage and data visualization. The extensive term strains
data and data framework and includes tools as well as techniques used to inquire into
and transform the data.
‘Ahab’ which is a distributed and cloud-based stream processing architecture has
been proposed in [11], which offer a consolidate way for the operators to understand
preferably and enhance the managed infrastructure in response to data from the
underlying architecture resources, i.e., connection of IoT devices, runtime edge con-
figuration, as well as the implementation environment of overall application. In [22],
time-series analysis has been used in urbane studies, by the mean time considering
Big Data for Smart Cities and Smart Villages: A Review 429
diverse environmental impacts. Analysis of the time series helps in resolving the past,
which comes proficiently to warn the future. In [39] NoSQL database has been used
that incorporates a spacious range of distinct database technologies. These are devel-
oping for designing modern applications. A non-relational or non SQL database that
delivers a process for procurement and retrieval of data has been depicted. A multi-
agent method was developed using data mining by the authors for addressing the tasks
of gathering and processing sensor data [19]. Different technologies for different pur-
poses in big data platforms are used in our reviewed papers such as API, Navitia.io,
CitySDK, SPARQL Query, R Programming, Predictive Analytics, Blockchain, etc.
This paper reviews forty papers on sustainable smart city and smart villages that
include the applications of massive information. In this paper we will discuss about the
methodology of preparing a review paper, then will discuss the details of our collected
paper and then will represent the overview on findings of these papers.
2 Methodology
A methodical review refers to a survey of the proof on an obviously figured inquiry that
utilizes deliberate and express techniques to distinguish, choose and fundamentally
assess important essential exploration. It may be a review of details of previous
research. This systematic review as follows gives guidelines to conduct the survey and
take out the findings in a mannered way.
2.1 Phase 1- Planning

This section includes the process of how the relevant papers were selected (which
database, which search terms, which inclusion/exclusion criteria).The writing consid-
eration was sourced from prestigious sources such as Springer, ACM, MDPI, Elsevier,
IEEE Access, Wiley, Taylor & Francis. The following search terms are used that were
included in this audit: “big data in smart city”, “big data in urban area”, “big data in
sustainable smart city”, “big data analysis in smart cities”, “big data framework for
smart community”, “smart cities and smart villages research”.
2.2 Phase 2- Conducting

This section focuses on how the papers were checked. The papers were strictly observed
for their reliability and validness to take as final sample papers to review on them. In
order to meet with the aim of the research, the chosen papers were carefully considered.
2.3 Phase 3- Reporting

After the strict observation 40 relevant research papers were chosen for review. The
research papers are categorized in five types to evaluate in a mannered way such as
concept based, framework based, web analysis based, data analysis based and tech-
nology based research. The categorized evaluation identifies the contributions, process
of work and flaws in research.
430 T. Jahan et al.
3 Paper Collection
This section is to discuss the research paper collections for our research work. There are
lots of paper published in terms of big data in smart cities and villages, but our search
could find the available researches started from the year 2014. Hence, we do start
reviewing on those papers which were found in the year 2014. And there are poor
availability of papers in the year 2014, 2015 or 2016 individually. So we need to merge
these years. Finally we gathered forty papers in terms of five categories mentioned in
Sect. 2.3, with the time scale of 2014–2016, 2017, 2018 and 2019–2020. Table 1
shows the considered papers of those years at a glance.
Table 1. Evolutions of big data for smart cities and smart villages
2014–2016 2017 2018 2019–2020
Conceptual Vision and paradigms Sustainability Research in Urban Health [15]
research [13] Union [16] Europe [8]
Transformations of Digital economy Systematic Big Rhetoric in
studies [32] [35] review [28] Toronto’s [36]
Synergistic Case Study
Approach [24] Shanghai [14]
Monitoring Development [5]
system [22]
Framework Cloud-based [11] Knowledge based IoT data Smart road
based research [18] analytics [2] monitoring [19]
Reasoning from IoT for sustain- Decision API deployment [37]
attractors [1] ability [21] management [17]
Analytics Data mining [26]
framework [25]
Smart Urban Redefining Smart
Planning [27] City [23]
IoT and Big Data
[40]
Data analysis SmartSantandertestbed Integrating multi- Data quality Multimedia data [30]
based research [10] source [31] perspective [29]
Transit-
oriented development
[34]
Web analysis Big Data platform [39] Enhancing
based research pedestrian mobility
[38]
Technology Connected Processing and AI and
based research communities [4] analysis [12] Big Data [9]
Data analysis [3] Healthcare data University Campus
processing [20] [7]
Multisource Cities and Villages
geospatial [33] Research [6]
4 Detailed Review of Papers
This section focuses on the contribution, dataset, implementation and evaluation of

each paper. Table 2 represents the conceptual research where authors have presented
their thoughts on smart city using big data.
Table 2. Conceptual research

Paper title Contribution Dataset Evaluation
Vision and Integrated IoT DBpedia dataset Development of novel
paradigms [13] ecosystems vision services
Transformations Transformation OSM dataset, POI data, Transformations of urban
of studies [32] on urban studiesGPS records, travel studies in China
survey
Research in Conceptual Smart cities and villages Research has been
Europe [8] boundaries research promoted
Urban Health Advances beyond Temperature, humidity, Sharpen the efficiency and
[15] infrastructure wind speed and direction, livability
rainfall, health
Sustainability Acknowledge the Examines different Sustainability in
Union [16] obstacles definitions technological endeavors
Systematic Data harvesting Historical, survey and Data produced, stored,
Review [28] and mining local demographics mined, and visualized
processes
Synergistic Intelligence to a World Population growth Improve data gathering
Approach [24] city UPTS is between 1950–2100 related to the UPTS
given
Monitoring Integrated Temperature, relative Demonstrate effects on
System [22] Environmental humidity, CO, SO2, UV environment
Monitoring index, and noise
System
Big Rhetoric in Benjamin Google LLC, personal, Stephen Graham’s
Toronto’s [36] Bratton’s Stack governance, phenomenon of ubiquitous
Theory environmental computerized matrix
Digital economy Applications in Concepts of 13 papers Papers concept
[35] digital economy
Case Study Overcome the 71 green parks locations In-depth experimental
Shanghai [14] issue of Check-in analysis
distribution
Development [5] Accelerated the Energy, mobility, waste Deconstructed popular
community features
In [14], for check-in habits which practiced intensity maps and norms from LBSN
data, authors reached an in-depth experimental analysis (seven districts of Shanghai).
After processing the collected data of Weibo, 71 green parks locations had been
chosen. Author’s focal point was overcoming Check-in distribution of visitants in
various green spaces.
432 T. Jahan et al.
Authors of [5] inspected famous features of smart village as well as cities, sus-
tainability and community. Then in accelerating the value of CCD of applied (devel-
opment) projects, products and services, authors brought them back together. For the
development, new community-centered method is suggested in order to highlight that
only by technological solutions, sustainable living cannot be achieved.
The authors considered data of Sidewalk labs which is an alternate of Google LLC
and took personal, governance, environmental data and introduced Benjamin Bratton’s
Stack Theory as an approach for conceptualizing the logics of smart cities specifically
and more generally the digital capitalism [36].
This research [15] advances the need to enlarge the thought of Big Data beyond the
architecture for comprising that of urbane health thus, serving a more compatible set of
data that may guide to knowledge, as to the connectivity of community with the city
and how this concerns to the thematic of urban health.
In [22] the researchers gave an prove-based case study in demonstrating the effects
of few factors by including- data calibration directed outdoors is not highly rigorous,
cost of the approach is appropriate for collecting onsite data but may be it requires cost
diminishing for mass creation for a scale-up project and lastly, the battery life of the
device generally bided for 6 to 8 h.
Table 3 represents the framework based research. Where in [19] a multiple agent
method was established by the writers to locate the works of collecting then operating
sensor data and offered a convergent model for the process of big sensor data. By using
fog, mobile computing technologies and cloud includes three layers.
The authors developed structure in [37] that ensures interoperability of historical
energy data, real-time as well as online data for managing district energy in providing
energy information to facilitate presumption operations. Authors presented a multi-tier
fog computing model based on analytics service in [2] with the environment of
Raspberry Pi and Spark for smart city applications. Where in the model of multiple tier
fog computing they mentioned both ad-hoc fogs with the convenience computing
resources. And also presented the consigned fogs with consigned computing resources.
The authors established a Big Data analytics embedded (BDA) framework in [17],
where two main aspects are served. First of all, it simplifies exploitation of urban Big
Data (UBD) by planning, modeling and then maintaining smart cities. The second one
is it occupies BDA for managing and then process massive UBD to promote the quality
of urban services.
In [40] researchers proposed a framework which utilizes, Hadoop Biological
Community with Spark at the best to process large measure of information. Previous
massive studies has been brought together for smart cities and sustainable cities by
[26], which also includes research that directs at more conceptual, theoretical and
overarching level.
Table 3. Framework based research

Cloud-based Able to User API helps to provide Able to
[11] autonomously data autonomously
optimize application optimize
IoT data Multi-tier fog Five datasets QoS aware resource
analytics [2] computing model management
schemes
Reasoning Introduce the Attractor types, SCN Multidimensionality
from principle of attractors context in identifying
attractors [1] as a novel paradigm multifunctional
communities
Decision Facilitate Traffic, parking lots, Integrate data
management exploitation pollution, water normalizing and
[17] consumption filtering techniques
Knowledge Defined an Mobility, energy, health, Architecture and
based [18] architecture food, education, weather solution have been
forecast presented
Smart road Convergent model Traffic flow, number of Multi-agent
monitoring road accidents, approach developed
[19] temperature indicators
IoT for Augmenting the Environment, waste -water Brings a large
sustain- informational management, traffic, number of previous
ability landscape transport, buildings, studies
[21] energy, mobility
Analytics Use profits of data to Sensors, detectors, GPS, Build sustainable
framework progress the life Chip cards, social media city
[25] quality
Data mining Enhanced insights Selects 14 research studies Intends to develop,
[26] and enables adorn then discuss a
the better–informed systematic
resolve–making architecture
Smart Urban Real-time rational Data from different Data filtration as
planning decision making then projects well as
[27] managing user- normalization are
centric event utilized
Redefining Dimensions of Data set from 2004 and Redefined paradigm
Smart City culture, metabolism, 2018
[23] and governance
API Explores the role of MySQL, Couch and meta Structure ensures
deployment APIs data generated interoperability
[37]
IoT and Big Build up the shrewd Wireless gadgets, climate Enormous measure
Data [40] city and water, activity datasets of information
434 T. Jahan et al.
Table 4. Data analysis based research

SmartSantander Correlates into SmartSantander Analyzed even
testbed [10] temperature, traffic, when bursts
seasons and the behaviors are
working days present
Data quality Ensures potentiality of IEEE research papers Develops to
perspective [29] data ensure data
quality actions
Multimedia data Processing and CCTV surveillance, New System extracts
[30] management of York, ICT data sets meaningful
multimedia data information
Integrating Actual land use in big Footprint, taxi dataset, Precise
multi-source cities We Chat, and a POI and delineating
[31] street view information
Transit-oriented Investigates transit- OpenStreetMap, Currency and
development oriented development retrieves a 2017 point of fineness of
[34] attributes interest, obtains bike- spatiotemporal
sharing grain
Table 4 represents the data analysis based articles. Authors of this research [34]
investigated the way of exploiting big and open data to examine the connections
between transit-oriented development (TOD) attributes with the quantitatively out-
comes of TOD. By exploiting BOD, this study reconfirms the subsistence of tradeoffs,
like- composing the net ratio of frequent riders versus enhancing metro ridership to
study different metro station areas. With the value of 30% or 50% car-priority streets,
values of metro station area was measured by using walkability and then compared
with primitive data, BOD have the advantages of currency and fineness of spa-
tiotemporal grain. The survey in [30] helps to generate processing and management of
big multimedia data has been collected from smart city applications by various machine
learning algorithm such as SDL, UDL, API and operating systems including Linux,
iOS and cloud computing.
This work [31] provides detailed descriptive knowledge on actual land which is
used in Tianhe District, China. And that shows the use of multiple sources big data
about actual land usages in city. Here, author’s offered method will be individually
helpful for the planning of urban, by allowing the planners in identifying original land
usages in large cities of China as well as other quickly developing countries in terms of
the building.
Table 5 represents the web analysis based articles. [38] Provides an inherent
argument for the application. Then, gave IoT technologies to boost pedestrian mobility,
utilizing pedestrian movements understanding to utter future framework development.
Microsoft Excel and PowerBI were used for static analysis, advanced interactive
visualization and analysis as well. Also improved the flow of pedestrian in the Mel-
bourne CBD. 2 and built a clear merger to the trends which is identified by analyzing
the pedestrian data.
Table 5. Web analysis based research

Big Data platform Fill the gap between Real time data sets Handle both
[39] big data platform and SmartSantander historical and real
Testbed time data
Enhancing Pedestrian flow in 53 Specific locations Foundational
pedestrian the Melbourne CBD argument
mobility [38]
A new platform to realize big data CiDAP has been disguised in [39] which is able
to handle historical data as well as real time data. And it is flexible with different scales
of data while many issues like security of data and system are ignored. The system
deployed and for the next has been integrated with a continuing IoT experimental
testbed and provided a valuable example for the designers of future smart city platform
to fill the gap between big data platform looks alike in high level and how it must be
realized.
Table 6 shows the technology based research evaluation. By encouraging across
scientific debate on multifarious provocation, [6] this special issue proposes a useful
overview for the most recent evolutions in the multifaceted and, regularly overlying,
fields of smart updated cities and smart updated villages’ research, here authors
delivered a combined discussion for the major issues and challenges which are related
to Smart Cities as well as Villages Research. And various soft issues related to this
scientific domain containing Happiness, Well-being, Security, and Safety while giving
definition to the way for future research.
The proposal in [7] combines technologies such as IoT, Hadoop and Cloud
Computing, in a conventional university campus, basically through the perception of
data by Internet of Things. The approaches of allocated and multilevel analysis here,
could be a strong starting point in finding a reliable and effective solution for the
evaluation of an intelligent environment which is based on sustainability.
In [12] authors inspected Internet of Things, secondly Cloud Computing, then Big
Data and Sensors technologies along with the focus in finding their common operations
then combine them. And offered new processes for collecting and then managing the
sensors’ data in a smart building, which manages in IoT environment.
For the first time, the study [33] mentioned a monotonous gravity model of uptown
buildings and population. And authors built a multiple scale population model to
diminish census data and achieved a high delicacy population map at a fine structural
resolution of 25 m.
In [20] authors, proposed PRIMIO model which introduces VM migration by
accounting user mobility and cloudlets computational and esteemed the rates of
resource over-provisioning by the VM migration, allowing the whole system to operate
computing resources ideally. Here, the user’s mobility and outlined VM resources in
cloud, oration the VM migration problem.
436 T. Jahan et al.
Table 6. Technology based Research

Connected TreSight, for smart OpenDataTrentino Context-Aware
communities traversing and regarding points of interest, solution
[4] defendable cultural weather, typical restaurants
estate
Data analysis Hierarchical Fog Sensor network Employing
[3] Computing advanced machine
architecture learning algorithms
AI and Offers theoretical Organizations’ website, Integrated SIS
Big Data [9] value policy documents, and technologies
newspaper articles
Processing Combine four Temperature, movement, Find common
and analysis aforementioned light and moisture operations and
[12] technologies combine
and functionality
University Implementation of Consumption of drinks in Facilitate
Campus [7] an intelligent examination seasons, areas management
environment with the highest population
density
Healthcare Model of joint VM Users mobility and Utilize computing
data migration includes cloudlet sever load resources optimally
processing Optimization of Ant
[20] Colony
Cities and Overview of the Selected 15 research studies Discussion for the
Villages most recent key issues and
Research [6] developments challenges
Integrating Iterative model Population density, Land Evaluate equitable
multisource cover, Road, Real time standard living
[33] Tencent user density areas in terms of
(RTUD) census units
5 Discussion
In this observation, we have expressed the concept of smart towns from the angle of
different data and studied various concepts, data processing techniques and frame-
works. After reviewing forty papers we have observed that there is no noticeable
concept and framework to make a smart village because of many constraints. This can
be notified that the rural areas are facing poverty, low level of education and finite
access to technology as their main problems. As smart villages’ research is a new-
comer, the researchers can focus on making a village smarter in future by exploring the
issues and challenges. For making a smart village, it should be equipped with a stronger
interconnection between existing and new smart technologies that have the ability to
communicate with one another.
6 Conclusion
The most important purpose of a smart city is to enhance the existence of its population
by means of imparting them a sustainable environment at minimum expenses. To do so
it requires a realization of various facts, e.g., data collection and processing. Big data
technology has provide efficient support for the bettermnet of smart cities. There are so
many challenges of smart cities. Day by day technologies are being updated and for
that issues of conflict in data, issues of security, issues of privacy and authenticity are
creating. So to face these issues and to make reliable system towards smart area
researchers have to work more carefully as for the next year the research on smart city
and smart village will be the trend. So, this research mentioned forty research papers by
categorizing in five sectors with time scales from the year 2014 to 2020, while sum-
marizing author’s contributions in the field of evaluation of big data in smart cities and
smart villages.
References
1. Ianuale, N., Schiavon, D., Capobianco, E.: Smart Cities, Big Data, and Communities:
reasoning from the viewpoint of attractors. IEEE Access 4, 41–47 (2016). https://doi.org/10.
1109/ACCESS.2015.2500733
2. He, J., Wei, J., Chen, K., Tang, Z., Zhou, Y., Zhang, Y.: Multitier fog computing with large-
scale IoT data analytics for Smart Cities. IEEE Internet Things J. 5(2), 677–686 (2018).
https://doi.org/10.1109/JIOT.2017.2724845
3. Tang, B., Chen, Z., Hefferman, G., Wei, T., He, H., Yang, Q.: A hierarchical distributed fog
computing architecture for Big Data analysis in Smart Cities. ACM (2015). https://doi.org/
10.1145/2818869.2818898
4. Sun, Y., Song, H., Jara, A.J., Bie, R.: Internet of Things and Big Data analytics for smart and
connected communities. IEEE Access 4, 766–773 (2016). https://doi.org/10.1109/ACCESS.
2016.2529723
5. Zavratnik, V., Podjed, D., Trilar, J., Hlebec, N., Kos, A., Duh, E.S.: Sustainable and
community-centred development of Smart Cities and Villages. Sustainability 12(10), 3961
(2020). https://doi.org/10.3390/su12103961
6. Visvizi, A., Lytras, M.D.: Sustainable Smart Cities and Smart Villages research: rethinking
security, safety, well-being, and happiness. Sustainability 12(1), 215 (2019). https://doi.org/
10.3390/su12010215
7. Villegas-Ch, W., Palacios-Pacheco, X., Luján-Mora, S.: Application of a Smart City model
to a traditional University Campus with a Big Data architecture: a sustainable Smart
Campus. Sustainability 11(10), 2857 (2019). https://doi.org/10.3390/su11102857
8. Visvizi, A., Lytras, M.: It’s not a fad: Smart Cities and Smart Villages research in European
and global contexts. Sustainability 10(8), 2727 (2018). https://doi.org/10.3390/su10082727
9. Mark, R., Anya, G.: Ethics of using Smart City AI and Big Data: the case of four large
European cities. ORBIT J. 2(2), 1–36 (2019). https://doi.org/10.29297/orbit.v2i2.110
10. Jara, A.J., Genoud, D., Bocchi, Y.: Big Data for Smart Cities with KNIME a real experience
in the SmartSantandertestbed. Intell. Technol. Appl. Big Data Analyt. 45(8), 1145–1160
(2014). https://doi.org/10.1002/spe.2274
438 T. Jahan et al.
11. Vogler, M., Schleicher, J.M., Inzinger, C., Dustdar, S.: Ahab: a cloud-based distributed Big
Data analytics framework for the Internet of Things. Big Data Cloud Things 47(3), 443–454
(2016). https://doi.org/10.1002/spe.2424
12. Plageras, A.P., Psannis, K.E., Stergiou, C., Wang, H., Gupta, B.B.: Efficient IoT-based
sensor BIG Data collection–processing and analysis in smart buildings. Future Gen. Comput.
Syst. 82, 349–357 (2018). https://doi.org/10.1016/j.future.2017.09.082
13. Petrolo, R., Loscrì, V., Mitton, N.: Towards a smart city based on cloud of things, a survey
on the smart city vision and paradigms. Emerg. Telecommun. Technol. 28(1), e2931 (2015)
14. Liu, Q., et al.: Analysis of green spaces by utilizing Big Data to support Smart Cities and
environment: a case study about the city center of Shanghai. ISPRS Int. J. Geo-Inf. 9(6), 360
(2020). https://doi.org/10.3390/ijgi9060360
15. Allam, Z., Tegally, H., Thondoo, M.: Redefining the use of Big Data in Urban Health for
increased liveability in Smart Cities. Smart Cities 2(2), 259–268 (2019). https://doi.org/10.
3390/smartcities2020017
16. Kudva, S., Ye, X.: Smart Cities, Big Data, and sustainability union. Big Data Cognit.
Comput. 1(1), 4 (2017). https://doi.org/10.3390/bdcc1010004
17. Silva, B., et al.: Urban planning and Smart City decision management empowered by real-
time data processing using Big Data analytics. Sensors 18(9), 2994 (2018). https://doi.org/
10.3390/s18092994
18. Badii, C., Bellini, P., Cenni, D., Difino, A., Nesi, P., Paolucci, M.: Analysis and assessment
of a knowledge based Smart City architecture providing service APIs. Future Gen. Comput.
Syst. 75, 14–29 (2017). https://doi.org/10.1016/j.future.2017.05.001
19. Finogeev, A., Finogeev, A., Fionova, L., Lyapin, A., Lychagin, K.A.: Intelligent monitoring
system for smart road environment. J. Ind. Inf. Integr. 15, 15–20 (2019). https://doi.org/10.
1016/j.jii.2019.05.003
20. Islam, M., Razzaque, A., Hassan, M.M., Nagy, W., Song, B.: Mobile cloud-based big
healthcare data processing in Smart Cities. IEEE Access 5, 11887–11899 (2017). https://doi.
org/10.1109/ACCESS.2017.2707439
21. Bibri, S.E.: The IoT for smart sustainable cities of the future: an analytical framework for
sensor-based big data applications for environmental sustainability. Sustain. Cities Soc. 38,
230–253 (2018). https://doi.org/10.1016/j.scs.2017.12.034
22. Wong, M., Wang, T., Ho, H., Kwok, C., Keru, L., Abbas, S.: Towards a Smart City:
development and application of an improved integrated environmental monitoring system.
Sustainability 10(3), 623 (2018). https://doi.org/10.3390/su10030623
23. Allam, Z., Newman, P.: Redefining the Smart City: culture, metabolism and governance.
Smart Cities 1(1), 4–25 (2018). https://doi.org/10.3390/smartcities1010002
24. Lucas, C.M., de Mingo López, L., Blas, N.G.: Natural computing applied to the underground
system: a synergistic approach for Smart Cities. Sensors 18(12), 4094 (2018). https://doi.org/
10.3390/s18124094
25. Abbad, H., Bouchaib, R.: Towards a Big Data Analytics Framework for Smart Cities. ACM
(2017). https://doi.org/10.1145/3175628.3175647
26. Bibri, S.E., Krogstie, J.: The Big Data deluge for transforming the knowledge of smart
sustainable cities: a data mining framework for urban analytics. ACM (2018)
27. Babar, M., Arif, F.: Smart urban planning using Big Data analytics based Internet of Things.
Future Gen. Comput. Syst. 77, 65–76 (2017). https://doi.org/10.1145/3123024.3124411
28. Moustaka, V., Vakali, A., Anthopoulos, L.G.: A systematic review for Smart City Data
analytics. ACM Comput. Surv. 51(5), 1–41 (2019). https://doi.org/10.1145/3239566
29. Baldassarre, M.T., Caballero, I., Caivano, D., Garcia, B.R., Piattini, M.: From big data to
smart data: a data quality perspective. ACM (2018). https://doi.org/10.1145/3281022.
3281026
30. Usman, M., Jan, M.A., He, X., Chen, J.: A survey on big multimedia data processing and
management in smart cities. ACM Comput. Surv. 52(3), 1–29 (2019). https://doi.org/10.
1145/3323334
31. Niu, N., et al.: Integrating multi-source big data to infer building functions. Int. J. Geograph.
Inf. Sci. (2017). https://doi.org/10.1080/13658816.2017.1325489
32. Long, Y., Liu, L.: Transformations of urban studies and planning in the big/open data era: a
review. Int. J. Image Data Fusion 7(4), 295–308 (2016). https://doi.org/10.1080/19479832.
2016.1215355
33. Yao, Y., et al.: Mapping fine-scale population distributions at the building level by
integrating multisource geospatial big data. Int. J. Geograph. Inf. Sci. (2017). https://doi.org/
10.1080/13658816.2017.1290252
34. Zhou, J., Yang, Y., Webster, C.: Using big and open data to analyze transit-oriented
development: new outcomes and improved attributes. J. Am. Plan. Assoc. 86(3), 364–376
(2020). https://doi.org/10.1080/01944363.2020.1737182
35. Tan, K.H., Ji, G., Lim, C.P., Tseng, M.-L.: Using big data to make better decisions in the
digital economy. Int. J. Prod. Res. 55(17), 4998–5000 (2017). https://doi.org/10.1080/
00207543.2017.1331051
36. Tierney, T.F.: Big Data, big rhetoric in Toronto’s Smart City. Archit. Cult. 7(3), 351–363
(2019). https://doi.org/10.1080/20507828.2019.1631062
37. Jnr, B.A., Petersen, S.A., Ahlers, D., Krogstie, J.: API deployment for big data management
towards sustainable energy prosumption in smart cities-a layered architecture perspective.
Int. J. Sustain. Energy 39(3), 263–289 (2019). https://doi.org/10.1080/14786451.2019.
1684287
38. Carter, E., Adam, P., Tsakis, D., Shaw, S., Watson, R., Ryan, P.: Enhancing pedestrian
mobility in Smart Cities using Big Data. J. Manag. Analyt. 7(2), 173–188 (2020). https://doi.
org/10.1080/23270012.2020.1741039
39. Cheng, B., Longo, S., Cirillo, F., Bauer, M., Kovacs, E.: Building a big data platform for
smart cities: experience and lessons from Santander. IEEE Access (2015). https://doi.org/10.
1109/BigDataCongress.2015.91
40. Yadav, P., Vishwakarma, S.: Application of Internet of Things and Big Data towards a Smart
City. IEEE Access (2018). https://doi.org/10.1109/IoT-SIU.2018.8519920
A Compact Radix-Trie: A Character-Cell
Compressed Trie Data-Structure
for Word-Lookup System
Rahat Yeasin Emon(&) and Sharmistha Chanda Tista

of Engineering and Technology, Chattagram 4349, Bangladesh
u1304007@student.cuet.ac.bd,
rahat_yeasin_emon@hotmail.com,
tista_chanda@cuet.ac.bd
Abstract. String words are a sequence of characters. Efficient data structure

needs to store a word-list in memory to reduce the space complexity. Trie-tree is
a popular word lookup data structure, whose word lookup time complexity is O
(l) (‘l’ is the searched-word length). Array-based trie-tree, which has linear
searching time complexity, is a memory inefficient data structure, which has lots
of unused character-cells. Dynamic data structure (e.g., linked-list, binary search
tree) based trie-tree, compresses character-cells through word prefix sharing.
This paper proposes a more character-cells compressed, space-efficient trie-tree,
for word-list storing and searching which has a new empty node property (get
data from another trie-node) thus reduces character-cells requirement. The
proposed trie data structure needs very few numbers of character-cells. From the
experimental results, we have seen that using the proposed data structure to
represent any dictionary word-list, 99.95% character-cells are compressed and
99.90% trie-nodes are empty.
Keywords: Data structure Trie Radix-trie/PATRICIA-trie Word-lookup

data structures Character-cells Space complexity
1 Introduction and Background
Tree is a special type of data structure, it is defined as a hierarchical collection of nodes.

It has one root node and several hierarchical subtree nodes. Each node has a value or
key and a list of child nodes. The bottom node in the tree hierarchy which doesn’t have
a child is called the leaf node.
Trie-tree [1] is a vastly used word lookup data structure whose time complexity is O
(l). It is used in several types of computer-science applications such as dictionary
management [6–8], generating auto word suggestion [9], spell-checking [10, 11],
pattern matching [12, 13], and IP-address searching [14–16], natural language pro-
cessing, data-mining [17, 18], database-system, compiler, and computer-networks, and
text compression.
Trie-tree is a character-wise tree where string-word are stored in a tree-type manner.
The key of a node is a single character and the number of child nodes of a node is
usually the size of the alphabet of the corresponding word list. For example, the size of
https://doi.org/10.1007/978-3-030-93247-3_43
A Compact Radix-Trie: A Character-Cell Compressed Trie Data-Structure 441
the alphabet of the English language is 26. For that case, to represent English word-list
in trie-tree, the value of a node is one of the 26 English alphabets and each node has a
maximum 26 number of child-nodes.
Fig. 1. Building process (word insertion) of trie-tree
The above Fig. 1 depicts the insertion process of word-list – ‘tree’, ‘trie’, ‘tank’,
‘work’, and ‘wood’ into trie-tree.
If dynamic data structures (e.g., linked-list) are used to represent the child-list of a
node, then the trie-tree structure has data compress ability through prefix sharing. For
example, the above Fig. 1(e) is a trie-tree for words ‘tree’, ‘trie’, ‘tank’, ‘work’, and
‘wood’. These five words have total 20 character cells. But trie-tree needs 15 character
cells to represent these five words. Here 25% of character cells are compressed. The
proposed methodology tends to improve the trie-tree character-cells compression
capability.
The array-based trie-tree, that word lookup time-complexity is O(l). As its word
lookup time-complexity is very low, historically this array-based trie-tree is commonly
used everywhere [2–5] but it has high memory requirement.
The above Fig. 2 depicts the implementation and memory requirement of array-
based trie-tree of word ‘tree’. In Fig. 2, we have shown that to build a trie-tree of the
word ‘tree’, we have total 104 cells among them 4 cells are used-cells and the rest of
the 100 cells are unused-cells.
PATRICIA or Radix-trie [2] sometimes called compact prefix-tree, is a space
optimize representation of native-trie. Radix-trie saves space by merging single parent
nodes (nodes that have only one child) with its child-node.
442 R. Y. Emon and S. C. Tista
Fig. 2. Implementation of trie-tree of word ‘tree’ (array-based child-list)
Fig. 3. Radix-trie (from native trie-tree)
In Fig. 3, we have seen that single parent-nodes merged with its child-node. This
compact representation of radix-trie has a minimal number of trie-nodes, which
improves the space and searching time-complexity. But the Radix-trie, there can have
lots of nodes that possess the identical type of data, is memory inefficient.
An ASCII character-cell consumes 8 bits and a Unicode character-cell consumes 16

bits of memory. That data structure is space-efficient whose has minimal number of
character-cells. This paper presents a new character-cell compressed trie-tree, for
storing and searching string words. From the experimental results, we have seen that
the proposed-trie can compress 99.95% character cells to represent any dictionary word
list.
2 Proposed Trie
2.1 Character Path Node and Maximum Prefix Matched Node

The character sequences need to traverse a node from the root in trie-tree is termed as
character-path of that node. For a searched word, the longest prefix matched character-
path-node in a trie is termed as the maximum prefix matched node (Fig. 4).
Fig. 4. Maximum prefix matched node
The above trie, to search ‘become’ word, the maximum matched prefix is ‘be’ and
to search ‘begat’ word, the maximum matched prefix is ‘beg’ for that case ‘be’
character-path-node and ‘begin’ character-path-node is the maximum prefix-matched-
node of search string ‘become’ and ‘begat’.
2.2 Algorithm for Proposed Trie – Data Entry Procedure.
Algorithm – Procedure – 1 Data entry in the proposed-trie node

1: Procedure enter_data_to_node(Node node, String word_substring)
2: Check or create character_path_node of word_substring
3: If such character-path-node exists or possible to create then
4: Put node.refererenceCharacterPathNode = that character_path_node
5: Put node.data = null
6: Else put node.data = word_substring
7: Put node.refererenceCharacterPathNode = null
8: End procedure
Fig. 5. Proposed-trie data-entry process
The Fig. 5(a) represents a trie-tree of words (‘road’, ‘abandon’, ‘abroad’, and ‘about’),
where the node’s entry-data depicts a side of a node. In Fig. 5(b), node-1 (‘road’),
node-2 (‘ab’) and, node-3 (‘andon’) store the entry-data internally. In Fig. 5(c), node-4
entry-data ‘road’ already exists in trie-tree (here node-1). Node-4 points ‘road’
character-path-node as data-node. In Fig. 5(d), node-5 creates ‘out’ character-path-node

and points that node as data-node.
2.3 Algorithm for Proposed Trie – Insert word in Trie-tree

Algorithm – Procedure-2 Insert word to proposed-trie
1: Procedure insert_word_to_proposed_trie(String word)
2: Go to maximum prefix-matched-node of word
3: If maximum prefix-matched-node is not-found then
4: Create new_node(), which is child of root-node
5: Put new_node.data = word
6: Put new_node.refererenceCharacterPathNode = null
7: Put root.childList.add(new_node)
8: Else if maximum prefix-matched-node found then
9: Put current_node = maximum prefix-matched-node of entry word
10: Check matched prefix and unmatched suffixes between current_node
************and entry string word
11: Split current_node as ::
12: Put current_node.data = matched_prefix
13: Create two new node as new_node1() and new_node2()
14: Put enter_data_to_node(new_node1, unmatched_suffix_of_current_node)
15: Put enter_data_to_node(new_node2, unmatched_suffix_of_entry_word)
16: End Procedure
The above Fig. 6 depicts the split node process of proposed-trie and Fig. 7 is the
graphical representation of the proposed-trie word insertion process. Figure 7(a) depicts
an empty trie-trie, starts with a root node. In Fig. 7(b), a new word ‘abandon’ is inserted
in the empty trie. Root-node creates a child-node (here node-1) as it is the child-node of
root it stores ‘abandon’ data internally.
In Fig. 7(c) and Fig. 6, a new word ‘abroad’ is inserted. The ‘abandon’ node is the
maximum prefix-matched node of the word ‘abroad’ where matched prefix is ‘ab’.
Here node-1 split and creates two child nodes (node-2 and node-3). Node-1 possesses
matched prefix ‘ab’, node-2 possesses ‘road’ data (the unmatched-suffix of-entry-
word), and node-3 possesses ‘andon’ data (the unmatched suffix of ‘abandon’ node).
In Fig. 7(g), a new word ‘road’ is inserted into the proposed trie. Here we have seen
that ‘road’ character-path-node already exists. Thus we only need to put a word ending
sign in node-4.
The character-path node lookup property reduces the character-cells of proposed
trie-tree.
In Fig. 8, depicts the proposed-trie of words (‘road’, ‘abroad’, ‘abandon’, ‘injury’,
‘inboard’, ‘board’, ‘juryboard’). These words have a total 44 character cells. The
proposed trie data structure requires 22 character cells to store these seven words. The
character-cells compressed ratio is 50% and the empty node ratio is 40%.
Fig. 6. Insert new word and split node
Fig. 7. Word insertion process of proposed-trie (graphical presentation)

Fig. 8. Proposed-trie
3.1 Compaction of Character-cell of Proposed Data Structure

The following table shows the character-cells requirement of various data sets using the
proposed compact-trie.
Table 1. Proposed data structure character-cell compaction.

Data set Total character-cells Proposed-trie
Character-cells Compressed-ratio
20,000 English words 135,418 33 99.98%
466,544 English words 4,396,422 106 99.99%
18,622 French words 134,303 56 99.96%
26,280 German words 201,903 76 99.96%
112,940 Bangla words 846,296 154 99.98%
23,066 Hindi words 139,322 237 99.82%
In our first data set, 20,000 dictionary words have 135,418 character cells. In the
third column, we have seen that to represent this huge data-set the proposed compact-
trie has 33 character cells. Total 135,385 (135,418 – 33) character-cells are com-
pressed. The character-cells compressed-ratio is 99.98%. In Table 1, for every data set,
the compressed ratio is nearly 99.95%.
3.2 Comparison Radix-trie and Proposed Compact-trie (Node

Requirement)
In Table 2, we will show the experimental results of node requirement, of radix-trie,
and the proposed compact-trie.
Table 2. Node requirement, Radix-trie, and Proposed compact-trie.

Data set Radix-trie Proposed compact-trie
Total nodes Total nodes Total Empty-nodes
empty- percentage
nodes
20,000 English 23,525 25,922 25,894 99.89%
words (all non-empty (28 nodes are
nodes) non-empty)
4,66,544 5,96,084 6,26,689 6,26,625 99.98%
English words (all non-empty (64 nodes are
nodes) non-empty)
The first data set, radix-trie has 23,525 non-empty nodes to store 20,000 dictionary
words. But the proposed compact-trie has 25,922 nodes to represent the same data set,
among them, the number of empty nodes is 25,894 and the number of non-empty nodes
is 28 (25,922 - 25,894). The empty-nodes percentage is 99.89%. Here we have seen
that for every data set, the proposed compact-trie empty-nodes percentage is nearly
99.90%.
4 Conclusion
This paper has introduced a character-cells compressed, improved trie-tree for word
lookup system. We have introduced a new empty node property to trie-tree. From the
experimental results, we have seen that 99.90% of the proposed trie nodes are empty.
These empty nodes reduce the character-cells requirement to a large extent. To rep-
resent any popular dictionary word-list, the proposed-trie can compress almost 99.95%
of character cells.
An ASCII character consumes 8 bits of memory, and a Unicode character con-
sumes 16 bits of memory. The proposed data structure reduces space complexity by
reducing the number of character cells. As the proposed trie compresses word-list
character cells to a large extent, the methodology can be used as a text compression
algorithm. Based on the proposed compact radix-trie, we will try to publish a text
compression algorithm in the coming days.
References
1. Fredkin, E.: Trie memory. Commun. ACM 3, 490–499 (1960)
2. Morrison, D.R.: PATRICIA—practical algorithm to retrieve information coded in alphanu-
meric. J. ACM 15(4), 514–534 (1968)
3. Askitis, N., Sinha, R.: HAT-trie: a cache-conscious trie-based data structure for strings. In:
Proceedings of the 30th Australasian Conference on Computer science, pp. 97–105 (2007)
4. Heinz, S., Zobel, J., Williams, H.: Burst tries. ACM Trans. Inf. Syst. 20, 192–223 (2002)
5. Hanandeh, F., Alsmadi, I., Akour, M., Daoud, E.: KP-trie algorithm for update and search
operations. Int. Arab J. Inf. Technol. 13(6) (2016)
6. Parmar, P., Kumbharana, C.K.: Implementation of trie structure for storing and searching of
English spelled homophone words. Int. J. Sci. Res. Publ. 7(1) (2017)
7. Ferrández, A., Peral, J.: MergedTrie: efficient textual indexing. PLOS ONE 14, e0215288
(2019)
8. Aoe, J.-I., Morimoto, K., Sato, T.: An efficient implementation of trie structures. Softw.
Pract. Exp. 22(9), 695–721 (1992)
9. Boo, V.K., Anthony, P.: A data structure between trie and list for auto completion. In:
Lukose, D., Ahmad, A.R., Suliman, A. (eds.) KTW 2011. CCIS, vol. 295, pp. 303–312.
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32826-8_31
10. Bhaire, V.V., Jadhav, A.A., Pashte, P.A., Magdum, P.G.: Spell checker. Int. J. Sci. Res.
Publ. 5(4) (2015)
11. Xu, Y., Wang, J.: The adaptive spelling error checking algorithm based on trie tree. In: 2nd
International Conference on Advances in Energy, Environment and Chemical Engineering
(AEECE) (2016)
12. Deng, D., Li, G., Feng, J.: An efficient trie-based method for approximate entity extraction
with edit-distance constraints. In: 2012 IEEE 28th International Conference on Data
Engineering (2012)
13. Baeza-Yates, R.A., Gonnet, G.: Fast text searching for regular expressions or automaton
searching on tries. J. ACM 43(6), 915–936 (1996)
14. Lim, H., Yim, C., Swartzlander, E.E.: Priority tries for IP address lookup. IEEE Trans.
Comput. 59(6), 784–794 (2010)
15. Nilsson, S., Karlsson, G.: Ip-address lookup using LC-tries. IEEE J. Select. Areas Commun.
17(6), 1083–1092 (1999)
16. Thair, M., Ahmed, S.: Tree-combined trie: a compressed data structure for fast IP address
lookup. Int. J. Adv. Comput. Sci. Appl. 6(12) (2015)
17. Qu, J.-F., Liu, M.: A fast algorithm for frequent itemset mining using Patricia* structures. In:
Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 205–216. Springer,
Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_17
18. Savnik, I., Akulich, M., Krnc, M., Škrekovski, R.: Data structure set-trie for storing and
querying sets: theoretical and empirical analysis. PLOS ONE 16(2), e0245122 (2021)
Digital Twins and Blockchain:
Empowering the Supply Chain
Jose Eduardo Aguilar-Ramirez1 , Jose Antonio Marmolejo-Saucedo1(B) ,

and Roman Rodriguez-Aguilar2
1
Facultad de Ingenierı́a, Universidad Panamericana, Augusto Rodin 498,
03920 Ciudad de México, Mexico
jmarmolejo@up.edu.mx
2
Facultad de Ciencias Económicas y Empresariales, Universidad Panamericana,
Augusto Rodin 498, 03920 Ciudad de México, Mexico
rrodrigueza@up.edu.mx
Abstract. Industry 4.0 is here, and it arrived with very promising new
technologies that can foster he supply chain management across indus-
tries. In this paper we review multiple sources to identify the main
characteristics of Digital Twins and Blockchain technologies and how
they can work together to fulfill the needs of the supply chain. We
identify some advantages and disadvantages that must be properly ana-
lyzed before adopting this approach into any business. Many applications
behind these new benefits are still in development, but we believe these
two technologies have great potential.
Keywords: Digital Twin · Blockchain · Supply chain · ERP ·

Digitalization
1 Introduction
Technology has had a huge impact on the development of the human race. Indus-
try 1.0 was led by the steam machines surpassing human capacity; Industry 2.0
was led by the introduction of electricity in factories, as well as the assembly line
of Henry Ford; Industry 3.0 was led by the development of computer automa-
tion and information technology (IT). Now, we are facing industry 4.0, led by the
internet of things (IoT), Artificial Intelligence (AI), computer-based algorithms
such as machine learning, and all the above mentioned connected to display data
in real time to make decisions.
In this new type of industry where everything is connected and digitalized,
the need of sharing data in real time for better decision making, while maintain-
ing data integrity throughout the supply chain is essential. That is where Digital
Twins (DT) and Blockchain (BC) comes in.
Digital Twins helps you to replicate any physical object or system in a digital
environment, where you can run multiple tests as well as monitor the current

https://doi.org/10.1007/978-3-030-93247-3_44
Blockchain on Digital Twins 451
Fig. 1. Document type available in Scopus data base
state of the object. This technology relies heavily on the IoT and the connec-
tivity and digitalization of all components. Blockchain on the other hand, is a
technology that makes a good partnership with digital twins in terms of data
integrity and storage.
This paper will briefly describe how these two technologies complement each
other and explain benefits and some limitations.
Since our goal is to analyze benefits and limitations of digital twins with
blockchain technology, we focused our search mainly behind these key words:
Digital Twins and Blockchain. In Table 1 shows the number of papers currently
available on Scopus database.
Table 1. Number of documents per key words available in Scopus data base
Key Words Documents

Blockchain 23,937
Digital Twins 6,997
Digital Twins and Blockchain 101
Digital Twins and Blockchain and supply chain 16
The main source of information came mainly from the following type of
documents: Articles, Conference Papers and book chapters. Other sources were
used for getting some real-life examples. The information is presented in Fig. 1.
452 J. E. Aguilar-Ramirez et al.
Fig. 2. Documents available by year
Results for Digital Twins combined with Blockchain increased over time,
indicating that there is increasing interest in the topic, as demonstrated in Fig. 2.
As conclusion, the results for DT and BC’s integration with the supply chain
had the lowest search results and highlights this is an area that needs more
research.
2.1 Literature Review
Digital Twin. DT is a new kind of technology. The term “twin” was first
used by NASA’s Apollo program in 2010, they used two identical and physical
space vehicles to mirror the conditions of the other vehicle during the mission
Boschert and Rosen (2016). Although this first approach didn’t include a digital
representation, it clearly showed great benefits.
Its scope has changed throughout the years. It was first described as a proto-
type that could mirror real conditions for simulation Boschert and Rosen (2016),
and also a tool to assist on the product life cycle management of a product Huang
et al. (2020). Ultimately it also refers to the digital representation of its coun-
terpart and lets companies manage the life cycle of their products or even the
supply chain Dietz and Pernul (2020).
Now a days, many companies use DT in order to simulate and test different
circumstances without closing or delaying the daily activities Felipe Bustamante
and Singh (2020).
DT provides a real time visualization into what is happening with the physical
asset. This enables the past, present and future performance of the asset to be
tracked and in combination with BC, guarantees the integrity of the information
by recording every transaction of the asset Raj (2021).
Blockchain. The concept was introduced in 2008 by Satoshi Nakamoto. BC is

a data structure chained to each other in a sequential order. The reason why
this technology is considered tampered-proofed is because of a public cyphered
ledger. This means any modification is registered in a block and each block is
connected to the previous and next block, creating a “blockchain” Zhong and
Huang (2020). This secure BC function that fulfills different needs of security
across the supply chain. Those abilities have already been identified by Enterprise
Resource Planning (ERP) vendors and they are currently trying to integrate
those to their ERP systems Parikh (2018).
Sometimes information must be shared with partners outside the company,
and since the connection between different ERP’s system from partners is not
allowed or has to be verified by another entity, information doesn’t flow as
efficient as it could; with this new model of decentralized platforms based on
blockchain communication efficiency will increase Sokolov and Kolosov (2021).
ERP and BC also share something in common: both store information from
multiple areas of the business and both share the same information to other
areas, the difference relay on the accessibility of this data as well as how they
store it. BC makes it in a decentralized way, while ERP store everything in one
place Haddara et al. (2021).
BC also has a powerful tool named smart contracts. These smart contracts
allow a natural flow of decisions in the supply chain without the need of a central
authority checking if the conditions are enough to keep the flow Borowski (2021).
The terms in which smart contracts work must be settle with experts from the
same company or between different companies Nielsen et al. (2020). Boundaries
must be applied to limit access to contracts and prevent them to be modified,
as well as full transparency of who altered it Putz et al. (2021).
2.2 Integrating DT and BC

Since a Digital Twin is a digital representation of any physical entity, it is not
only limited to one object, but can also be the mirror of a whole system with
multiple individual entities Hemdan and Mahmoud (2021), Dietz and Pernul
(2020). For example: we can create a DT of a building and by connect that DT
with another building of the same block, we could create a system describing
the whole block and so on. We could create a digital twin of a city following this
logic.
By combining both technologies, the storage and transmission of valuable
information that de DT of the system carries would be safe and restricted thanks
to its public and cyphered ledger Hasan et al. (2020), Wang et al. (2021).
When connecting multiple DT’s with each other, a huge amount of infor-
mation and storage is required from a centralized system. With BC technology
integrated, the decentralized system can be used for better performance Wang
et al. (2021).
One proposed model of integration for both technologies is the use of peer-to-
peer networks that enables effective communication between participants from
the same team as well as from other teams inside or outside the company Huang
et al. (2020).
2.3 Benefits
By fostering DT with BC technology, the main concerns behind the level of

digitalization, data storage and transmission security of information are tackled.
This has already been mentioned previously. By keeping data secure, intellectual
property rights (IPR) that are shared through the supply chain by creating its
DT, could detect any tampering and avoid any leaks Nielsen et al. (2020).
For example, when buying a pre-owned car, with these technologies, you
would track how many owners the car has had, and which parts of the vehicle
are still original thanks to its DT with BC technology. Heber et al. (2017).
Likewise, thanks to this traceability functions, any manufacturer could be able
to detect a failure in a batch production item and easily track and correct those
failures, increasing the operational and service levels Hemdan and Mahmoud
(2021).
As stated before, implementing a DT enhanced with BC in each product
would create a digital certification that would nullify any kind of fraud and
detect fake products with this proof of authenticity Raj (2021).
At last, using and analyzing historical data, we can simulate when any poten-
tial breakdown may occur and prepare for that scenario. This ability is also
known as “predictive twin” Raj (2021). So DT is not only limited to product
life management, but also to a way to predict possible outcomes of the physical
twin.
2.4 Limitations
The level of digitalization in businesses required to enable these new technologies

from industry 4.0 is high. DT and BC rely heavily on IoT, sensors, machine
learning and 5G to capture, transfer and analyze data. Without any of those,
the process could not be possible to get. Heavy investment behind these systems
would be required to obtain these benefits Nielsen et al. (2020).
Since a high level of connectivity is required between multiple sensors and
systems to measure, analyze, and display data in real time of the physical asset;
there is a question behind how many sensors are required to get a complete
evaluation of the object? There is no simple answer. The number of sensors will
vary depending on the industry. A very suitable approach would be to track
only key inputs that are needed to complete your objective Aaron Parrott and
Warshaw (2020).
This also leads us to requirements of data transmission. Linked with the
level of digitalization, multiple sensors for every DT created demands a lot of
processing, storage and transmission capacity; as long as those tools get higher
performance, there might be a bottleneck inside the transmission process Tao et
al. (2020).
Later, the distrust between different companies of the same chain is a bound-
ary that needs to be addressed Tao et al. (2020); but with BC technology in DT,
a solution is available to this solve behavior since it is an immutable and secure
way Hasan et al. (2020). And finally, BC tech must be standardized across all
industries in order to create a successful connection of all DT and systems. Since

BC is still being developed that formed, a common construction is still pending
Tao et al. (2020).
3 Conclusion
Lots of the BC technology is yet to be investigated and properly discussed so
far. There are many applications that are still in early stages of development
and in some cases, far from industrial operation Sokolov and Kolosov (2021).
For that same reason, maybe a full proven system/process that involves a DT
with BC technology will not be available soon, but there is a lot of potential
behind those technologies. Imagine a whole supply chain industry connected:
providers, manufacturers, and customers, where each product has its own DT
that provides immutable information and let customers validate its genuine ori-
gin and companies deliver great service level. This could be possible with both
technologies working together.
To get to that level, the aircraft industry has long been users of DT and
are a high-tech sector, that are keen on tracking all components of an aircraft
throughout all its lifetime Mandolla et al. (2019). For this reason, they may be a
benchmark for every other industry that is looking for best practices using these
new tools.
References
Aaron Parrott, B.U., Warshaw, L.: Digital twins bridging the physical and digital
Deloitte (2020)
Borowski, P.F.: Digitization, digital twins, blockchain, and Industry 4.0 as elements of
management process in enterprises in the energy sector. Energies 14(7), 1885 (2021)
Boschert, S., Rosen, R.: Digital twin—the simulation aspect. In: Hehenberger, P.,
Bradley, D. (eds.) Mechatronic Futures, pp. 59–74. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-32156-1 5
Dietz, M., Pernul, G.: Digital twin: empowering enterprises towards a system-of-systems
approach. Bus. Inf. Syst. Eng. 62(2), 179–184 (2020)
Felipe Bustamante, J.H., Dekhne, A., Singh, V.: Improving Warehouse Operations-
Digitally. Mckinsey (2020)
Haddara, M., Norveel, J., Langseth, M.: Enterprise systems and blockchain technology:
the dormant potentials. Procedia Comput. Sci. 181, 562–571 (2021)
Hasan, H.R., et al.: A blockchain-based approach for the creation of digital twins. IEEE
Access 8, 34113–34126 (2020)
Heber, D., Groll, M., et al.: Towards a digital twin: how the blockchain can foster
E/E-traceability in consideration of model-based systems engineering. In: DS 87-3
Proceedings of the 21st International Conference on Engineering Design (ICED 17),
Product, Services and Systems Design, Vancouver, Canada, 21–25 August 2017, vol.
3, pp. 321–330 (2017)
Hemdan, E.E.-D., Mahmoud, A.S.A.: BlockTwins: a blockchain-based digital twins
framework. In: Choudhury, T., Khanna, A., Toe, T.T., Khurana, M., Gia Nhu,
N. (eds.) Blockchain Applications in IoT Ecosystem. EICC, pp. 177–186. Springer,
Cham (2021). https://doi.org/10.1007/978-3-030-65691-1 12
Huang, S., Wang, G., Yan, Y., Fang, X.: Blockchain-based data management for digital
twin of product. J. Manuf. Syst. 54, 361–371 (2020)
Mandolla, C., Petruzzelli, A.M., Percoco, G., Urbinati, A.: Building a digital twin for
additive manufacturing through the exploitation of blockchain: a case analysis of the
aircraft industry. Comput. Ind. 109, 134–152 (2019)
Nielsen, C.P., da Silva, E.R., Yu, F.: Digital twins and blockchain-proof of concept.
Procedia CIRP 93, 251–255 (2020)
Parikh, T.: The ERP of the future: blockchain of things. Int. J. Sci. Res. Sci. Eng.
Technol. 4(1), 1341–1348 (2018)
Putz, B., Dietz, M., Empl, P., Pernul, G.: EtherTwin: blockchain-based secure digital
twin information management. Inf. Process. Manag. 58(1), 102425 (2021)
Raj, P.: Empowering digital twins with blockchain. Adv. Comput. 121, 267 (2021)
Sokolov, B., Kolosov, A.: Blockchain technology as a platform for integrating corporate
systems. Autom. Control Comput. Sci. 55(3), 234–242 (2021)
Tao, F., et al.: Digital twin and blockchain enhanced smart manufacturing service
collaboration and management. J. Manuf. Syst. (2020)
Wang, W., Wang, J., Tian, J., Lu, J., Xiong, R.: Application of digital twin in smart
battery management systems. Chin. J. Mech. Eng. 34(1), 1–19 (2021)
Zhong, S., Huang, X.: Special Focus on Security and Privacy in Blockchain-Based
Applications. Science China Press (2020)
Detection of Malaria Disease Using Image
Processing and Machine Learning
Md. Maruf Hasan(B) , Sabiha Islam , Ashim Dey(B) , Annesha Das ,

and Sharmistha Chanda Tista
Computer Science and Engineering, Chittagong University of Engineering and

Technology, Chittagong 4349, Bangladesh
{u1604089,u1604070}@student.cuet.ac.bd,
{ashim,annesha,tista chanda}@cuet.ac.bd
Abstract. Malaria is a contagious disease that claims millions of lives

each year. A standard laboratory malaria diagnosis requires a careful
study of both healthy and infected red blood cells. Malaria can be diag-
nosed by looking at a drop of the patient’s blood under a microscope
and opening it on a slide as a blood smear. The quality of the blood
smear also influences its accuracy and correctness in the classification and
detection of malaria disease. This results in a large number of inevitable
errors, which are not acceptable. The goal of this research is to create
a computer-aided method for the automatic detection of malaria par-
asites using image processing and machine learning techniques. Unin-
fected or parasitized blood cells have been classified using handcrafted
features extracted from red blood cell images. We have implemented
Adaboost, K-Nearest Neighbor, Decision Tree, Random Forest, Support
Vector Machine and Multinomial Naive Bayes machine learning models
on a dataset of 27,558 cell images. Among these algorithms, Adaboost,
Random Forest, Support Vector Machine, and Multinomial Naive Bayes
achieved an accuracy of about 91%. Furthermore, the ROC curve demon-
strates that the Random Forest classification model is the best. We hope
that by decreasing the requirement for human intervention throughout
the detection process, this approach can greatly improve the efficiency
of malaria disease detection.
Keywords: Malaria disease · Blood smear images · Image

processing · Machine learning · Computer-aided diagnosis
1 Introduction
Malaria has become one of the severe infectious diseases for humankind. The bite
of Anopheles mosquitoes is the main reason for transmitting this disease. Accord-
ing to Wikipedia, out of 400 species, only 30 species of Anopheles mosquitoes
are malaria vectors. Nowadays, it is a serious public health issue around the
globe, particularly in third-world countries. As per WHO (World Health Orga-
nization), 1.5 billion malaria cases were averted since 2020, but 4,09,000 people
https://doi.org/10.1007/978-3-030-93247-3_45
458 Md. M. Hasan et al.
died of malaria in 2019 [1]. The transmission of the malaria virus depends on cli-
mate conditions. Especially during the rainy season, this disease spreads rapidly
because this is the breeding season for the Anopheles mosquitoes. It grows more
intense when temperature rises to the point that a mosquito’s life span can be
extended. Regarding the temperature issue, in many tropical areas such as Latin
America, Asia, and also Africa, the malaria disease spreading rate is around 90%.
According to WHO, in 2019, about 50% of the entire world’s population was in
danger of malaria. Malaria is the leading cause of death in Sub-Saharan Africa.
Western Pacific, the Eastern Mediterranean, the South-East Asia, and the Amer-
icas have all been recognized as high-risk zones by the WHO [2]. Most of the
time, malaria can be predominant in remote areas where it is hard to find proper
medical treatment. It is critical to detect malaria disease early and administer
appropriate treatment; otherwise, the disease can be fatal.
Qualified microscopists examine blood smears of infected erythrocytes as one
typical method of detecting malaria. These are traditional diagnostic methods
used in laboratories by microscopists, such as clinical diagnosis. Microscopic diag-
noses are the most widely used malaria diagnosis procedures, taking only 15 min
to complete. But the efficiency and accuracy of these methods are depended on
the degree of human proficiency, which is challenging to find most of the time.
Otherwise, accuracy fluctuates. Polymerase Chain Reaction (PCR) is the most
sensitive and specific approach to recognize malaria parasites and is more typ-
ical for species identification [3]. Microscopists use an alternative PCR method
for malaria diagnosis that allows sensitive and specific detection of Plasmod-
ium species DNA from peripheral blood. Rapid Diagnostic Test (RDT), which
is also a microscopic diagnosis method that provides high-quality microscopy
services in distant locations with limited access for reliable detection of malaria
infections [4]. This method is unsuccessful in some cases because effective results
depend on the experience and knowledge of microscopists, and also, human error
is inevitable. If there were more efficient automated diagnostic methods avail-
able for malaria detection, then this disease could easily be controlled. Recently,
There are many automated machine learning or deep learning approaches have
come across to detect this disease, which are claimed to be more efficient than
conventional approaches [5–9].
In this work, we have used machine learning algorithms with automatic image
recognition technologies for detecting parasite-infected red blood cells on stan-
dard microscope slide images. We have used the image smoothing technique,
gray scale conversion and feature extraction. The main objectives of our work
are:
– To locate region of interest and extract key features from standard micro-
scopic images of red blood cells using image processing techniques.
– To train various machine learning models using the extracted features for
classifying healthy and parasitized red blood cells.
– To find the most suitable approach based on different evaluation metrics for
detecting malaria disease.
Detection of Malaria Disease Using Image Processing and Machine Learning 459
The rest of the paper is arranged as follows: Sect. 2 presents related works we
have investigated. Our methodology is illustrated in Sect. 3. Section 4 exhibits
the obtained results in details. In the end, Sect. 5 concludes the paper.
2 Related Work
Nowadays, malaria has become a fatal life-threatening disease, causing deep
research interest among scientists all over the world. Different techniques, meth-
ods, and algorithms have been used to detect parasitic blood cells in recent
times.
In the domain of machine learning, mostly the handcrafted features are used
for decision making. Previously, the feature extraction was dependent on mor-
phological factors [10] and the classification was analyzed by Support Vector
Machine (SVM) and Principle Component Analysis (PCA).
In disease recognition studies, Convolutional Neural Networks (CNN) gained
stimulating results in terms of efficiency and accuracy [5]. In the advanced
method, it is found that CNN is much more effective than the SVM classi-
fier method for the purpose of image featuring [6]. In [7], to extract features
of the optimal layer of a pretrained model, the 16-layered CNN model got a
detection accuracy of 97.37% which is claimed to be superior to other transfer
learning model with an accuracy of 91.99%. The CNN model was also explored
for extracting features from the 96 × 96 resolution cell image data in [8]. Among
the CNN architectures, the GoogleNet, ResNet, and VGGNet models showed an
accuracy rate in the range of 90% to 96%. They used Contrast Limited Adaptive
Histogram Equalization (CLAHE) for pre-processing the images to enhance the
quality. In [9], they have introduced the Multi-Magnification Deep Residual Net-
work, an enhanced deep learning approach for the categorization of microscopic
blood smear photos. They have handled the problem of vanishing gradients,
degradation, low-quality images by combining batch normalization and individ-
ual residual units.
There are multiple image pre-processing techniques for instance, image
enhancement and feature extraction that can be used. In [11], images were con-
verted into grayscale and then Gray Level Co-occurrence Matrix (GLCM). His-
togram of Oriented Gradients (HOG), Local Binary Pattern (LBP) was being
applied for feature extraction. By using these pre-processing methods, differ-
ent machine learning algorithms had the highest accuracy of 97.93% with the
use of the Support Vector classification model. In [12], they have used differ-
ent machine learning algorithms such as Cubic SVM, Linear SVM, and Cosine
KNN, but Cubic SVM got the highest accuracy of 86.1% among them. They
have tested only 110 thin films for their system.
To choose a suitable and highly precise model for detecting the malaria par-
asite from a microscopic blood smear, autoencoder training from deep learning
showed an accuracy of 99.23% with nearly 4600 flops of image [2]. Precisely
this model with 28 × 28 images gave an accuracy of 99.51% whereas 32 × 32
images gave an accuracy of 99.22%. They compromised too little accuracy, only
0.0029, to obtain a slightly higher image resolution quality for sensitive, specific,
and precise performance on a smartphone, as well as a low-cost phone and web
application for portable malaria diagnosis.
3 Methodology
First, we have identified a series of steps and designed a methodology to achieve

our goal. Our overall methodology is represented in Fig. 1. A publicly available
dataset was used in this work. The techniques for obtaining data, preprocessing
and model training are covered in the following subsections.
Fig. 1. Block diagram of our methodology.
The first step is to collect images of blood smears from malaria patients. We col-
lected the dataset from Kaggle which is publicly available [13]. This dataset has
27,558 blood cell images which are divided into two classes: cells infected with
malaria, which have 13779 data, and cells that are not infected with malaria,
which also have 13779 data. The original source of this dataset is Chittagong
Medical College Hospital, Bangladesh [14]. Thin blood sample slides were col-
lected by photographing 150 P. falciparum-infected, which is commonly known
as malaria-infected, and 50 healthy patients. Figure 2 shows some sample data
from the dataset.
Fig. 2. Sample images (a) Uninfected and (b) Parasitized.

Transforming raw data before applying a machine learning algorithm is called
preprocessing. Preprocessing data is an important phase in ML as the quality of
data as well as functional details can be retrieved from it, which greatly affects
the performance and correctness of a model. Image preprocessing begins with
the input of an image and then performs some operations on that image, such
as image sharpening, image filtering, and image enhancement. Initially, we have
used the original images as an input, as shown in Fig. 3 and resized them to
120 × 120. Images can be smoothened by different blurring techniques such as
Averaging, Gaussian Blurring, Median Blurring, Bilateral Filtering provided by
OpenCV. Blurring techniques are beneficial in removing noise from images. Here,
smoothing is accomplished using the Gaussian blur technique, as illustrated in
Fig. 3. We have used Gaussian blurring, which is a very effective tool for removing
Gaussian noise from images. We have used OpenCV and Python to convert
images into grayscale images after smoothing them. We are more interested in
the patterns of these images because there isn’t much information in color as a
whole.

In this step, we have identified our region of interest from the preprocessed
images. To locate the infected areas in these images, we have attempted to detect
all contours. Simply, Contours are a curve which connects all continuous points
(along the boundary) that have the similar intensity or color. Contours are an
Fig. 3. Data processing steps.
effective tool for analysing shape as well as for object detection and recognition.
For our work, features are extracted in this step by obtaining the five largest
contour areas or bounded regions. When we have got higher accuracy for the
five largest contour areas, but when we have considered less than the five largest
areas, the accuracy is reduced. By considering more than five of the largest areas,
accuracy remains the same. For uninfected images, we have got 1 contour area
in 12544 images out of 13779 images, and only 273 images have 5 contour areas.
For parasitized images, out of 13779 images, only 1585 images have 1 contour
area and 1585 images have 5 contour areas.
3.4 Model Training
To detect uninfected and parasitized blood smears, six classifiers have been
selected for training. They are AdaBoost (AD), K-Nearest Neighbor (KNN),
Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and
Multinomial Naive Bayes (MNB).
To train and evaluate our model, we have used 70% of the images for training
and 30% for testing. To find out which model is better for detecting malaria
disease, the suggested technique’s performance is evaluated using graphical and
statistical indicators, including the confusion matrix, accuracy, F1-score, recall,
precision, and ROC curve. The confusion matrix generates an array containing
the number of true positives (TP), false positives (FP), false negatives (FN),
and true negatives (TN).
– Accuracy: The accuracy estimates the ratio of expected to actual values,

regardless of whether the sample is positive or negative, as illustrated in the
given Formula.
TP + TN
Accrcy = (1)
TP + TN + FP + FN
– Precision: Precision is defined as the ratio of all positive samples that are
actually positive, as illustrated in the given Formula.
TP
Precson = (2)
TP + FP
– Recall: The recall is defined as the ratio of positive predictions to all positive
predictions, as illustrated in the given Formula.
TP
Rec = (3)
TP + FN
– F1 score: The F1 metric is used to describe the classification performance
of the system.As illustrated in the given Formula, it is calculated using the
recall and precision rates.
2 ∗ Precson ∗ Rec 2 ∗ TP
F1 = = (4)
Precson + Rec 2 ∗ TP + FP + FN
4 Result Analysis
After following the steps mentioned earlier in preprocessing and feature extrac-
tion, the classifiers are trained using the Scikit-learn library. The performance
of these classifiers is compared as shown in Table 1. The overall classification
performance varies between 84% and 91%. According to the classification report
of Table 1, the performance of the SVM, AD, RF, and MNB is slightly better
in terms of test accuracy and classification report. These classifiers achieved
Fig. 4. Confusion matrices.
an average accuracy of 90.63%. Figure 4 shows the confusion matrices of the

implemented classifiers. We can see that SVM can predict 3733 images correctly
as parasitized and 3703 images correctly as uninfected, AD can predict 3700
images correctly as parasitized and 3734 images correctly as uninfected, RF can
predict 3735 images correctly and 3694 images correctly as uninfected, MNB can
predict 3692 images correctly as parasitized and 3713 images correctly as unin-
fected. Then we explored the stacking ensemble technique by combining the best
performing models, but in Table 1 we can see that the test accuracy is 90.71%,
which is lower than the test accuracy of the RF classifier.
To select the best model among the four models with the same accuracy, we
have further investigated the AUC-ROC curve as shown in Fig. 5. The objective
of the AUC-ROC curve is to present the model’s overall detection rate. The
horizontal line in the diagram indicates the model’s false-positive rate, while the
vertical line indicates the model’s true-positive rate. We can conclude that the
performance of the Random Forest Classifier is noticeably superior in terms of
AUC as measured by the ROC curve.
Fig. 5. ROC curve.
Table 1. Classification report in weighted average.
Model Accuracy Precision Recall F1-Score

DT 83.62 83.63 83.63 83.62
AD 90.59 90.57 90.64 90.58
KNN 88.05 88.04 88.08 88.04
RF 90.76 90.75 90.77 90.76
MNB 90.54 90.53 90.58 90.54
SVM 90.64 90.63 90.65 90.64
Ensemble 90.71 90.70 90.73 90.71
(AD+RF+SVM+MNB)
5 Conclusion
Malaria is a contagious mosquito-borne disease and diagnosis of this disease
requires thorough and careful examination of red blood smears. This diagnosis
procedure is not only time-consuming but also its accuracy relies on the expertise
of pathologists. Now-a-days, machine learning has become a popular strategy for
handling the most complicated real-world issues. In this work, we have utilized
machine learning along with image processing for reliable diagnosis of malaria
disease. First, handcrafted features were extracted by identifying region of inter-
est from a dataset of 27,558 microscopic images. For this purpose, five largest
contours have been considered from the preprocessed images. Then, six machine
learning models along with an ensemble model were trained using the extracted
features. We successfully identified the results of parasitized and healthy non-
parasitized photos of blood smears with the highest accuracy of about 91%. In
future, we aim to incorporate deep learning approaches in this work for more
accurate analysis and classification of red blood smear images.
References
1. Who, “fact sheet: World malaria report 2020,” in world health organization,
world health organisation (2020). https://www.who.int/teams/global-malaria-
programme/reports/world-malaria-report-2020. Accessed 23 Oct 2021
2. Fuhad, K.M.F., Tuba, J.F., Sarker, M.R.A., Momen, S., Mohammed, N., Rahman,
T.: Deep learning based automatic malaria parasite detection from blood smear
and its smartphone based application. Diagnostics 10(5) (2020). https://www.
mdpi.com/2075-4418/10/5/329
3. Hänscheid, T., Grobusch, M.P.: How useful is PCR in the diagnosis of malaria?
Trends Parasitol. 18(9), 395–398 (2002)
4. Wongsrichanalai, C., Barcus, M., Sinuon, M., Sutamihardja, A., Wernsdorfer, W.:
A review of malaria diagnostic tools: microscopy and rapid diagnostic test (RDT).
Am. J. Trop. Med. Hyg. 77, 119–27 (2008)
5. Khan, S., Islam, N., Jan, Z., Ud Din, I., Rodrigues, J.J.P.C.: A novel deep learning
based framework for the detection and classification of breast cancer using transfer
learning. Pattern Recogn. Lett. 125, 1–6 (2019). https://www.sciencedirect.com/
science/article/pii/S0167865519301059
6. Lecun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series
(1995)
7. Liang, Z., et al.: CNN-based image analysis for malaria diagnosis. In: 2016 IEEE
International Conference on Bioinformatics and Biomedicine (BIBM), pp. 493–496
(2016)
8. Militante, S.V.: Malaria disease recognition through adaptive deep learning models
of convolutional neural network. In: 2019 IEEE 6th International Conference on
Engineering Technologies and Applied Sciences (ICETAS), pp. 1–6 (2019)
9. Pattanaik, P., Mittal, M., Khan, M.Z., Panda, S.: Malaria detection using deep
residual networks with mobile microscopy. J. King Saud Univ. Comput. Inf. Sci.
(2020). https://www.sciencedirect.com/science/article/pii/S1319157820304171
10. Linder, N., et al.: A malaria diagnostic tool based on computer vision screening and
visualization of plasmodium falciparum candidate areas in digitized blood smears.
PLOS ONE 9(8), 1–12 (2014). https://doi.org/10.1371/journal.pone.0104855
11. Kumari, U., Memon, M., Narejo, S., Afzal, M.: Malaria disease detection using
machine learning (2021). https://www.researchgate.net/publication/348408910
Malaria Disease Detection Using Machine Learning
12. Kumari, U., Memon, M., Narejo, S., Afzal, M.: Malaria detection using image
processing and machine learning. IJERT NTASU-2020, 09(03) (2021)
13. Arunava: Malaria cell images dataset. https://www.kaggle.com/iarunava/cell-
images-for-detecting-malaria. Accessed 23 Oct 2021
14. Rajaraman, S., et al.: Pre-trained convolutional neural networks as feature extrac-
tors toward improved malaria parasite detection in thin blood smear images. PeerJ
6, e4568 (2018)
Fake News Detection of COVID-19 Using
Machine Learning Techniques
Promila Ghosh1 , M. Raihan1(B) , Md. Mehedi Hassan1 , Laboni Akter2 ,

Sadika Zaman1 , and Md. Abdul Awal3
1
North Western University, 9100 Khulna, Bangladesh
me@promila.info, raihan1146@cseku.ac.bd, mehedihassan@ieee.org
2
Khulna University of Engineering and Technology, 9203 Khulna, Bangladesh
3
Khulna University, 9208 Khulna, Bangladesh
m.awal@ece.ku.ac.bd
Abstract. Covid-19 or Coronavirus is the most popular common term

in recent time. The SARS-CoV-2 virus caused a pandemic of respiratory
disturbance which is named as COVID-19. The coronavirus is outspread
through drop liquids as well as virus bits which are released into the air
by an infected person’s breathing, coughing or sneezing. This pandemic
has become a great death threat to the people, even the children too.
It’s quite unexpected that some corrupted individuals spread false or
fake news to disrupt the social balance. Due to the news misguidance,
numerous people have been misled for taking proper care. For this issue,
we have analyzed some machine learning techniques, among them, an
ensemble method Random forest has gained 90% with the best exacti-
tude. The other models Naive Bayes got 85%, as well as another ensemble
method created by Naive Bayes with Support Vector Machine (SVM),
gained the exactitude as 88%.
Keywords: Coronavirus · Fake news detection · Ensemble learning ·

Random forest · Naive Bayes
There are a huge number of people who have lost their lives, good health and
capital in the COVID-19 pandemic situation. The current COVID-19 outbreak is
announced by the World Health Organization as a worldwide emergency of public
health concern (WHO). The severity of this viral illness is reflected worldwide
new figures of 2268011 positive cases (through 18 April 2020) and 155185 reports
of death [1]. The danger communication was frequently inadequate during the
COVID-19 pandemic. From this perspective “fake news” has spread and a lot
of confusion and inconvenience has spread. Spreading fake social media news
may have a severe impact, particularly for political, reputational, and financial
sectors along with on human society. The robust news media infrastructure on a
social network is therefore crucial to automated, false identification of news [2].
The authenticity of the news is not enough based just on the news substance.

https://doi.org/10.1007/978-3-030-93247-3_46
468 P. Ghosh et al.
It must also assess news social characteristics. In this article, we have selected
Naive Bayes and voting systems Ensemble Machine Learning methods. Ensemble
Learning is a powerful approach to improve model exactness [3]. We’ve tried to
have an implementation of COVID - 19 fake news classification by adopting
Naive Bayes, Naive Bayes with Support Vector machine ensemble system and
Random Forest Machine Learning methods on a merged dataset with multiple
preprocessing techniques.
The other part of the assessment is organised as the following - the related
works in Sect. 1 and methodology in Sect. 2 where the analysis have been
described with a distinguishing approach to the exactness of the classifier algo-
rithms respectively. In Sect. 3 the experiential result of this assessment has been
discussed. Finally, this work has been terminated with Sect. 4 by the conclusion.
1 Related Works
Hlaing et al. [2] presented a multidimensional fake news dataset. In this study,
77.81% news was categorized as true, 12.87% as false, 3.99% as non-factual data,
and the rest as primarily fake. Uppal et al. [4] applied two types models news
content model, news models in the social context model with a 6586 sample
size and obtained an accuracy of 0.76 F1 and 74.62%. In this work, the Twitter
PHEME dataset utilized two-class rumours and non-rumours and adopted GRU,
LSTM, Bi-RNN, and CNN methods. All models of NN come with the help of
Keras as Conv1D, LSTM, Bi-LSTM, GRU, and Bi-GRU and the results achieved
by the micro-average F1 values was Bi-GRU 0.564 [5]. Benamira et al. [6] pro-
posed a semi-supervised graphical model of false news identification based on
neural graph networks and focusing on content-based approaches for the detec-
tion of forming the problem into a binary text classification. Kaliyar et al. [7]
designed a comprehensive neural deep network that can manage not just news
item content, but the social networking relationships through tensor factoriza-
tion method, reflecting the social context of newspaper articles which contains a
mixture of information from users, groups, and news with 15257 number of users
in BuzzFeed news data. Dong et al. [8] proposed a novel dual-stream attentive
model of random forest. For Text Social and AttForest-2 technique, ablation
investigations of the data set result was 84.4 %. Ahmad et al. [9] presented
many textual characteristics to distinguish between false and actual contents.
The accuracy was 99% of Ensemble learners Random forest. In a survey, Among
all the analyses Naive Bayes, Random forest, Decision tree, Bi-LSTM etc. per-
formed well on a different dataset [14]. A dataset that contained news from
different financial websites was taken for fake financial news detection. They
applied Tree LSTM, SVM, CNN-LSTM where CNN-LSTM got the best perfor-
mance with 92.1% [15]. There are several works for false news detection with
good accuracy, most of the datasets are public datasets focused on different cat-
egories of news not only COVID-19. From reviewing these papers, we’ve got
the concepts and the algorithm selection decision based on their former works’
performance. In our manuscript, we’ve analyzed the ensemble techniques. Our
Fake News Detection of COVID-19 Using Machine Learning Techniques 469
main focus was to classify the COVID-19 fake and real news and tried to achieve
the best performance.
2 Methodology
False news during this pandemic created social disturbances. To uproot the dis-
turbance we’ve collected fake news data from the datasets we described, prepro-
cessed them and finally applied the Naive Bayes, an ensemble method Random
Forest and another ensemble method Naive Bayes with SVM. The whole analysis
has been narrated in Fig. 1.
Start
Dataset Collection
Merge Dataset
Dataset Preprocessing
Apply Classifiers
Naive Bayes Naive Bayes + SVM Random Forest
Evaluate Results
Compare Results
End
2.1 Dataset Selection
Parth Patwa et al. has introduced a COVID -19 fake news and real news dataset
where they annotated data manually from 10,700 social media posts and articles
[10]. According to Fig. 3, we’ve selected 6420 data for our assumption consisted
of 3060 fake news data and the least are the real news data. Another dataset
consisting of 9727 fake news data and 7171 real data from different web portals
and CBC NEWS that represented in Fig. 2 graphically [11]. We’ve merged two
dataset as Fig. 4. In Fig. 2, Fig. 3 and Fig. 4 there was the bar charts plot based
on “Value” vs “Count” where column “0” is for “Fake” news and “1” is for
“Real” News.
470 P. Ghosh et al.
Fig. 2. The visualization of fake and real news of a dataset.
Fig. 3. The visualization of fake and real news of another dataset.
Fig. 4. The visualization of fake and real news of the final dataset.
2.2 Dataset Prepossessing
At the beginning of data preprocessing, we have observed the dataset and take
the following steps. Before preprocessing the dataset generated by Word cloud
has been displayed in Fig. 5.
Using the different libraries of python, we converted the total text column
as lower case using str.lower() function, removed the punctuations with a user
defined function. As in the dataset, there were different types of symbols, we’ve
removed them with a user-defined function. For the stopwords, the Natural Lan-
guage Toolkit (NLTK) library has been adopted. A word that can be got as
a single unit is called lemma or a word’s lemma. Stemming is an approach of
removing the suffix from a specific word, reduce it to its lemma or root word.
As an example “Claiming” is a word with the suffix “ing”, if we reduce “ing”
then we will get the root word “Claim”. PorterStemmer() function from NLTK
has been applied for stemming. At the final observation, there was some emojis
and URLs. The URLs existence have been visualized clearly in Fig. 5 of Word
cloud. By user-defined function, we’ve removed URLs and emojis.
Fig. 5. The text data generated by Word cloud.
2.3 Naive Bayes (NB)

Naive Bayes (NB) is one of the prominent classifiers in Machine Learning (ML).
NB works based on Bayes Theory which is conditional probability and statistics
theory. For the Bayes Theory, let a hypothesis K and an event M, after getting
the evidence P(K/M), the conditional probability is as Eq. 1.
P (M/K).P (K)
P (K/M ) = (1)
P (M )
NB considers each feature independent even they are dependent. Getting all
the independent features and their properties probability then the classification e
happens to put in likelihood. In this assumption, we’ve split the dataset into two
parts training and testing data. There were 25% testing data and the seed is 50.
Using the LabelEncoder() function, the dependent values have been transformed.
Scikit-learn and Tensorflow (TF) are the open-source libraries of ML [12,13].
Putting maximum text data 500, TfidfVectorizer() function of TF vectorized all
the text data as a numeric value [13]. Finally, we’ve adopted text data or discrete
data suitable classifier Multinomial Naive Bayes model by Scikit-learn [12].
2.4 Naive Bayes and Support Vector Machine (NB and SVM)
Ensemble methods are the combination of multiple ML models. The ensemble
method performs better classification comparing a single ML model. Voting is
a part of the ensemble system. The voting technique in classification, classified
the class with the most vote which is called hard voting or summing probability.
Voting by the highest summing probability is called soft voting.
Using the different libraries of python, we converted the total text column
as lower case using str.lower() function, removed the punctuations with a user
defined function. As in the dataset, there were different types of symbols, we’ve
removed them with a user-defined function. For the stopwords, the Natural Lan-
guage Toolkit (NLTK) library has been adopted. A word that can be got as
a single unit is called lemma or a word’s lemma. Stemming is an approach of
removing the suffix from a specific word, reduce it to its lemma or root word.
As an example “Claiming” is a word with the suffix “ing”, if we reduce “ing”
then we will get the root word “Claim”. PorterStemmer() function from NLTK
472 P. Ghosh et al.
has been applied for stemming. At the final observation, there was some emojis
and URLs. The URLs existence have been visualized clearly in Fig. 5 of Word
cloud. By user-defined function, we’ve removed URLs and emojis.
2.5 Naive Bayes and Support Vector Machine (NB and SVM)
Ensemble methods are the combination of multiple ML models. The ensemble

method performs better classification comparing a single ML model. Voting is
a part of the ensemble system. The voting technique in classification, classified
the class with the most vote which is called hard voting or summing probability.
Voting by the highest summing probability is called soft voting (Fig. 6).
Dataset
Naive Bayes Support Vector Machine
Voting
Final Classification
Fig. 6. Voting ensemble method.
R(y, y ) = exp(−γ||y − y||) (2)

To control the SVM model complexity degree = 3 has been taken. Finally, we’ve
put two models NB and SVM in the function named VotingClassifier() from
Sci-kit learn [12].
2.6 Random Forest (RF)
RF model is one of the ensemble algorithms. It works with the ensemble of

decision trees with the voting system for the classification [2]. In the ML decision
tree algorithm, the features work as nodes based on the class labels. In RF there
construct the decision trees for each sample and get results for each of them. At
last, with the voting methods, the best results count as Fig. 7.
For this RF model of our manuscript, we’ve used CountVectorizer for vec-
torization. Then we’ve put n estimators = 10 which determine the maximum
number of the decision trees and random state = 0 in RandomForestClassifier()
model of the Sci-kit learn library [12].
The complete results analysis of our assessment using the previously
described methods have been elaborated as the following.
Random Sample Random Sample Random Sample

Selection 1 Selection 2 ... Selection n
Decision Tree 1 Decision Tree 2 ... Decision Tree n
Voting
Final Classification
Fig. 7. Random forest classification.
3 Experimental Outcomes and Discussions
The complete results analysis of our assessment using the previously described
methods have been elaborated as the following.
Table 1. The confusion matrix of NB
TN = 2766 FP = 424
FN = 461 TP = 2179
In the classification results, the True Positive (TP) stores the correctly classi-
fied positive results and similarly True Negative (TN) stores the negative results.
On the other hand, the False Positive (FP) takes the incorrectly predicted or
classified positive results and the False Negative (FN) takes the incorrectly classi-
fied Negative instances. A confusion matrix is a table with 2 types of dimensions,
“The Actual” and “The Predicted” and they have - True Positives (TP), True
Negatives (TN), False Positives (FP), False Negatives (FN). From Table 1, Table
2 and Table 3 we’ve got the confusion matrix of the following models. The con-
fusion matrix helps us to visualize the proper correct or incorrect classification
of the models. Accuracy, Precision, Recall along with the F1-score are depended
on the measurement of TP, TN, FP and FN.
Accuracy defines the exactness of the models, how the models work correctly
as the Eq. 3.
TP + TN
Accuracy = (3)
TP + FP + FN + TN
The Accuracy has been represented on Table 4 as NB got 85%, NB+SVM got
88% and RF got the best with 90%. But to get a better observation of the model
we need more classification measurements as - precision, recall and f1-score.
Precision is a ratio as Eq. 4, that corresponds to positive instances and whole
predicted positive instances. For the fake data classification, the precision for
NB, NB+SVM and RF are as follows - 0.86, 0.85, 0.88, due to the real data
classification 0.84, 0.92 and 0.93.
474 P. Ghosh et al.
Table 2. The confusion matrix of NB and SVM
TN = 3023 FP = 169
FN = 405 TP = 2233
Table 3. The confusion matrix of RF
TN = 3005 FP = 185
FN = 522 TP = 2118
Table 4. The accuracy of the models
Models Accuracy
NB 85%
NB+SVM 88%
RF 90%
Table 5. The Analyses of “Fake” value
Models Precision Recall F1-score

NB 0.86 0.87 0.86
NB and SVM 0.85 0.94 0.89
RF 0.88 0.95 0.91
TP
P recision = (4)
TP + FP
Another ratio Recall measures how exactly the ML model identifies the true
positives. 0.87, 0.94, 0.95 are the “fake” classified Recall values and the rest
of the values for the “real” - 0.83, 0.80 and 0.85. In Eq. 5 we’ve got the recall
calculation method.
TP
Recall = (5)
TP + FN
F1 score, which is the weighted mean or average of Recall and Precision as
the following Eq. 6.
precision.recall
F 1Score = 2 ∗ (6)
precision + recall
In “fake” values classification the F1 score is 0.86, 0.89, 0.91 as well as 0.83, 0.86
and 0.89 are for the “real” value classification. The total view of precision, recall
and f1-score have been presented in Table 5 and Table 6.
Table 6. The Analyses of “Real” value
Models Precision Recall F1-score

NB 0.84 0.83 0.83
NB and SVM 0.92 0.80 0.86
RF 0.93 0.85 0.89
We’ve merged two datasets here, there is no former work on this dataset. As
it’s not possible to compare with previous work fairly, it’s not needed for our
study.
4 Conclusion
The performance of these above-described models are satisfactory but this
dataset is inadequate. Deploying the model in the future in a better manner
we need more data. In this analysis reviewing from different aspects, Random
Forest (RF) conducts the best performance comparing with the other models.
It’s been cleared that Random Forest (RF) achieved the best exactitude with
90% among the Naive Bayes (NB) as well as the ensemble model of Naive Bayes
and Support Vector Machine (NB and SVM). Not only the best accuracy but
also the f1-score is also decent with 0.89 and 0.91. This model is at the earli-
est stage of our research, we are currently working to raise the exactness of the
model.
References
1. Pradhan, D., Biswasroy, P., Kumar Naik, P., Ghosh, G., Rath, G.: A review of
current interventions for COVID-19 prevention. Arch. Med. Res. 51(5), 363–374
(2020). https://doi.org/10.1016/j.arcmed.2020.04.020
2. Hlaing, M., Kham, N.: Defining news authenticity on social media using machine
learning approach. In: 2020 IEEE Conference on Computer Applications (ICCA).
IEEE (2021)
3. Islam, M., Raihan, M., Aktar, N., Alam, M., Ema, R., Islam, T.: Diabetes melli-
tus prediction using different ensemble machine learning approaches. In: 2020 11th
International Conference on Computing, Communication and Networking Tech-
nologies (ICCCNT) (2020). https://doi.org/10.1109/icccnt49239.2020.9225551
4. Uppal, A., Sachdev, V., Sharma, S.: Fake news detection using discourse segment
structure analysis. In: 2020 10th International Conference on Cloud Computing,
Data Science & Engineering (Confluence). IEEE (2020)
5. Kotteti, C., Dong, X., Qian, L.: Rumor detection on time-series of tweets via
deep learning. In: MILCOM 2019–2019 IEEE Military Communications Conference
(MILCOM). IEEE (2019)
6. Benamira, A., Devillers, B., Lesot, E., Ray, A.K., Saadi, M., Malliaros, F.D.: Semi-
supervised learning and graph neural networks for fake news detection. In: 2019
IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM), pp. 568–569. IEEE, August 2019
476 P. Ghosh et al.
7. Kaliyar, R.K., Kumar, P., Kumar, M., Narkhede, M., Namboodiri, S., Mishra,
S.: DeepNet: an efficient neural network for fake news detection using news-user
engagements. In: 2020 5th International Conference on Computing, Communica-
tion and Security (ICCCS), pp. 1–6. IEEE, October 2020
8. Dong, M., Yao, L., Wang, X., Benatallah, B., Zhang, X., Sheng, Q.Z.: Dual-stream
self-attentive random forest for false information detection. In: 2019 International
Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, July 2019
9. Ahmad, I., Yousaf, M., Yousaf, S., Ahmad, M.: Fake news detection using machine
learning ensemble methods. Complexity 2020, 1–11 (2020). https://doi.org/10.
1155/2020/8885861
10. Patwa, P., et al.: Fighting an Infodemic: COVID-19 Fake News Dataset (2020)
11. Rahman, S., Raihan, M., Akter, L., Raihan, M.: Covid-19 news dataset both fake
and real (1.0). Zenodo (2021). https://doi.org/10.5281/zenodo.4722484. Accessed
13 Sept
12. scikit-learn: machine learning in Python - scikit-learn 0.24.2 documentation (2021).
emphScikit-learn.orghttps://scikit-learn.org/stable/. Accessed 12 Sept 2021
13. “TensorFlow”, emphTensorFlow (2021). https://www.tensorflow.org/. Accessed 12
Sept 2021
14. Kumar, S., Kumar, S., Yadav, P., Bagri, M.: A survey on analysis of fake news
detection techniques. In: 2021 International Conference on Artificial Intelligence
and Smart Systems (ICAIS). IEEE (2021)
15. Zhi, X., et al.: Financial fake news detection with multi fact CNN-LSTM model.
In: 2021 IEEE 4th International Conference on Electronics Technology (ICET).
IEEE (2021)
Sustainable Modelling, Computing
and Optimization
1D HEC-RAS Modeling Using DEM Extracted
River Geometry - A Case of Purna River;
Navsari City; Gujarat, India
Azazkhan Ibrahimkhan Pathan1(&), P. G. Agnihotri1, D. Kalyan1,

Daryosh Frozan2, Muqadar Salihi1, Shabir Ahmad Zareer1,
D. P. Patel3, M. Arshad1, and S. Joseph4
1
Sardar Vallabhbhai National Institute of Technology,
Surat 395007, Gujarat, India
2
Dr. S. & S. S. Ghandhy College of Engineering and Technology,
Gujarat Technological University, Surat, India
3
Pandit Deendayal Energy University, Gandhinagar 382007, India
4
S.P.B Patel Engineering College, Mehsana, India
Abstract. For any hydraulic modeling, river cross-sections are the main input
parameters to create geometry. The research is intended to utilize the 1D
hydrodynamic flood modeling approach with the use of HEC-RAS (Hydro-
logical engineering center-River Analysis System) mapper capabilities on the
downstream of the Purna River. Earlier the geometry of the river was digitized
with the HEC-GeoRAS extension in ARC-GIS. Present research indicates the
newly released HEC-RAS version 5.0.4, through which 30 m resolution
Cartosat-1 Digital Elevation Model (DEM), projection file, boundary condition
were used as an input dataset and steady flow analysis has been carried out for
flood modeling. In the present research, river geometry like river centerline,
bank line, flow path line, and cross-section were directly digitized in GIS tools
in HEC-RAS called RAS mapper. The outcomes of the model are useful for
flood disaster authorities to mitigate flood and for forecasting future flooding
scenarios.
Keywords: Flood modeling RAS mapper 1D steady flow
1 Introduction
Flooding is certainly regarded as the world’s most damaging causes of natural disasters.
During rainy seasons (June–September), the Himalayan Rivers cause flooding in 80%
of the total flood-affected region in India. In many states of India such as Gujarat,
Maharashtra, Andhra Pradesh, West Bengal, and Orissa extreme flooding is witnessed
mostly annually during the monsoon season, affecting a tremendous loss in properties
and lives. The primary causes of flooding in India are inadequate water systems par-
ticularly in the low land depositional area of the basins, inadequate river carrying
capacity due to sedimentation, and inadequate flood management techniques. To
minimize flood losses, appropriate flood management practices are needed, which in
turn requires space-time flux flow variation in 1-D as well as 2-D. Few researchers have

https://doi.org/10.1007/978-3-030-93247-3_47
480 A. I. Pathan et al.
conducted studies on the hydrodynamic flood modeling with the integration of GIS of
the Indian basin river floods [1, 2].
An effective model of the river flood includes a proper river bed representation and
the floodplains geometry, with a precise explanation of the model input parameters, for
forecasting the magnitude of flow and the flood water level along the river path pre-
ciously [3–5]. At present, software techniques have been created and are now being
modified to extract river geometry features, which are effective for hydrodynamic
modeling based on the GIS database (Schwanghart and Software [6]; Tesfa et al. 2009).
Several studies have been conducted which attempt to address bathymetric data
shortages in river flood modeling which highly depend on Digital Elevation Model
(DEM) GIS integration obtained from remote sensing satellites or other data sets
available globally [7–10]. Also, data assimilation techniques are used to recognize
synthetic cross-sections similar to river geometry (Roux and Dartus et.al 2012).
In the present study, one dimensional flood modeling approach has been utilized
with the new version of HEC-RAS, in which RAS mapper has GIS capabilities to
extract the geometry of river. Cartosat-1 DEM is used for the modeling, which is
available freely at ISRO BHUVAN web site. This research demonstrates the utility of
DEM in flood modeling with the integration of GIS. This approach is the advancement
of the one dimensional hydrodynamic flood modeling in the region where the scarcity
of collection data is a major issue. The novelty of this research include the utilization of
freely available satellite images for flood assessment [18].
2 Study Area and Data Required
Navsari city is situated on the coastal part of Gujarat near the Arabian Sea. The city is
at 20° 32′ and 21° 05′ north latitude and 72° 42′ and 73° 30′ east longitude. The
topographical area of the city is about 2210.97 km2. The study area map is illustrated in
(Fig. 1). Due to heavy precipitation, the water level may rise in the study area and the
surrounding area gets inundated annually in monsoon. There is no setup provided by
the government in this region to reduce the impact of the floods.
The river flow data of the last 20 years are obtained from the Navsari irrigation
department. The two major flooding events which took place in the city of Navsari
were for the years 1968 and 2004. Cartosat-1 DEM is utilized for the extraction of river
geometry which is globally available (www.bhuvan.nrsc.gov.in). The spatial projection
needs to be set in Arc GIS for coordinate systems used in HEC-RAS [1, 12]. The
2004 year flood data is required for validating the model.
3 Method and Material
HEC-RAS mapper is a GIS function capable of collecting GIS information such as the
centerline of the river, bank lines, flow direction lines, and cross-section lines by river
digitalization. The following are presented input parameters for one dimensional flood
modeling in the HEC-RAS mapper. The flow chart of the region of study is presented
in (Fig. 2).
1D HEC-RAS Modeling Using DEM Extracted River Geometry 481
Fig. 1. Location map of the study area
Fig. 2. Methodology flowchart

3.1 River Geometry Extraction

To build the river alignment within the river reach, a light blue colour line which shows
the river center line is shown in (Fig. 3), which flows from upstream to downstream. To
separate the primary river from the left and right banks of the floodplain, the red colour
lines show the river bank lines. To regulate the flow of the river, flow path lines are
digitized presented by the red colour shown in (Fig. 3). The Green colour line shows
the elevation data which is extracted from DEM, which is perpendicular to river flow.
Fig. 3. River geometry extraction
3.2 One Dimensional Flood Modeling Using HEC-RAS (Hydrological

Engineering Center-River Analysis System
HEC-RAS is software that describes water hydraulics flowing through common rivers
and other channels. It is a computer-based modeling program for water moving through
open channel systems and calculating profiles of water surfaces. HEC-RAS identifies
specific applications viable in floodplain mitigation measures [13, 14]. Saint Venant’s
equation is utilized in HEC-RAS to solve the energy equation for one-dimensional
hydrodynamic flood modeling [15, 16] expressed as,
a2 V22 a1 V21
Z2 þ Y2 þ ¼ Z1 þ Y1 þ þ he ð1Þ
2g 2g
Where, Y1, Y2 indicates the water depth at cross-sections,

Z1, Z2 expressed as the elevation of the main river channel,
a1, a2 demonstrates the velocity weighting coefficients,

V1, V2 shows the average velocities, g indicates the acceleration due to gravity,
he expresses as energy head loss.
3.3 Execution of 1D Model

Cartosat-1 Dem is downloaded from the ISRO BHUVAN portal for the digitization of
river geometry such as river centerline, river bank lines, flow path lines, and cross-
section lines with the arrangement of the spatial coordinate system in HEC-RAS through
the RAS mapper window along with Google map as shown in (Fig. 4). The extraction of
the cross sections presents the station-elevation data through a geometric data window
as illustrates in (Fig. 5) and (Fig. 6) represents river geometry in HEC-RAS. River
maximum discharge of the 2004 year flood was used as an upstream condition and the
normal depth of the Purna river was utilized as a downstream condition in HEC-RAS,
the rugosity coefficient is taken as 0.035 (Chow and Maidment, 1985.), Agriculture,
barren land, build up- urban, forest, and water body data are taken as land use and land
cover, and steady flow analysis is carried out for flood modeling.
Fig. 4. Extraction of River geometry in RAS mapper with google base map
This study is being performed in the lower part of the city. Discharge data for the
2004 year flood events are used to simulate steady flow analysis. Due to data scarcity in
the study area and only one gauging site is available, it is mandatory to simulate only
2004 year flood events to verify mode accuracy. Cross section were extracted from
DEM provides a good results in the data spares region. Flood modeling approach in such
region would be effective during peak flood event (Gichamo et al. 2012). The results
from the model for the present study indicates the depth of water at each cross-section.
The discharge was measured from the gauge station near Kurel village about 1.5 km
from the downstream side. The depth of water is measured corresponding to the
Fig. 5. Extracted river geometry in HEC-RAS
discharge of 8836 m3/s for 1D hydrodynamic flood modeling. Results obtained from the
simulation indicates that the cross-section number 1 and 2 are quite affected during peak
discharge, and cross-section 19 and 20 close to Navsari city were more affected by the
flood event. The results which are simulated show that the water lever at cross-section
one and cross-section twenty (Fig. 6). The water level at the downstream part of the
study area demonstrates that the people surrounding cross-section number twenty suffer
more during peak discharge and there are lots of property losses and lives during the
2004 year flood events. Figure 7 indicates simulated the one-dimensional flood depth
map for the 2004 year flood event. 1D hydrodynamic flood modeling can be advanta-
geous for the region where the flash flood is a major phenomenon annually [10].
Fig. 6. (a) water level at CS-1; (b) water level at CS-2

Fig. 7. Predicted depth of water for the 2004 years flood event
5 Conclusion
The present study shows the applicability of the HEC-RAS for the river geometry
extraction with the application of geospatial techniques (HEC-RAS mapper function).
A 1D hydrodynamic flood modeling approach was presented using Cartosat-1 DEM on
the Purna River, Navsari, Gujarat, India. The new version of HEC RAS version 5 was
utilized in the present study for GIS applications in flood modeling. River geometry
includes: river centerline, bank lines, flow path lines, cross-section cut lines were
digitized in RAS mapper tools without ARC GIS being used in the present study. The
Validation of the model is being carried out by comparing the observed water depth
with the simulated water depth at the location of the gauging site. The output of the
model is promising and demonstrate strong potential in the area of data scarcity for
using the suggested method. The applicability of open-source datasets would be an
effective worldwide approach in flood modelling (Table 1) [17].
Table 1. Differences between observed and simulated water depth at Gauge station [17]
Satellite Years Observed depth of water Simulated depth of Difference
(m) water (m) (m)
Cartosat-1 2001 10.9 11.3 0.01
2002 10.23 12 0.27
2003 10.32 9.92 −0.22
2004 11.64 13.1 1.6
2005 13.3 12.32 0.92
2006 10.5 9.5 0.37
2007 10.96 11.69 −1.1
2008 9.75 7.89 0.02
2009 5.61 6.3 0.13
2010 10.9 11.3 −0.06
2011 10.23 12 0.02
2012 13.8 13.79 0.01
References
1. Khan, A., Pathan, I., Agnihotri, P.G.: 2-D Unsteady Flow Modelling and Inundation
Mapping for Lower Region of Purna Basin Using HEC-RAS (2020). Accessed 06 May 2020
2. Vijay, R., Sargoankar, A., Gupta, A.: Hydrodynamic simulation of river Yamuna for
riverbed assessment: a case study of Delhi region. Environ. Monit. Assess. 130(1–3), 381–
387 (2007). https://doi.org/10.1007/s10661-006-9405-4
3. Kale, V.S.: Flood studies in India: a brief review. J. Geol. Soc. India 49, 359–370 (1997)
4. Pathan, A.K.I., Agnihotri, P.G.: 2-D unsteady flow modelling and inundation mapping for
lower region of Purna basin using HEC-RAS. Nat. Environ. Pollut. Technol. 19, 277–285
(2020)
5. Merwade, V., Cook, A., Coonrod, J.: GIS techniques for creating river terrain models for
hydrodynamic modeling and flood inundation mapping. Environ. Model. Softw. 23(10–11),
1300–1311 (2008). https://doi.org/10.1016/j.envsoft.2008.03.005
6. Schwanghart, W., Kuhn, N.J.: TopoToolbox: a set of Matlab functions for topographic
analysis. Environ. Model. 5, 770–781 (2010). Accessed 03 Sept 2020
7. Tesfa, T.K., Tarboton, D.G., Watson, D.W., Schreuders, K.A., Baker, M.E., Wallace, R.M.:
Extraction of hydrological proximity measures from DEMs using parallel processing.
Environ. Model. Softw. 26, 1696–1709 (2011)
8. Abdulkareem, J.H., Pradhan, B., Sulaiman, W.N.A., Jamil, N.R.: Review of studies on
hydrological modelling in Malaysia. Model. Earth Syst. Environ. 4(4), 1577–1605 (2018).
https://doi.org/10.1007/s40808-018-0509-y
9. Pathan, A.I., Agnihotri, P.G.: A combined approach for 1-D hydrodynamic flood modeling
by using Arc-Gis, Hec-Georas, Hec-Ras Interface-a case study on Purna River of Navsari
City Gujarat. IJRTE 8, 1410–1417 (2019)
10. Maharjan, L., Shakya, N.: Comparative study of one dimensional and two dimensional
steady surface flow analysis. J. Adv. Coll. Eng. Manag. 2, 15 (2016). https://doi.org/10.
3126/jacem.v2i0.16095
11. Roux, H., Dartus, D.: Sensitivity analysis and predictive uncertainty using inundation
observations for parameter estimation in open-channel inverse problem. J. Hydraul. Eng.
134, 541–549 (2008). https://doi.org/10.1061/ASCE0733-94292008134:5541
12. Gichamo, T.Z., Popescu, I., Jonoski, A., Solomatine, D.: River cross-section extraction from
the ASTER global DEM for flood modeling. Environ. Model. Softw. 31, 37–46 (2012).
https://doi.org/10.1016/j.envsoft.2011.12.003
13. Ouma, Y.O., Tateishi, R.: Urban flood vulnerability and risk mapping using integrated multi-
parametric AHP and GIS: Methodological overview and case study assessment. Water
(Switzerland) 6(6), 1515–1545 (2014). https://doi.org/10.3390/w6061515
14. Ahmad, H., Akhtar Alam, M., Bhat, S., Ahmad, S.: One dimensional steady flow analysis
using HECRAS – a case of River Jhelum, Jammu and Kashmir. Eur. Sci. J. ESJ 12(32), 340
(2016). https://doi.org/10.19044/esj.2016.v12n32p340
15. Brunner, G.: HEC-RAS River Analysis System. Hydraulic Reference Manual. Version 1.0
(1995). Accessed 07 May 2020
16. Chow, V.T., Maidment, D.R., Larry, W.: Applied Hydrology, International edn. McGraw-
Hill, New York (1988)
17. Pathan, A.I., Agnihotri, P.G.: Application of new HEC-RAS version 5 for 1D hydrodynamic
flood modeling with special reference through geospatial techniques: a case of River Purna at
Navsari, Gujarat, India. Model. Earth Syst. Environ. 7(2), 1133–1144 (2021)
18. Pathan, A.I., Agnihotri, P.G., Patel, D., Prieto, C.: Identifying the efficacy of tidal waves on
flood assessment study—a case of coastal urban flooding. Arab. J. Geosci. 14(20), 1–21
(2021)
A Scatter Search Algorithm
for the Uncapacitated Facility Location
Problem
Telmo Matos(&)
CIICESI, Escola Superior de Tecnologia e Gestão,

Politécnico do Porto, Porto, Portugal
tsm@estg.ipp.pt
Abstract. Facility Location Problems (FLP) are complex combinatorial opti-

mization problems whose general goal is to locate a set of facilities that serve a
particular set of customers with minimum cost. Being NP-Hard problems, using
exact methods to solve large instances of these problems can be seriously
compromised by the high computational times required to obtain the optimal
solution. To overcome this difficulty, a significant number of heuristic algorithms
of various types have been proposed with the aim of finding good quality solu-
tions in reasonable computational times. We propose a Scatter Search approach
to solve effectively the Uncapacitated Facility Location Problem (UFLP). The
algorithm was tested on the standard testbed for the UFLP obtained state-of-the-
art results. Comparisons with current best-performing algorithms for the UFLP
show that our algorithm exhibits excellent performance.
Keywords: UFLP Scatter Search FLP
1 Introduction
Facility Location Problems are widely studied problems in the literature with several
practical applications, reaching areas such as telecommunications, design of a supply
chain management, transport utilities and water distribution networks. A well-known
variant of this problem is the Uncapacitated Facility Location Problem (UFLP). This
problem can be formulated as:
Xm Xn Xm
Minimize i¼1
C x þ
j¼1 ij ij i¼1
F i yi ð1Þ
Xm
s:t: x
i¼1 ij
¼ 1 8 j ¼ 1; . . .; n ð2Þ
xij yi 8 j ¼ 1; . . .; n i ¼ 1; . . .; m ð3Þ
xij 0 8 j ¼ 1; . . .; n i ¼ 1; . . .; m ð4Þ

https://doi.org/10.1007/978-3-030-93247-3_48
A Scatter Search Algorithm for the Uncapacitated Facility Location Problem 489
yi 2 f0; 1g 8 i ¼ 1; . . .; m ð5Þ
Where m represents the number of possible locations to open a facility and n the
number of costumers to be served. F i indicates the fixed cost for opening a facility at
location i. Cij represents the unit shipment cost between a facility i and a costumer j.
The continuous variable xij represents the amount sent from facility i to costumer j and
yi indicates if facility i is open (or not). The objective is to locate a set of facilities in
such way that the total sum of the costs for opening those facilities and the trans-
portation costs for serving all costumers is minimized.
The UFLP problem has been widely studied for the past 50 years with the
development of exact and heuristics methods.
Such examples are the well-known Tabu Search [1–3], where some algorithms are
quite similar, presenting some differences in flexible memory [2] to preserve facilities
switching movements and to intensify the search in a more promising region, gain
functions to measure the attractiveness of the movement [1] and even a procedure to
create a starting solution embedded in a traditional Tabu Search [4, 5] procedure.
Recent algorithms use information obtained by combining two or more heuristics
producing good quality results in low computational time. These are called hybrid
algorithms and there are some examples in literature such as the PBS algorithm [6]
(Population-Based Search) proposed by Wayne Pullan (using Genetic Algorithm with a
greedy algorithm) and the H-RW [7] algorithm proposed by Resende and Werneck
(using Genetic Algorithm, Tabu Search and Scatter Search).
A relatively new algorithm named Monkey Algorithm was proposed by Atta et al.
[8]. This algorithm is based on the swarm intelligence algorithm and consists in three
main processes: an improvement process (clim), a method to accelerate the search for
the optimum solution (watch-jump) and a perturbation process (somersault) to avoid
falling into previous known solution. The authors compare the proposed algorithm with
other recent heuristics (Firefly Algorithm and the Artificial Bee Colony) achieving fair
computational results on ORLIB dataset.
Traditional Genetic Algorithms [9], Artificial Neural Networks [10, 11], Lagran-
gean Relaxation algorithms [12] and Dual Ascent procedure [13] are also examples of
proposed algorithms to solve the UFLP, achieved good results for well-known
instances in the literature.
The main contribution of this work is to demonstrate that the proposed SS algo-
rithm is a simple and efficient procedure for solving the UFLP and can be applied to
other Facility Location Problems.
The rest of the paper is organized as follows. The methodology including the
proposed algorithm for solving the UFLP are described in Sect. 2. Experimental results
are presented and discussed in Sect. 3. Finally, Sect. 4 completes this paper, showing
the conclusions and future directions of search.
490 T. Matos
2 Scatter Search for the UFLP
The Scatter Search (SS) is an evolutionary metaheuristic proposed by Glover [14].

The SS method combines solution vectors, aiming to capture information that cannot
be obtained separately in the original vectors, and uses auxiliary heuristic approaches to
evaluate the produced combinations and generate new solution vectors. The population
of solutions is always evolving in successive generations, as expected in an evolu-
tionary method.
The solution population is represented by a set of reference solutions whose for-
mation require good quality solutions and diversified solutions. The method proceeds
to the combination of solutions generating new ones, through the application of
weighted linear combinations of subsets of solutions that are later treated by auxiliary
heuristic procedures.
The SS is an evolutionary method that uses techniques of intensification, diversi-
fication, and a combination of solutions. In this algorithm, components are combined,
making this algorithm a very robust method with many applications not only in the
FLP but also in other areas of Operational Research.
The proposed algorithm to solve the UFLP problem is given in Fig. 1 and makes
use of the metaheuristic Scatter Search (SS). The procedure of the SS starts with a set of
initial solutions (seeds). Then the algorithm tries to produce a large number of random
solutions with different characteristics from the seeds (diversification generation
method). A local search procedure (improvement method) is applied to each of the
solutions (and seeds) to improve them. These improved solutions (and seeds) form the
population. Then, a small size population (reference set) consisting of elite and
diversified solutions (to force diversity) is obtained (forming the reference set update
method). A subset of solutions is defined (subset generation method) and combined
with solutions (usually in pairs) of that smaller population (combination method),
obtaining new solutions that are improved (improvement method) again. Then, these
solutions will populate the reference set (reference set update method). The process is
repeated until no new solutions are found in the reference set.
The algorithm can be divided into two phases: The initial phase and the scatter
search phase. The overall procedure that encompasses the two phases is the following:
Initial Phase
1. Starts by producing solutions in the solutions generation method.
2. Apply an improvement method to the solutions obtained in 1.
3. Calls the reference set (refset) update method with improved solutions obtained in 2.
4. If the desired quality level is not reached go to 1.
Scatter Search Phase
5. Calls the subset generation method upon the reference set.

6. Obtain new solutions through the combination method.
7. Apply an improvement method to the solutions obtained in 6.
8. Calls the reference set update method.
9. If new solutions are obtained go to 5.
10. Terminate.
Fig. 1. Scatter Search framework.
The diversification generation method aims to generate diversified solutions so that

the algorithm can go through several alternative solutions with the objective of finding
a good solution. A pseudo-random method is used to generate the random solutions so
that at each new execution of the algorithm, the random solutions are always the same.
The improvement method will play a crucial role in the algorithm, since the more
robust and efficient the method, the better results will be found later.
The method is based on the Tabu Search approach proposed by L. Michel and
P. V. Hentenryck [1] and will be applied to diversified solutions and solutions obtained
after the solutions combination method.
The method for creating and updating the reference set consists of two subsets. One
subset is made by improved solutions, and another is based on diversified solutions.
Both subsets are based on the reference set (Refset).
Next, the subset generation method is called. This method specifies how subsets are
selected to proceed with the application of the combination method. The main objective
of the method is to find a balance between diversification and intensification, that is, the
choice of the number of diversified solutions and the improved solutions to carry out
the combination of the solutions. If we want to diversify the solutions, we will have to
492 T. Matos
include more solutions from the reference set of diversified solutions. If we intend to
intensify, we will have to include more from the reference set of improved solutions.
The solutions combination combines solutions to diversify and intensify the search.
The method will have a great influence on finding the final solution because the more
finely tuned the balance between diversification and intensification, the probability of
finding better solutions will increase substantially. The method will combine the
solutions based on ratios so that each solution has a different weight (this weight is
based on the objective value of the solution). This ratio will also allow the application
of diversification and intensification techniques.
3 Results
The performance of the proposed algorithm was tested on a well-known benchmark

producing extremely competitive results. The SS algorithm was coded in C and the
experiments were conducted on an Intel® Pentium(R) CPU G645 @ 2.90 GHz - 8 GB
RAM and gcc compiler (using optimization) and Ubuntu operating system.
The table below (Table 1) summarizes the instances considered in our computa-
tional experiments. The reference for all instances considered in this paper could be
obtained in the work of Resende [7].
Table 1. List of all instances used to compare the proposed algorithm

Instance name #Instance
Orlib 15
Galvão Raggi (GR) 50
FPP11* 30
FPP17* 30
Bilde Krarup (BK) 220
PCodes 32
Total 377
*In caparison results with other
algorithms, we group these
instances as one.
The proposed algorithm was compared with Pullan [6] (AMD Opteron 252
2.6 GHz 4 GB) population-based algorithm, H-RW [7] (SGI Challenge with 28 196-
MHz MIPS R10000 processors - only one processor has used) hybrid algorithm and
Michel and Van Hentenryck’s [1] (850 MHz Intel Pentium III running Linux) tabu
search algorithm (we refer this method as TS). Some instances could not be compared
due to a lack of comparison results.
The table below (Table 2) shows the results obtained with SS, Pullan, H-RW and
TS for common instances. The table below presents the average computational time for
each of the different algorithms. As all authors use different machines to process their
algorithms, it is not possible to make a direct comparison regarding this parameter. In

the following table, the column GAP presents the gap computed as ðUBZ Z
Þ
100 (in

which Z denotes the optimal solution) and CPU presents the computational time (in
seconds) needed to achieve the best upper bound (UB).
Table 2. SS results and comparison with other algorithms

Classes Pullan H-RW TS SS
DEV CPU DEV CPU DEV CPU DEV CPU
Orlib 0.000 0.010 0.000 0.050 0.030 0.160 0.000 0.720
Galvão Raggi (GR) 0.000 0.010 0.000 0.090 0.100 0.160 0.000 0.390
FPP 0.005 8.210 69.380 1.740 95.710 0.650 0.015 1.670
Bilde Krarup (BK) 0.000 <0.1 0.030 0.090 0.070 0.155 0.000 1.890
PCodes 0.000 0.310 – – – – 0.000 1.320
We can see that SS achieved excellent results in very low computational time.
Comparing with state-of-the-art algorithms, SS compete with all algorithms in quality
solutions and computational time. Pullan’s algorithm looks to be very effective for FPP
and PCodes.
For FPP, SS achieved 0.015 of average deviation while Pullan achieved every
optimal solution. In terms of computational time, we cannot compare due to the dif-
ferent machines but nerveless, the proposed algorithm achieved almost every optimal
solution in very low computational time.
For H-RW and TS, SS exceeded these algorithms for all instances reaching every
optimal solution except for FPP (for FPP-11, achieved all optimal solutions and for
FPP-17 didn’t achieve four optimal solutions out of thirty instances), that have four
instances in FPP-17 under 0.035% deviation from the optimal solution.
4 Conclusions and Future Work
This paper presents a Scatter Search (SS) procedure for the UFLP which considers a set
of facilities and a set of clients. The objective is to locate a set of facilities in such a way
that the total sum of the costs for opening those facilities and the transportation costs
for serving all customers is minimized.
As we saw in the results obtained, the SS approach produced excellent results and
very low CPU time. The algorithm was tested on the standard dataset. Our approach
efficiently found the optimal solution for almost every instance. Comparisons with
current best-performing algorithms for the CFLP show that our SS algorithm exhibits
excellent results.
This evolutionary approach produces substantial information to the search. The
proposed algorithm for the UFLP employees’ strategic designs to gather information
about solutions and applies diversification strategies, that brings information about the
search, leading to optimal or near-optimal solutions.
494 T. Matos
Further research may apply other techniques to UFLP, such as different types of the
local search procedure, for the improvement method, or even the inclusion of tabu
search, allowing the intensification process. Different techniques may lead to even
better results.
The success obtained for other combinatorial optimization problems, and UFLP
specifically, demonstrate the efficiency of this procedure. It suggested that the appli-
cation of SS for other FLP will produce also a state-of-the-art algorithms.
Acknowledgement. This work has been supported by national funds through FCT – Fundação
para a Ciência e Tecnologia through project UIDB/04728/2020.
References
1. Michel, L., Hentenryck, P.V.: A simple tabu search for warehouse location. Eur. J. Oper.
Res. 157, 576–591 (2004)
2. Sun, M.: Solving the uncapacitated facility location problem using tabu search. Comput.
Oper. Res. 33, 2563–2589 (2006)
3. Al Sultan, K., Al Fawzan, M.: A tabu search approach to the uncapacitated facility location
problem. Ann. Oper. Res. 86, 91–103 (1999)
4. Glover, F.: Tabu search—part I. ORSA J. Comput. 1, 190–206 (1989)
5. Glover, F.: Tabu search—part II. ORSA J. Comput. 2, 4–32 (1990)
6. Pullan, W.: A population based hybrid meta-heuristic for the uncapacitated facility location
problem. In: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary
Computation - GEC 2009, p. 475. ACM Press, New York (2009)
7. Resende, M.G.C., Werneck, R.F.: A hybrid multistart heuristic for the uncapacitated facility
location problem. Eur. J. Oper. Res. 174, 54–68 (2006)
8. Atta, S., Mahapatra, P.R.S., Mukhopadhyay, A.: Solving uncapacitated facility location
problem using monkey algorithm. In: Bhateja, V., Coello Coello, C.A., Satapathy, S.C.,
Pattnaik, P.K. (eds.) Intelligent Engineering Informatics. AISC, vol. 695, pp. 71–78.
Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7566-7_8
9. Kratica, J., Tošic, D.: Solving the simple plant location problem by genetic algorithm.
RAIRO-Oper. Res. 35, 127–142 (2001)
10. Gen, M., Tsujimura, Y., Ishizaki, S.: Optimal design of a star-LAN using neural networks.
Comput. Ind. Eng. 31, 855–859 (1996)
11. Vaithyanathan, S., Burke, L., Magent, M.: Massively parallel analog tabu search using
neural networks applied to simple plant location problems. Eur. J. Oper. Res. 93, 317–330
(1996)
12. Erlenkotter, D.: A dual-based procedure for uncapacitated facility location. Oper. Res. 26,
992–1009 (1978)
13. Guignard, M.: A Lagrangean dual ascent algorithm for simple plant location problems. Eur.
J. Oper. Res. 46, 73–83 (1988)
14. Laguna, M., Marti, R.: Scatter search: methodology and implementations in C. Oper. Res.
Comput. Sci. Interfaces Ser. 24, 1–283 (2003)
An Effective Dual-RAMP Algorithm
for the Capacitated Facility Location Problem
Telmo Matos(&)
CIICESI, Escola Superior de Tecnologia e Gestão, Politécnico do Porto,

Porto, Portugal
tsm@estg.ipp.pt
Abstract. Facility Location Problems are widely studied problems in the lit-
erature with several practical applications, reaching areas such as telecommu-
nications, design of a supply chain management, transport utilities and water
distribution networks. In this paper, we address the Capacitated Facility Loca-
tion Problems (CFLP), whose general goal is to determine where to locate a set
of facilities to serve a particular set of customers with minimum cost. The CFLP
problem has been widely studied for the past decades with the development of
exact and heuristics methods. We propose a new heuristic algorithm for the
Capacitated Facility Location Problem (CFLP) based on the RAMP (Relaxation
Adaptive Memory Programming) framework. In the dual side of the method, the
RAMP framework uses a Dual-Ascent procedure and a simple improvement
method based on Tabu Search was used to explore the primal side, making this
algorithm a very robust RAMP approach. The RAMP algorithm for the CFLP
obtained excellent results, demonstrating its potential for new applications to
other extensions and variations of Facility Location Problems.
Keywords: RAMP Facility Location Adaptive Memory Programming

Dual-Ascent
1 Introduction
The Capacitated Facility Location Problem (CFLP) is a well-known combinatorial

optimization problem belonging to the class of the NP-Hard problems [1]. Given a set
of possible locations to install facilities with a limited capacity and an installation cost,
a set of costumers with a predefined demand, and the unit cost of serving each client by
each facility, the problem consists in determining which facilities to open in order to
serve all the costumers with minimum cost.
The CFLP can be formulated as follows:
Xm Xn Xm
min i¼1 j¼1
Dj Cij xij þ i¼1
F i yi ð1Þ
Xm
s.t: x
i¼1 ij
¼ 1; j ¼ 1; . . .; n ð2Þ

https://doi.org/10.1007/978-3-030-93247-3_49
496 T. Matos
Xn
j¼1
Dj xij Si yi ; i ¼ 1; . . .; m ð3Þ
xij 0; j ¼ 1; . . .; n i ¼ 1; . . .; m ð4Þ
yi 2 f0; 1g; i ¼ 1; . . .; m ð5Þ
where m represents the number of possible locations to open a facility and n the number
of customers to be served. Si indicates the capacity of facility i and F i the fixed cost for
opening that facility. Dj represents the demand of client j and C ij the unit shipment cost
between facility i and customer j. The variable xij denotes the amount (scaled to be
fractional) shipped from facility i to customer j and yi indicates whether facility i is
open or not. The objective is to locate a set of facilities in such a way that the sum of the
costs for opening those facilities and the transportation costs for serving all customers
is minimized. Given a set of open facilities (yi ¼ 1; i 2 I), the CFLP has the particu-
larity of becoming a Transportation Problem (TP) that can be solved in polynomial
time. The TP can be formulated as:
Xm Xn
min i¼1 j¼1
Dj Cij xij ð6Þ
Xn
s.t: j¼1
Dj xij Si ; i ¼ 1; . . .; m ð7Þ
(2) and (4)

Where C ij is the unit shipment cost from facility i to customer j, Dj xij is the amount sent
from facility i to customer j, Si is the availability of facility i and Dj is the demand of
customer j. The objective is to determine an optimal transportation scheme between the
facilities and customers so that the transportation costs are minimized.
Also, if we eliminate the capacity of each facility, Si in Eq. (3), and set Dj ¼ 1 in
the same equation and in the objective function, we obtain the uncapacitated variant of
the problem, namely the Uncapacitated FLP (UFLP), that is also widely studied in the
literature. Related to this problem, the Single Source CFLP (SSCFLP) considers all
decision variables are binary, whereas the CFLP considers continuous decision vari-
ables for the client’s assignments.
The CFLP has been widely studied for the past 50 years due to its complexity and
applicability, which justifies the significant number of proposed algorithms to solve this
problem based on both exact and heuristic methods. The first heuristic algorithm for the
CFLP was proposed by Jacobsen [2] in 1983 and was based on Kuehn e Hamburger’s
[3] heuristic for the Uncapacitated FLP.
Relaxation techniques are frequently used to solve the CFLP. Cornuejols [4] pro-
posed several algorithms based on mathematic relaxation to solve large CFLP and
proposed a comparison between several types of relaxations. Guignard and Spielberg
[5] presented a Dual Ascent procedure for the CFLP, initially proposed by Bilde and
Krarup [6] and Erlenkotter [7] for solving the UFLP. The main algorithm starts by
obtaining a linear programming relaxation of the original problem to produce a lower
bound.
An Effective Dual-RAMP Algorithm 497
Metaheuristics have also many applications in optimization problems with high-

quality results. Feo and Resende [8] proposed a Greedy Randomized Adaptive Search
Procedure (GRASP) procedure to tackle the CFLP, proposing the definition of elite
solutions to explore better ones. Later, Sun [9] proposed a Tabu Search for the CFLP
obtaining excellent results. Guastaroba and Speranza [10] proposed a kernel search
algorithm to solve the CFLP that consists of a heuristic framework based on the idea of
identifying subsets of variables and solving a sequence of MILP (mixed-integer linear
programming) problems, each of them, restrained to one of the identified subsets of
associated variables. For more information about the kernel search method and its
applications, please refer to [10] and [11].
Telmo et al. [12] proposed a sequential and parallel RAMP approach to tackle this
problem. The algorithm relies on a simple Dual-RAMP for exploring the primal-dual
relationship wherein the dual side makes use of the lagrangean relaxation and in primal
side uses ADD and DROP neighbourhood structures.
Nature-inspired algorithmic approaches for solving the CFLP have proved to be
very effective and thus have become common tools for solving many optimization
problems. Some examples are the Firefly (with GA) by Rahmani and Mirhassani [13],
the Ant Colony by Venables and Moscardini [14] and Bee Colony, used in the work of
Cabrera et al. [15] and Levanova and Tkachuk [16].
In this paper a simple RAMP algorithm for solving the CFLP is proposed combing
information of a Dual-Ascent procedure on the dual side with primal-feasible solutions
on primal side, that are obtained by a simple improvement method.
The remainder of the paper is structured as follows. The RAMP approach including
the proposed algorithm for solving the CFLP are described in Sect. 2. Experimental
results and outcomes are presented and discussed in Sect. 3. Finally, Sect. 4 completes
this paper, showing the conclusions and future directions of search.
2 RAMP for the CFLP
Relaxation Adaptive Memory Programming (RAMP) is a metaheuristic framework

proposed by César Rego [17] in 2005, that focus on the exploration of the primal and
dual solution spaces of a problem and the relationship between them. This framework
explores and combines relaxation techniques, such as Lagrangean relaxation and sur-
rogate constraints, and adaptive memory techniques, like scatter search and path
relinking.
The RAMP method has different levels of sophistication. The simplest version of
the RAMP approach, usually called Dual-RAMP, explores more intensively the dual
side of the problem using relaxation techniques to obtain the associated dual solutions.
The exploration of the primal side is reduced to the projection of the dual solutions to
the primal solution space followed by an improvement method. On the other hand,
when RAMP intensively explores both sides of the problem, we have a PD-RAMP
approach where PD emphasizes the Primal-Dual search.
In a PD-RAMP algorithm, the dual search is integrated with evolutionary tech-
niques, such as genetic algorithms and Scatter Search, to effectively explore the primal
side. Despite its freshness, the RAMP method has already proven its effectiveness by
498 T. Matos
producing state of the art algorithms for several complex combinatorial optimization
problems [12, 18–20].
We designed a Dual-RAMP model for the CFLP as it can be seen in Fig. 1. The
proposed RAMP algorithm uses a Dual-Ascent procedure based on the one proposed
by Guignard et al. [5] on the dual side, and an improvement method based on tabu
search on the primal side. The local search procedure is based on the ADD/DROP
method [21]. Elite solutions are obtained by a Greedy Randomized Adaptive Search
Procedure (GRASP) based on the Feo and Resende [8] algorithm.
Fig. 1. Dual-RAMP procedure for the CFLP.
The proposed algorithm starts with obtaining a good quality solution through the
GRASP method, creating an initial solution through a local search procedure, and then
an improvement method tries to improve the solution. Then, the algorithm proceeds
with the dual phase, with the generation of dual solutions. A projection method projects
the dual solutions to the primal solutions space and subsequently, a solution is com-
bined followed by an improvement method over this combined solution. In the primal
phase, the elite solutions are combined with improved solutions, attempting to produce
better ones. The combined solution is obtained by combining the two best solutions
found in the projection method, with the best solution found so far.
The RAMP algorithm alternates between the primal and dual phases until the
maximum number of iterations was achieved.
3 Results
The performance of the proposed RAMP algorithms was evaluated on a standard set of
benchmark instances. The well-known OR-Library dataset1 proposed by Beasley [22],
has 49 instances with known optimal solutions, which have the following sizes (fa-
cilities customers): 16 50, 25 50 and 50 50 for the small instances and 100
1000 for the large ones.
The algorithm was coded in C programming language and run on an Intel Pentium
I7 2.40 GHz with 8 GB RAM under Ubuntu operating system using CPLEX 12.6 to
solve the Transportation Problem.
To compare our Dual-RAMP algorithm with the state-of-the-art approaches for the
CFLP we show, in Table 1, the results reported for these algorithms on the dataset
previously described. The table below presents the average computational time for each
of the different algorithms. As all authors use different machines to process their
algorithms, it is not possible to make a direct comparison regarding this parameter. The
comparison between algorithms needs different tables since some authors do not
provide results for all instances.
Table 1. Comparison table for the ORLIB dataset.

KS TS LR HA AB Dual-
RAMP
GAP CPU GAP CPU GAP CPU GAP CPU GAP CPU GAP CPU
Small 0.00 0.63 0.00 0.24 0.02 1.49 – – – – 0.00 0.60
Large 0.00 2158.83 0.07 48.63 0.21 75.79 – – 0.00 415.79 0.05 42.99
Avg. 0.00 1079.73 0.04 24.44 0.12 38.64 – – – – 0.03 14.79
capa 8000 0.00 3604.88 – – 0.24 73.65 0.00 367.74 – – 0.03 43.79
capb 8000 0.00 1999.73 – – 0.40 131.84 0.00 317.47 0.00 319.26 0.11 39.71
capc 5000 0.00 2621.94 – – 0.00 105.74 0.00 129.34 0.00 577.30 0.00 35.93

In the following table, the column GAP presents the gap computed as ðUBZ
Z
Þ
100

(in which Z denotes the optimal solution) and CPU presents the computational time
(in seconds) needed to achieve the best upper bound (UB).
1
http://people.brunel.ac.uk/*mastjjb/jeb/orlib/capinfo.html.
500 T. Matos
The Dual-RAMP algorithm is compared with Sun’s Tabu Search (TS) [9], Beas-
ley’s Lagrangean relaxation heuristic (LR) [23], Guastaroba and Speranza’s Kernel
Search (KS) procedure [10], Hybrid Approach (Bee Algorithm with Mixed Integer
Programming – HA) proposed by Cabrera et al. [15] and with a Branch-and-Cut-and-
Price (denoted as AB) proposed by Avella and Boccia [24]. Table 1 shows the average
results for the OR-Library instances divided by Small and Large. Some specific
instances (from the large subset) are also presented to make possible the comparison
with the HA algorithm.
From the results presented, the RAMP algorithm achieved optimal solutions for all
small instances. For large test problems, it misses the optimal solution for 7 out of 12
test problems, but for all of these large problems, achieved a deviation under 0.09 from
the optimal solution, except for Capb7000 and Capb8000 (0.21% and 0.11%
respectively).
For 49 problems, the RAMP algorithm found optimal solutions for 42 of them with
low CPU time, being the best algorithm on average, comparing with TS-S, and
Beasley, regarding the quality solutions. Comparing with TS-S, the proposed algorithm
didn’t achieve so many optimal solutions (TS-S found 45 optimal solutions and RAMP
found 42 optimal solutions) but the RAMP algorithm reached, on average, high-quality
solutions with 0.03% deviation from optimal solutions under 15 s.
We can see that the RAMP algorithm achieved excellent results with very low
computation time (despite the fact that, for CPU time, different machines were used),
making the exploration of the dual and primal side of the problem a very effective
approach for solving the CFLP.
4 Conclusions
In this paper, we address the Capacitated Facility Location Problem (CFLP) which
considers a set of facilities and a set of clients with a certain demand. The objective is
to, at minimal cost, decide which facilities to open and that all clients have their
demand fulfilled and ensuring that the facilities capacity is not exceeded.
We propose a RAMP algorithm for the CFLP that produced excellent results,
reviling with the current best-known algorithms for the solution of this problem. The
proposed RAMP framework combines a Dual Ascent procedure in the dual side with
an improvement method on the primal side. In the primal phase, elite solutions are
combined with the improvement method to produce better solutions.
The RAMP algorithm successful achieved excellent results in very reduced time
competing with all other best approaches in literature.
Once again, the RAMP approach was able to efficiently solve a complex opti-
mization problem. The use of primal-dual exploration techniques, that use adaptive
memory in metaheuristics such as tabu search, are extremely efficient when applied to
such problems. The application of this framework to other complex optimization
problems is expected to obtain results of the same quality as we obtained for the CFLP.
Acknowledgement. This work has been supported by national funds through FCT – Fundação
para a Ciência e Tecnologia through project UIDB/04728/2020.
References
1. Salhi, S., Mirchandani, P.B., Francis, R.L.: Discrete location theory. J. Oper. Res. Soc. 42,
1124 (1991)
2. Jacobsen, S.K.: Heuristics for the capacitated plant location model. Eur. J. Oper. Res. 12,
253–261 (1983)
3. Kuehn, A., Hamburger, M.: A heuristic program for locating warehouses. Manage. Sci. 9,
643–666 (1963)
4. Cornuéjols, G., Sridharan, R., Thizy, J.: A comparison of heuristics and relaxations for the
capacitated plant location problem. Eur. J. Oper. Res. 50, 280–297 (1991)
5. Guignard, M., Spielberg, K.: A direct dual method for the mixed plant location problem with
some side constraints. Math. Program. 17, 198–228 (1979)
6. Bilde, O., Krarup, J.: Sharp lower bounds and efficient algorithms for the simple plant
location problem. In: Annals of Discrete Mathematics, pp. 79–97 (1977)
7. Erlenkotter, D.: A dual-based procedure for uncapacitated facility location. Oper. Res. 26,
992–1009 (1978)
8. Feo, T., Resende, M.: Greedy randomized adaptive search procedures. J. Glob. Optim. 134,
109–134 (1995)
9. Sun, M.: A tabu search heuristic procedure for the capacitated facility location problem.
J. Heuristics 18, 91–118 (2012)
10. Guastaroba, G., Speranza, M.G.: Kernel search for the capacitated facility location problem.
J. Heuristics 18, 877–917 (2012)
11. Guastaroba, G., Speranza, M.G.: Kernel search: an application to the index tracking
problem. Eur. J. Oper. Res. 217, 54–68 (2012)
12. Matos, T., Oliveira, Ó., Gamboa, D.: RAMP algorithms for the capacitated facility location
problem. Ann. Math. Artif. Intell. 89(8–9), 799–813 (2021). https://doi.org/10.1007/s10472-
021-09757-z
13. Rahmani, A., Mirhassani, S.A.: A hybrid Firefly-Genetic Algorithm for the capacitated
facility location problem. Inf. Sci. 283, 70–78 (2014)
14. Venables, H., Moscardini, A.: Ant based heuristics for the capacitated fixed charge location
problem. In: Dorigo, M., Birattari, M., Blum, C., Clerc, M., Stützle, T., Winfield, A.F.T.
(eds.) ANTS 2008. LNCS, vol. 5217, pp. 235–242. Springer, Heidelberg (2008). https://doi.
org/10.1007/978-3-540-87527-7_22
15. Cabrera, G.G., Cabrera, E., Soto, R., Rubio, L.J.M., Crawford, B., Paredes, F.: A hybrid
approach using an artificial bee algorithm with mixed integer programming applied to a
large-scale capacitated facility location problem. Math. Probl. Eng. 2012, 14 (2012)
16. Levanova, T., Tkachuk, E.: Development of a bee colony optimization algorithm for the
capacitated plant location problem. In: II International Conference Optimization and
Applications (OPTIMA-2011), Petrovac, Montenegro, pp. 153–156 (2011)
17. Rego, C.: RAMP: a new metaheuristic framework for combinatorial optimization. In: Rego,
C., Alidaee, B. (eds.) Metaheuristic Optimization via Memory and Evolution: Tabu Search
and Scatter Search, pp. 441–460. Kluwer Academic Publishers (2005). https://doi.org/10.
1007/0-387-23667-8_20
18. Oliveira, Ó., Matos, T., Gamboa, D.: A dual RAMP algorithm for single source capacitated
facility location problems. Ann. Math. Artif. Intell. 89(8–9), 815–834 (2021). https://doi.org/
10.1007/s10472-021-09756-0
19. Matos, T., Gamboa, D.: Dual-RAMP for the capacitated single allocation hub location
problem. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10405, pp. 696–708.
502 T. Matos
20. Matos, T., Maia, F., Gamboa, D.: Improving traditional dual ascent algorithm for the
uncapacitated multiple allocation hub location problem: a RAMP approach. In: The Fourth
International Conference on Machine Learning, Optimization, and Data Science, Volterra,
Tuscany, Italy, 13–16 September 2018, pp. 243–253. Springer, Cham (2019). https://doi.org/
10.1007/978-3-030-13709-0_20
21. Bornstein, C.T.: An ADD/DROP procedure for the capacitated plant location problem.
Pesqui. Operacional. 24, 151–162 (2003)
22. Beasley, J.: OR-library: distributing test problems by electronic mail. J. Oper. Res. Soc. 65,
1069–1072 (1990)
23. Beasley, J.E.: An algorithm for solving large capacitated warehouse location problems. Eur.
J. Oper. Res. 33, 314–325 (1988)
24. Avella, P., Boccia, M.: A cutting plane algorithm for the capacitated facility location
problem. Comput. Optim. Appl. 43, 39–65 (2009)
Comparative Study of Blood Flow Through
Normal, Stenosis Affected and Bypass Grafted
Artery Using Computational Fluid Dynamics
Anirban Banik1(&), Tarun Kanti Bandyopadhyay2(&),

and Vladimir Panchenko3
1
Department of Civil Engineering, National Institute of Technology Agartala,
Jirania 799046, Tripura (W), India
2
Department of Chemical Engineering, National Institute of Technology
Agartala, Jirania 799046, Tripura (W), India
3
Russian University of Transport, Obraztsova Street, Moscow 127994, Russia
Abstract. The effect of blood flow through normal artery, artery affected with
stenosis and artery with bypass grafting are simulated and studied using com-
putational fluid dynamics (CFD). The flow characteristics of blood were sim-
ulated by using available CFD software, ANSYS. Modelling and meshing of
normal artery, artery with stenosis, bypass grafted artery were created by using
Gambit 2.4.6. From grid independence study, 15234 mesh sizes were selected
for the simulation purpose. CFD solver used for the prediction of flow phe-
nomena, static pressure, and shear strain rate. The CFD pronounced results will
be helpful for the proper diagnosis of the stenosis, sharing knowledge with
doctors for overcoming the stenosis problem with the help of bypass grafting.
The research will also help the researcher and doctors for understanding the flow
phenomena inside the normal artery, artery with stenosis, and artery with bypass
grafting.
Keywords: Computational fluid dynamics Artery Bypass grafting

Stenosis Blood flow Flow phenomena
1 Introduction
Cardiovascular disease is one of the major causes of deaths in developing and devel-
oped countries. Narrowing of artery is said to be stenosis which could leads to sce-
narios such as heart attack, stroke etc. and such scenario also leads to death among
people of different age group [1]. In the scenario of blockage of artery, bypass surgery
is considered to be most suitable treatment procedure for restoring the blood flow inside
the artery. But recently it has been found that restenosis leads to the failure of the
bypass grafting which may be due to the hyperplasia [2]. And the failure of bypass
grafting increased due to the lack of information regarding behavior of flow of blood in
artery in normal and stenosis affected condition.
Computational fluid dynamics (CFD) uses combination of numerical governing
equations and data structure to solve the problems with fluid [3–5]. CFD analysis is one
of the most important tool for simulating the blood flow inside the artery to provide
https://doi.org/10.1007/978-3-030-93247-3_50
504 A. Banik et al.
appropriate picture to reduce the failure rate of the bypass grafting and also providing
the requisite information regarding the origin of the disease. Models are either
accounted for velocity field or for pressure field but the simultaneous consideration of
both pressure and velocity field is absent in the literatures [6]. In the concern study
vessel wall of implemented model is assumed to be wall [7] or assumed to be simple or
reduced deformable wall [8]. Earlier CFD analysis only considers ideal geometry for
simulating the blood flow characteristics and for flow phenomena like wall shear stress,
residence time [9]. Realistic and more accurate geometry are used in the modern day to
mimic blood flow in artery due to advancement of modern imaging techniques, the
results from the real time simulation have good agreement with blood flow inside with
cardiovascular disease and for normal artery [10, 11].
Objective study of the concern research is to apply CFD for blood flow inside the
human artery and to obtain necessary information regarding the velocity profile, wall
shear stress, pressure, and shear strain rate. Study also includes a comparative study of
the blood flow through normal artery, artery affected with stenosis and artery with
bypass grafting. The pronounce results provide clear picture regarding the cardiovas-
cular disease and regarding the blood flow through artery with bypass grafting.
2 Materials and Methods
Blood is most essential body fluid in human and other animals which delivers nec-
essary substances like nutrients and oxygen to the cells and carries metabolic waste
product away from the same cells [12]. It is generally composed of blood cells sus-
pended in blood plasma. Blood plasma constitutes 55% of blood fluid, 92% water by
volume, and also contains dissipated proteins, glucose, mineral ions, hormones etc.
Table 1 and Table 2 illustrate the property of artery and blood which are taken into
account during the modelling and simulation process. Table 3 shows the consistency
index (k) and flow behaviour index (n) for normal people and people with cardio
vascular disease people which are useful data taken into account during the mathe-
matical modelling of the blood flow through normal artery, artery with stenosis and
bypass grafted artery.
Table 1. Dimensions of artery

Sl. No. Characteristics Units Value
1 Area m2 3–5
2 Diameter (m) m 0.025
3 Length (m) m 0.12–0.25
4 Thickness (m) m 0.00246
Comparative Study of Blood Flow 505
Table 2. Properties and range of variables of Blood

Sl. No. Characteristics Units Value
1 Velocity (m/sec) m/s 0.11–4.59
2 Viscosity (Pa-sec) Pa.s 0.12296–0.35248
3 Blood density (kg/m3) kg/m3 1050–1060
4 Blood pH 7.35–7.45
Table 3. Consistency index (k) and Flow behaviour index (n)

Sl. No. Characteristics Value
1 Consistency index of normal people 0.11–4.59
2 Consistency index of cardiovascular disease people 0.12296–0.35248
3 Flow behaviour index of normal people 1050–1060
4 Flow behaviour index of cardiovascular disease people 7.35–7.45
3 Methodology
Advancement of computers and growing computational power, the computational fluid

dynamics (CFD) become widely used computational tool for predicting solution for
fluid related problem [13–16].
3.1 Assumptions
Some of the assumptions that were taken into consideration for developing the theo-
retical model for blood flow inside the artery are as follow:
1. The blood flows inside the artery are assumed to be incompressible and isothermal
non-Newtonian pseudo-plastic fluid.
2. Developed model of the blood flow through the artery are restricted to flow model
only.
3. Model is assumed to be single phase laminar non-Newtonian pseudo-plastic power
law model.
4. Velocity of the blood near the artery wall is assumed to be zero because of high
adhesive force between the blood and the artery wall.
3.2 Governing Equations

Rheological behaviour of the blood flowing inside the artery is found to be depending
on the apparent viscosity which is affected by velocity, shear strain rate. It has been
found that blood flow inside the artery is said to be following non-Newtonian pseudo
plastic power law model and apparent viscosity of the blood can be defined by Eq. 1,
506 A. Banik et al.
n1
8u
leff ¼ K 0 ð1Þ
d
Blood flow inside the different arterial condition is governed by continuity equation
which can be defined by using Eq. 2,
r:u ¼ 0 ð2Þ
where r is broadly defined by Eq. 3,

d d d
r¼i þj þk ð3Þ
dx dy dz
Momentum equation of the unsteady state for the blood flow inside the artery is
expressed by Eq. 4,

du
leff r u ðrPÞ ¼ q
2
þ ðquruÞ ð4Þ
dt
In steady state, du/dt is zero, then the momentum equation of the blood flow inside
the artery is expressed by Eq. 5,

leff r2 u ðrPÞ ¼ ðquruÞ ð5Þ
3.3 Boundary Conditions

The following boundary conditions were considered for modeling and simulation of
blood flow through normal, setnosis, and bypass grafted artery:
1. Inlet of the artery is assumed to be velocity inlet for modelling and simulation
purpose
2. Outlet of the artery is assumed to be pressure outlet where gauge pressure assumed
to be zero.
3. No slip condition is assumed near the wall of the artery because of the maximum
adhesive force between the wall of the artery and the blood.
3.4 Computational Technique

Model geometry of artery under normal condition, artery with stenosis and artery with
bypass graft are produced by using Gambit 2.4.6 for understanding the flow phe-
nomena inside the artery. Simulation and the post processing of the problem are done
by using ANSYS Fluent 6.3. The governing equation those govern the flow inside the
artery are applied to each sub domain. The developed model is exported to Fluent 6.3
after meshing and application of the boundary condition to the model is completed.
Once the model is exported to the CFD based solver (Fluent 6.3), the material
properties such as blood density, blood viscosity, and inlet velocity are provided for
normal people, people with stenosis, and people with bypass graft. To simplify the
CFD procedure and for reducing the simulation time, 1st order upwind scheme has been
selected for solution and simple pressure velocity coupling under relaxation has been
selected as convergence criteria for simulation in fluent and the solution of the concern
problem is achieved iteratively until the convergence is reached.
4.1 Normal Artery

Figure 1 shows the static pressure (Pa) for the blood flow inside the artery under
normal conditions. The static pressure inside the artery is found to be 2.35 1003
which is close to the actual normal systolic pressure. The pressure is found to be high at
the inlet section of the artery and gradually decrease towards the outlet due to the
gradual dissipation of Kinetic head of the flowing blood inside the artery. Figure 2
shows the velocity (m/s) of the blood flow inside the artery which are high at the centre
section of the artery and found to be low at the wall due to the no slip condition near the
wall. Figure 3 shear strain rate (s−1) for blood flow through normal artery. Shear strain
rate is found to be high at the wall due to the high adhesive force between the artery
wall and the blood as result of which velocity near wall reduces near the wall compared
to the centre where the adhesive force is minimum.
Fig. 1. Contour of static pressure (Pa) of the blood flow through artery under normal condition.
Fig. 2. Contour of velocity (m/sec) of the blood flow through normal artery.
508 A. Banik et al.
Fig. 3. Contour of shear strain rate (sec−1) for blood flow through artery under normal condition.
4.2 Stenosis Affected Artery

The condition of abnormal narrowing of blood vessel is said to be stenosis which is due
to the sudden contraction of the wall of artery due to the deposition of cholesterol etc. It
can also be referred as aortic coarctation [17]. Figure 4 shows static pressure
(Pa) contour for flow of blood in stenosis affected artery. Static pressure inside the
artery with stenosis increased (1.46 1004) due to the sudden contraction of the artery.
The static pressure gradually decreases along the length of the artery. Figure 5 illus-
trates the velocity (m/s) for flow of blood through the stenosis affected artery. The
velocity is found to be high at the centre and low at or near the wall as no slip condition
was considered. Figure 6 show shear strain rate (s−1) contour for blood flow through
artery with stenosis. From contour it has been found that shear strain rate are high in the
vicinity of the wall and low at centre because of high adhesive force between artery
wall and blood molecule.
Fig. 4. Illustrate static pressure (Pa) for flow of blood inside artery with stenosis.
Fig. 5. Velocity (m/s) contour for flow of blood inside artery affected with stenosis.
Fig. 6. Illustrates shear strain rate (s−1) for flow of blood inside Stenosis affected artery.
4.3 Bypass Grafted Artery

A bypass graft, often known as bypass surgery, is a surgical treatment that restores
normal blood flow to a clogged artery. In bypass surgical procedure, a great saphenous
vein is extracted from leg, one end of the vein is attached to aorta and other end is
attached to the artery with stenosis to restore the blood flow in artery and to reduce high
pressure in artery [18]. Figure 7 illustrate contour diagram of pressure (Pa) of blood
flow in artery with bypass grafting. Simulation shows the method of attaching a vein
with the obstructed artery to restore the normal blood flow. The static pressure is found
to be 4.82 1003 which is closed to the normal systolic pressure. Figure 8 illustrates
the velocity (m/sec) for blood flow inside the artery with bypass grafting which shows
the diversion of the blood away from the obstructed portion and thus restoring the
normal blood flow. Hence, the condition of stenosis is overcome with the help of
bypass graft. The velocity of the blood is found to high at the centre of artery with
bypass graft because of low adhesive force at centre compared to the force of adhesion
near the wall. Figure 9 shows shear strain rate (s−1) for flow of blood through the
bypass graft artery. As no slip condition was considered near artery wall, the wall shear
stress and shear strain rate was observed to be significant.
510 A. Banik et al.
Fig. 7. Illustrate static pressure (Pa) of blood flow through artery with bypass grafting.
Fig. 8. Illustrate velocity (m/s) for flow of blood through artery with bypass grafting.
Figure 9. Illustrate shear strain rate (s−1) for blood flow through artery with bypass grafting.
5 Conclusions
CFD analysis gives the flow structure through the normal artery, stenosis affected artery
and bypass grafted artery. Pressure drop, wall shear stress, shear strain rate will be more
in the constricted artery. The velocity distribution will be different compare to the
normal artery. Bypass grafted artery restores the normal flow of the blood in the
obstructed artery which lowers the pressure drop and increase the blood flow through
the artery. The gathered information from CFD analysis will help the researchers and
Doctors a clear understanding of the flow phenomena inside the artery, affect pressure,
will help in proper understanding of stenosis, early diagnosis and proper treatment of
patient for reducing the death of people due to the cardiovascular diseases. The
information can also be used in developing user friendly and consumer friendly in
house stenosis monitoring device for early diagnosis of stenosis.
References
1. Koksungnoen, S., Rattanadecho, P., Wongchadakul, P.: 3D numerical model of blood flow
in the coronary artery bypass graft during no pulse and pulse situations: effects of an
anastomotic angle and characteristics of fluid. J. Mech. Sci. Technol. 32(9), 4545–4552
(2018). https://doi.org/10.1007/s12206-018-0851-z
2. Varshney, G., Katiyar, V.K.: Analysis of flow fields in stenosed artery with complete bypass
graft using numerical method. Indian J. Biomech., 65–69 (2009). Special Issue
3. Debnath, S., Banik, A., Bandyopadhyay, T.K., Saha, A.K.: CFD and optimization study of
frictional pressure drop through bends. Recent Pat. Biotechnol. 13, 74–86 (2019). https://doi.
org/10.2174/1872208312666180820153706
4. Banik, A., Bandyopadhyay, T.K., Biswal, S.K.: Computational fluid dynamics (CFD) sim-
ulation of cross-flow mode operation of membrane for downstream processing. Recent Pat.
Biotechnol. 13, 57–68 (2019). https://doi.org/10.2174/1872208312666180924160017
5. Banik, A., Bandyopadhyay, T.K., Biswal, S.K.: Computational fluid dynamics simulation of
disc membrane used for improving the quality of effluent produced by the rubber industry.
Int. J. Fluid Mech. Res. 44, 499–512 (2017). https://doi.org/10.1615/InterJFluidMechRes.
2017018630
6. Anderson, T.B., Jackson, R.: Fluid mechanical description of fluidized beds: equations of
motion. Ind. Eng. Chem. Fundam. 6, 527–539 (1967). https://doi.org/10.1021/i160024a007
7. Perktold, K., Rappitsch, G.: Computer simulation of local blood flow and vessel mechanics
in a compliant carotid artery bifurcation model. J. Biomech. 28, 845–856 (1995). https://doi.
org/10.1016/0021-9290(95)95273-8
8. Cairncross, R.A., Schunk, P.R., Baer, T.A., et al.: A finite element method for free surface
flows of incompressible fluids in three dimensions. Part I. Boundary fitted mesh motion. Int.
J. Numer. Meth. Fluids 33, 375–403 (2000). https://doi.org/10.1002/1097-0363(20000615)
33:3<375::AID-FLD13>3.0.CO;2-O
9. Perktold, K., Resch, M., Peter, R.O.: Three-dimensional numerical analysis of pulsatile flow
and wall shear stress in the carotid artery bifurcation. J. Biomech. 24, 409–420 (1991).
https://doi.org/10.1016/0021-9290(91)90029-M
10. Berthier, B., Bouzerar, R., Legallais, C.: Blood flow patterns in an anatomically realistic
coronary vessel: influence of three different reconstruction methods. J. Biomech. 35, 1347–
1356 (2002). https://doi.org/10.1016/S0021-9290(02)00179-3
11. Steinman, D.A.: Image-based computational fluid dynamics modeling in realistic arterial
geometries. Ann. Biomed. Eng. 30, 483–497 (2002). https://doi.org/10.1114/1.1467679
12. Hart, G.D.: Descriptions of blood and blood disorders before the advent of laboratory
studies. Br. J. Haematol. 115, 719–728 (2001). https://doi.org/10.1046/j.1365-2141.2001.
03130.x
13. Vasant, P., Zelinka, I., Weber, G.-W. (eds.): ICO 2018. AISC, vol. 866. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-00979-3
(2021). https://doi.org/10.1007/978-3-030-68154-8
512 A. Banik et al.
(2020). https://doi.org/10.1007/978-3-030-33585-4
16. Kim, S.E., Boysan, F.: Application of CFD to environmental flows. J. Wind Eng. Ind.
Aerodyn. 81, 145–158 (1999). https://doi.org/10.1016/S0167-6105(99)00013-6
17. Nielsen, J.C., Powell, A.J., Gauvreau, K., et al.: Magnetic resonance imaging predictors of
coarctation severity. Circulation 111, 622–628 (2005). https://doi.org/10.1161/01.CIR.
0000154549.53684.64
18. Rihal, C.S., Raco, D.L., Gersh, B.J., Yusuf, S.: Indications for coronary artery bypass
surgery and percutaneous coronary intervention in chronic stable angina: review of the
evidence and methodological considerations. Circulation 108, 2439–2445 (2003). https://
doi.org/10.1161/01.CIR.0000094405.21583.7C
Transportation Based Approach for Solving
the Generalized Assignment Problem
Elias Munapo(&)
Department of Business Statistics and Operations Research, School of Economic

Sciences, North West University, Mafikeng Campus,
Potchefstroom, South Africa
Abstract. This paper presents a transportation-based approach for solving the

difficult generalized assignment problem (GAP). The first transportation-based
method was proposed in 2014 in the form of a transportation branch and bound
algorithm and then improved in 2015. In that approach the branch and bound
algorithm was used to solve the GAP problem where the sub-problems were
transportation problems which are easier to solve than the usual linear pro-
gramming (LP) based sub-problems. The main weakness of the transportation
branch and bound algorithm is that there is no guarantee that the number of sub-
problems will not explode to unmanageable levels for large numbers of vari-
ables. This paper proposes a transportation-based approach whereby the GAP is
relaxed into a transportation problem. The only difference with the 2014 and
2015 approaches is that the relaxed GAP model is solved as an LP and at every
stage cuts are added to cater for the violated constraints. This approach has the
advantage that at every iteration a single infeasible optimal solution is generated
and used in the next stage and repeated until a feasible one is obtained.
Keywords: Generalized assignment problem Transportation model Relaxed

GAP model Branch and bound Cuts
1 Introduction
The generalized assignment problem (GAP) is very difficult to solve. This is a general
form of the assignment problem in which both tasks and agents have a size and the size
of each task varies from one agent to the other. The GAP model has so many real life
applications such as:
• vehicle routing,
• resource allocation,
• supply chain,
• machine scheduling and
• location.
At the moment several exact algorithms [3, 5, 10, 12–15] and heuristics [1, 2, 4, 11,
16] are available for solving the GAP model. The GAP model is NP hard and special
purpose branch and bound algorithms have been proposed in the last 40 years.

https://doi.org/10.1007/978-3-030-93247-3_51
514 E. Munapo
The term NP stands for nondeterministic polynomial time. The first special purpose
algorithm was developed by Ross and Soland in 1975 [13]. The branch and bound
related methods have an obvious weakness. They can still explode even if there is
proper handling of branches in terms of using:
• Linear Programming (LP) based cuts,
• Lagrangean relaxation,
• Penalties,
• Feasibility based tests,
• Logical feasibility tests and
• Feasible solution generators
to increase or lower the current bounds to a desirable level. The GAP model has very
important applications in real life and the search for more efficient methods is ongoing.
This paper proposes a transportation-based approach whereby the GAP is relaxed into a
transportation problem. The only difference is that the relaxed model is solved as an LP
and at every stage cuts are added to cater for the violated constraints. This approach has
the advantage that at every iteration a single infeasible optimal solution is generated
and used in the next stage and this process is repeated until a feasible optimal solution
is obtained.
2 Generalized Assignment Problem
A mathematical formulation of the generalized assignment problem may be represented

as given in (1).
ZOPT ¼ Minimize c11 y11 þ c12 y12 þ . . . þ cmn ymn ;

Subject to:
a11 þ y11 þ a12 y12 þ . . . þ a1n y1n b1;

a21 y21 þ a22 y22 þ . . . þ a2n y2n b2 ;
...
am1 ym1 þ am2 ym2 þ . . . þ amn ymn bm ;
ð1Þ
y11 þ y21 þ . . . þ ym1 ¼ 1;
y12 þ y22 þ . . . þ ym2 ¼ 1;
...
y1n þ y2n þ . . . þ ymn ¼ 1:
Where.
0 yij 1 and integer.
i ¼ 1; 2; . . .m; is a set of agents.
j ¼ 1; 2; . . .n; is a set of tasks.
cij is the cost of assigning agent i to task j.
aij is the resource needed by agent i to do task j.
bi is the resource available to agent i.
Transportation Based Approach for Solving the Generalized Assignment Problem 515
3 Relaxing the Generalized Assignment Problem
The generalized assignment problem can be relaxed to become an ordinary trans-

portation problem. A transportation model is easier to handle than the original gen-
eralized assignment problem and faster methods to solve transportation models are
available. The idea of relaxing the GAP model presented in this paper was recently
used in a transportation branch and bound algorithm proposed by Munapo [6] and then
improved by Munapo et al. [7]. The relaxation process is explained as given in Sect. 3.
ZRELAX ¼ Minimize c11 y11 þ c12 y12 þ . . . þ cmn ymn ;

Subject to:
y11 þ y12 þ . . . þ y1n ¼ s1 ;

y21 þ y22 þ . . . þ y2n ¼ s2 ;
...
ym1 þ ym2 þ þ ymn ¼ sm;
ð2Þ
y11 þ y21 þ . . . þ ym1 ¼ 1;
y12 þ y22 þ . . . þ ym2 ¼ 1;
...
y1n þ y2n þ . . . þ ymn ¼ 1:
Where si is the largest integer satisfying (3).
si ¼ Maximize yi1 þ yi2 þ . . . þ yin ;

ai1 yi1 þ ai2 yi2 þ . . . þ ain yin bi ; ð3Þ
Where si 0:
3.1 Balancing the Relaxed Transportation Model

The relaxed transportation model given in (2) is not balanced as given in Table 1.
Table 1. The relaxed transportation problem

Supply
c11 c12 … c1n s1
c21 c22 … c2n s2
… … … …
cm1 cm2 … cmn sm
Demand 1 1 1
516 E. Munapo
For the transportation model to be balanced the Eq. (4) must hold.
s ¼ ðs1 þ s1 þ . . . þ sm Þ ¼ n: ð4Þ
If
s [ n: ð5Þ
Then a column is added to the transportation model as given in Table 2.
Table 2. Balanced transportation problem when s [ n:

Supply
c11 c12 … c1n 0 s1
c21 c22 … c2n 0 s2
… … … … …
cm1 cm2 … cmn 0 sm
Demand 1 1 1 sn
If
s\n ð6Þ
Then a row is added to the transportation table as given in Table 3.
Table 3. Balanced transportation problem when s\n

Supply
c11 c12 … c1n s1
c21 c22 … c2n s2
… … … …
cm1 cm2 … cmn sm
0 0 … 0 ns
Demand 1 1 1
3.2 Constructing LPs from the Balanced Relaxed Model

The additional column or row results in additional variables. If a column is added, then
the relaxed transportation becomes as given in (7).

Subject to :
y11 þ y12 þ . . . þ y1n þ y1ðn þ 1Þ ¼ s1 ;

y21 þ y22 þ . . . þ y2n þ y2ðn þ 1Þ ¼ s2 ;
...
ym1 þ ym2 þ . . . þ ymn þ ymðn þ 1Þ ¼ sm ;
y11 ; y21 þ . . . þ ym1 ¼ 1; ð7Þ
y12 þ y22 þ . . . þ ym2 ¼ 1;
...
y1n þ y2n þ . . . þ ymn ¼ 1;
y1ðn þ 1Þ þ y2ðn þ 1Þ þ . . . þ ymðn þ 1Þ ¼ s n:
Where y1ðn þ 1Þ ; y2ðn þ 1Þ ; . . .; ymðn þ 1Þ are pure integers and the rest are binary
variables.
If a row is added, then the relaxed transportation model becomes as given in (8).

Subject to :
y11 þ y12 þ . . . þ y1n ¼ s1 ;

y21 þ y22 þ . . . þ y2n ¼ s2 ;
...
ym1 þ ym2 þ . . . þ ymn ¼ sm ;
yðm þ 1Þ1 þ yðm þ 1Þ2 þ . . . þ yðm þ 1Þn ¼ n s; ð8Þ
y11 ; y21 þ . . . þ ym1 þ yðm þ 1Þ1 ¼ 1;
y12 þ y22 þ . . . þ ym2 þ yðm þ 1Þ2 ¼ 1;
...
y1n þ y2n þ . . . þ ymn þ yðm þ 1Þn ¼ 1:
Where yðm þ 1Þ ; yðm þ 1Þ2 ; . . .; yðm þ 1Þn are pure integers and the rest are binary
variables.
The additional row and column are called dummies.
4 Solving the Relaxed Balanced Transportation Model
The optimal solution of the GAP model and the optimal solution of the relaxed problem
are related in a way given in (9).
ZRELAX ZOPT : ð9Þ
The relationship given in (9) is very useful in solving the GAP model. The relaxed
GAP is easier to solve than its original form and its solution is tested for feasibility. The
violated original GAP constraints are used to generate cuts that are added to relaxed LP
model. This process is repeated until there are no more violated original constraints.
518 E. Munapo
4.1 Initial Iteration

Solving the relaxed GAP model (either (7) or (8)) using linear integer programming to
obtain the initial solution is called the initial iteration. In this paper the relaxed GAP
model (7) or (8) is represented by ReGAP(0).
4.2 Violations
If any optimal integer solution of ReGAP(0) does not satisfy any of the original GAP
constraints, ai1 yi1 þ ai2 yi2 þ . . . þ ain yin bi it is called a violation. For example if an
optimal solution of ReGAP(0) is y31 ¼ 0; y32 ¼ 1; y33 ¼ 0; y34 ¼ 1; y35 ¼ 0 and one of
the original GAP constraints is 31y31 þ 41y32 þ 29y33 þ 63y34 þ 18y35 93: The given
optimal solution does not satisfy the given solution and is therefore a violation.
4.3 Generation of Cuts from Violations

Once a violation is determined then the violated constraints can be used to generate a
cut. In this paper only cuts with 0, 1 and −1 as coefficients are recommended. This
comes from the fact that unimodular matrices are made up of 0, 1 and −1 as coeffi-
cients. Suppose the original constraint is ai1 yi1 þ ai2 yi2 þ . . . þ ain yin bi ; and the
optimal solution of ReGAP(0) is yi2 ¼ yij ¼ yin ¼ 1: This is a violation and the cut
given in (10) can be generated.
yi2 ¼ yij ¼ yin 2: ð10Þ
This cut is valid if ai2 þ aij þ ain [ bi : The number of cuts for any violation can be
more than one.
5 Transportation Based Approach for the GAP Model

Step 1: Relax the GAP model into a transportation linear program (ReGAP).
Step 2: Solve the problem (ReGAP) using linear integer techniques.
Step 3: Determine any violations. If none then Go to Step 5 Else Go to Step 4.
Step 4: From the violated constraints generate cuts and add to current problem and
return to Step 2.
Step 5: The current solution is optimal.
5.1 Flow Chart

See Fig. 1.
GAP
model
Relax to obtain
ReGAP model in LP
form
Solve ReGAP in
LP form to obtain an
optimal solution
Use the violated

constraints to no Is optimal yes Solution is
generate cuts and add solution optimal to GAP
to current ReGAP feasible? model
in LP form
Fig. 1. Flow chart for the proposed GAP-transportation based approach
5.2 Optimality
The optimal solution obtained by linear integer programming is optimal and infeasible.
The infeasibility is eliminated by addition of cuts that are generated from the violations.
5.3 Numerical illustration

Solve the GAP model given in (11).
59y11 þ 151y12 þ 103y14 þ 59y15 þ

ZOPT ¼ Minimize 191y21 þ 301y22 þ 77y23 þ 181y24 þ 193y25
þ 361y31 þ 171y32 þ 61y33 þ 45y34;
Subject to :
520 E. Munapo
23y11 þ 36y12 þ 19y14 þ 34y15 56;

16y21 þ 21y22 þ 26y23 þ 16y24 þ 30y25 51;
24y31 þ 19y32 þ 26y33 þ 43y34 55;
y11 þ y21 þ y31 ¼ 1;
y12 þ y22 þ y32 ¼ 1; ð11Þ
y23 þ y33 ¼ 1;
y14 þ y24 þ y34 ¼ 1
y15 þ y25 ¼ 1:
yij ¼ 0 or 18ij:
Solving the GAP model directly, it takes 23 branch and bound sub-problems in the
worst case to verify the optimal solution given in (12).
ZOPT ¼ 585; y14 ¼ y15 ¼ y21 ¼ y25 ¼ y32 ¼ y33 ¼ 1: ð12Þ
For all other variables yij ¼ 0: Relaxing the GAP model, we have Table 4.
Table 4. ReGAP in transportation model form

59 151 1 103 59 2
191 301 77 181 193 2
361 171 61 45 1 2
1 1 1 1 1
This is not balanced and is balanced by addition of dummy column since

s [ n; i:e:ð2 þ 2 þ 2 ¼ 6Þ [ ð1 þ 1 þ 1 þ 1 þ 1 þ 1 ¼ 5Þi:e: 6 [ 5. In addition the
demand (d) for the added dummy column is d ¼ s n ¼ 6 5 ¼ 1: The balanced
transportation is given in Table 5.
Table 5. Balanced transportation model

59 151 1 103 59 0 2
191 301 77 181 193 0 2
361 171 61 45 1 0 2
1 1 1 1 1 1(=6–5)
The ReGAP in LP form is given in (13).
59y11 þ 151y12 þ 103y14 þ 59y15 þ

Minimize 191y21 þ 301y22 þ 77y23 þ 181y24 þ 193y25
þ 361y31 þ 171y32 þ 61y33 þ 45y34 55;
Subject to :
y11 þ y12 þ y13 þ y14 þ y15 þ y16 ¼ 2;

y21 þ y22 þ y23 þ y24 þ y25 þ y26 ¼ 2;
y31 þ y32 þ y33 þ y34 þ y35 ¼ 2;
y11 þ y21 þ y31 ¼ 1;
y12 þ y22 þ y32 ¼ 1; ð13Þ
y23 þ y33 ¼ 1;
y14 þ y24 þ y34 ¼ 1;
y15 þ y25 ¼ 1:
yij ¼ 0 or 18ij:
The three main GAP constraints are given in (14a, 14b, 14c).
23y11 þ 36y12 þ 19y14 þ 34y15 56 ð14aÞ
16y21 þ 21y22 þ 26y23 þ 16y24 þ 30y25 51 ð14bÞ
24y31 þ 19y32 þ 26y33 þ 43y34 55 ð14cÞ
Initial Iteration
Solving the initial relaxed GAP model i.e. ReGAP(0) given in (13) using linear integer
programming techniques we obtain the optimal solution in 2 branch and bound sub-
problems. The optimal solution is presented in (15) and Table 6.
ZRe GAPð0Þ ¼ 411; y11 ¼ y15 ¼ y23 ¼ y26 ¼ y32 ¼ 1: ð15Þ
Table 6. Optimal solution - initial iteration

59[1] 151 1 103 59[1] 0 2
191 301 77[1] 181 193 0[1] 2
361 171[1] 61 45[1] 1 0 2
1 1 1 1 1 1
First Iteration
The main GAP constraints (14a) and (14c) are violated. The two cuts that are generated
from the violations are y11 þ y15 1 and y12 þ y14 1. Adding these two cuts to
ReGAP(0) we have ReGAP(1). Solving ReGAP(1) using linear integer programming
techniques we have Table 7 and (16) and this is done in 2 branch and bound sub-
problems.
ZRe GAPð1Þ ¼ 507; y12 ¼ y15 ¼ y26 ¼ y26 ¼ y33 ¼ y34 ¼ 1: ð16Þ
522 E. Munapo
Table 7. Optimal solution – first iteration

59 151[1] 1 103 59[1] 0 2
191[1] 301 77 181 193 0[1] 2
361 171 61[1] 45[1] 1 0 2
1 1 1 1 1 1
Second Iteration
The main GAP constraints (14a) and (14c) are violated. The two cuts that are generated
from the violations are y12 þ y15 1 and y33 þ y34 1:. Adding these two cuts to
problems.
Table 8. Optimal solution – second iteration

59[1] 151[1] 1 103 59 0 2
191 301 77[1] 181 193[1] 0 2
361 171 611 45[1] 1 0[1] 2
1 1 1 1 1 1
ZRe GAPð2Þ ¼ 525; y11 ¼ y12 ¼ y25 ¼ y34 ¼ y36 ¼ 1: ð17Þ
Third Iteration
The main GAP constraints (14a) and (14b) are violated. The two cuts that are generated
from the violations are y11 þ y12 1 and y23 þ y25 1. Adding these two cuts to
problems.
ZRe GAPð3Þ ¼ 585; y14 ¼ y15 ¼ y23 ¼ y26 ¼ y32 ¼ y33 ¼ 1: ð18Þ
Table 9. Optimal solution – third iteration

59 151 1 103[1] 59[1] 0 2
191[1] 301 77 181 193 0[1] 2
361 171[1] 61[1] 45 1 0[1] 2
1 1 1 1 1 1
Fourth Iteration
The main GAP constraints (14a), (14b) and (14c) are all satisfied. Thus the ReGAP(3)
optimal solution is the optimal solution for the original GAP model given in (11).
6 Conclusions
The transportation branch and bound algorithm proposed for the generalized assign-
ment problem in 2014 [6] and improved in 2015 [7] has the obvious disadvantage that
the number of branch and bound sub-problems that are required to verify optimality
cannot be contained for large GAP models. The GAP transportation approach proposed
in this paper has the strength that at every iteration the number of transportation branch
and bound sub-problems are suppressed into one problem and the process is repeated
until an optimal solution is obtained. Research on the GAP model is ongoing and there
is a need to incorporate new techniques for linear integer programming such as the
branch cut and free algorithm [8] which is efficient. In addition, a more efficient version
of the interior point approach can be used to solve the GAP as given in [9].
References
1. Asahiro, Y., Ishibashi, M., Yamashita, M.: Independent and cooperative parallel search
methods for the generalized assignment problem. Optim. Methods Softw. 18(2), 129–141
(2003)
2. Chu, P.C., Beasley, J.E.: A genetic algorithm for the generalized assignment problem.
Comput. Oper. Res. 24, 17–23 (1997)
3. Karabakal, N., Bean, J.C., Lohmann, J.R.: A steepest descent multiplier adjustment method
for the generalized assignment problem. Report 92-11, University of Michigan, Ann Arbor,
MI (1992)
4. Laguna, M., Kelly, J.P., Conzalez-Velarde, J.L., Glover, F.F.: Tabu search for the
generalized assignment problem. Eur. J. Oper. Res. 82, 176–189 (1995)
5. Martello, S., Toth, P.: An algorithm for the generalized assignment problem. In: Brans, J.
P. (ed.) Operations Research, vol. 81, pp. 589–603. North-Holland, Amsterdam (1981)
6. Munapo, E.: A transportation branch and bound algorithm for solving the generalized
assignment problem. In: 6th International Conference on Applied Operational Research,
Proceedings. Lecture Notes in Management Science, vol. 6, pp. 150–158 (2014)
7. Munapo, E., Lesaoana, M., Philimon, N., Kumar, S.: A transportation branch and bound
algorithm for solving the generalized assignment problem. Int. J. Syst. Assur. Eng. Manag.
6, 217–223 (2015)
8. Munapo, E.: Branch cut and free algorithm for the general linear integer problem. In: Vasant,
9. Munapo, E.: Network reconstruction – a new approach to the traveling salesman problem
and complexity. In: Vasant, P., Zelinka, I., Weber, G.W. (eds.) ICO 2019. AISC, vol. 1072,
pp. 260–272. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33585-4_26 https://
www.springer.com/gp/book/9783030335847
10. Nauss, R.M.: Solving the generalized assignment problem: an optimizing heuristic approach.
INFORMS J. Comput. 15(3), 249–266 (2003)
524 E. Munapo
11. Osman, I.H.: Heuristics for the generalized assignment problem: simulated annealing and
tabu search approaches. OR Spektrum 17, 211–225 (1995)
12. Pigatti, A., Poggie de Aragao, M., Uchoa, E.: Stabilized branch and cut and price for the
generalized assignment problem. In: 2nd Brazilian Symposium on Graphs, Algorithms and
Combinatorics. Electronic Notes in Discrete Mathematics, vol. 19, pp. 389–395. Elsevier,
Amsterdam (2005)
13. Ross, G.T., Soland, R.M.: A branch and bound algorithm for the generalized assignment
problem. Math. Program. 8, 91–103 (1975)
14. Savelsburgh, M.: A branch and price algorithm for the generalized assignment problem.
Oper. Res. 45(6), 831–841 (1997)
15. Yagiura, M., Ibaraki, T., Glover, F.: An ejection chain approach for the generalized
assignment problem. Informs J. Comput. 16, 133–151 (2004)
16. Yagiura, M., Ibaraki, T., Glover, F.: A path re-linking approach with ejection chains for the
generalized assignment problem. Eur. J. Oper. Res. 169, 548–569 (2006)
Generalized Optimization: A First Step
Towards Category Theoretic Learning
Theory
Dan Shiebler(B)
University of Oxford, Oxford, UK

daniel.shiebler@kellogg.ox.ac.uk
Abstract. The Cartesian reverse derivative is a categorical generaliza-

tion of reverse-mode automatic differentiation. We use this operator to
generalize several optimization algorithms, including a straightforward
generalization of gradient descent and a novel generalization of Newton’s
method. We then explore which properties of these algorithms are pre-
served in this generalized setting. First, we show that the transformation
invariances of these algorithms are preserved: while generalized Newton’s
method is invariant to all invertible linear transformations, generalized
gradient descent is invariant only to orthogonal linear transformations.
Next, we show that we can express the change in loss of generalized gra-
dient descent with an inner product-like expression, thereby generalizing
the non-increasing and convergence properties of the gradient descent
optimization flow. In the apppendix we include several numerical exper-
iments to illustrate how we can use the ideas in the paper to optimize
polynomial functions over an ordered ring.
1 Background
Given a convex differentiable function l : Rn → R, there are many algorithms
that we can use to minimize it. For example, if we pick a step size α and a
starting point x0 ∈ Rn we can apply the gradient descent algorithm in which we
repeatedly iterate xt+1 = xt − α ∗ ∇l(xt ). For small enough α this strategy is
guaranteed to get close to the x that minimizes l [2].
Algorithms like gradient descent are often useful even when l is non-convex.
For example, under relatively mild conditions we can show that taking small
enough gradient descent steps will never increase the value of any differentiable
l : Rn → R [2]. The modern field of deep learning consists largely of applying
gradient descent and other algorithms that can be efficiently computed with
reverse-mode automatic differentiation to optimize non-convex functions [6].
Given the utility of these algorithms it is natural to explore when they can be
generalized beyond differentiable functions. For example, some authors [4,5,8]
use category theory to generalize automatic differentiation. Cockett et al. [3]
introduce Cartesian reverse derivative categories in which we can define an oper-
ator that shares certain properties with reverse-mode automatic differentiation
https://doi.org/10.1007/978-3-030-93247-3_52
526 D. Shiebler
(RD.1 to RD.7 in Definition 13 of Cockett et al. [3]) and Wilson et al. [9] build
on this formulation to introduce a generalized perspective on gradient descent
that can be used to learn Boolean circuits.
Despite this progress, there has been relatively little research on the proper-
ties of these generalized algorithms. That is, although categorical machine learn-
ing has started to gain traction, categorical learning theory is still far behind. In
this paper we aim to reduce this gap by exploring the properties of optimizers
generalized over other categories. Our contributions are as follows:
• We use Cockett et al.’s [3] Cartesian reverse derivative to define generalized

analogs of several optimization algorithms, including a novel generalization
of Newton’s method.
• We derive novel results on the transformation invariances of these generalized
algorithms.
• We define the notion of an optimization domain over which we can apply these
generalized algorithms and characterize the properties that an optimization
domain must satisfy in order to support generalized gradient-based optimiza-
tion. We provide novel results that the optimization domain of polynomials
over ordered rings satisfies these properties.
2 Standard Optimization
As we described in Sect. 1, gradient descent optimizes an objective function

l : Rn → R by starting at a point x0 ∈ Rn and progressing the discrete dynamical
system xt+1 = xt − α ∗ ∇l(xt ). Rewriting this as xt+α = xt − α ∗ ∇l(xt )
and taking the limα→0 of this system yields the differential equation ∂x ∂t (t) =
−∇l(x(t)), which we can think of as the continuous limit of gradient descent.
More generally we have:
Definition 1. An optimizer for l : Rn → R with dimension k is a continuous
function d : Rkn → Rkn .

Intuitively, an optimizer defines both a continuous system ∂x (t), ∂y
(t), · · · =
∂t ∂t
d(x(t),y(t), · · · ) and a discrete system xt+1 , yt+1 , · · · ) = (xt , yt , · · · ) + αd(xt ,
yt , · · · . Note that the discrete dynamical system is the Euler’s method dis-
cretization of the continuous system. We can think of an optimizer with dimen-
sion k > 1 as using information beyond the previous value xt to determine xt+1 .
In practice we usually work with optimizers that define dynamical systems
in which l(x(t)) and l(xt ) get closer to the minimum value of l as t increases.
Given l : Rn → R we can construct the gradient descent optimizer d(x) =
−∇l(x) and the Newton’s method optimizer d(x) = −∇2 (l(x))−1 ∇l(x),
both with dimension 1. We can also construct the momentum optimizer
d(x, y) = (y, −y − ∇l(x)) with dimension 2.
Generalized Optimization 527
2.1 Optimization Schemes

Definition 2. An optimization scheme u : (Rn → R) → (Rkn → Rkn ) is an
n-indexed family of maps from objectives l : Rn → R to optimizers d : Rkn → Rkn .
For example, the gradient descent optimization scheme is u(l)(x) = −∇l(x) and
the momentum optimization scheme is u(l)(x, y) = (y, −y − ∇l(x)).
In some situations we may be able to improve the convergence rate of the
dynamical systems defined by optimization schemes by precomposing an invert-
ible function f : Rm → Rn . That is, rather than optimize the function l : Rn → R
we optimize l ◦ f : Rm → R. However, for many optimization schemes there are
classes of transformations to which they are invariant: applying any such trans-
formation to the data cannot change the trajectory.
Definition 3. Suppose f : Rm → Rn is an invertible transformation and write
fk for the map (f × f × · · · ) : Rkm → Rkn . The optimization scheme u is
invariant to f if u(l ◦ f ) = fk−1 ◦ u(l) ◦ fk .
Proposition 1. Recall that an invertible linear transformation is a function

f (x) = Ax where the matrix A has an inverse A−1 and an orthogonal linear
transformation is an invertible linear transformation where A−1 = AT . Newton’s
method is invariant to all invertible linear transformations, whereas both gradient
descent and momentum are invariant to orthogonal linear transformations.
Proof. First, we will show that the Newton’s method optimizer scheme
N EW (l)(x) = −(∇2 l(x))−1 ∇l(x) is invariant to invertible linear transforma-
tions. Consider any function of the form f (x) = Ax where A is invertible. We
have:
2 −1 −1 2 −1 −T T
N EW (l ◦ f )(x) = −(∇ (l ◦ f )(x)) ∇(l ◦ f )(x) = −A (∇ l(Ax)) A A ∇l(Ax) =
−1 2 −1 −1 2 −1 −1
−A (∇ l(Ax)) ∇l(Ax) = −f ((∇ l(f (x))) ∇l(f (x))) = f (N EW (l)(f (x)))
Next, we will show that the gradient descent optimizer scheme GRAD(l)(x) =
∇l(x) is invariant to orthogonal linear transformations, but not to linear trans-
formations in general. Consider any function of the form f (x) = Ax where A is
an orthogonal matrix. Then the following holds only when AT = A−1 :
T −1 −1
GRAD(l ◦ f )(x) = −∇(l ◦ f )(x) = −A (∇l(Ax)) = −A (∇l(Ax)) = −f (GRAD(l)(f (x)))
Next, we will show that the momentum optimizer scheme M OM (l)(x, y) =

(y, y + ∇l(x)) is also invariant to orthogonal linear transformations, but not to
linear transformations in general. Consider any function of the form f (x) = Ax
where A is an orthogonal matrix. Then the following holds only when AT = A−1 :
T −1
M OM (l ◦ f )(x, y)x = y = A Ay = f (M OM (l)(f (x), f (y)))x
−1 T −1
M OM (l ◦ f )(x, y)y = −y − ∇(l ◦ f )(x)) = −A Ay − A ∇l(Ax)) = f (M OM (l)(f (x), f (y)))y
528 D. Shiebler
In order to interpret these invariance properties it is helpful to consider how

they affect the discrete dynamical system defined by an optimization scheme.
Proposition 2. Given an objective function l : Rn → R and an optimization
scheme u : (Rn → R) → (Rkn → Rkn ) that is invariant to the invertible linear
function f : Rm → Rn , the system yt+1 = yt + αu(l ◦ f )(yt ) cannot converge
faster than the system xt+1 = xt + αu(l)(xt ).
Proof. Consider starting at some point x0 ∈ Rkn and repeatedly taking Euler
steps xt+α = xt + αu(l)(xt ). Now suppose instead that we start at the point
y0 = fk−1 x0 and take Euler steps yt+α = yt + αu(l ◦ f )(yt ). We will prove by
induction that yt+α = fk−1 (xt+α ), and therefore the two sequences converge at
the same rate. The base case holds by definition and by induction we can see
that: yt+α = yt + αu(l ◦ f )(yt ) = fk−1 (xt ) + αfk−1 (u(l)(xt )) = fk−1 (xt+α ).
Propositions 1 and 2 together give some insight into why Newton’s method
can perform so much better than gradient descent for applications where both
methods are computationally feasible [2]. Whereas gradient descent can be led
astray by bad data scaling, Newton’s method steps are always scaled optimally
and therefore cannot be improved by data rescaling.
It is important to note that Proposition 2 only applies to linear transfor-
mation functions f . Since Euler’s method is itself a linear method, it does not
necessarily preserve non-linear invariance properties.
3 Generalized Optimization
In this section we will use Cartesian differential categories [8] and Cartesian
reverse derivative categories [3] to generalize standard results on the behavior of
gradient descent as well as the results in Sect. 2.
Definition 4. An optimization domain is a tuple (Base, X) such that each

morphism f : A → B in the Cartesian reverse derivative category Base has
an additive inverse −f and each homset C[∗, A] out of the terminal object ∗ is
further equipped with a multiplication operation f g and a multiplicative identity
map 1A : ∗ → A to form a commutative ring with the left additive structure +.
X is an object in Base such that the homset f ∈ C[∗, X] is further equipped
with a total order f ≤ g to form an ordered commutative ring.
Given an optimization domain (Base, X) the object X represents the space of

objective values to optimize and we refer to morphisms into X as objectives.
We abbreviate the map 1B ◦!A : A → B as 1AB , where !A : A → ∗ is the unique
map into the terminal object ∗. Note that any map f : A → B in Base has the
additive inverse −f = (−1AB )f .
For example, the objectives in the standard domain (Euc, R) are functions
l : Rn → R. Given an ordered commutative ring r we can form the r-polynomial
domain (Polyr , 1) in which objectives are r-polynomials lP : n → 1.
Definition 5. An objective l : A → X is bounded below in (Base, X) if there

exists some x : ∗ → X such that for any a : ∗ → A we have x ≤ l ◦ a.
In both the standard and r-polynomial domains an objective is bounded below
if its image has an infimum.
3.1 Generalized Gradient and Generalized n-Derivative

Definition 6. The generalized gradient of the objective l : A → X in
(Base, X) is R[l]1 : A → A where R[l]1 = R[l] ◦ idA , 1AX .
In the standard domain the generalized gradient of l : Rn → R is just the gradient
R[l]1 (x) = ∇l(x) and in the r-polynomial domain the generalized gradient of lP :
n → 1 is R[lP ]1 (x) = ∂x ∂lP
1
(x), · · · , ∂lP
∂xn (x) where ∂lP
∂xi is the formal derivative
of the polynomial lP in xi .
Definition 7. The generalized n-derivative of the morphism f : X → A in
(Base, X) is Dn [f ] : X → A where D1 [f ] = D[f ] ◦ idX , 1XX and Dn [f ] =
D[Dn−1 [f ]] ◦ idX , 1XX .
In the standard domain n
the generalized n-derivative of f : R → R is the
n-derivative f (n) = ∂∂xnf and in the r-polynomial domain the generalized n-
n
derivative of lP : 1 → 1 is the formal n-derivative ∂∂xlnP .
The derivative over the reals has a natural interpretation as a rate of change.
We can generalize this as follows:
Definition 8. We say that a morphism f : X → X in Base is n-smooth in
(Base, X) if whenever Dk [f ] ◦ t ≥ 0X : ∗ → X for all t1 ≤ t ≤ t2 : ∗ → X and
k ≤ n we have that f ◦ t1 ≤ f ◦ t2 : ∗ → X.
f is n-smooth if it cannot decrease on any interval over which its generalized
derivatives of order n and below are non-negative. Some examples include:
• Any f : R → R is trivially 1-smooth in the standard domain.
• When r is a dense subring of a real-closed field then any polynomial lP : 1 → 1
is 1-smooth in the r-polynomial domain [7].
n
• For any r, the polynomial lP = k=0 ck tk : 1 → 1 of degree n is n-smooth in
the r-polynomial n for any t1 we can use the binomial
n domain since n theorem to
write lP (t) = k=0 ck tk = k=0 ck (t1 + (t − t1 ))k = lP (t1 ) + k=1 ck (t − t1 )k
where ck is a constant such that (ck )(k!) = Dk [lP ](t1 ). Note that ck must exist
by the definition of the formal derivative of lP , and must be non-negative if
Dk [lP ](t1 ) is non-negative.
3.2 Optimization Functors

In this section we generalize optimization schemes (Sect. 2.1) to arbitrary opti-
mization domains. This will enable us to characterize the invariance properties
of our generalized optimization schemes in terms of the categories out of which
they are functorial. Given an optimization domain (Base, X) we can define the
following categories:
530 D. Shiebler
Definition 9. The objects in the category Objective over the optimization

domain (Base, X) are objectives l : A → X such that there exists an inverse
function R[l]−1 −1 −1
1 : A → A where R[l]1 ◦ R[l]1 = R[l]1 ◦ R[l]1 = idA : A → A,

and the morphisms between l : A → X and l : B → X are morphisms f : A → B
where l ◦ f = l.
Note that Objective is a subcategory of the slice category Base/X. In the
standard domain the objects in Objective are objectives l : Rn → R such
that the function ∇l : Rn → Rn is invertible. In the r-polynomial domain, the
objects in Objective are r-polynomials lP : n → 1 such that the function
∂x
∂lP
1
, · · · , ∂x
∂lP
n
: n → n is invertible.
Definition 10. A generalized optimizer over the optimization domain
(Base, X) with state space A ∈ Base and dimension k ∈ N is an endo-
morphism d : Ak → Ak in Base. The objects in the category Optimizer over
(Base, X) are generalized optimizers, and the morphisms between the general-
ized optimizers d : Ak → Ak and d : B k → B k are Base-morphisms f : A → B
such that f k ◦ d = d ◦ f k : Ak → B k . Note that morphisms only exist between
generalized optimizers with the same dimension. The composition of morphisms
in Optimizer is the same as in Base.
Recall that Ak and f k are respectively A and f tensored with themselves k
times. In the standard domain a generalized optimizer with dimension k is a
tuple (Rn , d) where d : Rkn → Rkn is an optimizer (Definition 1).
Definition 11. Given a subcategory D of Objective, an optimization func-
tor over D is a functor D → Optimizer that maps the objective l : A → X to
a generalized optimizer over (Base, X) with state space A.
Optimization functors are generalizations of optimization schemes (Definition
2) that map objectives to generalized optimizers. Explicitly, an optimization
scheme u that maps l : Rn → R to u(l) : Rkn → Rkn defines an optimization
functor in the standard domain.
The invariance properties of optimization functors are represented by the
subcategory D ⊆ Objective out of which they are functorial. Concretely, con-
sider the subcategory ObjectiveI of Objective in which morphisms are limited
to invertible linear morphisms l in Base and the subcategory Objective⊥ of
ObjectiveI in which the inverse of l is l† . In both the standard domain and r-
polynomial domain, the morphisms in ObjectiveI are linear maps defined by an
invertible matrix and the morphisms in Objective⊥ are linear maps defined by
an orthogonal matrix (matrix inverse is equal to matrix transpose). We will now
generalize Proposition 1 by defining generalized gradient descent and momen-
tum functors that are functorial out of Objective⊥ and a generalized Newton’s
method functor that is functorial out of ObjectiveI .
Definition 12. Generalized gradient descent sends the objective l : A → X
to the generalized optimizer −R[l]1 : A → A with dimension 1 and general-
ized momentum sends the objective l : A → X to the generalized optimizer
π1 , −π1 − (R[l]1 ◦ π0 ) : A2 → A2 with dimension 2.
Generalized momentum and generalized gradient descent have a very simi-

lar structure, with the major difference between the two being that general-
ized momentum uses a placeholder variable and generalized gradient descent
does not. In the standard domain we have that −R[l]1 (x) = −∇l(x) and
(π1 , −π1 − (R[l]1 ◦ π0 ) )(x, y) = (y, −y − ∇l(x)), so generalized gradient descent
and generalized momentum are equivalent to the gradient descent and momen-
tum optimization schemes that we defined in Sect. 2.1. Similarly, in the r-
polynomial domain generalized gradient descent maps lP : n → 1 to −R[lP ]1 :
n → n. Since Newton’s method involves the computation of an inverse Hessian
it is not immediately obvious how we can express it in terms of Cartesian reverse
derivatives. However, by the inverse function theorem we can rewrite the inverse
Hessian as the Jacobian of the inverse gradient function, which makes this easier.
That is: (∇2 l)(x)−1 = J∇l (x)−1 = J(∇l)−1 (∇l(x)) where J∇l (x) = (∇2 l)(x) is
the Hessian of l at x, J(∇l)−1 (∇l(x)) is the Jacobian of the inverse gradient func-
tion evaluated at ∇l(x), and the second equality holds by the inverse function
theorem. We can therefore generalize the Newton’s method term −∇2 (l)−1 ∇l as
−R[R[l]−1 1 ] ◦ R[l]1 , R[l]1 : X → X and generalize Newton’s method as follows:
Definition 13. Generalized Newton’s method sends l : A → X to the gen-
eralized optimizer −R[R[l]−1
1 ] ◦ R[l]1 , R[l]1 : A → A with dimension 1.
In the r-polynomial domain generalized Newton’s Method maps the polynomial
lP : n → 1 to −R[R[lP ]−11 ] ◦ R[lP ]1 , R[lP ]1 : n → n. We can now present the
main result in this section, which is a generalization of Proposition 1:
Proposition 3. Generalized Newton’s method is a functor from ObjectiveI to
Optimizer, whereas both generalized gradient descent and generalized momen-
tum are functors from Objective⊥ to Optimizer.
Proof. Since generalized gradient descent, generalized momentum and general-
ized Newton’s method all act as the identity on morphisms, we simply need to
show that each functor maps a morphism in its source category to a morphism
in its target category.
First we show that generalized Newton’s method N EW (l) = R[R[l]−1 1 ] ◦
R[l]1 , R[l]1 is a functor out of ObjectiveI . Given an objective l : A → X and
an invertible linear map f : B → A we have:
f ◦ N EW (l ◦ f ) = −f ◦ R[R[l ◦ f ]−1
1 ] ◦ R[l ◦ f ]1 , R[l ◦ f ]1 =
∗
−f ◦ f −1 ◦ R[R[l]−1
1 ] ◦ (f
−†
× f −† ) ◦ f † ◦ R[l]1 ◦ f, f † ◦ R[l]1 ◦ f =
−R[R[l]−1
1 ] ◦ f
−†
◦ f † ◦ R[l]1 ◦ f, f −† ◦ f † ◦ R[l]1 ◦ f =
−R[R[l]−1
1 ] ◦ R[l]1 , R[l]1 ◦ f = N EW (l) ◦ f
where ∗ holds by:
−1 ∗∗ −1 −1 −† −† −1 −1 −†
R[R[l ◦ f ]1 ] = R[f ◦ R[l]1 ◦f ] = R[f ] ◦ (idB × R[f ◦ R[l]1 ]) ◦ π0 , f ◦ π0 , π1 =
−1 −1 −1 −†
f ◦ R[f ◦ R[l]1 ] ◦ f ◦ π0 , π1 =
−1 −1 −1 −1 −†
f ◦ R[R[l]1 ] ◦ (idA × R[f ]) ◦ π0 , R[l]1 ◦ π0 , π1 ◦ f ◦ π0 , π1 =
−1 −1 −† −† −1 −1 −† −†
f ◦ R[R[l]1 ] ◦ (idA × f ) ◦ f ◦ π0 , π1 = f ◦ R[R[l]1 ] ◦ (f ×f )
532 D. Shiebler
and where ∗∗ holds by:

R[l ◦ f ]−1
1 =f
−1
◦ R[l]−1
1 ◦ R[f
−1
] ◦ (1A × idB ) = f −1 ◦ R[l]−1
1 ◦f
−†
Next we show that generalized gradient descent GRAD(l) = (1, A, R[l]1 ) is a

functor out of Objective⊥ . Given an objective l : A → X and an invertible
linear map f : B → A where f ◦ f † = idA and f † ◦ f = idB we have:
f ◦ GRAD(l ◦ f ) = −f ◦ R[l ◦ f ]1 = −f ◦ R[l ◦ f ] ◦ idB , 1BX =
−f ◦ R[f ] ◦ (idB × R[l]1 ) ◦ idB , f =
−f ◦ f † ◦ π1 ◦ (idB × R[l]1 ) ◦ idB , f =
−π1 ◦ (idB × R[l]1 ) ◦ idB , f =
−R[l]1 ◦ f = GRAD(l) ◦ f
Next we show that generalized momentum M OM (l) = (1, A, π1 , π1 + (R[l]1 ◦
π0 ) ) is a functor out of Objective⊥ . Given an objective l : A → X and an
invertible linear map f : B → A where f ◦ f † = idA and f † ◦ f = idB we have:
f 2 ◦ M OM (l ◦ f ) = (f × f ) ◦ M OM (l ◦ f ) = (f × f ) ◦ π1 , −π1 − (R[l ◦ f ]1 ◦ π0 ) =
= f ◦ π1 , f ◦ (−π1 − (R[l ◦ f ]1 ◦ π0 )) =
f ◦ π1 , −f ◦ π1 − (f ◦ R[l ◦ f ]1 ◦ π0 ) =
f ◦ π1 , −f ◦ π1 − (R[l]1 ◦ f ◦ π0 ) =
π1 , −π1 − (R[l]1 ◦ π0 ) ◦ (f × f ) = M OM (l) ◦ (f × f ) = M OM (l) ◦ f 2
Proposition 3 implies that the invariance properties of our optimization functors

mirror the invariance properties of their optimization scheme counterparts. Not
only does Proposition 3 directly imply Proposition 1, but it also implies that the
invariance properties that gradient descent, momentum, and Newton’s method
enjoy are not dependent on the underlying category over which they are defined.
3.3 Generalized Optimization Flows

In Sect. 2 we demonstrated how we can derive continuous and discrete dynamical
systems from an optimizer d : Rkn → Rkn . In this section we extend this insight
to generalized optimizers.
To do this, we define a morphism s : X → Ak whose Cartesian derivative
is defined by a generalized optimizer d : Ak → Ak . Since we can interpret
morphisms in Base[∗, X] as either times t or objective values x, the morphism
s : X → Ak describes how the state of our dynamical system evolves in time.
Formally we can put this together in the following structure:
Definition 14. A generalized optimization flow over the optimization
domain (Base, X) with state space A ∈ Base and dimension k ∈ N is a
tuple (l, d, s, τ ) where l : A → X is an objective, d : Ak → Ak is a generalized
optimizer, s : X → Ak is a morphism in Base and τ is an interval in Base[∗, X]
such that for t ∈ τ we have d ◦ s ◦ t = D1 [s] ◦ t : ∗ → Ak .
Intuitively, l is an objective, d is a generalized optimizer, and s is the state map

that maps times in τ to the system state such that d ◦ s : X → Ak describes the
Cartesian derivative of the state map D1 [s].
In the standard domain we can define a generalized optimization flow
(l, d, s, R) from an optimizer d : Rkn → Rkn and an initial state s0 ∈ Rkn by
t
defining a state map s : R → Rkn where s(t) = s0 + 0 d(s(t ))dt . We can think
of a state map in the standard domain as a simulation of Euler’s method with
infinitesimal α.
Definition 15. A generalized optimization flow (l, d, s, τ ) over the optimization

domain (Base, X) is an n-descending flow if for any t ∈ τ and k ≤ n we have
Dk [l ◦ π0 ◦ s] ◦ t ≤ 0X : ∗ → X.
Note that if (l, d, s, τ ) is an n-descending flow and l ◦ π0 ◦ s : X → X is n-smooth

(Definition 8), then l ◦ π0 ◦ s must be monotonically decreasing in t on τ .
Definition 16. The generalized optimization flow (l, d, s, τ ) over the optimiza-
tion domain (Base, X) converges if for any δ > 0X : ∗ → X there exists some
t ∈ τ such that for any t ≤ t ∈ τ we have −δ ≤ (l ◦ π0 ◦ s ◦ t ) − (l ◦ π0 ◦ s ◦ t) ≤ δ
In the standard domain this reduces to a familiar definition of convergence [1]:

a flow converges if there exists a time t after which the value of the objective l
does not change by more than an arbitrarily small amount.
Now suppose (l, d, s, τ ) is an n-descending flow, l ◦ π0 ◦ s : X → X is n-
smooth and l is bounded below (Definition 5). Since l ◦ π0 ◦ s must decrease
monotonically in t it must be that (l, d, s, τ ) converges. In the next section we
give examples of optimization flows defined by the generalized gradient that
satisfy these conditions.
3.3.1 Generalized Gradient Flows

Definition 17. A generalized gradient flow is a generalized optimization
flow of the form (l, −R[l]1 , s, τ ).
Given a smooth objective l : Rn → R an example generalized
t gradient flow in
the standard domain is (l, −∇l, s, R) where s(t) = s0 + 0 −∇l(s(t ))dt for some
s0 ∈ Rn . One of the most useful properties of a generalized gradient flow is that
we can write its Cartesian derivative with an inner product-like structure:
Proposition 4. Given a choice of time t ∈ τ and a generalized gradient flow
(l, −R[l]1 , s, τ ) we have D1 [l ◦ π0 ◦ s] ◦ t = −R[l]†st ◦ R[l]st ◦ 1X : ∗ → X where
R[l]st = R[l] ◦ s ◦ t◦!X , idX : X → A.
534 D. Shiebler
Proof.
D1 [l ◦ s] ◦ t = D[l ◦ s] ◦ t, 1X = D[l] ◦ s ◦ π0 , D[s] ◦ t, 1X = D[l] ◦ s, D[s] ◦ idX , 1X ◦t=

D[l] ◦ s, d ◦ s ◦ t = D[l] ◦ s, −R[l] ◦ idA , 1AX ◦ s ◦ t = −D[l] ◦ s, R[l] ◦ s, 1X ◦t=
−π1 ◦ R[R[l]] ◦ ( idA , 1AX × idA ) ◦ s, R[l] ◦ s, 1X ◦t=
−π1 ◦ R[R[l]] ◦ s, 1X , R[l] ◦ s, 1X ◦t=
−π1 ◦ R[R[l]] ◦ s ◦ t, 1X , R[l] ◦ s, 1X =
−π1 ◦ R[R[l]] ◦ ( s ◦ t, 1X × idA ) ◦ R[l] ◦ s ◦ t, 1X =
−π1 ◦ R[R[l]] ◦ ( s ◦ t, 1X × idA ) ◦ R[l]st ◦ 1X =
−R[R[l] ◦ s ◦ t◦!X , idX ] ◦ 1X , R[l]st ◦ 1X =
†
−(R[l] ◦ s ◦ t◦!X , idX ) ◦ π1 ◦ 1X , R[l]st ◦ 1X =
† †
−(R[l] ◦ s ◦ t◦!X , idX ) ◦ R[l]st ◦ 1X = −R[l]st ◦ R[l]st ◦ 1X
Intuitively, s ◦ t : ∗ → A is the state at time t and R[l]st ◦ 1X : ∗ → A is the

value of the generalized gradient of l at time t. To understand the importance
of this result consider the following definition:
Definition 18. (Base, X) supports generalized gradient-based optimiza-
tion when any generalized gradient flow over (Base, X) is a 1-descending flow.
Intuitively, an optimization domain supports generalized gradient-based opti-
mization if loss decreases in the direction of the gradient. Proposition 4 is impor-
tant because it helps us identify the optimization domains for which this holds.
For example, Proposition 4 implies that both the standard domain and any
r-polynomial domain support generalized gradient-based optimization:
• In the standard domain −R[l]†st ◦ R[l]st ◦ 1R = − ∇l(s(t)) 2 which must be
non-positive by the definition of a norm. As a result, any generalized gradient
flow (l, −R[l], s, τ ) in the standard domain converges if lis bounded below.
• In the r-polynomial domain −R[lP ]†st ◦ R[lP ]st ◦ 11 = − i=1 ∂l
n P ∂lP
∂xi (st ) ∂xi (st )
which must be non-positive since in an ordered ring no negative element is a
square. If r is a dense subring of a real-closed field then any generalized gra-
dient flow (l, −R[l], s, τ ) in the r-polynomial domain converges if l is bounded
below.
References
1. Ang, A.: Convergence of gradient flow. In: Course Notes at UMONS (2020)
2. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press,
Cambridge (2004). ISBN: 0521833787, http://www.amazon.com/exec/obidos/
redirect?tag=citeulike-20%5C&path=ASIN/0521833787
3. Cockett, R., et al.: Reverse derivative categories. arXiv e-prints arXiv:1910.07065
(2019)
4. Cruttwell, G.S.H., et al.: Categorical foundations of gradient-based learning. arXiv
e-prints arXiv:2103.01931 (2021). [cs.LG]
5. Elliott, C.: The simple essence of automatic differentiation. In: Proceedings of the
ACM on Programming Languages 2.ICFP, pp. 1–29 (2018)
6. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
7. Nombre: Does the derivative of a polynomial over an ordered ring behave like a rate
of change? (2021). https://math.stackexchange.com/q/4170920
8. Robert, A.G., Seely, R.A., Blute, R.F., Cockett, J.R.B.: Cartesian differential cate-
gories. Theory Appl. Categories 22(23), 622–672 (2009)
9. Wilson, P., Zanasi, F.: Reverse derivative ascent: a categorical approach to learning
boolean circuits. In: Electronic Proceedings in Theoretical Computer Science, vol.
333, pp. 247–260 February 2021. ISSN: 2075–2180, https://doi.org/10.4204/eptcs.
333.17
Analysis of Non-linear Structural Systems
via Hybrid Algorithms
Sinan Melih Nigdeli1(&), Gebrail Bekdaş1, Melda Yücel1,

Aylin Ece Kayabekir2, and Yusuf Cengiz Toklu3
1
{melihnig,bekdas}@iuc.edu.tr,
melda.yucel@ogr.iu.edu.tr
2
Department of Civil Engineering, Istanbul Gelisim University,
aekayabekir@gelisim.edu.tr
3
Department of Civil Engineering, Istanbul Beykent University,
34398 Sarıyer, Istanbul, Turkey
cengiztoklu@beykent.edu.tr
Abstract. Metaheuristic methods are commonly used in the problems which

are treating optimization of structural systems as to the topology, shape and size.
It is currently shown that metaheuristic methods can also be used in the analysis
of structural system by an application of the well-known mechanical principle of
minimum energy. This method is called total potential optimization using
metaheuristic methods (TPO/MA), and it is shown that this method has certain
advantages in dealing with nonlinear problems and with the problems where
classical methods including the Finite Element Method has some difficulties. In
this paper, a retaining wall example that is generated via plane-strain members is
presented and hybrid algorithms using the Jaya algorithm are investigated. The
hybrid algorithms may have advantages on the needed iteration number to reach
the final value.
Keywords: TPO/MA Metaheuristics Hybrid algorithms Structural

analysis Optimization
1 Introduction
The basic principle of mechanics defines the minimum total potential energies of the
structural systems as the system equilibrium position. As it is known, the system
potential energy is equal to the sum of the energy created by the external effects and the
deformation energy formed in the system due to the effects. In other words, as a result
of external effects, the system comes to a deformation state (equilibrium position)
which will minimize its total potential energy. Analysis of structural systems can be
defined as the process of finding structural deformations and their internal effects
according to this basic principle of mechanics. For this purpose, from past to present,
design engineers have developed and used various numerical methods and approaches
for the analysis of structural systems. The general approach in these methods is to

https://doi.org/10.1007/978-3-030-93247-3_53
Analysis of Non-linear Structural Systems via Hybrid Algorithms 537
determine the matrices to define the structural system and the loads, and to determine
the displacements of the structural system, and to obtain the cross-section effects by
using these displacements. Although this approach gives sufficient approximation
results in linear analysis, it may not be very effective in nonlinear analysis. For this
reason, it may be necessary to use a method based on iterative analysis for the analysis
of nonlinear systems.
In recent years, a method that is the same as the current analysis methods in terms
of being based on energy theorems, but completely different in terms of analysis
approach has been proposed by Toklu [1]. The method can be defined as a mini-
mization process based on finding the system deformation state where the total
potential energy is minimum with the help of metaheuristic algorithms. In other words,
the method is an optimization process in which the design variables are displacements
and the objective function is defined as the minimum system energy.
Compared to the existing numerical methods, the solution of the system dis-
placements defined as unknown in the current methods is based on mathematical
approaches, while TPO/MA iteratively determines the minimum situation among
various randomly defined displacement situations. That is, displacements, which are the
unknowns of the existing methods, are defined in the TPO/MA in the first step through
an optimization process. With this approach, TPO/MA provides a great convenience
compared to existing methods, as well as being very effective in terms of considering
the nonlinear behavior of the system without requiring any additional processing.
Despite this superiority in TPO/MA analysis, since it is an iterative minimization
process based on a metaheuristic algorithm, it caused the analysis of some systems to
be somewhat long in terms of computation time. However, especially in recent years,
developing processor technology and effective algorithms have enabled the method to
perform close to existing methods in terms of computation time. Scientific studies on
the TPO/MA method have proven that it is a high-performance, easy-to-apply and
effective method, whether the system is linear or nonlinear. Among the mentioned
scientific studies, it is seen that various structural systems such as trusses, cables,
tensegrics, plates and various situations related to these systems are discussed [1–11].
In this study, the analysis of systems consisting of plane strain elements with
TPO/MA is presented. In the study, linear and non-linear state analyzes of a retaining
wall sample defined by plate elements were performed. As metaheuristic algorithms,
flower pollination and Jaya algorithms have been used, which have been proven
effective by scientific studies. Some modifications and hybrid methods have also been
developed to improve the algorithms in terms of analysis computation time. In the tests,
it was understood that the proposed modification and hybrid algorithms gave very
effective results in terms of minimization process calculation time.
2 The Plane-Strain Members
The plates problems can be considered as a structural system that is generated via
triangular elements given in Fig. 1. In the figure, u(x, y) and v(x, y) are displacement
fields in x and y directions respectively. Considering the linear variation in displace-
ments, these displacement fields in the x-y plane can be defined as
538 S. M. Nigdeli et al.
uðx; yÞ ¼ ui þ C 1 x þ C 2 y ð1Þ
vðx; yÞ ¼ vi þ C 3 x þ C 4 y ð2Þ
where ui and vi are displacements at the node symbolized with i. The constants sym-
bolized with C1, C2, C3 and C4 can be obtained by partial derivatives of Eqs. (1) and (2)
with respect to x and y as follows:
@u
ex ¼ ¼ C1 ð3Þ
@x
@v
ey ¼ ¼ C4 ð4Þ
@y
@u @v
cxy ¼ þ ¼ C2 þ C3 ð5Þ
@y @x
in which the normal strain in x-direction, normal strain in y-direction and shear strain
are symbolized with ex, ey and cxy respectively. Nodal displacements at the nodes
named i, j and k can be calculated with Eqs. (6)–(8).
uð0; 0Þ ¼ ui ; vð0; 0Þ ¼ vi ð6Þ
uðaj ; bj Þ ¼ uj ; vðaj ; bj Þ ¼ vj ð7Þ
uðak ; bk Þ ¼ uk ; vðak ; bk Þ ¼ vk ð8Þ
ak
k
j
bk
bj
x
i
aj
Fig. 1. Triangular elements used in the generation of plate systems.

Considering Eqs. (1) and (2), the relation of these nodal displacements can be
written as
2 3 2 32 3 2 3
uj aj bj 0 0 C1 ui
6 vj 7 6 0 0 aj bj 7 6 C 2 7 6 vi 7
6 7¼6 76 7 þ 6 7 ð9Þ
4 uk 5 4 ak bk 0 0 54 C 3 5 4 u i 5
vk 0 0 ak bk C4 vi
and by the solution of this equation, C1, C2, C3 and C4 can be formulized as in Eqs.
(10)–(13). The strain energy density (e) for an elastic body with two dimensions can be
determined with Eq. (14). By substituting Eqs. (10)–(13) in Eqs. (3)–(5), the stresses
are obtained as in Eqs. (15)–(17) for linear cases, where E and m express elasticity
modulus and Poisson’s ratio respectively.
bk ðuj ui Þ bj ðuk ui Þ
C1 ¼ þ ð10Þ
aj bk ak bj ak bj aj bk
ak ðuj ui Þ aj ðuk ui Þ
C2 ¼ þ ð11Þ
ak bj aj bk aj bk ak bj
bk ðvj vi Þ bj ðvk vi Þ
C3 ¼ þ ð12Þ
aj bk ak bj ak bj aj bk
ak ðvj vi Þ aj ðvk vi Þ
C4 ¼ þ ð13Þ
ak bj aj bk aj bk ak bj
Z e
1
e¼ rde ¼ ðrx ex þ ry ey þ sxy cxy Þ ð14Þ
e¼0 2
E
rx ¼ ðð1 mÞex þ mey Þ ð15Þ
ð1 þ mÞð1 2mÞ
E
ry ¼ ðmex þ ð1 mÞey Þ ð16Þ
ð1 þ mÞð1 2mÞ
E 1 2m
sxy ¼ ð cxy Þ ð17Þ
ð1 þ mÞð1 2mÞ 2
Considering nonlinear stress–strain relations, formulation of normal stress in x-

direction (rx), normal stress in y-direction (ry) and shear stress (sxy) can be written as
E
rx ¼ ðð1 mÞex þ mey Þ3 ð18Þ
ð1 þ mÞð1 2mÞ
E
ry ¼ ðmex þ ð1 mÞey Þ3 ð19Þ
ð1 þ mÞð1 2mÞ
E 1 2m 3
sxy ¼ ð cxy Þ ð20Þ
ð1 þ mÞð1 2mÞ 2
For mth triangular element, the strain energy (Um) can be calculated by multiplying
strain energy density (em) and volume (Vm) of the element (Eq. (21)). The formulation
of the volume (Vm) are given in Eq. (22), where t is thickness of the element. The strain
energy equation for a system with n elements is written as in Eq. (23). The total
potential energy (Pp) is found by subtracting the work done by the external forces from
the total strain energy. For a system with p nodes and point loads, Pxi in x and Pyi in y
directions, Pp is formulated via Eq. (24).
U m ¼ em V m ð21Þ
ðaj bk ak bj Þt
Vm ¼ ð22Þ
2
X
n
U¼ Um ð23Þ
m¼1
X
p

Pp ¼ U Pxi ui þ Pyi vi ð24Þ
i¼1
3 The Optimization Algorithm
The first algorithm was developed by Yang as an optimization method called as flower
pollination algorithm (FPA) by considering the natural behavior of flowery plants as
pollination process [12]. This process is divided into two different stages and deter-
mined according to a special parameter based on the pollination style. In this regard, a
parameter called switch probability (sp) is utilized to realize the process either global
(Eq. 25) or local pollination (Eq. 26).
X new;i ¼ X old;i þ LðX old;i gÞ ð25Þ
X new;i ¼ X old;i þ 2 ðX j X k Þ ð26Þ
Here, to reach the minimization target, better new solutions are replaced with old
ones. Also, rand(0, 1) is a function that provides the generation of random values
between numbers in brackets. g* shows the best candidate solution in terms of mini-
mum objective function that it is selected among all flower pollination. Also, L means
to random flight function as Lévy distribution. ɛ is a random value ranged within 0 and
1, besides that Xj and Xk reflect the different solutions determined randomly.
The second method is Jaya algorithm (JA) proposed by Rao [13]. The main target
of JA is to reach the best solution (g*) besides moving away from the worst solution
(gw). So, the optimum solution is provided with a victory approach due to the men-
tioned JA principle. Additionally, the Jaya word is meant to victory in the Sanskrit
language that it suits the main target of JA while the optimization process is performed.
This process can be carried out via Eq. (27) by including only one phase and not
utilizing any specific parameter.
X new;i ¼ X old;i þ randð0; 1Þðg X old;i Þ randð0; 1Þðgw X old;i Þ ð27Þ
To improve the process of structural analyses, JA is handled owing to that JA has

only one phase, besides different phases or parameters can be added to it which is open
to modification by adding new phases. Three novel and hybrid algorithms are devel-
oped with improving the JA process.
The first one is developed with JA by combining Lévy distribution, which provides
the randomization via pollinator flight in FPA. To realize this, determining of random
values with (rand(0, 1)) within the expression of JA (Eq. (27)) is changed via Lévy
distribution (L). Also, a second phase is added from the student phase of Teaching-
learning-based optimization (TLBO). The existing phases are chosen with was switch
probability. In this sense, the first hybrid algorithm is named JALS.
TLBO was developed by Rao et al. [14] by inspiring the teach-learn process between
teacher and students. TLBO comprises two separate stages, which are called the teacher
and student phases. The second stage is known as the student phase where students
improve their knowledge and grade levels themselves by applying interaction with each
other and making some investigations, etc. This phase can be formalized via Eq. (28).
Here, it shows that Xi and Xj are different candidate solutions determined as randomly.

X old;i þ randð0; 1ÞðX i X j Þ; f ðX i Þ [ f ðX j Þ
X new;i ¼ ð28Þ
X old;i þ randð0; 1ÞðX j X i Þ; f ðX i Þ \ f ðX j Þ
In the second algorithm, JA is handled together with the student phase of TLBO to
evaluate the other candidate solutions out of the best and worst. This hybrid algorithm
provides the randomization of solutions comparing to single phase usage by JA. While
applying this algorithm, these two phases are also performed successively and sym-
bolized with JA2SP.
The last algorithm is proposed by considering JA with the student phase of TLBO.
These phases are selected through switch probability (sp) of FPA. The reason of this
modification is decreasing of optimization phase number by virtue of increasing
analysis time with the usage of multiple phases to find the optimal results. So, this
algorithm can be represented as JA1SP. Additionally, it is possible to say that JA1SP
has not any special parameters due to the sp parameter is changed with the random-
ization process in the range of [0, 1].
4 Numerical Example
The structural model and loading conditions of cantilever retaining can be seen in
Fig. 2. This structure has a thickness of 1 mm besides 14 members-16 nodes created
through system meshing. Moreover, as some material characteristics for wall, elasticity
modulus (E) is 32 106 kN/m2; Poisson ratio is 0.2. The optimization outcomes are
represented in Tables 1 and 2 for the linear and non-linear cases.
Table 1. Optimum results for linear solution case of retaining wall.
542
Node FPA JA JA1SP JA2SP JALS

Dx (mm) Dx (mm) Dx (mm) Dy (mm) Dx (mm)
1 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
2 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 −0.00037
3 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00128 0.00000
4 −0.00001 −0.00001 −0.00001 −0.00001 −0.00001 −0.00001 −0.00001 −0.00001 0.00172 −0.00022
5 −0.00001 0.00000 −0.00001 0.00000 −0.00001 0.00000 −0.00001 0.00000 0.00255 0.00000
S. M. Nigdeli et al.
6 −0.00001 0.00001 −0.00001 0.00001 −0.00001 0.00001 −0.00001 0.00001 0.00236 −0.00071
7 −0.00002 0.00000 −0.00002 0.00000 −0.00002 0.00000 −0.00002 0.00000 0.00500 0.00000
8 −0.00002 0.00000 −0.00002 0.00000 −0.00002 0.00000 −0.00002 0.00000 0.00500 −0.00031
9 −0.00014 −0.00003 −0.00014 −0.00003 −0.00014 −0.00003 −0.00014 −0.00003 0.00480 −0.00154
10 −0.00014 0.00004 −0.00014 0.00004 −0.00014 0.00004 −0.00014 0.00004 0.00500 −0.00219
11 −0.00034 −0.00003 −0.00034 −0.00003 −0.00034 −0.00003 −0.00034 −0.00003 0.00487 −0.00327
12 −0.00034 0.00005 −0.00034 0.00005 −0.00034 0.00005 −0.00034 0.00005 0.00500 −0.00336
13 −0.00057 −0.00002 −0.00057 −0.00002 −0.00057 −0.00002 −0.00057 −0.00002 0.00500 −0.00500
14 −0.00057 0.00005 −0.00057 0.00005 −0.00057 0.00005 −0.00057 0.00005 0.00500 −0.00486
15 −0.00081 −0.00001 −0.00081 −0.00001 −0.00081 −0.00001 −0.00081 −0.00001 0.00500 −0.00500
16 −0.00081 0.00005 −0.00081 0.00005 −0.00081 0.00005 −0.00081 0.00005 0.00500 −0.00500
Energy −0.03156753578 −0.03156734444 −0.03156753578 −0.03156753578 −0.03156753578
(kNm)
Iteration 1462853 761048 1665053 1196979 1326900
number
Table 2. Optimum results for non-linear solution case of retaining wall.
Node FPA JA JA1SP JA2SP JALS
Dx (mm) Dx (mm) Dx (mm) Dy (mm) Dx (mm)
1 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.0000 −0.0024 0.0000 −0.0024 0.0000 −0.0024 0.0000 −0.0024 0.0000 0.0000
3 −0.0020 0.0000 −0.0020 0.0000 −0.0020 0.0000 −0.0020 0.0000 −0.0020 −0.0020
4 −0.0131 −0.0123 −0.0131 −0.0123 −0.0131 −0.0123 −0.0131 −0.0123 −0.0131 −0.0131
5 −0.0173 0.0000 −0.0173 0.0000 −0.0173 0.0000 −0.0173 0.0000 −0.0173 −0.0173
6 −0.0228 0.0141 −0.0228 0.0141 −0.0228 0.0141 −0.0228 0.0141 −0.0228 −0.0228
7 −0.0360 0.0000 −0.0360 0.0000 −0.0360 0.0000 −0.0360 0.0000 −0.0360 −0.0360
8 −0.0422 0.0056 −0.0422 0.0056 −0.0422 0.0056 −0.0422 0.0056 −0.0422 −0.0422
9 −0.1189 −0.0246 −0.1189 −0.0246 −0.1189 −0.0246 −0.1189 −0.0246 −0.1189 −0.1189
10 −0.1123 0.0352 −0.1123 0.0352 −0.1123 0.0352 −0.1123 0.0352 −0.1123 −0.1123
11 −0.2833 −0.0270 −0.2833 −0.0270 −0.2833 −0.0270 −0.2833 −0.0270 −0.2833 −0.2833
12 −0.2788 0.0425 −0.2788 0.0425 −0.2788 0.0425 −0.2788 0.0425 −0.2788 −0.2788
13 −0.4930 −0.0240 −0.4930 −0.0240 −0.4930 −0.0240 −0.4930 −0.0240 −0.4930 −0.4930
14 −0.4903 0.0456 −0.4903 0.0456 −0.4903 0.0456 −0.4903 0.0456 −0.4903 −0.4903
15 −0.7307 −0.0156 −0.7307 −0.0156 −0.7307 −0.0156 −0.7307 −0.0156 −0.7307 −0.7307
16 −0.7298 0.0428 −0.7298 0.0428 −0.7298 0.0428 −0.7298 0.0428 −0.7298 −0.7298
Energy (kNm) −40.719901868 −40.719901868 −40.719901868 −40.719901868 −40.719901868
Iteration number 675955 300302 529275 109687 176397
Analysis of Non-linear Structural Systems via Hybrid Algorithms
543
Fig. 2. Structural model of cantilever retaining wall [15]
5 Conclusion
According to the results, all classical and hybrid algorithms are effective to find the
same energy value as the final result. This situation is both effective in linear and non-
linear cases. According to linear case results, the classical JA algorithm needs the least
number of iterations, but it cannot be said for the non-linear case. In that situation, the
best algorithms are JA2SP (although the computing time is double due to applying two
phases in an iteration) and JALS. Due to these different findings, different modifica-
tions of the algorithms may lead to a better and advanced solution for various problems.
As conclusion, by classical and hybrid algorithms, TPO/MA is an alternative

structural analysis tool for nonlinear problems. The effectiveness of the methods will be
increased by solving new types of problems.
of Istanbul University-Cerrahpasa. Project number: FYO-2019-32735.
References
1. Toklu, Y.C.: Nonlinear analysis of trusses through energy minimization. Comput. Struct. 82
(20–21), 1581–1589 (2004)
2. Toklu, Y.C., Temür, R., Bekdaş, G.: Computation of nonunique solutions for trusses
undergoing large deflections. Int. J. Comput. Meth. 12(03), 1550022 (2015)
3. Nigdeli, S.M., Bekdaş, G., Toklu, Y.C.: Total potential energy minimization using
metaheuristic algorithms for spatial cable systems with increasing second order effects. In:
12th International Congress on Mechanics (HSTAM2019), pp. 22–25, September 2019
4. Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Toklu, Y.C.: Advanced energy-based analyses of
trusses employing hybrid metaheuristics. Struct. Des. Tall Spec. Build. 28(9), e1609 (2019)
5. Toklu, Y.C., Uzun, F.: Analysis of tensegric structures by total potential optimization using
metaheuristic algorithms. J. Aerosp. Eng. 29(5), 04016023 (2016)
6. Toklu, Y.C., et al.: Total potential optimization using metaheuristic algorithms for solving
nonlinear plane strain systems. Appl. Sci. 11(7), 3220 (2021)
optimization using hybrid metaheuristics: a tunnel problem solved via plane stress members.
In: Nigdeli, S.M., Bekdaş, G., Kayabekir, A.E., Yucel, M. (eds.) Advances in Structural
Engineering—Optimization. SSDC, vol. 326, pp. 221–236. Springer, Cham (2021). https://
doi.org/10.1007/978-3-030-61848-3_8
8. Toklu, Y.C., Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M., Yücel, M.: Analysis of plane-stress
systems via total potential optimization method considering nonlinear behavior. J. Struct.
Eng. 146(11), 04020249 (2020)
optimization using metaheuristics: analysis of cantilever beam via plane-stress members. In:
Nigdeli, S.M., Kim, J.H., Bekdaş, G., Yadav, A. (eds.) ICHSA 2020. AISC, vol. 1275,
pp. 127–138. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-8603-3_12
10. Kayabekir, A.E., Toklu, Y.C., Bekdaş, G., Nigdeli, S.M., Yücel, M., Geem, Z.W.: A novel
hybrid harmony search approach for the analysis of plane stress systems via total potential
optimization. Appl. Sci. 10(7), 2301 (2020)
Wiley, Hoboken (2021)
12. Yang, X.-S.: Flower pollination algorithm for global optimization. In: Durand-Lose, J.,
Jonoska, N. (eds.) UCNC 2012. LNCS, vol. 7445, pp. 240–249. Springer, Heidelberg
(2012). https://doi.org/10.1007/978-3-642-32894-7_27
(3), 303–315 (2011)
15. Topçu, A.: Sonlu Elemanlar Metodu, Eskişehir Osmangazi Üniversitesi, April 2019. http://
mmf2.ogu.edu.tr/atopcu/
Ising Model Formulation for Job-Shop
Scheduling Problems Based on Colored
Timed Petri Nets
Kohei Kaneshima1 and Morikazu Nakamura2(B)

1
Graduate School of Engineering and Science, University of the Ryukyus, Nishihara,
Okinawa 903-0213, Japan
k208580@ie.u-ryukyu.ac.jp
2
Computer Science and Intelligent Systems, University of the Ryukyus, Nishihara,
Okinawa 903-0213, Japan
morikazu@ie.u-ryukyu.ac.jp
Abstract. This paper presents a colored timed Petri net-based Ising

model formulation for job-shop scheduling problems. By extracting fun-
damental properties of Petri nets such as the structural precedence
relation, the firing conflicts, we can incrementally construct the corre-
sponding Ising model for a given job-shop scheduling problem. Our app-
roach can overcome the difficulty of Ising model formulation for quantum
annealing. This paper presents the formal composition method, an illus-
trated example, and some results of the computational evaluation for our
binary search-based quantum annealing process.
Keywords: Ising model · Quantum annealing · Petri net · Scheduling

problem · Optimization
1 Introduction
Combinatorial optimization has been considered a fundamental research field
in computer sciences and operations research. We reduce various problems in
our real life into combinatorial optimization problems for minimizing costs or
maximizing throughputs. In addition, recent hot areas such as machine learning
and IoT data incentive applications require combinatorial optimization as a core
processing step.
Many practical combinatorial optimization problems are known as NP-hard;
polynomial-time deterministic algorithms have not been developed thus far [1].
The mathematical programming approach reduces the search space drastically
by using mathematical techniques and obtaining the exact solution for small but
practical problems. Heuristic algorithm approaches find the reasonable quality of
feasible solutions, in which the reasonability means its practicality even though
it is not the exact solution [2].
Meta-heuristic algorithms are not problem-specific but applicable to many
varieties of combinatorial optimization problems [3]. Genetic algorithms, sim-
ulated annealing, and tabu search are well-known meta-heuristics. Quantum
https://doi.org/10.1007/978-3-030-93247-3_54
Ising Model Formulation for Job-Shop Scheduling Problems 547
annealing is a new meta-heuristic algorithm inspired by the quantum mechanics

for combinatorial optimization [4,5]. Once we formulate a target combinatorial
optimization problem into the Ising model, the annealing process finds the lowest
energy state of the model. The state gives us a solution for the target problem.
D-wave is the first commercial machine of quantum annealing [6], and there are
several digital machines for performing simulated quantum annealing. To date,
many application examples of quantum annealing are shown in the literature [7]
even though we have several difficulties to utilizing much more the new plat-
form efficiently. The Ising model formulation is an obstacle to improving the
usability of quantum annealing. The formulation requires our deep knowledge of
mathematics and sufficient skills obtained from experience.
To overcome this difficulty, we proposed a model-based approach for Ising
model formulation [8], in which our method can systematically generate the
Ising model from the Petri net model of a target problem. Our previous work
presented the formal framework of the Petri net modeling-based Ising model
formulation. We introduced binary quadratic nets, a class of colored Petri net,
to represent Ising models with Petri net forms. We use higher-classes of Petri
nets such as colored timed Petri net to model target problems. We call the class
of Petri nets the problem-domain Petri net; we systematically convert problem-
domain Petri net models into binary quadratic nets. Thus, we can easily obtain
the corresponding Ising model if we model the target problem with high-level
Petri nets. That is, our method drastically reduces the difficulty of the Ising
model formulation.
In our previous paper [8], we showed a QUBO model formulation for a job-
shop scheduling problem as an example without a detailed explanation. The
formulation was for the single available resource case, where each resource type
contains only a single available resource. This paper presents the detailed QUBO
model formulation of job-shop scheduling problems and shows some evaluation
results.
2 Quantum Annealing
Quantum annealing is a new optimization algorithm inspired by quantum
mechanics for combinatorial optimization. The annealing process searches opti-
mal solutions composed of values of the Ising variables, s = (s1 , s2 , . . . , sN ), si ∈
{−1, +1} to minimize the energy represented by the Hamiltonian:
N

HP (s) = hi si + Ji,j si sj , (1)
i=1 i<j
where hi is the magnetic field coefficient at site si , and Ji,j is the interaction
coefficient between si and sj . Ising variables correspond to discrete variables in
the target optimization problem. The lowest energy state of the Hamiltonian,
the ground state of HP , corresponds to the optimal solution.
548 K. Kaneshima and M. Nakamura
Note that we can convert the decision variables on {+1, −1} to {0, 1}, and
vice versa. The {0, 1}-base formulation is called QUBO (Quadratic Uncon-
strained Binary Optimization) models. We often use the notation Ising model
even for a QUBO model except when we need to distinguish both because both
models are equivalent to each other.
3 Petri Net Modeling for Job-Shop Scheduling Problems
This section presents Petri net modeling method for job-shop scheduling prob-
lems. To represent the necessary information of the problems, we use colored
timed Petri nets. The processing time in the problem can be represented as the
firing time of transitions, and colors denote the capability of resources.
Colored timed Petri net is a ten-tuple CTPN = (Σ, P , T , F , V , C, E,
T S, F D, M0 ) with set of places P = {P1 , P2 , . . . , Pm }, set of transitions, T =
{T1 , T2 , . . . , Tn }, connectivity function F : (P × T ) ∪ (T × P ) → N, set of arc
variables V , color function C : P → Σ, arc function E, time stamp T S, firing
duration F D : T → T S, and initial marking M0 . The readers can refer to [8–10],
for more details such as firing, state changes, firing duration, arc functions.
In this paper, we consider the conventional job-shop scheduling problem
defined as follows.
Definition 1. For given the following input data:
J = {J1 , J2 , ..., Jn }, (2)

Ji = (Oi,1 , Oi,2 , ..., Oi,m ), i = 1, 2, ..., n, (3)
Task = {Oi,j |i = 1, ..., n, j = 1, ..., m}, (4)
R = {m1 , m2 , ..., mr }, (5)
RT = {R1 , R2 , ..., R|RT | } (6)
|RT |
R = ∪i=1 Ri such that ∀Ri , Rj ∈ RT =⇒ Ri ∩ Rj = ∅ (7)
P T : Task × R → N, (8)
RR : Task → RT, (9)
the problem is to obtain a schedule to process all the jobs, equivalently all the
tasks, with the minimum makespan, that is, the total length of the schedule. Here,
J is a set of jobs, and each job Ji is an ordered list of tasks, Task is the set of
all tasks, R is a set of resources, RT is a partition of R, P T (Oi,j ), mr is the
processing time when task Oi,j is processed by machine mr , N is the set of natural
numbers, and RR(Oi,j ) returns the required resource for task Oi,j ,
In this definition, the conventional job-shop scheduling problem is the case
that |Ri | = 1 for all i in (6) and |RT | = m. It should be noted that the problem
becomes flexible job-shop scheduling if |Ri | ≥ 2 for some i.
Definition 2 represents the modeling method for a given job-shop scheduling
problem by a colored timed Petri net.
Definition 2. (Colored Timed Petri Net for Job-shop Scheduling Problem) For
a given job-shop scheduling problem defined in Definition 1, we creat a colored
timed Petri net, CTPN = (Σ, P , T , F , V , C, E, T S, F D, M0 ) as follows:
Σ=R (10)
re st
P =P ∪P (11)
re
P = {Pr |∀r ∈ RT } (12)
P st = {Pi,j |i = 1, 2, ..., n, j = 1, 2, ..., m + 1} (13)
T = {Ti,j |∀Oi,j ∈ Task } (14)
⎧
⎪
⎪1 x = Pi,j , y = Ti,j
⎪
⎪
⎪
⎪ x = Ti,j , y = Pi,j+1
⎨1
F (x, y) = 1 x = Ti,j , y = Pr , r = RR(Oi,j ) (15)
⎪
⎪
⎪
⎪1 x = Pr , y = Ti,j , r = RR(Oi,j )
⎪
⎪
⎩0 otherwise
⎧
⎪
⎨{v} x = Pr , y = Ti,j , r = RR(Oi,j )
V (x, y) = {v} x = Ti,j , y = Pr , r = RR(Oi,j ) (16)
⎪
⎩
∅ otherwise
Type(v) = R (17)
⎧
⎪
⎨{v} x = Pr , y = Ti,j , F (Pr , Ti,j )
E(x, y) = {v} x = Ti,j , y = Pr , F (Ti,j , Pr ) (18)
⎪
⎩
∅ otherwise
F D(Ti,j ) = P T (Ti,j , RR(Ti,j )), ∀t ∈ T (19)

UNIT ∀Pi ∈ P st
C(p) = (20)
R ∀Pr ∈ P re
⎧
⎪
⎨UNIT p = Pi,1 , i = 1, 2, ..., n
M0 (p) = ∅ p ∈ P st \ {Pi,1 |i = 1, 2, ..., j} (21)
⎪
⎩
mi p = Pi ∈ P re
The following example contains three jobs and four tasks per job and three
resources.
Example 1. For a job-shop scheduling problem instance;

– J = {(O1,1 , O1,2 , O1,4 , O1,4 ), (O2,1 , O2,2 , O2,4 , O2,4 ), (O3,1 , O3,2 , O3,4 , O3,4 )},
– R = {m0c1, m1c1, m2c1}, RT = {m0, m1, m2},
– m0 = RR(O1,1 ) = RR(O2,1 ) = RR(O3,1 ) = RR(O3,2 ), m1 = RR(O1,2 ) =

RR(O2,2 ) = RR(O1,3 ) = RR(O2,3 ), m2 = RR(O1,4 ) = RR(O2,4 ) = RR(O3,3 )
= RR(O3,4 ),
– P T (O1,2 , m1) = P T (O1,3 , m1 = P T (O2,1 , m0) = P T (O2,4 , m2) = 1,
– P T (O1,1 , m0) = P T (O1,4 , m2) = P T (O2,3 , m1) = P T (O3,1 , m0) = P T

(O3,3 , m1) = P T (O3,4 , m2) = 2, P T (O2,2 , m1) = P T (O3,2 , m0) = 3,
Figure 1 depicts a colored timed Petri net model. We created the model by using
CPNTools.
Fig. 1. Petri net model for job-shop scheduling problems
4 Petri Net-Based Ising Model Formulation

We present in this section the method of Ising (QUBO) model construction.
As specified in [8], the superposition principle allows us to construct binary
quadratic nets incrementally.
For general description, we use the notation Ti instead of Ti,j in this section
because we need the notation for their identification in our algorithm. Note that
the notation Ti,j contains information of the structure, that is, the j-th task in
i-th job.
In the colored timed Petri net model defined in Sect. 3, the scheduling prob-
lem corresponds to the determination of starting time for each task. Therefore,
firing count vectors Xk , k = 1, 2, ... are suitable as decision variables, where
Xk (Ti ) = 1 if Ti starts its firing at step k and complete at step k + F D(Ti ),
otherwise Xk (Ti ) = 0.
For the job-shop scheduling problems, we have to consider three constraints;
the precedence relation, the resource conflict, and the completeness of all tasks.
We can straightforwardly construct these constraints from the problem-domain
Petri net model.
Firstly, we construct Hamiltonian for the precedence relation; the relation is
represented as Petri net structure; that is, there exists the precedence relation
between ordered pair of tasks (Ti , Tj ) if (Ti• \P re )∩ (•Tj\P re ) = ∅. Let Prec be the
set of the precedence relation extracted from the Petri net model. The following
Hamiltonian represents the penalty for breaking the relation; that is, Hprece (X)
becomes zero if all the precedence relations are satisfied in the schedule.
MaxTime

Hprece (X) = ( Xk (Ti )Xh (Tj )), (22)
k=0 (Ti ,Tj )∈P rec,h≤k+F D(Ti )
where MaxTime is the parameter corresponding to the delivery time deadline.

Secondly, we consider the resource conflict; it is a fundamental problem for
shared resource systems. The mutual exclusion condition should be satisfied
in such systems. Suppose task Ti starts at step k with resource mr . Another
task Ti cannot start by using the same resource r till step k + F D(Ti ); that is,
Xh (Tj ) cannot be 1 for k ≤ h < k + F D(Ti ). We can extract such relations
on (Ti , Tj , k, h) and collect as set C timed . The following Hamiltonian denotes the
penalty for breaking the mutual exclusion condition; that is, Hconf lict (X) should
be zero if the mutual exclusion is satisfied for all the shared resources.

Hconflict (X) = Xk (Ti )Xh (Tj ). (23)
(Ti ,Tj ,k,h)∈C timed
Lastly, the completeness of all tasks are to confirm the number of firing is
just once for each task. That is, the following condition should be satisfied;
MaxTime

Xk (Ti ) = 1, ∀Ti ∈ T. (24)
k=0
We can express this condition as the following Hamiltonian, which becomes zero
if all the tasks are processed just once.
|T | MaxTime

Hcomplete (X) = ( Xk (Ti ) − 1)2 . (25)
i=1 k=0
The total Hamiltonian is constructed as follows.

H(X) = A · Hprece (X) + B · Hconflict (X) + C · Hcomplete (X), (26)
where A, B, and C are parameters to balance the minimizing pressure between

three constraints. We need carefully set these parameters to get good quality
solutions.
The Hamiltonian (26) is an QUBO model, (0, 1)-base, and we can convert
directly to the corresponding Ising model, (−1, +1)-base shown in (1).
The Hamiltonian (26) does not contain the terms for the objective function,
minimizing the makespan. This formulation is proposed in [11] and makes the
Ising (QUBO) model simple. Therefore, we can obtain just feasible schedules
from the annealing process with the Hamiltonian (26).
To obtain the minimum makespan, we need a binary search algorithm to
determine the minimum M axT ime. This algorithm finds a feasible solution with
the minimum M axT ime parameter based on the binary search. Algorithm 1
shows the steps of the algorithm, where QA(H, MaxTime) means the invocation
of quantum annealer with the Hamiltonian H and parameter MaxTime.
Algorithm 1 Binary Search-based Scheduling with QA

Input: Hamiltonian H
Output: Xk , k = 1, 2, ..., MaxTime
Initialisation
:
1: low ← ( j F D(Tj ))/|T |;
2: high ← j F D(Tj ));
3: repeat
4: MaxTime ← (low + high)/2;
5: if QA(H, MaxTime) outputs feasible solutions then
6: low ← mid + 1;
7: else
8: high ← mid − 1;
9: end if
10: until low > high;
5 Evaluation
We implemented our method with Python, where we utilize CPNTools [10] for
GUI software to create Petri net models and SNAKES [12] to represent Petri
net objects in Python programs. PyQubo is a useful software to convert Python
objects to the specific formats of annealing machines [13].
We evaluated the number of iterations in the binary search in Algorithm 1.
Figure 2 depicts the result. The horizontal axis indicates the number of jobs in
the job-shop scheduling problem instance, where jssx denotes that the number
of jobs is x. The vertical axis shows the number of iterations on average among
100 runs. The result shows the limited number of iterations we need to obtain
an optimal schedule even though we cannot confirm the optimality.
From the characteristics of quantum annealing, we sometimes fail to obtain
feasible solutions. Careful tuning of parameters A, B, and C in (26) and other
annealing parameters are required in practical use. Figure 3 shows the ratio of
infeasible solutions among 100 runs. The ratio becomes bigger for a larger size
of instances. In the experiment, we used the same parameters for different sizes
of instances. The ratio can be reduced when we tune the parameters according
to the problem size.
Fig. 2. Iterations in binary search Fig. 3. Ratio of infeasible solutions
6 Conclusion
We propose a colored timed Petri net-based Ising model formulation for job-shop
scheduling problems. Our approach can overcome the difficulty of Ising model
formulation for quantum annealing. This paper presents the formal composition
method, an illustrated example, and some results of the computational evalua-
tion for our binary search-based quantum annealing process. In the future, we
will extend the method to the multiple resource requirement problems. They are
more practical formulations than the conventional ones, but we need to reduce
the complexity of the generated model for efficient quantum annealing.
References
1. Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory
of NP-Completeness. W. H. Freeman & Co., New York (1990)
2. Hoos, H.H., Stützle, T.: 1 - introduction. In: Hoos, H.H., Stützle, T. (eds.) Stochas-
tic Local Search. The Morgan Kaufmann Series in Artificial Intelligence, pp. 13–59.
Morgan Kaufmann, San Francisco (2005). https://www.sciencedirect.com/science/
article/pii/B9781558608726500184
3. Gendreau, M., Potvin, J.Y.: Metaheuristics in combinatorial optimization. Ann.
Oper. Res. 140(1), 189–213 (2005). https://doi.org/10.1007/s10479-005-3971-7
4. Kadowaki, T., Nishimori, H.: Quantum annealing in the transverse Ising model.
Phys. Rev. E 58, 5355–5363 (1998). https://doi.org/10.1103/PhysRevE.58.5355
5. Farhi, E., Goldstone, J., Gutmann, S., Lapan, J., Lundgren, A., Preda, D.: A
quantum adiabatic evolution algorithm applied to random instances of an NP-
complete problem. Science 292(5516), 472–475 (2001). https://science.sciencemag.
org/content/292/5516/472
6. Johnson, M.W., et al.: Quantum annealing with manufactured spins. Nature

473(7346), 194–198 (2011). https://doi.org/10.1038/nature10012
7. Lucas, A.: Ising formulations of many np problems. Front. Phys. 2, 5 (2014)
8. Nakamura, M., Kaneshima, K., Yoshida, T.: Petri net modeling for Ising model
formulation in quantum annealing. Appl. Sci. 11(16), 7574 (2021). https://www.
mdpi.com/2076-3417/11/16/7574
9. Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4),
541–580 (1989)
10. Jensen, K., Kristensen, L.M., Wells, L.: Coloured petri nets and CPN tools for mod-
elling and validation of concurrent systems. Int. J. Softw. Tools Technol. Transf.
9(3), 213–254 (2007). https://doi.org/10.1007/s10009-007-0038-x
11. Venturelli, D., Marchand, D.J.J., Rojo, G.: Job shop scheduling solver based on
quantum annealing. arXiv:1506.08479v2 [quant-ph] (2016)
12. Pommereau, F.: SNAKES: a flexible high-level petri nets library (tool paper). In:
Devillers, R., Valmari, A. (eds.) PETRI NETS 2015. LNCS, vol. 9115, pp. 254–265.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19488-2 13
13. Tanahashi, K., Takayanagi, S., Motohashi, T., Tanaka, S.: Application of ising
machines and a software development for ising machines. J. Phys. Soc. Jpn 88(6),
061010 (2019). https://doi.org/10.7566/JPSJ.88.061010
Imbalanced Sample Generation
and Evaluation for Power System
Transient Stability Using CTGAN
Gengshi Han, Shunyu Liu, Kaixuan Chen, Na Yu, Zunlei Feng,

and Mingli Song(B)
Zhejiang University, Hangzhou 310007, Zhejiang, China

{hangengshi,liushunyu,chenkx,na yu,zunleifeng,brooksong}@zju.edu.cn
Abstract. Although deep learning has achieved impressive advances

in transient stability assessment of power systems, the insufficient and
imbalanced samples still trap the training effect of the data-driven
methods. This paper proposes a controllable sample generation frame-
work based on Conditional Tabular Generative Adversarial Network
(CTGAN) to generate specified transient stability samples. To fit the
complex feature distribution of the transient stability samples, the pro-
posed framework firstly models the samples as tabular data and uses
Gaussian mixture models to normalize the tabular data. Then we trans-
form multiple conditions into a single conditional vector to enable multi-
conditional generation. Furthermore, this paper introduces three evalu-
ation metrics to verify the quality of generated samples based on the
proposed framework. Experimental results on the IEEE 39-bus system
show that the proposed framework effectively balances the transient sta-
bility samples and significantly improves the performance of transient
stability assessment models.
Keywords: Power system · Transient stability · Sample generation ·

Conditional generative adversarial network
1 Introduction
Power system transient stability assessment is one of the most significant ways
to ensure the security and stability of power systems. It assesses the ability of a
power system to recover to the original secure state or transition to a new secure
state after withstanding a specific disturbance [1]. Therefore, fast and accu-
rate transient stability assessment is needed to deal with emergencies in time
and effectively ensure the secure operation of power systems. However, Time
Domain Simulation (TDS), a traditional method of transient stability assess-
ment, is extremely time-consuming due to the nonlinear complexity of power
systems [2]. In recent years, to improve the computational speed of assessment
models, several transient stability assessment methods based on deep learning

https://doi.org/10.1007/978-3-030-93247-3_55
556 G. Han et al.
are proposed [3–6]. These assessment methods are usually data-driven, and need
large-scale valid samples [7–10].
However, there are two problems that need to be addressed when training the
assessment model in practice. Firstly, the insufficient samples cannot effectively
represent the distribution of features, resulting in the risk of model overfitting.
Moreover, since the category distribution of samples is highly imbalanced, the
learning of the unstable samples is usually inhibited, leading to poor performance
of trained models on unstable samples.
To solve the insufficient and imbalanced samples and improve the perfor-
mance of transient stability assessment models, we use a sample generation
model to supplement the transient stability samples, especially the unstable
samples. Generative Adversarial Network (GAN) [11] is widely used in sample
generation tasks, which trains a generator and a discriminator in the adversar-
ial process. However, the generation process of GAN is uncontrollable, resulting
in a large number of unnecessary samples. Instead, Conditional GAN (CGAN)
realized a conditional generation mechanism based on the architecture of GAN
to generate required samples [12]. Furthermore, since transient stability samples
are usually recorded as tabular data, we focus on the dedicated CTGAN method
which implemented mode-specific normalization and conditional generation for
tabular data [13].
Therefore, this paper proposes an imbalanced sample generation framework
based on CTGAN for power system transient stability. Considering the struc-
tural characteristics of transient stability samples, the generation framework
firstly models the samples as tabular data, and uses the Gaussian Mixture Model
(GMM) to normalize the tabular data [14,15]. Multiple conditions, including the
transient stability and the load level, are converted into a single condition vec-
tor to enable multi-conditional generation. Besides, we design a multi-metric
evaluation to effectively evaluate the obtained sample generation framework.
The evaluation includes the effect of conditional generation, distance calcula-
tion, and the performance of transient stability assessment models trained with
generated samples. Case studies on the IEEE 39-bus system show that the pro-
posed framework can effectively balance the transient stability samples and sig-
nificantly improve the performance of transient stability assessment models.
2 Sample Generation Framework for Transient Stability

In this section, we detail the proposed sample generation framework based on
CTGAN for power system transient stability. As shown in Fig. 1, the proposed
sample generation framework first models transient stability samples as tabular
data, then transform the data using one-hot code and GMM normalization, and
finally train the CTGAN model.
2.1 Transient Stability Sample Representation

To construct appropriate input characteristics, we should not only consider
the correlation between characteristics and transient stability, but also consider
Imbalanced Sample Generation and Evaluation 557
whether the characteristics can be obtained in real time or quickly calculated in

actual power system. Assuming that there is no another fault in the transient
process, the transient stability of the power system has been determined at the
moment of fault removal. Therefore, we take the values at the moment of fault
clearing as the representation of transient stability samples. A transient stability
sample is represented by the voltage magnitude and voltage angle of bus nodes,
active power and reactive power of load nodes, active power and reactive power
of generator nodes at the moment of fault clearing.
Generation
[ 0 0 ... 0 1 0 ... 0 0 ]
One-hot code of condition
Normalization with GMM

Power system Tabular data of transient stability samples
Train data
Condition, z
Generator Discriminator
CTGAN model training
Fig. 1. Illustration of the proposed imbalanced sample generation framework based on

CTGAN for power system transient stability.
2.2 Transformation of Multi-condition Vector
With the basic idea of conditional generation, the transient stability and the load
level of transient stability samples are used as generation conditions to realize
multi-conditional generation. However, in common CGANs, the conditional vec-
tor is a one-hot code, which can only represent a single condition. Therefore, a
simple transformation method for multi-condition vector is designed in the pro-
posed generation framework, which aims to convert multiple condition vectors
into a single condition vector:
cond1 ⊕ cond2 ⊕ · · · ⊕ condn (1)
where cond represents conditions, n represents the number of conditions, and ⊕

represents the operation of serially concatenate. The specific principle is that n
condition vectors can be serially concatenated, and then transformed into one
condition vector as the condition input of CTGAN model.
558 G. Han et al.
2.3 Normalization with GMM
To eliminate the dimensional influence between different characteristics, it

is important to transform the samples through appropriate methods before
inputting them into the model for training. Transient stability samples are com-
posed of the feature values of bus, load, and generator nodes. However, these
continuous values cannot be normalized by one-hot code.
Considering the complex distribution of transient stability samples, the gen-
eral min-max normalization is unable to fit the complex distribution. Therefore,
when processing transient stability samples, the variational GMM is used to pro-
cess continuous values to fit the complex distribution of each feature. The basic
steps of the normalization are elaborated as follows:
Learning GMM. For each continuous column Ci , we use a variational Gaussian

mixture model to learn a GMM distribution:
mi

PCi (ci,j ) = μk N (ci,j ; ηk , φk ) (2)
k=1
where mi is the number of modes, μk , ηk and φk are weight, mean value and
standard deviation of the k th mode, respectively.
Calculating Probability Density. For each value ci,j in column Ci , we cal-

culate the probability density of each mode:
ρk = μk N (ci,j ; ηk , φk ) (3)
Normalization. We find the highest ρk in mi modes and normalizing it. For

instance, if the highest probability density η2 in three modes η1 , η2 , η3 , the value
ci,j can be transformed to a one-hot code [0, 1, 0] and a scalar βi,j = (ci,j −
η2 )/4φ2 normalized to [−1, 1].
2.4 CTGAN-Based Network
We adopt CTGAN model as the basic sample generation model, which includes
a generator and a discriminator. And we construct the generator and the dis-
criminator with fully connected layers respectively.
The processed transient stability samples are applied as the training input
of the constructed CTGAN-based network. In the training process, the discrim-
inator and generator are trained by turns to obtain the model for the sample
generation framework. To test the model, we apply it to the generation task
of transient stability samples with labels., And we can also control the gener-
ating conditions to generate samples with specific labels purposefully, such as
controlling the model to generate transient unstable samples.
3 Multi-metric Evaluation
After realizing the generation framework of power system transient stability sam-
ples, it is necessary to evaluate the generation framework. This paper designs a
multi-metric evaluation for the transient stability samples generation framework.
As shown in Fig. 2, the evaluation is composed of the following three metrics: the
effect of conditional generation, the distance between real samples and gener-
ated synthetic samples, and the performance of assessment models trained with
generated samples.
Evaluation
control generate
Conditional generation
dimension reduction
and binning calculate
Discrete probability distributions

Distance calculation
train
test
Performance of assessment models
Fig. 2. Illustration of three evaluation metrics for the sample generation framework
based on CTGAN.
3.1 Conditional Generation
The power system transient stability sample generation framework should have
the ability to control the transient stability and the load level characteristics
of power system samples in generating process. By comparing the proportions
of transient stability samples that generated under different settings (without
setting conditions, setting conditions as transient stable, and setting conditions
as transient unstable), the condition generation ability of the transient stability
can be evaluated. The same is true for the evaluation of the load level condition.
3.2 Distance Calculation
Without setting the generating conditions, the generated samples should be sim-
ilar to the real samples as much as possible. Therefore, calculating the similarity
560 G. Han et al.
or distance between the two distributions is an efficient metric for evaluating the
generation framework.
First, dimensionality reduction methods, such as Principal Component Anal-
ysis (PCA) [16], should be used to reduce the dimension of the samples to some
appropriate degree. Second, we convert the dimensionality reduced samples into
discrete probability distributions through the binning operation. Finally, we cal-
culate the distance between the probability distribution of synthetic samples and
real samples. Common methods for measuring the similarity between two distri-
butions are adopted to calculate the distance between the distributions, such as
KL divergence, JS divergence, and Wasserstein distance [17].
3.3 Performance of Assessment Models
To evaluate the generated samples more practically, the performance of the tran-
sient stability assessment model trained with generated samples is a proper met-
ric. Some classical networks for classification are selected as the power system
transient stability assessment model, the generated samples are used for the
training of assessment models, and the performance is obtained by testing the
assessment models.
More specifically, the real dataset of transient stability samples is randomly
divided into Strain and Stest . We randomly generate Sgen and get the united set
Sunion = Strain + Sgen . And Strain , Sgen , and Sunion are used for the training of
the transient stability assessment models respectively to obtain different assess-
ment models and the models are tested on Stest . These models are tested on
the real test set to obtain the accuracy, recall rate of transient stable samples,
recall rate of transient unstable samples. And these test scores can be used as
the evaluation metric to evaluate the quality of the generation framework.
4 Experiment
In this section, we study our proposed framework on the classical IEEE 39-bus
power system [18] and show its excellent performance by evaluating the effect
of conditional control, calculating the distance between distributions and the
scores of assessment models trained with generated samples.
Time Domain Simulation Samples. Matpower [19] and Power System Anal-
ysis Toolbox (PSAT) [20] are applied to obtain the original dataset of real sam-
ples, taking the IEEE 39-bus system as the basic system. The power system
contains 39 buses, 10 generators, 19 loads and 46 transmission lines. For simu-
lating the transient stability samples, we adopt the following principles:
1. Randomly changing both active and reactive power of all loads from 60% to
145% of basic load level.
2. Using the matpower to compute the optimal power flow for the next TDS.
3. Randomly selecting a fault line, setting a three-phase grounding fault from
20% to 80% and clearing it after a time from 1/60 to 1/3 s.
4. Using the PSAT to do time domain simulation for 10 s.
5. Labeling the stability of generated sample by values of generators after TDS.
With the simulation operations above performing, we generate a total of 14,221

transient stability samples that include 11,510 stable samples and 2,711 unstable
samples as the original dataset.
Generation Model Training. CTGAN is used as the primary sample gener-

ation model, which includes a generator and a discriminator. In the generator,
two fully connected layers are used, and each fully connected layer is equipped
with a batch normalization layer and a ReLU activation layer. The tanh and
softmax activation functions are used for the output layer. In the discriminator,
two fully connected layers are used, and the dropout layer is used to filter the
nodes appropriately to reduce overfitting.
4.2 Evaluation Metrics
The CTGAN-based generation framework of power system transient stability

samples is trained with the simulated samples as the training set. After that, it is
necessary to evaluate the quality of the generation framework. This paper designs
a multi-metric evaluation for the generation framework of transient stability
samples, composed of three evaluation metrics.
The Effect of Conditional Generation. We evaluate the ability to control

the transient stability and the load level of power system samples in generating.
Table 1 shows the result of conditional generation with different transient sta-
bility condition settings. We set the conditions as follows: no condition, stable,
and unstable. When the condition is set as transient stable, the proportion of
stable samples generated is increased by 18.7% compared with the samples gen-
erated without condition. When the condition is set as unstable, the increment
is 48.8%. The result shows that the transient stability ratio of the generated
samples can be effectively controlled, and the framework can effectively balance
the transient stability samples by generating more unstable samples.
Table 1. The result of generation with different transient stability conditions.
Condition Stable proportion (%) Unstable proportion (%)

Without condition 59.92 40.08
With condition (stable) 71.10 28.90
With condition (unstable) 40.38 59.62
562 G. Han et al.
Moreover, the result of conditional generation with different load level con-
dition settings is shown in Table 2. We set the conditions as no condition, and
as 18 load levels (60% to 145%, with a step of 5%). We count the number of
samples of corresponding load level in the generated samples under the control
of generation conditions, and calculate the proportion for comparison. When the
condition is set to a specific load level, the proportion of the corresponding load
level generated will be higher than that of the samples generated without condi-
tion. The results show that the generation framework can effectively control the
load level proportion of the generated samples.
Table 2. The result of generation with different load level conditions.
Condition Generated without Generated with load Rate of

condition (%) level condition (%) improvement (%)
70% 2.49 3.35 34.54
80% 3.89 4.64 19.32
90% 2.70 4.92 81.90
100% 3.38 4.59 36.09
110% 5.81 8.86 52.60
120% 2.51 2.58 2.67
130% 2.50 4.60 84.53
140% 0.54 1.81 233.76
The Distance Between Real and Generated Sample Distribution. The

generated samples should be similar to the real samples as much as possible.
Therefore, calculating the distance between the two distributions is an efficient
metric for evaluating the generation framework.
Table 3 shows the results of JS divergence and Wasserstein distance calculated
between distributions. We randomly select 2,000 samples from real samples as
A B
set Sreal , repeat the operation to get Sreal , and generate 2,000 samples as set
A
Sgen . From Table 3, we can see that the distance between Sreal and Sgen and
A B
the distance between Sreal and Sreal are in the same order of magnitude, which
means that the samples generated by the generation framework are similar to
the real samples in these three distance measurements.
The Performance of Assessment Models Trained with Generated Sam-

ples. The performance of assessment models trained with generated samples is
a valuable metric. In this paper, we select Multilayer Perceptron (MLP) and
Decision Tree (DT) as the power system transient stability assessment models
for training and testing, since they are classical network models for classification
Table 3. The distance between distributions.
Distributions JS divergence Wasserstein distance

A B
Sreal , Sreal 0.002826 0.001429
A B
Sreal , Sreal 0.063141 0.006388
A B
Sreal , Sreal 0.063084 0.005939
problems. The hidden layer size of MLP is 200, and the max number of iterations
is 500. The max depth of DT is 100.
Table 4 shows the test results of assessment models trained with different
datasets. We randomly divide the real dataset into Strain with 8,533 samples
and Stest with 5,688 samples. We randomly generate Sgen with 8,533 samples
and get the united set Sunion = Strain + Sgen . Note that Strain , Sgen , and
Sunion are used for the training of the transient stability assessment models
respectively to obtain different assessment models and the models are tested on
Stest . The scores of the model trained with Sgen are lower than the scores of the
model trained with Strain . However, the scores of the model trained with Sunion
are higher than the scores of the model trained with Strain . The recall rate of
unstable samples is increased by 1.48% in DT, and is increased by 2.74% in MLP.
The results show that adding the generated samples into the train set is able to
improve the performance of transient stability assessment models, especially for
the unstable label, which is the scarce class in the train set.
Table 4. The test results of assessment models trained with different datasets. RecallP
is the recall rate of stable samples, and RecallN is the recall rate of unstable samples.
Assessment model Train dataset RecallP RecallN F1 score Accuracy

DT Strain 0.9770 0.9348 0.9813 0.9694
DT Sgen 0.9430 0.6856 0.9376 0.8969
DT Sunion 0.9788 0.9486 0.9837 0.9734
MLP Strain 0.9883 0.9261 0.9861 0.9771
MLP Sgen 0.7719 0.8061 0.8509 0.7780
MLP Sunion 0.9832 0.9515 0.9863 0.9775
5 Conclusion
In this paper, we attempt to solve the imbalanced distribution and insufficient
samples in the research of power system transient stability assessment. We pro-
pose a CTGAN-based controllable sample generation framework for transient
stability. In the generation framework, firstly, the transient stability samples are
processed into tabular data. Then the transient stability and load level are con-
verted into the conditional vector and the variational Gaussian mixture model
564 G. Han et al.
is used to fit and normalize the tabular data. And finally train the CTGAN
model with processed samples. Moreover, we design a multi-metric evaluation
to effectively evaluate the generation framework from three aspects: the effect
of conditional generation, the distance between real and generated sample dis-
tribution, and the performance of the assessment model trained with generated
samples. Experiments demonstrate that samples generated through the proposed
generation framework are valid and effective in multiple metrics.
Acknowledgement. This work is funded by National Key Research and Develop-

ment Project (Grant No: 2018AAA0101503) and State Grid Corporation of China
Scientific and Technology Project: Fundamental Theory of Human-in-the-loop Hybrid-
Augmented Intelligence for Power Grid Dispatch and Control.
References
1. Wei, W., Yong, T., Huadong, S., Shiyun, X.: A survey on research of power sys-
tem transient stability based on wide-area measurement information. Power Syst.
Technol. 36(9), 81–87 (2012)
2. Tang, C., Graham, C., El-Kady, M., Alden, R.: Transient stability index from
conventional time domain simulation. IEEE Trans. Power Syst. 9(3), 1524–1530
(1994)
3. Gao, K., Yang, S., Liu, S., Li, X.: Transient stability assessment for power system
based on one-dimensional convolutional neural network. Autom. Electric Power
Syst. 43(12), 18–26 (2019)
4. Li, N., Li, B., Gao, L.: Transient stability assessment of power system based on
XGBoost and factorization machine. IEEE Access 8, 28403–28414 (2020)
5. Tacchi, M.: Model based transient stability assessment for power systems. In: Euro-
pean Control Conference, p. 328 (2020)
6. Hu, W., et al.: Real-time transient stability assessment in power system based on
improved SVM. J. Mod. Power Syst. Clean Energy 7(1), 26–37 (2019)
7. Wang, B., Fang, B., Wang, Y., Liu, H., Liu, Y.: Power system transient stability
assessment based on big data and the core vector machine. IEEE Trans. Smart
Grid 7(5), 2561–2570 (2016)
8. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing & Optimization.
9. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization.
10. Vasant, P., Zelinka, I., Weber, G.W.: Intelligent Computing and Optimization.
11. Goodfellow, I.J., et al.: Generative adversarial nets. In: Annual Conference on
Neural Information Processing Systems, pp. 2672–2680 (2014)
12. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint
arXiv:1411.1784 (2014)
13. Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tab-
ular data using conditional GAN. In: Annual Conference on Neural Information
Processing Systems, pp. 7333–7343 (2019)
14. Anzai, Y.: Pattern Recognition and Machine Learning. Elsevier, Amsterdam (2012)
15. Tsukakoshi, K., Ida, K.: Analysis of GMM by a gaussian wavelet transform. In: Pro-
ceedings of the Conference on Systems Engineering Research, pp. 467–472 (2012)
16. Yata, K., Aoshima, M.: Principal component analysis based clustering for high-
dimension, low-sample-size data. arXiv preprint arXiv:1503.04525 (2015)
17. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks.
In: International Conference on Machine Learning, pp. 214–223 (2017)
18. Pai, M.: Energy Function Analysis for Power System Stability. Springer, Boston
(2012)
19. Zimmerman, R.D., Murillo-Sánchez, C.E., Thomas, R.J.: MATPOWER: steady-
state operations, planning, and analysis tools for power systems research and edu-
cation. IEEE Trans. Power Syst. 26(1), 12–19 (2010)
20. Ayasun, S., Nwankpa, C.O., Kwatny, H.G.: Voltage stability toolbox for power
system education and research. IEEE Trans. Educ. 49(4), 432–442 (2006)
Efficient DC Algorithm for the Index-Tracking
Problem
F. Hooshmand(&) and S. A. MirHassani
Department of Mathematics and Computer Science, Amirkabir University of

Technology (Tehran Polytechnic), Tehran, Iran
{f.hooshmand.khaligh,a_mirhassani}@aut.ac.ir
Abstract. Index tracking is one of the successful strategies in the portfolio

management. This paper reviews three well-known models of index tracking
problem, namely return-based, value-based, and beta-based models, and com-
pares their performance in terms of the tracking accuracy on in-sample and out-
of-sample data over real instances. Due to the low tracking error of the portfolio
obtained by the value-based model, and NP-hardness of this problem, an effi-
cient iterative method based on the difference of convex functions is proposed to
find high-quality feasible solutions within a short amount of time. Computa-
tional results over real-world instances confirm the effectiveness of the proposed
method.
Keywords: Index tracking problem Value-based model Difference of

convex functions Iterative DC algorithm
1 Introduction
Optimization techniques are successfully applied in finance. See [1, 2], and [3]. Spe-
cially, portfolio management is a popular research filed in optimization and includes
active and passive strategies. In the active strategy, the fund manager frequently (e.g.,
daily and weekly) checks the status of the portfolio and rebalances it based on the
technical and fundamental analyses. However, in the passive strategy, a suitable
portfolio is constructed and kept unchanged for a long time. One of the well-known
methods of passive management is the index tracking (IT) approach which constructs a
portfolio mirroring the index of the market as closely as possible while containing a
limited number of assets. Due to the long-time growth of the return of the market index,
it is expected that an IT portfolio leads to an appropriate return during a long time.
The IT problem has received great attention from researchers and depending on the
function used to calculate the tracking-error, different optimization models have been
presented in the literature. Concerning the tracking-error function, existing formula-
tions can be classified into return-based, value-based, and beta-based models. The aim
of return-based models is to construct a portfolio, the return of which over historical
data has minimum deviation from the return of the index. For some related works, see
Gaivoronski et al. [4], Mezali and Beasley [5], Sant’Anna et al. [3], and Moeini [6].
Value-based models create a portfolio, the value of which over historical data has
minimum deviation from the value of index scaled by a constant factor. For example,

https://doi.org/10.1007/978-3-030-93247-3_56
Efficient DC Algorithm for the Index-Tracking Problem 567
see Gaivoronski et al. [4] and Guastaroba and Speranza [7]. Beta-based models aim at
minimizing the distance between the beta-coefficients associated with the portfolio and
the market index. For example, see Canakgoz and Beasley [8] and Chen and Kwon [9].
Enhanced IT (EIT) problem is an extension of IT aiming at outperforming the market
index. For example, Guastaroba, et al. [10] proposed a model based on the omega ratio
for the EIT problem. Filippi, et al. [11] used two objectives including the maximization
of the expected return and the minimization of the tracking error. Due to the NP-
hardness of IT and EIT problems, they are mostly solved via heuristics and meta-
heuristics such as genetic algorithm [3], and kernel search [7, 11].
The main contributions of this paper are as follows: First, the IT models are
reviewed and good performance of the value-based model is justified by evaluating its
tracking-error over in-sample and out-of-sample data. Then, to overcome the difficulty
of solving the value-based model, an efficient method based on the difference of convex
(DC) functions is proposed. The most relevant paper to our work is the study of Moeini
[6] who proposed a DC algorithm for the return-based model. We improve the method
of Moeini [6] by introducing a combinatorial cut and a quality-regulator constraint, and
embedding the basic DC algorithm into an iterative method. Computational results over
real-world instances confirm the superiority of our method over that of Moeini [6] in
terms of the solution quality. The results indicate that our method can achieve high
quality solutions in a short amount of time.
The rest of this paper is organized as follows: Sect. 2 provides an overview on
different formulations of IT problem. Section 3 reviews the theory of DC programming.
Section 4 reformulates the value-based model as a DC program for which an efficient
iterative DC-based algorithm is presented in Sect. 5. The performance of our algorithm
is investigated in Sect. 6. Finally, Sect. 7 concludes and offers directions for future
research.
2 Comparative Consideration of IT Models
2.1 IT Models
Let I (indexed by i) be the set of assets in the market and consider T (indexed by t; t0 ) as
the set of previous time periods for which the historical returns are available. An
investor intends to construct a portfolio containing C assets provided that the fraction
of capital invested in each selected asset i belongs to the range ½li ; ui and the index is
mirrored as closely as possible. Short-selling is not allowed and due to the assumption
that the portfolio is kept unchanged for a long time, the transaction costs are neglected.
Consider r i;t and r 0t , respectively, as the return of asset i and the index return in time
period t, and let bi be the beta coefficient of asset i. In what follows, we present return-
based, value-based, and beta-based models and compare their efficiency in terms of the
tracking error. We refer to these models as RM, VM, and BM, respectively, for short.
Model RM
The model RM minimizes the sum of squared difference between the portfolio and
index returns over historical periods t 2 T. Decision variables are defined as follows:
568 F. Hooshmand and S. A. MirHassani
di Binary variable that is 1 if asset i is selected, 0 otherwise.

xi The fraction of the capital invested in the selected asset i.
RM is formulated as the following mixed integer nonlinear program (MINLP):
ðRMÞ
!2
X
T X
min r i;t xi r 0t ð1Þ
t¼1 i2I
s:t:
X
di ¼ C ð2Þ
i2I
X
xi ¼ 1 ð3Þ
i2I
l i di x i ui di 8i 2 I ð4Þ
xi 0 8i 2 I ð5Þ
di 2 f0; 1g 8i 2 I ð6Þ
Model VM
The model VM minimizes the sum of squared difference between the portfolio and
index values over historical periods t 2 T assuming that the index value equals 1 (i.e.
the amount of investor’s capital) at the beginning of the first period. VM is formulated
as the following MINLP:
ðVMÞ
! !2
X X Y
t Y
t
min 1 þ r i;t0 xi 1 þ r 0t0 ð7Þ
t2T i2I t0 ¼1 t0 ¼1
s:t: ð2Þð6Þ
Model BM
BM is formulated as the following MINLP minimizing the difference between the beta
coefficients of the portfolio and the market index:
ðBMÞ
!2
X
min bi xi 1 ð8Þ
i2I
s:t: ð2Þð6Þ
2.2 Evaluation of Models

Here, the performance of models RM, VM, and BM are evaluated on four datasets
(available at https://or-brescia.unibs.it/instances) associated with the weekly returns of
index FTSE100 (composed of 100 assets). Each dataset consists of 104 weeks of in-
sample observations, and 52 weeks of out-of-sample ones. The in-sample information
is included into the model, and then, the portfolio obtained by the model is evaluated
over the out-of-sample realizations. The datasets are named based on the market trends
(increasing or decreasing) over in-sample and out-of-sample periods as down-down
(DD), down-up (DU), up-down (UD), and up-up (UU). Two values C = 10, 15 are
examined for the cardinality. All models are solved via solver BARON, and three
criteria, namely in-sample absolute deviation (ISAD), out-of-sample absolute deviation
(OSAD), and out-of-sample lower deviation (OSLD) are used to evaluate the tracking
errors of the portfolios obtained by each model. Considering x as the optimal portfolio,
obtained by a given model, the aforementioned criteria are calculated as (9)-(11), where
ðuÞ ¼ maxðu; 0Þ. The results are provided in Table 1.
P Q Qt !
1 X t 0
i2I t0 ¼1 1 þ r i;t0 xi t0 ¼1 1 þ r t0
104
ISAD ¼ Qt 100 ð9Þ
104 t¼1 0
t0 ¼1 1 þ r t0

P Q Qt !
1 X t 0
i2I t0 ¼1 1 þ r i;t0 xi t0 ¼1 1 þ r t0
156
OSAD ¼ Qt 100 ð10Þ
52 t¼105 0
t0 ¼1 1 þ r t0

P Qt Q !
1 X156
t0 ¼1 1 þ r i;t0 xi tt0 ¼1 1 þ r 0t0
OSLD ¼ i2I
Qt 0
100 ð11Þ
52 t¼105 t0 ¼1 1 þ r t0
Table 1. Comparison of models RM, VM and BM

ID C ISAD ð%Þ OSAD ð%Þ OSLD ð%Þ
RM VM BM RM VM BM RM VM BM
DD 10 10.2 0.6 8.7 1.7 2.8 2.6 0.6 0.2 1.9
DU 10 4.2 0.6 3.5 3.5 1.3 2.8 3.5 0.9 1.5
UD 10 4.2 0.7 12.2 5.5 9.3 11.2 5.3 0.0 1.7
UU 10 3.9 0.4 9.4 1.4 1.6 3.9 1.3 1.5 0.6
DD 15 1.8 0.5 3.2 2.3 2.4 1.6 0.0 0.0 0.6
DU 15 4.0 0.4 1.8 1.8 1.1 3.5 0.7 0.5 3.4
UD 15 2.9 0.5 6.9 8.1 5.9 6.5 0.0 0.0 3.2
UU 15 2.3 0.3 6.5 3.2 3.1 2.3 3.2 0.6 0.6
Ave. 4.2 0.5 6.5 3.4 3.4 4.3 1.8 0.5 1.7
As can be seen, the model VM has better performance regarding tracking errors
over in-sample and out-of-sample data. It is worth mentioning that the optimal solution
to BM is achieved in about 2 s; however, the resolution process of the models RM and
VM via solver BARON is time-consuming and it is stopped with a time limit of 1000 s,
and the best solutions found are utilized. The good performance of the model VM, on
the one hand, and the difficulty of solving it via optimization solvers, on the other hand,
motivate us to propose an efficient DC-based algorithm to solve it.
3 A Review on DC Programming
This section provides some basic concepts of DC programming and presents a brief
review on the classic DC algorithm (DCA). For a comprehensive detailed description,
see Dinh and Le Thi [12]. Let h : Rn ! R be a convex function with the domain
dom h ¼ f x 2 Rn : hð xÞ\ þ 1g. The sub-differential of h at x0 2 dom h, denoted by
@hðx0 Þ, is stated as below where ha; bi refers to the inner product of a and b.
@hðx0 Þ ¼ f y 2 Rn : hð xÞ hðx0 Þ þ hx x0 ; yi 8x 2 Rn g
Additionally, the conjugate of hð xÞ, denoted by h ð yÞ, is defined as follows:

h ð yÞ ¼ sup xT y hð xÞ 8y 2 Rn
x
Considering g and h as lower semi-continuous convex functions, the standard form

of a DC program, denoted by Pdc , is defined as follows and the functions g and h are
called DC components:
Pdc : inf f f ð xÞ ¼ gð xÞ hð xÞ : x 2 Rn g
The convention þ 1 ð þ 1Þ ¼ þ 1 is used and hence, dom f ¼ dom g.

It is worth mentioning that if the set X is convex, the problem
inf f f ð xÞ ¼ gð xÞ hð xÞ : x 2 Xg
can be restated as a standard DC program, as demonstrated below:
inf fgð xÞ þ vX ð xÞ hð xÞ : x 2 Rn g
where vX ð xÞ is an indicator function that is 0 if x 2 X, otherwise þ 1.

The dual program associated with Pdc is formulated as below:
Ddc : inf fh ð yÞ g ð yÞ : y 2 Rn g
The point x 2 dom f is said to be a critical point to Pdc if

@gðx Þ \ @hðx Þ 6¼ ;
With respect to above definitions, the following theorem states the necessary
condition for local optimality.
Theorem 1. If x is a local optimal solution to Pdc , then @hðx Þ @gðx Þ.
Proof. See Dinh and Le Thi [12].
The general framework of DC algorithm (DCA) for the standard DC program Pdc is
provided in Algorithm 1.

Theorem 2. The sequence xðkÞ , obtained by DCA, converges to a critical point for

any arbitrary starting point xð0Þ , and the sequence g xðkÞ h xðkÞ is decreasing.
Proof. See Dinh and Le Thi [12].

Corollary 1. If h (see Pdc ) is differentiable, the sequence xðkÞ , obtained by DCA,
converges to a critical point to Pdc , satisfying the necessary local optimality condition.

Proof. Theorem 2 indicates that the sequence xðkÞ , obtained by DCA, converges to a
critical point x . Thus, we have @gðx Þ \ @hðx Þ 6¼ ;. However, since h is a differen-
tiable function, @hðx Þ is a singleton set, and accordingly @hðx Þ @gðx Þ. Thus, x
satisfies the necessary condition for local optimality, stated in Theorem 1.
DCA has been successfully applied to combinatorial optimization and different
classes of hard non-convex problems. For a comprehensive overview, see [13].
4 Adopting DCA to Solve VM
In this section, first the MINLP model VM is equivalently reformulated as a DC

program. Then, DCA is adopted to solve it.
4.1 Reformulation of VM as a DC Program
P relaxing the binary restriction of variable di and adding the constraints

By
i2I di ð1 di Þ 0, and 0 di 1, VM is equivalently reformulated as VM′.
ðVM0 Þ
! !2
X X Y
t Y
t
0
min 1 þ r i;t0 xi 1 þ r t0
t2T i2I t0 ¼1 t0 ¼1
s:t: ð2Þð5Þ
X
di ð 1 di Þ 0
i2I
0 di 18i 2 I
P
Further, since i2I di ð1 di Þ is a nonnegative concave function, considering m as
a sufficiently large positive number, the model VM′ can be equivalently reformulated as
VM″, which is a DC program. This idea is taken from Moeini [6].
ðVM00 Þ
! !2 !
X X Y
t Y
t X
min 1 þ r i;t0 xi 1 þ r 0t0 m di ð 1 di Þ
t2T i2I t0 ¼1 t0 ¼1 i2I
s:t: ð2Þð5Þ; 0 di 1 8i 2 I
Therefore, the DC program VM″ is equivalent to VM and can be solved via DCA.
Since DC components of the objective functions of VM″ are differentiable, with respect
to Corollary 1, DCA converges to a solution satisfying the necessary local optimality
condition. Algorithm 2 shows that how DCA is adopted for VM″.
The main advantage of DCA is its short running time; however, it does not
guarantee the global optimality of the solution. The quality of the solution obtained by
DCA depends on the starting point. Therefore, Moeini [6] suggested that instead of
running DCA only once, it is implemented multiple times for different starting points,
and finally, the best solution found is returned. For this purpose, he examined five
starting points: 1) the optimal solution to the linear programing relaxation of the
original model, 2) a modified solution obtained by rounding the previous starting point,
3) the vectors with all 0 entries, 4) the vector with all 0.5 entries, and 5) the vector with
all 1 entries. In the following section, we present a novel iterative algorithm based on
DCA which is superior to the method of Moeini [6] in terms of solution quality.
5 Iterative DCA-Based Algorithm
In our new algorithm, DCA is implemented on the following model (instead of VM″)
which we refer to as restricted VM″ (RVM″).
ðRVM00 Þ
! !2 !
X X Y
t Y
t X
min 1 þ r i;t0 xi 1 þ r 0t0 m di ð 1 di Þ
t2T i2I t0 ¼1 t0 ¼1 i2I
s:t: ð2Þð5Þ; 0 di 1 8i 2 I
X X
ð 1 di Þ þ di 1 8s 2 S ð12Þ
hsi hsi
i:di ¼1 i:di ¼0
! !2
X X Y
t Y
t
1 þ r i;t0 xi 1 þ r 0t0 RHS ð13Þ
t2T i2I t0 ¼1 t0 ¼1
The combinatorial cut (12) removes the solution dhsi from the feasible region, and
the quality-regulator cut (13) ensures that the objective function value of the model VM
is less than or equal to RHS. At the beginning of the algorithm, these cuts are not
contained in RVM″, and they are involved as the algorithm proceeds. Moreover, the
parameter UB is defined as the objective value of the best feasible solution, identified
so far for the model VM, and it is initialized at þ 1.
At each iteration s of the algorithm, DCA is implemented on RVM″ by starting
from a given starting point, the detail of which is discussed in Remark 1 (note that at
the beginning of the algorithm, due to the absence of the cuts (12) and (13), RVM″ is
~
similar to VM″). The solution returned by DCA is denoted by d; ~x . If at least one
component of ~d violates the binary restriction, those components which are sufficiently
close to 0 (resp. 1) are replaced by 0 (resp. 1). See Eq. (14) where n is a given accuracy.
8
< 0 ~
di \n
~di :¼ 1 1~ di \n ð14Þ
:
remains unchanged otherwise
Then, the following model, namely FixedVM, is solved. It is a restricted version of

the model VM that fixes the variable di at ~di provided that ~
di takes binary value.
ðFixedVMÞ
! !2
X X Y
t Y
t
min 1 þ r i;t0 xi 1 þ r 0t0
t2T i2I t0 ¼1 t0 ¼1

s:t: ð2Þð6Þ; di ¼ ~di 8i 2 I : ~di ¼ 1 or ~
di ¼ 0
Since a high percentage of binary variables of the model FixedVM are fixed, its
optimal solution is achievable in a short amount of time via optimization solvers. We
denote the optimal solution and the optimal objective value to model FixedVM by

dhsi ; xhsi and zhsi , respectively. If zhsi \UB, then UB and RHS are correspondingly
updated as UB :¼ zhsi and UB 2 . Otherwise, RHS is increased by a small amount D, and
accordingly, the cut (13) is updated. Further, the cut (12) is involved in RVM″ to remove
the solution dhsi from the feasible region so that at the next resolution of RVM″, the
solution dhsi is not generated again. Afterward, DCA is implemented on the model
RVM″ and the same process is repeated until RHS becomes greater than UB. Finally, the
best found feasible solution is returned. We refer to this new iterative DCA-based
algorithm as NIDCA, the main steps of which are summarized in Algorithm 3.
Remark 1. In the first iteration of NIDCA, the starting point, needed in DCA, is
determined randomly as dIi :¼ round ðU ð0; 1ÞÞ, and xIi ¼ 0, where U ð0; 1Þ refers to the
uniform distribution on the interval ð0; 1Þ, and round ðuÞ returns the nearest integer
value to u. However, in other iterations, the feasible solutions found in the previous
iterations are also taken
into account in the construction
of the starting point. Indeed,
P hsi
we set di :¼ round U ð0; 1Þ þ s1 s\s di
I 1
, and xi ¼ 0. Additionally, the parame-
I
ters n and s are experimentally calibrated as n :¼ 104 and s :¼ 5.
6 Computational Results
This section evaluates the performance of the proposed algorithm compared to the
method of Moeini [6] over instances of index FTSE100. Experiments are carried out on
a laptop running Windows 10 operating system with a Core(TM) i7 processor, and
16 GB of RAM. The proposed algorithm is coded in the GAMS mathematical
modeling-language 28.2.0 [14], and the BARON solver, included in GAMS, is used to
solve all optimization models. The results are reported in Table 2 where the columns
labeled by zM and zNIDCA show the objective value associated with the best solutions
obtained by the algorithm of Moeini [6] and our algorithm NIDCA, respectively. The
column, labeled by Imp, calculated as Imp ¼ ððzM zNIDCA Þ=zM Þ 100, indicates the
amount of improvement in the objective achieved if NIDCA is used instead of the
method of Moeini [6]. The columns, labeled by T M and T NIDCA represent the time (in
second), spent by the algorithm of Moeini [6], and our algorithm NIDCA, respectively.
Table 2. Evaluation of the performance of the proposed algorithm

ID C zM zNIDCA Imp (%) T M (s) T NIDCA (s)
DD 10 0.5193 0.0133 97 19 126
DU 10 0.0086 0.0067 22 18 101
UD 10 0.0360 0.0226 37 17 96
UU 10 0.5193 0.0135 97 18 132
DD 15 0.0250 0.0059 76 16 166
DU 15 0.0054 0.0033 39 17 156
UD 15 0.0567 0.0094 83 17 78
UU 15 0.0110 0.0021 81 19 109
Ave. – – – 67 18 120
As can be seen in Table 2, our algorithm outperforms the algorithm of Moeini [6]
by an averaged improvement of 67%, with just a reasonable increase in the running
time.
7 Conclusions
In this paper, a novel iterative DC-based algorithm was presented to solve the model
VM, and computational results indicated that the proposed method outperforms the
algorithm of Moeini [6] with an averaged improvement of 67%. The proposed
approach is applicable to model RM and other mixed-integer linear programming
problem (MILP) and MINLP models appeared in different applications of operations
research. The integration of the proposed method with the well-known feasibility-pump
algorithm as a hybrid algorithm for quickly finding high-quality feasible solutions to
hard MILP and MINLP optimization problems is suggested for future work.
References
1. Ozer, F., Toroslu, I.H., Karagoz, P., Yucel, F.: Dynamic programming solution to ATM cash
replenishment optimization problem. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO
2018. AISC, vol. 866, pp. 428–437. Springer, Cham (2019). https://doi.org/10.1007/978-3-
030-00979-3_45
2. Pholkerd, P., Auephanwiriyakul, S., Theera-Umpon, N.: Companies trading signs prediction
using fuzzy hybrid operator with swarm optimization algorithms. In: Vasant, P., Zelinka, I.,
Weber, G.-W. (eds.) ICO 2019. AISC, vol. 1072, pp. 420–429. Springer, Cham (2020).
https://doi.org/10.1007/978-3-030-33585-4_42
3. Sant’Anna, L.R., Filomena, T.P., Guedes, P.C., Borenstein, D.: Index tracking with
controlled number of assets using a hybrid heuristic combining genetic algorithm and non-
linear programming. Ann. Oper. Res. 258, 849–867 (2017)
4. Gaivoronski, A.A., Krylov, S., Van der Wijst, N.: Optimal portfolio selection and dynamic
benchmark tracking. Eur. J. Oper. Res. 163, 115–131 (2005)
5. Mezali, H., Beasley, J.E.: Index tracking with fixed and variable transaction costs. Optim.
Lett. 8, 61–80 (2014)
6. Moeini, M.: Solving the index tracking problem: a continuous optimization approach. Cent.
Eur. J. Oper. Res. (2019) https://doi.org/10.1007/s10100-019-00633-0
7. Guastaroba, G., Speranza, M.G.: Kernel search: an application to the index tracking
problem. Eur. J. Oper. Res. 217, 54–68 (2012)
8. Canakgoz, N.A., Beasley, J.E.: Mixed-integer programming approaches for index tracking
and enhanced indexation. Eur. J. Oper. Res. 196, 384–399 (2009)
9. Chen, C., Kwon, R.H.: Robust portfolio selection for index tracking. Comput. Oper. Res. 39,
829–837 (2012)
10. Guastaroba, G., Mansini, R., Ogryczak, W., Speranza, M.G.: Linear programming models
based on omega ratio for the enhanced index tracking problem. Eur. J. Oper. Res. 251, 938–
956 (2016)
11. Filippi, C., Guastaroba, G., Speranza, M.G.: A heuristic framework for the bi-objective
enhanced index tracking problem. Omega 65, 122–137 (2016)
12. Dinh, T.P., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithm
and applications. Acta Math. Vietnamica 22, 289–355 (1999)
13. Le Thi, H.A., Dinh, T.P.: DC programming and DCA: thirty years of developments. Math.
Program. 169, 5–68 (2018)
14. Brooke, A., Kendrick, D., Meeraus, A., Raman, R.: GAMS - a user’s guide. GAMS
Development Corporation, Washington (2014). http://www.gams.com
Modelling External Debt Using VECM
and GARCH Models
Naledi Blessing Mokoena(&), Johannes Tshepiso Tsoku,

and Martin Chanza
Department of Statistics and Operations Research, North-West University,

Corner Dr Albert Luthuli and University Drive, Mmabatho 2735, South Africa
{Johannes.tsoku,Martin.chanza}@nwu.ac.za
Abstract. This research uses the Vector Error Correction Model (VECM) and
the Generalized Autoregressive Conditional Heteroscedasticity (GARCH)
model to investigate the drivers of external debt (ED) during the period 1990 to
2018. The VECM findings demonstrated that ED, GDP, exports, government
expenditure, and capital formation all have a long-run equilibrium relationship.
Exports, capital formation, and GDP were found to be negatively related to ED,
whereas government expenditure was found to be positively related to ED.
The GARCH model results revealed that exports and GDP were positively
correlated with external debt. Furthermore, capital formation and government
expenditure were negatively correlated with external debt. GARCH (1, 1) model
was found to be more efficient in modeling external debt in South Africa. The
results of the paper have significant policy implications for South Africa.
Keywords: External debt VECM ARCH model GARCH model
1 Introduction
The issue of external debt in developing countries has been an ongoing concern ever
since these countries achieved independence from their European colonial rulers.
External debt (ED) has been identified as one of the major obstacles to the growth of
the economy in most developing countries [5]. Some of the reasons ED hinders the
growth of an economy is due to the continual mismanagement of resources, corrupt
government, unemployment rate, high rate of population growth and poverty rate [4].
Access to ED is of importance to most developing countries because it has the
ability to boost the country’s economic growth [3]. The ED, depending on the way it is
used, can impact the economy in both a positive and negative way; it impacts the
economy positively when used for investment purposes and negatively when used for
public and private consumption. Generally, a lower rate of ED affects the economy
positively and vice versa [13].
[31] believes that in order to reduce ED in a country, the national government
should reduce the national budget deficit, and the policy on government’s overall tax
expenditure should be conducted within the imperative context to limit interest

https://doi.org/10.1007/978-3-030-93247-3_57
578 N. B. Mokoena et al.
payments in the budget and gradually reduce the percentage of the total spending by
government that is dedicated to servicing the national debt.
The application and modelling of long run relationships in finance has attracted the
attention of both academics and practitioners for many years [19]. Among many of the
academics are [22, 23, 29, 33, 35, 36, 44] and [14]. In this paper, the focus is on
modelling the relationship between ED, Gross Domestic Product (GDP), Exports
(EXP), Government Expenditure (GE) and Capital Formation (CF) using the VECM
and GARCH model.
Most studies in the past have used different techniques to model the determinants of
external debt, such as [2], who made use of the ARDL technique, [10] and [5], who
applied cointegration analysis, [13], who used the VECM approach and [43], who
employed the panel least square method, to mention just a few. There are very few
studies that modelled the ED with the GARCH model and this paper seeks to fill this
gap. The aim of the paper is to fill the gap in current literature on the determinants of
ED in South Africa (SA). Furthermore, the paper seeks to model the determinants of
ED and assess the modelling ability of the VECM and the GARCH model using MSE,
RMSE and MAPE criterion.
The following sections make up the body of the paper: The review of the relevant
literature is discussed in Sect. 2. The methodology of the paper is outlined in Sect. 3.
Section 4 presents the outcomes of the paper presented and the discussion of results.
The conclusion is found in Sect. 5.
2 Literature Review
The study examines important empirical research that were done utilizing various
methods. [9] analysed the economic and political factors that influence ED using the
data from 1975 to 2012. The study used the pooled OLS estimator, the fixed effect
estimator and the Generalized Method of Moments to do dynamic panel data analysis.
The results of the study showed that governments that are not constrained accumulated
more debt and those that are democratically administered accumulated more debt than
autocratic governments.
[4] investigated the macroeconomic determinants of ED in Pakistan. The cointe-
gration technique and the ARDL model was used in the modelling the annual data
ranging from 1976 to 2010. Exchange rate (ER), fiscal deficit (FD), and trade openness
(TO) were all found to be statistically significant drivers of ED in the study. The
evidence of a positive long run association amongst FD and ED, nominal ER and the
ED burden of Pakistan was found.
[32] attempted to present a basic method of time series analysis, modelling and
forecasting performance of ARIMA, GARCH (1, 1) and ARIMA-GARCH (1, 1)
models using daily closing price for the GE company in the USA over the period 2001
to 2004. The findings revealed that the 95% confidence interval of the ARIMA (2, 1, 2)
is wider than that of the combined model ARIMA (2, 0, 2)-ARCH (1, 1). The ARIMA-
GARCH forecasting shows that the share price moves between 22.19 and 34.04.
Modelling External Debt Using VECM and GARCH Models 579
[43] examined the macroeconomic factors of ED in oil and gas importing and
exporting nations from 2004 to 2013, using the panel least square approach. The panel
data analysis found that the determinants of ED and their consequences differ between
exporting and importing countries. The findings from the panel data for exporting
countries revealed that increases in economic growth (EG), general government rev-
enue (GGR), foreign exchange reserves (FER), oil price (OL), and domestic investment
(DI) are important factors in lowering the ED, whereas the results for importing
countries differed slightly. Increases in GGR, EG and gross domestic savings are also
essential contributors in lowering ED, according to the findings.
For the period 1975 to 2010, [2] looked on the effects of ED on EG and investment
in the Philippines. The data was modeled using the ARDL approach in this study. The
findings revealed that ED, EG, and investment have a negative significant association.
Domestic debt was found to have a negative association with investment and a good
link with EG. In order for EG to grow in poor nations, [2] suggested that these
countries should put in place policies that will help them reduce their debt burden and
not to allow the debt to reach an unsustainable level.
The study by [20] compared ARMA and GARCH models for three metal com-
modities. The data of silver price covered the period between 1980 and 2013, while the
data for nickel and copper price covered the period between June 2004 and June 2014.
The forecasted values and running standard deviations of the models were cross-
validated using MASE, MAPE and correct pairs of sign measure. The forecasted values
obtained from the ARMA and GARCH models were equal in accuracy. On the other
hand, the GARCH model was found to be more efficient at forecasting the running
standard deviation.
The study by [7] used the GARCH model to analyse weekly closing share price
data of JAPFA Comfeed Indonesia over the period June 2015 to October 2016. The
conditional mean model ARIMA (0, 1, 2) and conditional variance model GARCH (1,
1) were found to be the best models. The results showed that the application of the two
models for forecasting the share price data for the following 5-week period were very
close to the real values and that the forecast values were within the 95% confidence
interval.
[38] used the Engle Granger cointegration approach, the VECM and the nonlinear
VECM to examine the effects of the real policy interest rate on the banking sector loan
rate, real ER, real stock prices, and deposit rate. The data used was monthly time series
data ranging from January 2002 to April 2018. The results from the Engle Granger
demonstrated that the real loan rates, interest rate, and deposit rate had a bivariate
cointegration connection. With the exception of the real ER, all of the variables in the
VECM demonstrated a long-term association. The nonlinear VECM results allowed for
the assessment of eleven assumptions, highlighting the symmetric connection and
legitimate pass-through impact while rejecting the exogeneity assumption for all
variables.
The VECM was used in a study by [1] to assess the influence of ER depreciation on
the level of foreign direct investment (FDI) influx into Nigeria. The study employed
data from the Central Bank Statistical Bulletin for a yearly time series. The dependent
variable was FDI inflow, while the independent variables were ER depreciation, EG,
inflation rate (IR), ED, and openness degree. The ADF test for stationarity was applied
in the study, and all variables were determined to be stationary at their first difference.
According to the findings, ER depreciation and trade openness are important predictors
of FDI inflows into Nigeria.
3 Methodology
This paper uses the quarterly time series data for SA that ranges from 1990Q2 to
2018Q4. There are 115 observations for each variable in the data. The ED, GDP,
Exports (EXP), Government Expenditure (GE) and Capital Formation (CF) are the
variables employed, with ED being the dependent variable. The data consists of 115
observations of each variable and were transformed using logarithmic transformation.
The EViews 10 Student Version and RStudio version 1.1.463 software were used in the
analysis.
Generally, macroeconomic variables are not stationary in nature and non-stationary
time series provide misleading findings. It is critical to determine if time series vari-
ables are nonstationary or stationary in order to avoid misleading results. “A stationary
process is one whose statistical properties do not change over time” [34]. The paper
used the ADF by [11] and the PP by [37] stationarity tests to assess for the existence of
unit root. The ADF and PP stationarity tests are given as:
Xp
Dyt ¼ a þ bt þ dyt1 þ c Dyt1
i¼1 i
þ ut ð1Þ
Dyt ¼ h þ dyt1 þ et ð2Þ
where D represents the variables at first difference, yt denotes the observed time
series variables, a and h denote a constant. The b; d, and ai are coefficients while ut is
the error term. The d denotes a non-parametric correction to the t-statistic which makes
it robust to the existence of serial correlation and heteroscedasticity, et is I(0) and may
also be heteroscedastic. The PP test frequently gives the same conclusions as the ADF
tests. Subsequent to determining the integration order, the next stage is to compute the
Johansen cointegration technique.
To examine for the presence of a long run relationship, the paper used the Johansen
cointegration technique introduced by [21] and [22]. The Johansen cointegration
technique uses the trace test and maximum eigenvalue to determine the number of
cointegration vectors [6]. The statistical equations for the trace test and maximum
eigenvalue test are:
X
ktrace ¼ T i¼r þ 1
lnð1 kÞ ð3Þ
kmax ¼ Tlnð1 kr þ 1 Þ ð4Þ

where r is the number of cointegrating vectors, T is the number of usable obser-

vations and k is the estimated eigenvalues. The null hypothesis of no cointegration
(r ¼ 0) is tested against the alternative of cointegration (r [ 0).
If the presence of cointegration exists, a VECM can be estimated. A VECM is
viewed as a restricted VAR aimed for use with the series that is not stationary and that
are said to have a cointegrating relationship [41]. If the variables have a long run
connection that is in the same order, then there is an existence of error correction model
(ECM) amongst the variables and two equations arise:
a) The long run equation:
EDt ¼ b0 þ b1 GDPt þ b2 EXPt þ b3 GE t þ b4 CF t þ ut ð5Þ
where ED, GDP, EXP, GE and CF represent variables used in the paper and ut is
the stochastic error term with constant variance and zero mean.
b) The VEC equation:
Xm Xm Xm
DFDt ¼ a1 þ b1i DGDP ti þ h 1i DEXP ti þ d DGE ti
i¼1 1i
Xm i¼1 i¼1
ð6Þ
þ / DCF ti þ u1 ECT t1 þ ut
i¼1 1i
where D denotes the difference operator, m is the number of lags, ECT indicates the
error correction term and ut denotes the stochastic error term with constant variance and
mean zero.
The GARCH model is the most widely used method for predicting and modeling
volatility. The model was proposed by [8] in order to generalize Engle’s ARCH model
from 1982 [28]. In order to estimate a GARCH model, one need to determine whether
there is an existence of heteroscedasticity in the series. The ARCH effects are described
as the presence of autocorrelation or heteroscedasticity in the squared residuals of a
series. The Engle’s ARCH test is a Lagrange Multiplier (LM) test that is used to
determine whether or not a series contains ARCH effects. For the regression on squared
residuals, the LM test statistic employs the F-statistic. The F-statistic has a v2 distri-
bution. The null hypothesis is rejected if the p-value of the v2 distribution is less than
the 5% significance level. This means that the residuals exhibit the presence of ARCH
effects. The GARCH model can now be estimated. The GARCH (p, q) process is given
by:
Xq Xp
rt 2 ¼ k þ i¼1
ai e2 ti þ i¼1
bi r2 ti ð7Þ
where
p 0; q [ 0; x[0
ai [ 0; i ¼ 1; . . .; q
bi [ 0; i ¼ 1; . . .; p
where rt is the conditional variance that is dependent on the previous error where
the previous error is denoted by bi r2 ti . The model articulates that the variance for
tomorrow is a function of squared innovations for the present day, the variance for the
present day and the weighted average long term variance [12]. The model is subject to
constraints that are not negative to make sure that there is a strict positive variance. The
condition of the stationary of a þ b\1 should be valid to guarantee weak stationarity
of the GARCH process [27].
Using the residuals of the models, diagnostic tests were computed to check whether
the selected model is efficient and adequate. The Breusch-Godfrey (BG) LM test and
the Ljung Box Q-test were used to see if there was any serial correlation. The LM test
statistic is calculated as follows:
h X X i
g g
LM h ¼ T K tr R e ð8Þ
P
where gR symbolizes the covariance matrix of the residuals of the restricted
P
models and ge denotes the covariance matrix of the residuals of the unrestricted
models. The Ljung Box test statistic is given by:
Xk rj 2
QLB ¼ T ðT þ 2Þ j¼1 T j
ð9Þ
where T is the number of observations and k denotes the highest order of auto-
correlation for which r j 2 and jth autocorrelation should be tested. The White’s test and
the LM test were used to assess the assumption of heteroscedasticity. The following
equations represent the White’s test and the LM test, respectively:
nR2 v2 df ð10Þ
LM E ¼ nR2 ð11Þ
where R2 denotes the regression’s coefficient of determination and n the number of

observations.
The Jacque-Bera (JB) test was computed to assess whether the residuals of the
model were symmetrically distributed and it was calculated using the following
equation:
" #
N k 2 ð K 3Þ 2
JB ¼ S þ ð12Þ
6 4
where k denotes the number of parameters calculated, N denotes the number of

observations, K signifies a variable’s kurtosis, and S denotes a variable’s skewness
[42].
Model evaluation is an important part of developing a model. It helps to find the
best model that will represent the data well. The forecasting ability of the models was
examined using three different error measures. The three different error measures used
are the MSE, RMSE and MAPE.
4 Discussion of Findings
This section of the paper presents the data analysis and interpretation of the results. The
financial series are converted to logarithms in order to make them unit-free. Table 1
shows the descriptive analysis that were employed in the study.
Table 1. Results from descriptive statistics

Measures LED LGDP LEXP LGE LCF
Mean 10.5668 5.8771 4.0869 11.4542 9.3032
Median 11.2400 5.9245 4.0910 11.4393 9.1838
Maximum 12.5690 7.1166 5.0814 12.9071 10.5520
Minimum 7.4339 4.3268 2.9549 9.7741 8.1236
Std.Dev 1.4794 0.8459 0.6827 0.8961 0.8106
Skewness −0.7746 −0.1903 −0.1447 −0.0580 0.1321
Kurtosis 2.3791 1.7645 1.6403 1.7676 1.5128
JB test 13.3467 8.0090 9.2601 7.3422 10.9328
Probability 0.0013 0.0182 0.0098 0.0254 0.0042
Table 1 displays the descriptive statistics for the study’s financial series. Because
the means and medians of all five variables are within the same range and not far apart,
this could indicate that the data series are slightly symmetric. All of the variables have
small standard deviations, indicating that the data is more concentrated around the
mean. LED, LGDP, LEXP, and LGE are all negatively skewed, while LCF is posi-
tively skewed, according to the findings. Overall, the data are mildly skewed because
the skewness is between −1 and 1. The values of the kurtosis for the five variables are
less than 3, indicating a platykurtic distribution. Because all of the variables’ proba-
bilities are significant, it can be concluded that they are not normally distributed. To
assess whether the series is stationary or not, the ADF and PP stationarity tests were
computed, and the findings are summarised and presented in Table 2.
The results of the ADF and PP stationarity tests of each variable at the level and at
the first difference are shown in Tables 2 and 3. Because the p-values of the test statistic
for two tests are less than the 0.05 significance level, the LGDP series is stationary at
the level. After the first difference, all of the variables became stationary. The LGDP
has been differenced, but it is still stationary. As a result, all variables of order 1 are
Table 2. Stationarity test results – ADF test

Variables ADF test
Level 1st difference
Test statistic P-value Test statistic P-value
LED −1.9798 0.2954 −7.2642 0.0000
LCF −0.0521 0.9510 −4.4572 0.0004
LEXP −0.8613 0.7971 −8.6325 0.0000
LGDP −4.7185 0.0002 −8.0680 0.0000
LGE −1.5597 0.4996 −3.0021 0.0379
Note: ADF and PP significant at 5% significance level.
Table 3. Stationarity test results – PP test

Variables PP test
Level 1st difference
Test statistic P-value Test statistic P-value
LED −1.8095 0.3743 −7.9213 0.0000
LCF −0.0114 0.9549 −11.2820 0.0000
LEXP −0.8792 0.7917 −8.6226 0.0000
LGDP −4.5182 0.0003 −8.2546 0.0000
LGE −1.3476 0.6054 −51.1657 0.0001
Note: ADF and PP significant at 5% significance level.
integrated (1). Before modeling the Johansen test, which eliminates serial correlation in
the residuals and provides the deterministic trend for the VAR model, it is critical to
determine the optimal lag length. The lag order for the VAR is chosen using the lag
order selection criteria and the results are given in Table 4.
Table 4. VAR lag order selection criteria

Lag LogL LR FPE AIC SC HQ
0 203.5740 NA 1.68e−08 −3.7117 −3.5868 −3.6610
1 983.9308 1473.197 1.24e−14 −17.8305 −17.0811* −17.5267
2 1015.425 56.5133 1.10e−14 −17.9519 −16.5780 −17.3949
3 1032.440 28.9407 1.29e−14 −17.8026 −15.8042 −16.9925
4 1117.907 137.3864 4.23e−15* −18.9328* −16.3100 −17.8696*
5 1133.557 23.6939 5.16e−15 −18.7581 −15.5101 −17.4416
6 1152.578 27.0206 5.98e−15 −18.6463 −14.7745 −17.0767
7 1182.921 40.2694* 5.69e−15 −18.7462 −14.2499 −16.9234
8 1201.471 22.8835 6.89e−15 −18.6256 −13.5048 −16.5497
Where * indicates lag order selected by the criterion
FPE: Final prediction error, LR: sequential modified LR test statistic (each test
at 5% level), Akaike information criterion (AIC): HQ: Hannan-Quinn
information criteria, SC: Schwarz information criterion
Both the SIC and the AIC propose a different amount of lag orders to be chosen, as
seen in Table 4. The SIC chose a lag order of 1, while the AIC chose a lag order of 4. In
this instance, the lag order with the lowest AIC value will be chosen. The AIC is used
to calculate the appropriate lag length, and it is decided to use 4 delays. Table 5 and
Table 6 show the Johansen cointegration test results which was performed with 4 lags.
Table 5. Johansen cointegration test results

Hypothesized no. of Eigenvalue Trace Prob.** Max-Eigen Prob.**
CE (s) statistic statistic
None* 0.3347 98.2727 0.0001 44.8303 0.0017
At most 1* 0.1941 53.4424 0.0136 23.7420 0.1440
At most 2 0.1311 29.7003 0.0513 15.4573 0.2581
At most 3 0.0665 14.2430 0.0765 7.5657 0.4246
At most 4* 0.0589 6.6774 0.0098 6.6774 0.0098
Note: Trace test indicates 2 cointegration eqn(s) at the 5% level, Max-eigenvalue test indicates 1
cointegration eqn(s) at the 5% level
* indicates rejection of the hypothesis at the 5% level
** [33] p-values
The trace test revealed that there are 2 cointegrating vectors. On the other hand, the
maximum eigenvalue test revealed only one cointegrating vector. [30] claimed that the
trace tests’ power is superior to that of the maximum eigenvalue tests in some cases.
The trace test is favored over the maximum eigenvalue test in this scenario. It may be
concluded that the model contains many cointegration equations. As a result, the study
indicates that the variables have a long run relationship and that a VECM can be used
to analyze the cointegrated series’ short-term dynamics. Table 6 summarizes the results
of the VEC estimates.
Table 6. VECM estimates

Cointegrating Eq CointEq1
LED(-1) 1.0000
LCF(-1) −0.0555
(0.5219)
[−01064]
LEXP(-1) −2.9671
(1.3891)
[−2.1359]
LGDP(-1) −5.1747
(1.7784)
[−2.9098]
LGE(-1) 6.2925
(1.2823)
[4.90725]
C −39.6334
Note: t-statistics in [] &
Standard errors in ()
The long run relationship amongst the variables in Table 6 is presented by esti-
mating the equation below:
LED ¼ 39:6334 0:0555LCF 2:9671EXP 5:1747LGDP þ 6:2925LGE ð13Þ
The equation above demonstrates that LCF, LEXP, LGDP have a negative rela-
tionship in a long run with LED and LGE has a positive relationship in long run with
LED. This can be interpreted by saying that a unit increase in LCF, LEXP and LGDP
results in a reduction for LED and a unit increase for LGE results in an upsurge in
LED. Table 7 presents the summary of the ECM’s.
Table 7. Summary of ECM’s

Error correction D(LED) D(LCF) D(LEXP) D(LGDP) D(LGE)
CointEq1 −0.0426 0.0386 −0.0008 −0.0047 −0.0401
(0.0197) (0.0114) (0.0060) (0.0021) (0.0081)
[−2.1617] [3.3967] [−0.1283] [−2.2971] [−4.9323]
Standard errors in () and t-statistics in [] should be noted
The error correction term, which quantifies the speed with which the system adjusts
to equilibrium, is negative, indicating that the system will eventually reach equilibrium.
For LED, LEXP, LGDP, and LGE, the size of these coefficients suggests that around
4.26%, 0.08%, 0.47%, and 4.01% of the disequilibrium is adjusted, respectively. The
coefficient of D(LCF) is positive, implying that a 3.86% increase in LCF is linked to a
percentage change in the cointegrating equation in the near run. The following step is to
compute the GARCH modeling method. The ARCH model was used to determine the
existence of ARCH effects, and the results are summarized in Table 8.
Table 8. Heteroscedasticity test

F-statistic 45.1896 Prob. F(1, 112) 0.0000
Obs*R-squared 32.7732 Prob.Chi-Square (1) 0.0000
The results of the ARCH LM test displays that there is an existence of

heteroscedasticity among the residuals since the p-value is less than 0.05 significance
level. Since the p-value = 0.0000, the residuals exhibit the presence of ARCH effects in
the model. Therefore, GARCH model can be estimated. The results for GARCH test
are summarized in Table 9.
Table 9. The results of the GARCH (1, 1) model estimates

Variable Coefficient Std. error z-statistic Prob
C 10.0260 0.5591 17.9330 0.0000
LCF −1.6624 0.0799 −20.8158 0.0000
LEXP 0.7185 0.2797 2.5685 0.0102
LGDP 3.0579 0.2524 12.1163 0.0000
LGE −0.4267 0.1250 −3.4139 0.0006
Variance equation
C 0.0087 0.0026 3.4069 0.0007
RESID(-1)2 1.1262 0.3242 3.4733 0.0005
GARCH(-1) −0.0659 0.0463 -1.4250 0.1542
According to the results presented in Table 9 above, the mean equation is given as:
dt ¼ 10:0260 1:6624 lcf

led c þ 0:7185 lexp
d þ 3:0579 lgdp
d 0:4267 lge
c ð14Þ
The coefficients of LEXP and LGDP are positive and significantly predict the series
by 0.7185 and 3.0579 respectively. The coefficients of LCF and LGE are negative and
significantly predict the series by −1.6624 and −0.4267 respectively. The coefficients
of all the variables are statistically significant at 0.05 significance level. A positive and
statistically significant relationship exists amongst LEXP and LED and also between
LGDP and LED, implying that higher volatility in LEXP and LGDP increases the LED
volatility. A negative and statistically significant relationship exists amongst LCF and
LED and also LGE and LED. This means that lower volatility in LCF and LGE
decreases the LED volatility. The variance equation is given as:
hbt ¼ 0:0087 0:0659 b u 2 t1

h t1 þ 1:1262b ð15Þ
The results of the GARCH model includes the time varying volatility constant
(0.0087), plus its past (0:0659b h t1 ) and a component which depends on past errors
2
(1:126bu t1 ). The sum of the ARCH and GARCH terms is more than 1. This means
that the variance is increasing over time. The diagnostic tests are computed to deter-
mine the adequacy of the two models and the results are summarised in Table 10 and
Table 11.
The next and final step in the estimation of the results is evaluating the performance
of the VECM and GARCH model using the MSE, RMSE and MAPE. The results are
summarized in Table 12 below.
The results in Table 12 shows low values for the GARCH (1, 1) model as compared
to the VECM for all the three error measures. This strongly recommends that the
GARCH (1, 1) model is more efficient than the VECM for the selected time series. The
results of the error measures can also be supported by the results of the diagnostic tests.
The results of the diagnostic tests shown that the GARCH (1, 1) model is the only
model that passed all of its diagnostic tests and was proven to be adequate.
Table 10. Diagnostic statistics for the VECM

H0 Test Test Prob. Decision
statistic
Serial correlation No serial correlation LM Test 18.7239 0.8105 No serial correlation
24.2642 0.5048 among residuals
35.3883 0.0817
24.2473 0.5042
Normality Residuals are JB-Joint 16.7162 0.0809 Residuals are
normally distributed normally distributed
Heteroscedasticity No White’s 547.6993 0.0174 Residuals are
heteroscedasticity Test heteroscedastic
Table 11. Diagnostic statistics for the GARCH model

H0 Test Test Prob. Decision
statistic
Serial correlation No serial Q- Q-Statistic The p-values No serial
correlation Statistic are shown of the Q- correlation
in statistic are among
Appendix shown in residuals
1 Appendix 1
Normality Residuals are JB-Joint 1.4691 0.4797 Residuals are
normally normally
distributed distributed
Heteroscedasticity No White’s 1.9854 0.1017 Residuals are
heteroscedasticity Test not
heteroscedastic
Table 12. Error measures for the VECM and GARCH model
VECM GARCH(1, 1)
MSE 0.7889 0.0834
RMSE 0.8882 0.2888
MAPE 205.5909 2.1069
5 Conclusion
The study modelled the determinants of external debt using the VECM and GARCH
models with the intention to identify and recommend the most effective approach.
According to the findings of the cointegration test, the LCF, LEXP, and LGDP have a
positive long run relationship with LED, but LGE has a negative long run relationship
with LED. The findings of [9] and [10] contradict these findings. The findings of the
ARCH LM test revealed that ARCH effects are present in the residuals. The coefficients
of the constant term and ARCH terms were positive and both statistically significant;
the GARCH term however was negative and statistically insignificant at 0.05 signifi-
cance level. The ARCH and GARCH terms added up to more than one, indicating that
the variance is increasing over time.
The analysis revealed the GARCH (1, 1) model to be more desirable and adequate
as compared to the VECM. [7, 20, 27] and [18] also discovered that the GARCH (1, 1)
model outperformed the VECM model. Therefore, it can be concluded that the
GARCH model has a better ability to forecast external debt. The paper is limited to the
two proposed techniques. The study recommends extending this study to include other
GARCH type models. The paper also recommends that the South African government
should implement effective external debt management strategies in safeguarding the
financial stability of the country.
Appendix 1
References
1. Agugua, E.A., Onuoha, F.C. and Nwaeze, N.C.: Impact of exchange rate depreciation on
foreign direct investment inflows in Nigeria: vector error correction mechanism (VECM)
approach (2017)
2. Akram, N.: Is public debt hindering economic growth of the Philippines? Int. J. Soc. Econ.
42(3), 202–221 (2015)
3. Awan, A.G., Qasim, H.: The impact of external debt on Economic Growth of Pakistan.
Glob. J. Manag. Soc. Sci. Humanit. 6(1), 30–61 (2020)
4. Awan, R.U., Anjum, A., Rahim, S.: An econometric analysis of determinants of external
debt in Pakistan. Br. J. Econ. Manag. Trade 5(4), 382–391 (2015)
5. Bader, M., Magableh, I.K.: An enquiry into the main determinants of public debt in Jordan:
an econometric study. Dirasat Adm. Sci. 36(1), 181–190 (2009)
6. Banumathy, K., Azhagaiah, R.: Long-run and short-run causality between stock price and
gold price: evidence of VECM analysis from India. Manag. Stud. Econ. Syst. 1(4), 247–256
(2015)
7. Barusman, M.Y.S., Usman, M., Ambarwati, R., Virginia, E.: Application of generalized
autoregressive conditional heteroscedasticity (GARCH) model for forecasting. J. Eng. Appl.
Sci. 13(10), 3418–3422 (2018)
8. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econometr. 31
(3), 307–327 (1986)
9. Chiminya, A., Nicolaidou, E.: An empirical investigation into the determinants of external
debt in Sub Saharan Africa. In: Biennial Conference of the Economic Society of South
Africa, University of Cape Town, pp. 1–22. http://2015.essa.org.za/fullpaper/essa_3098.pdf
10. Cholifihani, M.: The role of public domestic debt in economic development: the empirical,
directorate of bilateral foreign funding quick share for quick learner (QSQL) Forum (2009)
11. Dickey, D.A., Fuller, W.A.: Distribution of the estimators for autoregressive time series with
a unit root. J. Am. Stat. Assoc. 74(366a), 427–431 (1979)
12. Ding, J.: Time series predictive analysis of bitcoin with ARMA-GARCH model in Python
and R (2018)
13. Dritsaki, C.: Causal nexus between economic growth, exports and government debt: the case
of Greece. Proc. Econ. Financ. 5, 251–259 (2013)
14. Engle, R.F., Granger, C.W.: Co-integration and error correction: representation, estimation,
and testing. Econometr.: J. Econometr. Soc. 251–276 (1987)
15. Fan, J.H., Akimov, A., Roca, E.: Dynamic hedge ratio estimations in the European union
emissions offset credit market. J. Clean. Prod. 42, 254–262 (2013)
16. Floyd, J.E.: Vector autoregression analysis: estimation and interpretation. Unpublished
Manuscript, University of Toronto, 19 September 2005 (2005)
17. Franses, P.H., Van Dijk, D.: Forecasting stock market volatility using (non-linear) GARCH
models. J. Forecast. 15(3), 229–235 (1996)
18. Frimpong, J.M., Oteng-Abayie, E.F.: Modelling and forecasting volatility of returns on the
Ghana stock exchange using GARCH models (2006)
19. Guirguis, M.: An application of a johansen cointegration test and a vector error correction,
(VEC) model to test the granger causality between general government revenues and general
government total expenditures in Greece. Available at SSRN 3253642 (2018)
20. Hansson, M., Andersson, O., Holmberg, O.: ARMA and GARCH models for silver, nickel
and copper price returns (2015)
21. Johansen, S.: Statistical analysis of cointegration vectors. J. Econ. Dyn. Control 12(2–3),
231–254 (1988)
22. Johansen, S., Juselius, K.: Maximum likelihood estimation and inference on cointegration—
with applications to the demand for money. Oxford Bull. Econ. Stat. 52(2), 169–210 (1990)
23. Kao, C.: Spurious regression and residual-based tests for cointegration in panel data.
J. Econometr. 90(1), 1–44 (1999)
24. Kavussanos, M.G., Visvikis, I.D., Alexakis, P.D.: The lead-lag relationship between cash
and stock index futures in a new market. Eur. Financ. Manag. 14(5), 1007–1025 (2008)
25. Lau, E., Lee, A.S.Y.: Determinants of external debt in thailand and the Philippines. Int.
J. Econ. Financ. Issues 6(4), 1973–1980 (2016)
26. Lee, D., Schmidt, P.: On the power of the KPSS test of stationarity against fractionally-
integrated alternatives. J. Econometr. 73(1), 285–302 (1996)
27. Lim, C.M., Sek, S.K.: Comparing the performances of GARCH-type models in capturing the
stock market volatility in Malaysia. Proc. Econ. Financ. 5, 478–487 (2013)
28. Liu, H., Erdem, E., Shi, J.: Comprehensive evaluation of ARMA–GARCH (-M) approaches
for modeling the mean and volatility of wind speed. Appl. Energy 88(3), 724–732 (2011)
29. Lütkepohl, H.: Vector Autoregressive Models, pp. 1645–1647. Springer, Heidelberg (2011)
30. Lütkepohl, H., Saikkonen, P., Trenkler, C.: Maximum eigenvalue versus trace tests for the
cointegrating rank of a VAR process. Econometr. J. 4, 287–310 (2001)
31. Luüs, C.: South Africa’s growing public debt. The South African Financial Markets Journal.
5. The South African Institute of Financial Markets (2012)
32. Malik, V.: ARIMA/GARCH (1, 1) modelling and forecasting for a GE stock price using R.
ELK Asia Pac. J. Mark. Retail Manag. 8, 2349–2317 (2017)
33. MacKinnon, J.G., Haug, A.A., Michelis, L.: Numerical distribution functions of likelihood
ratio tests for cointegration. J. Appl. Economet. 14(5), 563–577 (1999)
34. Nason, G.P.: Stationary and non-stationary time series. Stat. Volcanol. Spec. Publ. IAVCEI
1 (2006)
35. Osterwald-Lenum, M.: A note with quantiles of the asymptotic distribution of the maximum
likelihood cointegration rank test statistics. Oxford Bull. Econ. Stat. 54(3), 461–472 (1992)
36. Pedroni, P.: Critical values for cointegration tests in heterogeneous panels with multiple
regressors. Oxford Bull. Econ. Stat. 61(S1), 653–670 (1999)
37. Phillips, P.C., Perron, P.: Testing for a unit root in time series regression. Biometrika 75(2),
335–346 (1988)
38. Sahin, A.: Loom of symmetric pass-through. Economies 7(1), 11 (2019)
39. Silvennoinen, A., Teräsvirta, T.: Multivariate GARCH models. In: Mikosch, T., Kreiß, J.P.,
Davis, R., Andersen, T. (eds.) Handbook of financial time series, pp. 201–229. Springer,
Heidelberg (2009). https://doi.org/10.1007/978-3-540-71297-8_9
40. Sims, C.A.: Macroeconomics and reality. Econometr. J. Econometr. Soc. 1–48 (1980)
41. Songul, H.: The relationship between the financial development and the economic growth in
Turkey (2011)
42. Ssekuma, R.: A study of cointegrating models with applications. Doctoral dissertation
(2011)
43. Waheed, A.: Determinants of external debt: a panel data analysis for oil & gas exporting and
importing countries. Int. J. Econ. Financ. Issues 7(1), 234–240 (2017)
44. White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for
heteroskedasticity. Econometr. J. Econ. Soc. 817–838 (1980)
Optimization of Truss Structures with Sizing
of Bars by Using Hybrid Algorithms
Melda Yücel(&), Gebrail Bekdaş, and Sinan Melih Nigdeli

melda.yucel@ogr.iu.edu.tr,
{bekdas,melihnig}@iuc.edu.tr
Abstract. Structural engineering is one of the fields where optimization

approaches are widely used. In this regard, the major aim is to generate the
lightest, cheapest or sustainable structural design for any model such as beam,
retaining wall, truss, etc. Therefore, in the present study, some stages as opti-
mization analyses were presented for independent structural models including
10 and 25-bar truss designs. Also, the smallest structural weights were found for
both structures. As to mention optimization approaches, these are related to
metaheuristic algorithms, which were developed and expressed by inspiring
from various features of nature, abilities of lives and also mechanisms of
chemical or physical processes. In this respect, Jaya algorithm (JA) is hybridized
with different techniques as flower pollination algorithm (FPA) and teaching-
learning-based optimization (TLBO). Approaches were compared to each other
through some parameters by calculating different statistical expressions con-
taining the best objective function, average together standard deviation of this
value.
Keywords: Truss structures Optimization Metaheuristic algorithms

Flower pollination algorithm Teaching-learning-based optimization Jaya
algorithm
1 Introduction
Optimization is a process in which the design is done better to provide an objective in a

maximum effective way. This task is sometimes done via mathematical calculations,
but generally, it is a nonlinear problem since it is constrained by several factors and the
solution of the problem must be known to check these constraints. Engineering design
optimization is in these types of problems and structural engineering has various
problems in this category that can be easily solved via metaheuristic algorithms [1–7].
One of the structural design optimization problems is the optimization of truss
structures. The main objective is to provide a truss structure that is the lowest weight
and directly the lowest cost because of less material usage. Truss structures have been
optimized by using several metaheuristic methods [8–18]. Besides optimization of truss
structures, metaheuristics were combined with total potential minimization theory to
analyze truss structures [19].

https://doi.org/10.1007/978-3-030-93247-3_58
Optimization of Truss Structures with Sizing of Bars 593
In the present study, Jaya algorithm (JA), which has a single phase, is made an
advanced optimization method by hybridizing with other metaheuristics including
flower pollination algorithm (FPA) (as JA-FPA) and teaching-learning based opti-
mization (TLBO) (as JA-SPTLBO) to increase the performance in the optimization of
truss structures in means of the robustness of the algorithms that are measured via
solutions of different cycle runs of the same optimization process. Also, the main
objective of this study is to enhance different optimization techniques that provide the
more effective solutions for truss systems, which have optimal design combinations by
ensuring the minimum structural weight.
2 The Optimization Algorithm
Jaya algorithm (JA) proposed by Rao [20] finds the optimum solution by using the
current best solution (g*) besides moving away from the current worst solution (gw). In
that case, this process is a victory approach like its name “Jaya” in the Sanskrit
language meaning victory. This process can be formulized via Eq. (1). In the formu-
lations, Xnew,i and Xold,i represent the new and old existing solutions.
Xnew;i ¼ Xold;i þ randð0; 1Þðg Xold;i Þ randð0; 1Þðgw Xold;i Þ ð1Þ
Unlike other algorithms, JA is a single-phase one. In that case, it did not use any
specific parameters in the choice of phases. Due to this feature, it is very simple but
effective. To improve the robustness of the algorithm, the hybridization of JA with
other algorithms can be effective by adding a second phase. JA uses best and worst
solutions, but using the other existing solutions may increase the performance to avoid
local-optima problems that can be observed in a run of the optimization cycle.
The first algorithm that is added to JA was developed by Yang [21] and it is called
as flower pollination algorithm (FPA). The local pollination phase was added to JA’ s
phase above Eq. 1 to generate the hybrid method JA-FPA that chose the single phase in
an iteration by using a parameter called switch probability (sp). The local pollination is
formulated as Eq. 2.
Xnew;i ¼ Xold;i þ eðXj Xk Þ ð2Þ
ɛ is a local distribution that is generated via a random value ranged between 0 and 1. Xj
and Xk are the different solutions that are randomly chosen.
Secondly, a second phase is added from the student phase of teaching-learning-
based optimization (TLBO) developed by Rao et al. [22] by inspiring the teaching-
learning process between teacher and students. The student phase uses the process in
which students improve their knowledge and grade levels themselves. It is formalized
via Eq. 3. Xi and Xj are different candidate solutions that are determined randomly.
As TLBO, this phase is consequently applied after the phase using Eq. 1 in JA-
SPTLBO.
594 M. Yücel et al.
(
Xold;i þ randð0; 1ÞðXi Xj Þ; f ðXi Þ [ f ðXj Þ
Xnew;i ¼ ð3Þ
Xold;i þ randð0; 1ÞðXj XiÞ; f ðXi Þ\ f ðXj Þ
3 Numerical Examples
3.1 Structural Truss Models

In the current study, two different truss models were investigated to realize optimiza-
tion processes respect to structural design applications. In this regard, both 10 and 25-
bar truss models as the mentioned structures were handled to minimize the total
structural weight by providing optimum sizes intended for cross-section of each bar.
Truss models can be seen in Figs. 1 and 2, where all of the design details for section
properties, loading conditions etc. together with space coordinate properties of 10 and
25-bar models are taken place, respectively. Here, loading conditions are also reflected
for 10-bar truss model as P1 and P2, which have the values 150 and 50 kips,
respectively.
Fig. 1. Structural model for 10-bar truss with design parameters [23].
Fig. 2. Structural model for 25-bar truss with design parameters [11].
On the other hand, to reach the best models, each optimization stages, which was
designed with constant algorithm parameters as 20 populations and 5000 iterations,
was also operated multiple times. For this reason, this process was performed with the
usage of 25 separate cycles for structural models, which were designed optimally with
three optimization algorithms as JA, JA-FPA, and JA-SPTLBO.
Moreover, the model handled as the first structural problem is based on the opti-
mum design of 10-bar truss to reach the lowest weight. So, in Table 1, design prop-
erties as constant parameters with structural constraints are represented for each
member as nodes and bars.
While realizing of 25-bar truss model, the symmetry properties of the structure was
considered by the reason that the grouping approach was utilized in terms of bars.
Hence, the design properties of structural models were represented in Table 2. Also,
multiple loading amounts and design constraints as stresses for steel bars and dis-
placement boundaries for nodes respect to 25-bar truss structure can be also seen in
Tables 3 and 4, respectively.
Table 1. Some designing properties besides constraints for 10-bar truss structure.
Property Symbolization Ranges or values Unit
Design Bar cross- Abar 0.1–35 inch2
parameters section
area
Design Modulus Es 107 Psi
constants of
elasticity
Steel qs 0.1 lb/inch3
weight per
unit of
volume
Number of - 10 -
bars
Length of Lbar - Inch
bars
Structural Explanation
element
Constraint Nodes Limitations for d \ |±2000000j Psi
conditions displacements
for design occurred on nodes
towards all directions
Bars Stress limitations for r \ |±25| Inch
tensile and
compressive forces
Pbarnumber
Design Minimization of total FðwÞ ¼ qs i¼1 Ai Li Lb
Target weight
Table 2. Some properties of optimization design of 25-bar truss model.

Definition Notation Ranges or values Unit
Design parameters Bar cross-section Abar 0.01–3.4 inch2
7
Design constants Elastic modulus Es 10 Psi
Steel weight per unit of volume qs 0.1 lb/inch3
Total number of bars - 25 -
Node numbers - 10
Group number of bars - 8
Table 3. Conditions for loading forces corresponding to 25-bar truss.

Case number Number of nodes Amounts of loading Unit
1 1 1000 10000 −5000 lb/inch2
2 0 10000 −5000
3 500 0 0
6 500 0 0
2 1 0 20000 −5000
2 0 −20000 −5000
Table 4. Constraints according to design regulation conditions for the structural design of
25-bar truss model.
Structural Related elements Property Constraint condition Unit
element
Nodes All of nodes Displacement Displacement inch
boundaries for the d\j0:35j
nodes on truss
structure
Group Design Compression Tension
number parameters stress stress
of bars
Bars 1 A1 Boundaries for rc [ 35092 rt \ þ 40000 Psi
2 A2–5 stresses occurred rc [ 11590
3 A6–9 respect to steel bars rc [ 17305
4 A10–11 of truss rc [ 35092
5 A12–13 rc [ 35092
6 A14–17 rc [ 6759
7 A18–21 rc [ 6959
8 A22–25 rc [ 11080
3.2 Optimum Designing of 10-Bar Truss

The first application is based on the optimal design of 10-bar truss structure to mini-
mize the total weight. To realize this, the aforementioned properties of stages were
benefited through the usage of constantly determined 20 candidate members for pop-
ulation besides total analyses number as 5000 iteration steps by repeating 25 times.
Here, the ensured results for optimum design can be seen in Table 5 where the min-
imized value of total weight is achieved as 4677.0373 lb with the operation of JA
among all hybrid metaheuristics. However, JA is the best algorithm to find the most
minimum weight, the standard deviation of JA is the biggest according to other ver-
sions of JA.
Table 5. Optimum values of design parameters and statistical evaluations for objective function.
Group number of bars Design parameters Algorithms used in the present study
JA JA-FPA JA-SPTLBO
1 A1 23.5715 23.7320 23.3961
2 A2 0.1001 0.1005 0.1013
3 A3 25.3125 25.1431 25.3240
4 A4 14.2887 14.5148 14.3240
5 A5 0.1000 0.1000 0.1000
6 A6 1.9712 1.9700 1.9703
7 A7 12.3711 12.4224 12.3970
8 A8 12.7694 12.7986 12.9055
9 A9 20.4200 20.1902 20.3511
10 A10 0.1000 0.1000 0.1000
Total weight Best value 4677.0373 4677.2268 4677.1583
Average 4872.2253 4679.3908 4677.4173
Standard deviation 618.3222 1.3695 0.2205
Total population number 20
Iteration number 5000
Total cycle 25
3.3 Grouping Approach for 25-Bar Truss

As the second application is dependent on the optimization process, which was carried
out for a truss structure with 25 bars. In this process, all of the steel bars of truss model
were collected according to axis similarity and symmetry of space coordinates and
optimized with this approach within design ranges. According to this issue, the opti-
mization applications were realized in a similar way to the 10-bar truss model in terms
of using the algorithm parameters. Here, provided results for the best/minimum level
for total weight is determined as 545.0435 lb with JA-SPTLBO. This result is also
evaluated through some statistical expressions, and all of them are reflected in Table 6.
Table 6. Optimization results for 25-bar truss model with some statistical expressions.
Group number of bars Design parameters Algorithms used in the present
study
JA JA-FPA JA-SPTLBO
1 A1 0.0102 0.0103 0.0100
2 A2–5 2.0410 2.0386 2.0504
3 A6–9 3.0006 2.9994 3.0041
4 A10–11 0.0100 0.0100 0.0100
5 A12–13 0.0100 0.0104 0.0100
6 A14–17 0.6811 0.6837 0.6834
7 A18–21 1.6256 1.6283 1.6159
8 A22–25 2.6749 2.6712 2.6730
Total weight Best value 545.0463 545.0574 545.0435
Average 562.0948 545.1864 545.0747
Standard deviation 33.4640 0.0841 0.0232
Total population number 20
Iteration number 5000
Total cycle 25
4.1 Optimum Designing of 10-Bar Truss

According to classical JA and the other hybrid algorithms as JA-FPA and JA-SPTLBO,
which were proposed by using JA with local pollination of FPA and student phase of
TLBO, it can be recognized that JA is the most effective to reach the weight mini-
mization purpose. However, it made an extremely big standard error for objective
function as minimum weight. For this reason, the best one can be accepted as JA-
SPTLBO. In this regard, for the optimal design of 10-bar truss structure, minimum
weight was determined as 4677.1583 lb with the usage of JA-SPTLBO hybrid algo-
rithm. Here, this value of minimum weight was also ensured with a small error of
0.2205 standard deviations.
4.2 Grouping Approach for 25-Bar Truss

For 25-bar truss structure, the minimum total weight was also ensured via JA-SPTLBO
similar to the process of 10-bar truss. It can be understood the optimum outcomes that
JA-SPTLBO found the best weight as 545.0435 lb with the deviation of 0.0232. Here,
the classical version of JA can be accepted to approach the minimum weight, but the
standard deviation of the objective function as weight minimization is detected
extremely bigger than JA-SPTLBO’s. Although the deviation of JA-FPA is smaller
than JA, JA-FPA cannot find the best weight for this structure.
5 Conclusion
As a summary, although JA is the most successful algorithm for 10-bar truss, it can be
recognized that JA-SPTLBO is the most effective method among all of the versions of
JA due to that it found the best results thanks to obtaining smaller error amounts for
minimized weights. Also, for 25-bar structure, JA-SPTLBO succeeded the main pur-
pose of minimizing the total weight similarly. Also, while it realizes these optimization
processes, standard deviations as error amounts can be obtained smaller according to
JA-FPA hybrid algorithm and classic JA. These results show that JA-SPTLBO is more
steady and reliable to find similar and almost equal optimum results for design vari-
ables together with objective function owing to standard deviation lowness and
closeness of average weight to the best value of it.
of Istanbul University-Cerrahpasa. Project number: FYO-2019-32735.
References
Wiley, New York (2021)
2. Carbas, S., Toktas, A., Ustun, D. (eds.): Nature-Inspired Metaheuristic Algorithms for
Engineering Optimization Applications. Springer Tracts in Nature-Inspired Computing,
Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6773-9
3. Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M.: Metaheuristic Approaches for Optimum Design
of Reinforced Concrete Structures: Emerging Research and Opportunities: Emerging
Research and Opportunities. IGI Global (2020)
4. Kayabekir, A.E., Bekdaş, G., Yücel, M., Nigdeli, S.M., Geem, Z.W.: Harmony search
algorithm for structural engineering problems. In: Carbas, S., Toktas, A., Ustun, D. (eds.)
Nature-Inspired Metaheuristic Algorithms for Engineering Optimization Applications.
Springer Tracts in Nature-Inspired Computing, pp. 13–47. Springer, Singapore (2021).
https://doi.org/10.1007/978-981-33-6773-9_2
5. Kayabekir, A.E., Bekdaş, G., Nigdeli, S.M.: Developments on metaheuristic-based
optimization in structural engineering. In: Nigdeli, S.M., Bekdaş, G., Kayabekir, A.E.,
Yucel, M. (eds.) Advances in Structural Engineering—Optimization. SSDC, vol. 326, pp. 1–
22. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-61848-3_1
6. Nigdeli, S.M., Bekdaş, G., Kayabekir, A.E., Yucel, M.: Advances in Structural Engineering
—Optimization: Emerging Trends in Structural Optimization (2020)
7. Ulusoy, S., Kayabekir, A.E., Nigdeli, S.M., Bekdaş, G.: Metaheuristic-based structural
control methods and comparison of applications. In: Carbas, S., Toktas, A., Ustun, D. (eds.)
Nature-Inspired Metaheuristic Algorithms for Engineering Optimization Applications.
STNC, pp. 251–276. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-
6773-9_12
8. Talatahari, S., Goodarzimehr, V.: A discrete hybrid teaching-learning-based optimization
algorithm for optimization of space trusses. J. Struct. Eng. Geo-Tech. 9(1), 55–72 (2019)
9. Salar, M., Dizangian, B.: Sizing optimization of truss structures using ant lion optimizer. In:
2nd International Conference on Civil Engineering, Architecture and Urban Management in
Iran, Tehran University, August 2019 (2019)
10. Bekdaş, G., Yucel, M., Nigdeli, S.M.: Evaluation of metaheuristic-based methods for
optimization of truss structures via various algorithms and Lèvy flight modification.
Buildings 11(2), 49 (2021)
11. Bekdaş, G., Nigdeli, S.M., Yang, X.S.: Sizing optimization of truss structures using flower
pollination algorithm. Appl. Soft Comput. 37, 322–331 (2015)
12. Degertekin, S.O., Hayalioglu, M.S.: Sizing truss structures using teaching-learning-based
optimization. Comput. Struct. 119, 177–188 (2013)
13. Degertekin, S.O., Lamberti, L., Hayalioglu, M.S.: Heat transfer search algorithm for sizing
optimization of truss structures. Latin Am. J. Solids Struct. 14, 373–397 (2017)
14. Dede, T., Bekiroglu, S., Ayvaz, Y.: Weight minimization of trusses with genetic algorithm.
Appl. Soft Comput. 11, 2565–2575 (2011)
15. Aydogdu, I., Ormecioglu, T.O., Carbas, S.: Electrostatic discharge algorithm for optimum
design of real-size truss structures. In: Carbas, S., Toktas, A., Ustun, D. (eds.) Nature-
Inspired Metaheuristic Algorithms for Engineering Optimization Applications. STNC,
pp. 93–109. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6773-9_5
16. Aydogdu, I., Carbas, S., Akin, A.: Effect of levy flight on the discrete optimum design of
steel skeletal structures using metaheuristics. Steel Compos. Struct. 24(1), 93–112 (2017)
17. Tejani, G.G., Kumar, S., Gandomi, A.H.: Multi-objective heat transfer search algorithm for
truss optimization. Eng. Comput. 37(1), 641–662 (2019). https://doi.org/10.1007/s00366-
019-00846-6
18. Mortazavi, A., Togan, V.: Metaheuristic algorithms for optimal design of truss structures. In:
Nigdeli, S.M., Bekdaş, G., Kayabekir, A.E., Yucel, M. (eds.) Advances in Structural
Engineering—Optimization. SSDC, vol. 326, pp. 199–220. Springer, Cham (2021). https://
doi.org/10.1007/978-3-030-61848-3_7
19. Bekdaş, G., Kayabekir, A.E., Nigdeli, S.M., Toklu, Y.C.: Advanced energy-based analyses
of trusses employing hybrid metaheuristics. Struct. Des. Tall Spec. Build. 28(9), e1609
(2019)
21. Yang, X.-S.: Flower pollination algorithm for global optimization. In: Durand-Lose, J.,
Jonoska, N. (eds.) UCNC 2012. LNCS, vol. 7445, pp. 240–249. Springer, Heidelberg
(2012). https://doi.org/10.1007/978-3-642-32894-7_27
(3), 303–315 (2011)
23. Schmit, L.A., Jr., Farshi, B.: Some approximation concepts for structural synthesis.
AIAA J. 12(5), 692–699 (1974)
Information Extraction from Receipts Using
Spectral Graph Convolutional Network
Bui Thanh Hung(&)
Faculty of Information Technology, Ton Duc Thang University,

19 Nguyen Huu Tho Street, Tan Phong Ward, District 7,
Ho Chi Minh City, Vietnam
buithanhhung@tdtu.edu.vn
Abstract. Information extraction from receipts is the process to recognize text

and extract key texts from scanned receipts. This task plays a critical role in a
wide range of applications in finance, accounting and taxation. Graph neural
network is a model which is capable of capturing the dependence of graph via
message passing between the nodes of graph. This model has demonstrated high
effectiveness on various deep learning tasks. In this paper, we applied the
spectral graph convolutional network to extract the key information from
receipts. Experimental results have shown that spectral graph convolutional
network has achieved good results for this task.
Keywords: Graph neural network Spectral graph convolutional network

Information extraction Unstructured data Receipts
1 Introduction
Automatically extracting structured information from unstructured and/or semi-

structured machine-readable documents is typically the task of Information extrac-
tion (IE). Basically, Information Extraction from Receipts has two main steps: Optical
Character Recognition (OCR) and Information Extraction (Tagging). Since the
Receipts encode semantic information in their visual layout, the Information Extraction
step should not be done based solely on the machine-readable documents. However, we
should also inform it with the layout information or position of the word relative to the
other words in the document; therefore information extraction from receipts could be
considered as a special task. Indeed this task remains a challenge problem.
The main role of information extraction from receipts is to categorize the text boxes
into the corresponding information fields, for example: company (company name,
product distributor), address, date (transaction date), total (total price) and other (not in
the above 4 fields). The input of the model is an image, and the output for each text box
will be classified into 4 corresponding information fields.
This task plays a vital role for enhanced transparency, data analytics, working
capital improvement and better tracking of finance, accounting and taxation domains.
This also supplies the chain optimization, a backbone to ensure the proper functioning
of many companies.
Graph-based method for this problem is an essential approach because of the
following reasons:

https://doi.org/10.1007/978-3-030-93247-3_59
Information Extraction from Receipts Using Spectral Graph 603
– Local pattern: similar to the Convolutional Neural Network (CNN) model, but
instead of pixel points, nodes that are connected to each other will also have a
higher relationship with nodes which are further away in the graph [1, 2].
– Positional feature: the information about the position/coordination of the node on
the image will also help the model distinguish the information fields more easily.
– Textual feature: Similar to a positional feature, textual information is also important.
For example, distinguishing the address information field from other data fields
– Stacking multiple Graph Neural Network modules on top of each other helps the
model learn high level features better.
Basing on the advantages of Graph-based method, in this research, we apply the
Spectral Graph Convolutional Network to solve information extraction from receipts.
The rest of the paper is structured as follows: the related works are discussed in Sect. 2;
Sect. 3 presents the methodology; our experimentation is shown in Sect. 4; and finally,
our research summary and future direction are presented in Sect. 5.
2 Related Works
Extracting information from images can be solved by template-based or natural lan-

guage processing-based methods. However, each method has its respective limitations:
– Template-based method simply applies predefined rules to forms and documents
with a fixed layout/structure that does not make much change. Next, this uses the
text/keyword matching methods to determine the corresponding fields. However,
the biggest shortcoming of this method is that we have to define each rule separately
for each form, not being able to adapt to the new form; moreover, it is completely
dependent on each person's domain knowledge [3–5].
– NLP-based method is the contents obtained from the text-box which can be put into
a text classification or NER model to classify or identify entities belonging to each
corresponding information field. The advantage of this method over Template-
based is the ability to adapt to new data. However, several shortcomings can be
mentioned here such as: being heavily dependent on the layout of the form, limi-
tation to data represented in the form of tables, completely not using
information/features about position of the text-box, even such layout information
would be of great help in identifying the respective fields [6–8].
There are two kinds of Graph Convolutional Network: spatial convolution and
spectral convolution methods [9]. Our framework belongs to the spectral convolution
method. Spectral graph convolution exploits the Convolution theorem from signal
processing; and it does not scan the input feature matrix sliding a filter (like a regular
1D or 2D convolution layers). By this, an expensive convolution in the spatial domain
can be computed as a cheap multiplication by transforming the signals first to the
frequency domain. Applying the convolution theorem helps to reduce the computa-
tional complexity of the convolution.
Song et al. proposed Graph LSTM [10]. This model enables a varied number of
incoming dependencies at each memory cell. Wang et al. designed a directed graph
604 B. T. Hung
schema by jointly extracting entities and relations [11]. Marcheggiani et al. proposed
encoding sentences for semantic role labeling by using a syntactic dependency graph
[12]. Gui et al. proposed global semantics in the lexicon-based GCN to avoid word
ambiguities [13]. Szegedi et al. used fragments in spatial GCN by breaking down the
image of invoice into text pieces with positional information [14].
Our work is based on Spectral Graph Convolutional Network. This model reduces
the computational complexity of the convolution to make it work more effectively.
3 Methodology
3.1 The Proposed Model

To come up with the solution to Information Extraction problem, we need two sub-
problems: Scene Text Detection and Scene Text Recognition. The output of these two
problems will be used to build features and graphs for the Information Extraction. The
proposed model has four components: Word Extractor, Feature Extraction, Graph
Modeling, and Spectral Graph Convolutional Network for Node Classification. This
model is presented in Fig. 1. We will describe in details in next part.
Images
Word Extractor
Feature Extraction
Graph Modeling
Spectral Graph Convolutional

Neural Network
Extracted Entities
Fig. 1. The proposed model

3.2 Word Extractor

With the receipt images, we will encode the graph based on the following information:
– The bounding boxes correspond to each text line of the image. This text detection
part can use popular Object Detection models or use specialized models for Scene
Text Detection problems such as EAST, Differentiable Binarization, CRAFT, …
– Contents of those text boxes. This text recognition part can use instant noodle tools
like Tesseract or Scene Text Recognition models such as CRNN-CTC loss,
Attention-OCR, …
The Word Extractor part includes 2 parts: Text Detection and OCR as mentioned
above, we use previous work to do this task.

Feature Extraction is the process to create features for nodes on the graph. The nodes
here are the bounding boxes obtained after the text detection step. The definition of the
edges of the graph belongs to the Graph modeler section, which will be covered in
more details below. We will describe how to build the initial feature for the nodes of
the graph. We build and aggregate features from many different attributes as follows:
Boolean feature: Based on the output from the text recognition model, we build
properties like:
• isDate: is it a date (1/0)
• isZipCode: 6 characters in 1 zipcode available area code (1/0)
• isKnownCity, isKnownDept, isKnownCountry: respectively check if the text is the
name of the department, department, city or country (1/0)
• nature: contains 8 elements, in turn checks the properties including: isAlphabetic,
isNumeric, isAlphaNumeric, isNumberwithDecimal, isRealNumber, isCurrency,
hasRealandCurrency and mix. We will get an 8-dimensional binary vector corre-
sponding to 8 sub-attributes.
• Numeric feature: the relative distance from the current text box to the corresponding
4 boxes (Top, bottom, left, right).
• Text feature: based on the output of the text recognize model, we use common
models Glove to get the word embedding vector.
Finally, we “join” all those attributes and get a 317-dimensional feature vector
(1 + 1 + 3 + 8 + 4 + 300) as the initial feature node for each node (each text box) in
the graph.
3.4 Graph Modeling

Similar as above, we mentioned numeric feature, these 4 features about coordinations
are based on relative position to 4 text boxes above, below, left, and right as shown in
Fig. 2 below.
606 B. T. Hung
Fig. 2. Relative position of 4 text boxes
With 4 parameters RDL, RDR, RDT, RDB, it is calculated as follows:
RDL ¼ ðRightðWordLeft Þ LeftðWordSource ÞÞ=WidthPage

RDT ¼ ðBottom WordTop TopðWordSource ÞÞ=HeightPage

RDR ¼ ðLeft WordRight RightðWordSource ÞÞ=WidthPage
RDB ¼ ðTopðWordBottom Þ BottomðWordSource ÞÞ=HeightPage
For example, with RDL, it will be calculated based on the coordination of the
bounding boxes (output of model text detection), or by the distance from the bounding
box Source to the bounding box Left, and then divided by the width of the image,
similar to the parameters which are similar. Figure 3 shows the example of coordinates
of the bounding boxes.
Fig. 3. The example of coordinates of the bounding boxes

In addition, there is one more thing to pay attention to when building a graph for
each receipt. For example, in the image above, the lines between the text boxes have
been shown quite clearly. But if we take a closer look, it could be noticed that there is
no connection between the two text boxes, anticipé and le. The reason is that deter-
mining which text boxes are connected to each other would need to follow some rules
as follows:
Firstly, it needs to consider four sides (top, bottom, left, right) and determine the
corresponding RDL, RDR, RDT, RDB by selecting the text boxes with the closest
distance.
The order of priority will be from top to bottom, from left to right. And 1 direction
has only 1 line connecting to another text box.
As the example above, since the text box anticipé and fois have already been
connected, there will be no connection between anticipé and le. Although the two text
boxes fois and le are located directly below and have the same distance to the anticipé
text box. Below, Fig. 4 is an illustration showing the nodes and edges on an recepit in
the SROIE dataset [15].
Fig. 4. The nodes and edges on an recepit in the SROIE dataset
3.5 Spectral Graph Convolutional Network

We use spectral graph convolutional network model for the node classification task. As
we presented in Sect. 2, spectral graph convolutions do not scan the input feature
matrix sliding a filter, they exploit the Convolution Theorem from signal processing,
608 B. T. Hung
which states that an expensive convolution in the spatial domain can be computed as a
cheap multiplication by transforming the signals first to the frequency domain.
Inference with a Spectral GCN is slightly peculiar, since the model only works on a
network with a fixed number of nodes. The idea is to use a semi-supervised learning
setting. The model takes as input a partially-labeled dataset and outputs labels for all
the nodes (including those that where unlabeled in the input).
Kipf et al. proposed to the following forward-propagation formula to model a
spectral convolutional layer [16]:
Ht þ 1 ¼ dðLHt Wt Þ
Where,
Ht þ 1 : output tensor (feature matrix)
d: activation function
L: a normalization of the adjacency matrix of the graph
Ht : input tensor (feature matrix)
Wt : Learnable weights
Note that L is not a learnable parameter, and is the same for all layers. In Kipf’s
approximate convolutions on graphs, L is precomputed according to this formula:
L ¼ D1=2 AD1=2
where,
D: diagonalized row sum
A: adjacency matrix with self-connections added to each node
Replacing L in the above by its components:

Ht þ 1 ¼ d D1=2 AD1=2 Ht Wt
By using the advancetage of spectral graph convolutional network, we can classify

the node to four entities: company name, company address, date and total cost.
4 Experiments
4.1 Dataset
We used SROIE dataset in the experiments. SROIE (Scanned Receipts OCR and
Information Extraction) is the dataset used in the RRC Competition - ICDAR 2019
[15]. This dataset includes 3 subtasks: text detection, text recognition and key infor-
mation extraction. Some receipts of this dataset are shown in Fig. 5.
Fig. 5. Some receipts of SROIE dataset [15]
When we analyzed the dataset, we saw that the SROIE dataset still has some cases
where the annotate data is not correct, such as the labels of the text boxes. This is
unavoidable and easy to confuse the model. To ensure the effectiveness, we did pre-
processing to limit false-positive cases and filter false-negative as follows:
– For all text boxes in the image, set a threshold to filter out cases with low confidence
scores, for example 0.7. If it is lower than threshold, then re-label it as other.
– For all receipts, there is only one text box with the label total or date. So taking the
output with the highest probability for each text box can lead to the appearance of
many text boxes of total/date. We simply took the text box with the highest con-
fidence score.
– In some cases, the text boxes are mistakenly predicted to be total (eg, text-boxes
with content of Total Cost, …) which should be numeric text boxes in the same row
on the right. The wrong prediction is also partly due to the incorrect annotate data
set. The simplest way to handle this case is to take the text box with the highest
confidence score, corresponding to the total label and align it to the right, if there is
another text box, then reassign the label total to that text box.
We use the spectral Graph Convolution Network model of pytorch_geometric [17],
Tensorflow [18] and Keras [19] library to train the model. Parameters used in the model
are shown as follows: 4 GCN layers, the first layer learns 16 different filters (this can
learn 16 different graph connection patterns), and the second layer learn 32 filters; the
rest of task to recognize 32 combinations of the 16 patterns is learned in layer 1. This
hierarchical compositionality property, also found in CNN’s, gives the model the
power to generalize to unseen layouts.
For each test receipt image, the extracted text is compared to the ground truth. An
extract text is marked as correct if both submitted content and category of the extracted
text matches the ground truth; otherwise, it is marked as incorrect. F1 score is com-
puted based on Precision and Recall. The results we got shown in Table 1.
610 B. T. Hung
Table 1. The results of the model

Extracted entities Precision Recall F1
Company name 0.87 0.84 0.85
Company address 0.93 0.93 0.93
Date 0.94 0.96 0.95
Total cost 0.93 0.93 0.93
Micro average 0.9175 0.915 0.915
Looking at the result, we saw that some results on the test set are wrongly predicted
on each text box, but overall the results are quite good. Some results are shown in
Fig. 6.
Fig. 6. The results of spectral GCN model
5 Conclusion
In this paper, we proposed the spectral Graph Convolution Network model for infor-
mation extraction from receipts which has achieved outstanding results. For evaluation
purpose, we have done experiments on SROIE and evaluated by Precision, Recall and
F1-score. We also did further technical work to preprocess the dataset to gain better
results. Experiments show that the proposed model gives archiving results. In the
future, we expect to exploits another model of Graph neural network to find the best
model for this task. We will combine GNN models with some natural language pro-
cessing technique to find the optimal model for the information extraction from
receipts.
References
1. Hung, B.T., Tien, L.M.: Facial expression recognition with CNN-LSTM. In: Kumar, R.,
Quang, N.H., Kumar Solanki, V., Cardona, M., Pattnaik, P.K. (eds.) Research in Intelligent
and Computing in Engineering. AISC, vol. 1254, pp. 549–560. Springer, Singapore (2021).
https://doi.org/10.1007/978-981-15-7527-3_52
2. Hung, B.T., Semwal, V.B., Gaud, N., Bijalwan, V.: Violent video detection by pre-trained
model and CNN-LSTM approach. In: Singh Mer, K.K., Semwal, V.B., Bijalwan, V.,
Crespo, R.G. (eds.) Proceedings of Integrated Intelligence Enable Networks and Computing.
AIS, pp. 979–989. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6307-6_
99
3. d’Andecy, V.P., Hartmann, E., Rusiñol, M.: Field extraction by hybrid incremental and a-
priori structural templates. In: 2018 13th IAPR International Workshop on Document
Analysis Systems (DAS), April 2018, pp. 251–256 (2018). https://doi.org/10.1109/DAS.
2018.29
4. Rusiñol, M., Benkhelfallah, T., dAndecy, V.P.: Field extraction from administrative
documents by incremental structural templates. In: 2013 12th International Conference on
Document Analysis and Recognition, August 2013, pp. 1100–1104 (2013). https://doi.org/
10.1109/ICDAR.2013.223
5. Schuster, D., et al.: Intellix – end-user trained information extraction for document archiving.
In: 2013 12th International Conference on Document Analysis and Recognition, August
2013, pp. 101–105 (2013). https://doi.org/10.1109/ICDAR.2013.28
6. Palm, R.B., Winther, O., Laws, F.: CloudScan - a configuration-free invoice analysis system
using recurrent neural networks. In: 2017 14th IAPR International Conference on Document
Analysis and Recognition (ICDAR), vol. 1, pp. 406–413. IEEE (2017)
7. Hung, B.T.: Combining syntax features and word embeddings in bidirectional LSTM for
Vietnamese named entity recognition. In: Balas, V.E., Solanki, V.K., Kumar, R. (eds.)
Further Advances in Internet of Things in Biomedical and Cyber Physical Systems. ISRL,
8. Hung, B.T.: Document classification by using hybrid deep learning approach. In: Vinh, P.C.,
Rakib, A. (eds.) ICCASA/ICTCC -2019. LNICSSITE, vol. 298, pp. 167–177. Springer,
9. Zhang, S., Tong, H., Xu, J., Maciejewski, R.: Graph convolutional networks: algorithms,
applications and open challenges. In: Chen, X., Sen, A., Li, W.W., Thai, M.T. (eds.) CSoNet
2018. LNCS, vol. 11280, pp. 79–91. Springer, Cham (2018). https://doi.org/10.1007/978-3-
030-04648-4_7
10. Song, L., Zhang, Y., Wang, Z., Gildea, D.: N-ary relation extraction using graph state
LSTM. arXiv preprint arXiv:1808.09101 (2018)
11. Wang, S., Zhang, Y., Che, W., Liu, T.: Joint extraction of entities and relations based on a
novel graph scheme. In: IJCAI, pp. 4461–4467 (2018)
12. Marcheggiani, D., Titov, I.: Encoding sentences with graph convolutional networks for
semantic role labeling. arXiv preprint arXiv:1703.04826 (2017)
612 B. T. Hung
13. Gui, T., et al.: A lexicon-based graph neural network for Chinese NER. In: Proceedings of
the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),
pp. 1039–1049 (2019)
14. Szegedi, G., Veres, D.B., Lendák, I, Horváth, T.: Context-based information classification
on Hungarian invoices. In: ITAT, pp. 147–151 (2020)
15. Huang, Z., et al.: ICDAR 2019 competition on scanned receipt OCR and information
extraction. In: 2019 International Conference on Document Analysis and Recognition
(ICDAR), pp. 1516–1520. IEEE (2019)
16. Thomas N Kipf and Max Welling. (2016) Semi-supervised Classification with Graph
Convolutional Networks. arXiv preprint arXiv:1609.02907
17. Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch geometric. In: ICLR
Workshop on Representation Learning on Graphs and Manifolds (2019)
18. Chollet, F.: Keras (2015). https://github.com/fchollet/keras
19. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of
the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI
2016, pp. 265–283 (2016). https://doi.org/10.1007/s10107-012-0572-5
An Improved Shuffled Frog Leaping Algorithm
with Rotating and Position Sequencing in
2-Dimension Shapes for Discrete Optimization
Kanchana Daoden(&)
Department Smart Electronic Engineering, Faculty of Industrial Technology,

Uttradit Rajabhat University, Tha It, Thailand
kanchana.dao@uru.ac.th
Abstract. Various optimization algorithms propose to solve the 2D cutting

problem. The shuffled frog leaping algorithm (SFLA) is the heuristic search
algorithms consisting of local exploration, and global search is improved SFLA
by enhancing the local exploration step by using BL and BLF to determine the
best distance in the frog’s moving position stage of the leaping step. Then let
SFLA perform the actions of converging to the optimum point in the global
exploration stage. This improvement is the proposed approaches are called 1)
the improved SFLA with BL and 2) the improved SFLA with BLF. After
improving the proposed method, SFLA with BLF is selected due to the apparent
efficiency, then brought to comparative testing with two other techniques,
Genetic Algorithm (GA) with BLF and The Adaptive No Fit Polygon (ANFP).
The proposed algorithm is tested on an instance ESICUP data set provided for
the trial. The computational results show that the proposed algorithm performs
well on the convergence of times, followed by the GA with BLF and the
ANFP. On the other hand of per cent of the waste area. The Adaptive NFP
method provided the lowest percentage of the material waste area, then the GA
with BLF and finally, SFLA with BLF, respectively.
Keywords: Shuffled frog leaping algorithm (SFLA) Two dimensional cutting

and packing problem Heuristic
1 Introduction
The cutting and packing problem is a considerable problem in many fields and the
control system in the manufacturing process [1]. The optimization problem involved
finding the proper arrangement for the production in the fixed area to minimize material
waste and save production costs [2].
E. Hopper summarises the 2D packing problem with other meta-heuristic such as
GA, simulated annealing, native and stochastic optimization [3]. The efficiency algo-
rithms have been tested on the different size packing items and many pieces of the
polygons in the limited area, the comparison results of every technique shown in the
efficiency result graphs with the general techniques.
Baosu Guo presented better packing results by rotating the materials freely during
the packing process [4]. The equal angle range is often used in existing packing

https://doi.org/10.1007/978-3-030-93247-3_60
614 K. Daoden
algorithms regardless of shape characteristics which may miss the best packing posi-
tions with the reduced packing quality. The research solves this problem by an irregular
packing algorithm is proposed based on a principal component analysis methodology.
The main components of the convex shape are calculated. The rotation angles are then
searched according to the first principal components. After that, the irregular shape
rotates and contains the rotation angle. The results showed that filling times reduced,
and material utilization improved with the proposed algorithm.
This paper is structured as follows: The introduction is present in Sect. 1. In Sect. 2,
the advantage of the BL algorithm, BLF algorithm, and the original SFLA are
explained, respectively. Then, in Sect. 3, the proposed method of the improving SFLA
algorithm is described. Section 4 shows the results of the improving SFLA with BLF,
compared to the GA and ANFP. Section 5 is the discussion, conclusion and future
work.
2 SFLA with BL and BLF

2.1 BL (Bottom Left) and BLF (Bottom Left Fill)
In the cutting and packing problem, the placement algorithm uses to arrange the
different sizes of items [5]. E. Hopper was present in 2000 about the Bottom Left
algorithm (BL). This algorithm developed the 2D boxes and arranged them in the
warehouse. The capability of BL starts from the input of the boxes from the right and
top side of the container to the left and bottom side to gain more space. The arranging
process works with the horizontal and the vertical. The BL method would evolve an
arrangement until all of the positions are satisfied.
The Bottom Left Fill (BLF) method develops from the original BL method; the
difference between the BL and BLF is that BLF can fill holes or empty spaces of
rectangles area later, while the BL method cannot fill those gaps. Then space can be
considered to be wasted. An important of these algorithms is that the sequence of
rectangles can significantly impact the solution quality. Therefore, this study preferred
the BLF algorithm as a tool to arrange 2D polygon items.
Figure 1. (a) shown the BL algorithm. The polygons sequentially place from
numbers 1, 2, 3, 4, 5, 6, 7, and 8. The polygon number 8th will be placed from the
above right side. However, after the Bottom Left (BL) algorithm is applied, the
worthless empty spaces remain. Thus, from Fig. 1. (b), the Bottom Left Fill
(BLF) algorithm can resolve the space problem from BL. It can explore and keep
arranging items to the bottom-left as much as possible. It also scans the remaining
space, whether it remains adequate for the following box. Hence, when the 8th box
comes in a row, BLF would examine space for the 8th box accordingly and place it
appropriately. Thus, this method is somewhat helpful and efficient for solving a space
with a maximum appropriately allocated. When the number of polygon items is
ordering in the plane, BLF is simple and easy to apply for the limited placement layout
or large area.
An Improved Shuffled Frog Leaping Algorithm 615
Height
8
7 7
8
6 6
3 4 5 3 4 5
1 2 1 2
(a) (b)
Fig. 1. (a) BL algorithm. (b) BLF algorithm.
2.2 Shuffled Frog Leaping Algorithm (SFLA)

SFLA is the heuristic search method, solving the discrete optimization problem to find
the nearest optimum solution. The SFLA was presented by Eusuff and Lansey [6] in
2006. The algorithm has imitated the search behaviour of the frog. The frog’s initial
population (F) is separated into a subgroup and sorting the frog in descending order
from their fitness f(i).
The subgroup is called memeplex and consists of n frogs, the calculating of all
populations equal F = m n. The first frog moves the position in the first memeplex
while the second frog moves in the second. The following steps are the mth frog in the
mth memeplex and the (m + 1)th frog moving in the first memeplex until all the frogs
fully group to each memeplex. After that, all of the frogs would update their position in
each group. In this process, we call the local search.
The frog with the highest fitness value in the memeplex is called the best frog (Fb),
and the lowest is the worst frog (Fw). The evolution process starts with the position of
the Fw is updated using the frog leaping process. The new position of the frog moving
from the worst to the best call Fw(new). If it has a better fitness value than the old, the
fitness value is replaced by Fw(new). If not, the global best (Fg) frog is used instead of
Fb in the above adjusting process.
Suppose the fitness value is still un-improvement. A frog would randomly generate
a new frog, called this step the censorship, then all the frogs in the entire population are
ranked despondingly by their fitness values. This process is called the shuffling process.
After that, the leaping process of the local search and global exploration steps alter-
nately take place until a predefined convergence criterion is satisfied or the process of
the step reaches the fixed loop. Then SFLA is finishing for the evaluation step.
616 K. Daoden
3 The Proposed Method
As the advantages of each algorithm BL, BLF, and SFLA, this research apply the
optimized result height or a fitness function from the BL and BLF algorithm process as
an input of fitness function in SFLA and to minimize the processing time in SFLA.
Therefore, the frog populations of SFLA are ranking the fitness by the height value in
ascending. While applying BL and BLF aims to examine the possible height that can
fill the hole to minimize the waste and increase the quality of the process. The fitness
function of both the BL and the BLF algorithms is arranging the 2D polygon items with
minimizing the height. The algorithm for Improving SFLA shows in Fig. 2.
Algorithm Improving SFLA()

Begin
1 m: number of subgroups
2 n: number of frogs in each subgroup
3 N: time in a local generation in each subgroup
4 The evaluation of SFLA
5 Begin;
6 F: Initial random frog populations
7 Calculate the fitness for each frog by using BL and BLF
8 Rank all of the populations(F) in the ascending order
9 Explore Fbest(Fb) and Fworst(Fw) using Eq. (3)
10 Separate all of the population F into m subgroup
11 Inside each subgroup
11.1 The Shuffled and Leaping process
Explore Fb and Fw
Moving in frog position using Eq.1)
The new position Fw(new) of the worst frog using Eq.(2)
If Fw(new Fw current then exchange these two frogs;
Otherwise use Fg instead of Fb.
If the condition not satisfied, the process will random the
new frog called censorship process.
Do while until finish the generation
11.2 Improving SFLA in the local exploration
Updating position of the Fb and Fw by calculating the D using
Eq.(1)
Euclidean Distant Apply in the local search
1. Updating frog's position of Fw(new) by Smax
2. If Fw(new) will duplicate sequence, Euclidean distance
choose the shortest path, using Eq.(4)
3. Repeat until the finish in the sequence.
12 End if
13 Shuffling the evolution in the subgroup
14 Rank the population (F) in ascending order of their fitness
15 Check until the generation equal true
16 End of loop
Fig. 2. An improved SFLA algorithm.
3.1 The SFLA Model

Each frog indicated a sequence of box indexes, both BL and BLF method uses the
indicated in the procedure. The algorithm arranges the 2D items to gain minimal height
value into a limited area, and the height is the fitness function in this case. The frog
updates their position from the worst frog (Fw) to the best frog (Fb), called the leaping
step. The Improving SFLA has checked the validity of the frog indicated. Inside the
sequence, the number of moving positions could calculate and appear only one time. If
the same number appears more than once, it is the frog’s invalid sequence, and we
presented the new method. Since the leaping rule in the original SFLA, the new frog
from the updating process, according to Eq. (3), might result in an invalid sequence.
~ ¼ randðÞ ðFb Fw Þ
D ð1Þ
~
Fw ðnewÞ ¼ Fw þ D ð2Þ
Fw ðnewÞ ¼ Fw þ minðjjrand ðFb Fw Þjj; Smax Þ ð3Þ
When the position vector updating which represented by a vector D ~ in Eq. (1).
Which rand is a uniformly distributed random number in the range [0,1]. Fw(new) is
the new position of Fw, and Smax is the maximum step size allowed in local exploration.
3.2 Leaping Process

From Fig. 3, the worse frog attempts to jump toward the best frog. Define the random
number; in this case, rand = 0.4105 and Smax = 4.
Index 1 2 3 4 5 6 7 8 9
Fb 4 5 3 2 8 1 6 7 9
Fw 1 5 2 4 3 6 7 9 8
Fb-Fw 3 0 1 -2 5 -5 -1 -2 1
Updating 2.2 5.0 2.4 4.0 5.1 6.0 7.0 9.0 8.4
Fw(new) 2 5 2 4 5 6 7 9 8
Fig. 3. Example of invalid Fw(new) sequences.
Each number in the updating sequence is rounded to the closest integer = {2.2, 5.0,
2.4, 4.0, 5.1, 6.0, 7.0, 9.0, 8.4}. The integer sequence is Fw(new) = {2, 5, 2, 4, 5, 6, 7,
9, 8}. The new frog shown in Fig. 4.
Index 1 2 3 4 5 6 7 8 9
Fw(new) invalid 2 5 2 4 5 6 7 9 8
Invalid values
Fig. 4. Example of invalid Fw(new) sequences.
Figure 4 shows the invalid sequence. Some number appears more than one time in
the sequence. In this experiment, Number 2 appears twice at locations 1 and 3, and
number 5 at locations 2 and 5.
618 K. Daoden
We consider index number 2 in Fig. 4. The number of the worst frog in this index
value is 5, trying to move to 5. Therefore, the only possibility is only 5. It is clear that
the number value of the worst frog is in the lower bound, and the best frog is in the
upper bound when the frogs are moving the position. The valid sequence must consist
of the unique number, in this case, 1 to 9. The invalid sequence in this example could
be changed to a valid sequence if the index number 1, 2, 3, and 5 can fill with numbers
3, 5, 2, and 1.
The select invalid sequence using the Euclidean distance to calculate in Eq. (4).
The distance between p = (p1, p2) and q = (q1, q2) in 2-dimensional space. It uses to
calculate the distance between each of the remaining possibilities and the original
invalid sequence. Then, the valid sequence with the smallest distance is then selected.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u 2
uX
d ðp; qÞ ¼ t ðpi qi Þ2 ð4Þ
i¼1
The Euclidean distance is valid from Fw to the best valid sequence shown in
Table 1. The shortest Euclidean distances are d1 and d3. The Euclidean distance closest
valid from Fw(new) to the best valid sequence.
Table 1. The Euclidean Distance closes from Fw(new).

Index 1 2 3 4 5 6 7 8 9 Euclidean distance
d1 1 5 2 4 3 6 7 9 8 1.414
d3 1 5 2 3 4 6 7 9 8 1.414
This calculated value is an equivalence. Therefore, either d1 or d3 could selected as

the new, improved location for Fw(new).
3.3 Testing Data Sets

This research tried to test three different main variants of the algorithm as foremen-
tioned and tested by using the two data set of ESICUP instances with 2D irregular
shapes available in the EURO Special Interest Group on Cutting and Packing (ESI-
CUP) website (https://www.euro-online.org/websites/esicup/) [10].
4 The Experimental Results
4.1 Experimental and Parameters Setting

The parameters setting of the experiment is shown in Figs. 4, 5 and 6 when the initial
frog population is 100. We already tested these parameters and chose the optimized set
of parameters to use in the experiment. Define F = 100, m = 4, n = 25, q = 15, N = 1,
i = {1,2,3,…,n} and Smax = 100% The setting initial population of shuffled frog
leaping algorithm (SFLA) and genetic algorithm (GA) start with 100 random popu-
lations. One SFLA consists of a sequence of two-dimensional shapes. For the GA
parameters, the crossover rate is 0.5, and the mutation is 0.8.
4.2 The Experimental Results

A. Experimental with 2D Shapes Waste Area Results
The testing results from using 2D Irregular Strip Packing Problems (FU) from
FUJITA/AKAGJI/KIROKAWA (1993) and (JAKOBS) from JAKOBS (1996) of
EISCUP data set which each item has a different in shape width and height. The results
are represented as follows.
SFLA with BLF GA with BLF ANFP
Fig. 5. FU 11 piece of the dataset.
Fig. 6. Jakobs1 25 pieces of the dataset.
Fig. 7. Jakobs2 24 pieces of the dataset.

620 K. Daoden
Each problem testing consists of a sample set, fixed material area, the number of
data sets and the optimal height (fitness function). Figures 5, 6 and 7 are the arranged
results of comparing SFLA with BLF, GA with BLF, and ANFP by three data sets,
respectively. Figure 5 shows the alignment results for the FU11 data set (11-piece
shape). The arrangement of the Jakob1 data set, which has 25 shapes, are present in
Fig. 6. Figure 7 shows the result for the 25 and 24 pieces of the Jakob2 data set.
Meanwhile, visually assessing the results in Figs. 5, 6 and 7 shows that the ANFP
method can be arranged to waste. The lowest area in the FU11 dataset is a 2D irregular
shape with less complex shapes and a smaller number of items than the Jakob1–2
dataset.
Table 2. The results are the efficiency of the three types of different algorithm arrangements.
The number shows the waste area in percentage (%) before and after the arrangement, which all
three algorithms are available through the fixed area.
Example dataset SFLA with BLF GA with BLF ANFP
Fu 9 pieces 47.19 46.02 44.57
Fu 24 pieces 37.22 33.03 31.42
Jakob1 5 pieces 24.68 21.10 18.17
Jakob1 10 pieces 45.78 44.41 42.34
Jakob1 20 pieces 37.43 35.05 34.77
Jakob1 25 pieces 36.31 36.08 35.74
Jakob2 8 pieces 43.92 41.61 40.18
Jakob2 16 pieces 27.34 25.94 24.37
Jakob2 24 pieces 34.86 34.67 34.55
Average 37.19 35.32 34.01
Nonetheless, when testing all three methods with FU, Jakob1 and Jakob2 data sets
with different parts numbers, the arranging results by the average percentage of the
waste area are shown in Table 2. The comparison results found that the proposed
algorithm (the SFLA with BLF) returned 37.19%, the GA with BLF gave 35.32%, and
the ANFP algorithm produced 34.01%.
B. Experimental Results on Convergence Rate
The efficiency of the SFLA with BLF, GA with BLF, and ANFP shows in Fig. 8. The
y-axis is the average of the fitness function in the experiment using the minimum height
when the iteration number is on the x-axis—the SFLA with BLF legend by a solid line
with a black dot. The GA with BLF results shows a different line; a solid line with a
black triangle, and the ANFP is a solid line with a white dot. In the case of 20th
iterations, the SFLA with BLF shows a better convergence performance than the GA
with BLF and ANFP, respectively.
The comparison graph of three algorithms in 20th iterations

500
SFLA with BLF

ANFP
GA with BLF
400
Average of fitness function
300
200
100
0
5 10 15 20
Iterations
Fig. 8. The convergence rate is the optimum comparison for SFLA with BLF, GA with BLF,
and ANFP in 20th iterations.
The comparison graph of three algorithms in 100th iterations

500
SFLA with BLF

ANFP
GA with BLF
400
Average of fitness function
300
200
100
0
20 40 60 80 100
Iterations
Fig. 9. The convergence rate is the optimum comparison for SFLA with BLF, GA with BLF,
and ANFP in 100th iterations.
622 K. Daoden
5 Conclusion
This paper proposed the application of the shuffled frog leaping algorithm (SFLA),
improving with the arrangement of the Bottom left (BL) and bottom left fill algorithm
(BLF). The aim is to optimize time and minimize the waste area of the material. The
results showed the trend comparison between three methods; SFLA with BLF has
given the best performance in terms of time process. GA with BLF given the second
performance, and ANFP is the last performance, as shown in Figs. 8 and 9. Never-
theless, if considering the waste area after arranging the process, the ANFP presents a
better result than SFLA with BLF and GA with BLF by giving the per cent of waste
area 34.01, 35.32, and 37.19, sequentially. In case of application, as the results of the
performance examination on the cutting problem. In the industry, manufacture products
under the condition of expensive production materials. The material cutting layout
requires careful planning consideration. The ANFP method is probably the one to
consider as it minimizes waste, which means lower cost. Regards of that industry have
been challenged by processing time; the SFLA with BLF method is preferred, while the
GA with BLF method is suitable when speed and waste are not significant. In other
words, this method slightly balances between time and loss material. In future work, we
expect to improve the capabilities of SFLA-BLF in reducing the waste area of pro-
duction materials or developing ANFP to run faster.
References
1. Gerhard, W., Heike, H., Holger, S.: An improved typology of cutting and packing problems.
Duropean J. Operat. Res. 183, 1109–1130 (2007). Science Direct
2. Andrea, G., Abdur, J., Massimo., L., Fabio., S.: Solving the problem of packing equal and
unequal circles in a circular container. J. Glob. Optim. (2010)
3. Eva, H.: Two-dimensional packing utilizing evolutionary algorithm and other meta-heuristic
methods. A thesis submitted to the University of Wales for the Degree of Doctor of
Philosophy (2000)
4. Baosu, G., Qingjin, P.: Irregular Packing Based on Principal Component Analysis
Methodology. IEEE Access (2018)
5. Bili, C., Yong, W., Shuangyuan, Y.: A hybrid demon algorithm for the two-dimensional
orthogonal strip packing problem. In: Mathematical Problems in Engineering, Hindawi
Publishing Corporation, vol. 2015, p. 14 (2015). Article ID 541931
6. Muzaffar, E., Kevin, L., Fayzul, P.: Shuffled frog leaping algorithm: a memetic meta-
heuristic for discrete optimization. Eng. Optim. 38(2), 181–184 (2006)
7. Faina, L.: A survey on the cutting and packing problems. Bollettino dell’Unione Matematica
Italiana 13(4), 567–572 (2020). https://doi.org/10.1007/s40574-020-00253-6
8. Kanchana, D.: An adaptive no fit polygon (NFP) using modified SFLA for the irregular
shapes to solve the cutting and packing problem. Int. J. Adv. Sci. Technol. (IJAST). 29(8),
1046–1064 (2020)
9. Yi-Bo, L., Hong-Bao, S., Xiang, X., Yu-Rou, L.: An improved adaptive genetic algorithm
for two-dimensional rectangular packing problem. Appl. Sci. MDPI 11(413) (2021)
10. ESICUP. https://www.euro-online.org/websites/esicup/data-sets/
Lean Procurement in an ERP Cloud Base
Adrian Chin-Hernandez1 , Jose Antonio Marmolejo-Saucedo1(B) ,

and Jania Saucedo-Martinez2
1
{0018607,jmarmolejo}@up.edu.mx
2
Facultad de Ingenierı́a Mecánica y Eléctrica, Universidad Autónoma de Nuevo
León, Av. Pedro de Alba s/n Cd. Universitaria, 66451 San Nicolás de los Garza,
Nuevo León, Mexico
jania.saucedomrt@uanl.edu.mx
Abstract. The purpose of this paper is to propose a program to be

used in a cloud base ERP and use the principles of lean procurement
including all the process of purchase under only one application, with
the proposal suggested in here, the process of ordering, approvals, receiv-
ing, and invoicing, is intended to be clear an easy to manage in all the
purchase chain. The solution is aimed to be for medium and large com-
panies. But can be used for small companies if they are vendors for a big
enterprise that can implement the solution proposed. This solution will
provide how by using an ERP cloud base solution, lean procurement can
be easily implemented, providing examples of companies that have done
this transition.
Keywords: Lean procurement · S&OP · MRP · Lean manufacturing ·

ERP cloud base · SAP · Ariba
1 Introduction
Lean manufacturing for procurement is a production practice that considers all
the expenditure of available resources to obtain economic value to the procure-
ment process without any waste, and this waste is the objective to reduce. Lestari
(2015).
There are five main key elements in making a process lean:
– Value, is given by the final customer

– The value stream, all the steps involved in the process to bring the product
to the final client
– Flow, value adding steps in sequence in order to produce the article or provide
the services
– Pull let the customer reach for your product
– perfection, seeking excellence by reducing the waste

https://doi.org/10.1007/978-3-030-93247-3_61
624 A. Chin-Hernandez et al.
Womack and Jones (1996)

The Value is given by the customer, Decker and Stead (2008) Products or services
should be intended for and with clients, and be set at the right price.
Value stream, Rother and Shook (2003) all the steps even the ones that does
not add value to the process are included to produce the product or provide the
services.
Perfection, when the scenically flow is identified, no more waste will be pro-
duced, this is called perfection in the lean thinking, that is the main goal on
this paper to reduce all the waste in purchasing Lestari (2015), by using elec-
tronic tools Enterprise Resource Planning (ERP) that can be used and that are
actually used by other companies.
Lean manufacturing is important it will reduce waste, increases efficiency and
improve competence. The benefits of using lean thinking in procurement are:
– Improved lead times: better relationship with supplier making easy to see the
status of the purchase orders and the requirements.
– Sustainability: Less waste and better innovation to make business.
– Employee satisfaction: provide employees not only with repetitive task and a
lot of different solution to provide the requirements to the enterprise. Given
them the time not to only to react or to act, but to plan.
– Increased profits: productivity with less waste makes for a more profitable
company.
Companies now have realized that having a lean implementation, Afonso et
al. (2021) will maximize value and reduce waste saving money and have a better
quality-product, satisfied customers and improvement of processes.
In order to achieve customer satisfaction and continue in the market, Susanto
(2018) a company needs to have all the inventory that they will need to produce
or to sell or not to have the inventory that is not required. If a enterprise can
adapt to the necessities of the customer, the inventory levels need to be able to
adapt as well.
Therefore the inventory and the reorder of the raw materials used in the
process of productions are essential in order to complete the requirements for
the customers, adapting to change, innovation has to take place in the company,
innovation is by far one of the most important competitive priorities in the
current business context.
The supply base to any company place a major roll, since they will be the
once that initiates the process of production or provide the tools or accessories
necessaries for bringing a service. For this reason the innovation hast to play a
mayor roll in the equation Luzzini and Ronchi (2011).
In order to avoid the lack of material in the production line, the company can
implement several methodologies that can avoid this situation. Starting with a
sales-and-operations-planning (S&OP) process which is used by many businesses
to coordinate energies among the finance, operations, marketing and sales, in this
plan the sales teams has to come with an estimated of how much do they plan
to sell in a period of time, with this plan, all the company can plot accordingly
Tuomikangas and Kaipia (2014).
Lean Procurement 625
Fig. 1. Process overview: production planning and control.
Component delivery planning in supply chains is a crucial issue for companies.

By optimizing component supplies enterprises can save money and increase the
customer satisfaction. Hnaien et al. (2008) in order to have all the components
in the time that are required a tool called Material Requirement Planning can
be use. This (MRP) will propose the purchase requisitions, that are required
along with the lead time, minimum and maximum of the stocks, and among the
other master data. This purchase requisition can be converted into a Purchase
Order with the approval of a higher level. As can be seen in Fig. 1.
Users can have a complete view of every stage of a product planning, by
having an end-to-end solution that makes use of lean manufacturing processes
with digital technologies. This allows to involve everyone in the company, from
designers to engineers, including those in shop floor and purchasers Baecker
(2012).
By reducing the extra steps in the manufacturing process you start to improve
the principal of identify the value stream, by recognizing which is your waste.
In order to give the clarity, flexibility and the reduction of the extra steps,
that are required, an Enterprise Resource Planning (ERP) can be used, for the
purpose of this paper, SAP Ariba and SAP S/4Hana.
2 Central Concept
Developing a strong business case is a prerequisite for successful SAP implemen-
tation. An evaluation study of 120 SAP projects shows that 70% were initiated
with a business case Stevens (1998). A global survey reveals that 92% of the
responding companies develop business cases for SAP implementation Cooke
and Peterson (1998). A solid business case will convince people to change Al-
Mashari and Zairi (2000).
As an integrated backend application, SAP ERP has tens thousands of instal-

lations around the globe, focused on tracking and managing processes in both
midsize and big enterprises Rolia et al. (2009). From its foundation in 1972,
SAP’s vision remains to provide integrated applications to support real-time
business processes. On its first applications, SAP ERP supported early business
process such as Procurement, developing a purchasing solution. Customers and
suppliers have taken advantage of the computing applications and interactions
caused by the changes Internet brought. Subscriptions to SAP solutions, such as
SAP Ariba and SAP S/4HANA Cloud are available via web browsers with only
an internet connection on almost any smart device with a browser Chowdary
(2019).
The SAP Ariba is a cloud-based ground-breaking software that recreate the
process of buying, selling, and managing cash. Supporting the end-to-end pro-
cesses of procurement from sourcing the materials from a fitting supplier and
manufacturing the end-product and selling product to the customers. Choosing
a qualified vendor according to their requirements and generates the value for
spend across all the supply chains. Using SAP Ariba, the value stream or work-
flow is reducing extra steps done in the process of procurement Yarramalli et al.
(2020).
The main goal of this research is to provide an integral solution for procure-
ment and reduce the unnecessary steps or waste, considering the next phases for
internal acquisitions to be improve.
– Ordering,
– Approvals,
– Receiving,
– Invoicing
This can be seen in Fig. 2.
Externally, the supplier will be able to track your request, and provide the
merchandise or the raw materials when needed it, since they now have the vis-
ibility when the materials are required, where, how many of them and with a
number that can be easily identify for them and for the company this number
can be called Purchase Order (PO).
The waste or extra steps can be excess inventory, pointless operations, scrap,
rework or extra movements even inside the storage location. The core feature
of this concept is that by reducing waste activities, more resources are made
available to concentrate on those activities that add value to the product or
service Lestari (2015).
By unifying all the functions of procurement under one single tool included
not only stock room, but also the finance part and the suppliers. Sap Ariba is
the world’s leading e-purchasing platform, allowing buyers to act proactively and
anticipate problems before they occur, both to avoid them and to prepare for
them. SAP’s vision is that, in the future, procurement departments could be a
strategic role and make substantial impacts on their firms’ and their ecosystems’
revolution, reputations and overall the way the company delivers Allal-Chérif
et al. (2021).
Fig. 2. Process overview: production planning and control. Dickersbach and Keller
(2011)
In order to supervise your supply chain and collaborate with the vendors in
an effective way, a visibility to all the parts of the procurement process needs to
be maintain, this visibility can be done with the proposal solution, since being
a cloud base, the access can be granted to view the status of the procurement
process.
The aim of this research is propose an ERP cloud base to reduce any extra
steps in the purchase process.
Purchasing is a part of the supply chain that expanse all over the business pro-
cess from planning the materials, finding vendors, comparing quotations among
the existing vendors, transportation to make the goods receipt in the warehouse.
Matching the supplier with the area that needs the materials or services, in the
right time, quantity and quality Kappauf et al. (2011).
Determining the requirements is one of the key activities for purchasing,
finding a supplier and compare quotations, have the goods receipt, and invoice
verification as can be see in Fig. 3.
Under the proposal solution, it can reduce the different systems that are used
in the activities or purchasing, helping to standardize the process and having all
the relevant data under only one application. One application that can connect
the supplier and the buyer, even when the buyer is using a solution that is
different from the buyer.
3 Solution
Any ERP requires big effort to be implemented, not only requires an important
amount of money depending on the size of the company, but also requires a lot of
effort from the people that know the process, since they will be the responsible
to customize the new system to the company. Any ERP will be integrated for
different areas, sections or modules, which cover all the process for the enter-
prise. In order to be ready to implement SAP Ariba, the information of all the
master data needs to be in place the configuration for the plants involved needs
to be build, this include the finance part, and the structure of the company, how
many storage location exist in the whole company. There is no magic trick with
an ERP cloud base for improve all the procurement chain, in a study perform
by the Dipartimento di Elettronica e Informazione in Italy indicates that imple-
mentation effort not only grows with the number of modules and submodules
that are selected for implementation, but that ERP is found to require increas-
ing resources to be implemented in larger companies and for a higher number of
users, thus indicating that, while there is a technical component of effort that
is independent of the organizational breadth of the project, each user adds an
organizational component of costs Francalanci (2001).
3.1 Ordering
Once requirements are collected, this can come for the MRP which can be con-
sidered as planned or can be non-planned requirements, the purchasing organi-
zation receives this information. It’s not limited only to materials; this ordering
can include services. In the ERP it is called a Purchase Requisition this is a
document that contains a list of necessities. This document must of the time is
only manage by the company and not share with the vendor. This part can be
done in SAP Ariba or done on the ERP SAP S/4 Hana, with all the master data
built before. The reduction of the extras activities starts here, all the materials,
vendor, and definition of the company is already on the system even when the
supplier is using a different solution SAP Ariba is able to join both systems
Purchase of products that are not going to be storage in the stock room can be
done. This document needs to be approved.
3.2 Approvals
Purchase requisition needs approval from the purchasing organization, it can be
the next level above, or can triggered a workflow of authorization depending
on the actual amount to be purchased. The approval can convert the purchase
requisition into a purchase order, this can be part of the configuration of the ERP.
As commented before, in this ERP cloud base, you can have this approval in
almost any device with a browser and an internet connection, it can be approved
in a smart phone with a user and password. Making thing visible from the person
responsible to approve the purchase, with all the information relevant, what is
he approving, what is the total amount, where this product or services will
be received, and who is the vendor, among others. The approval is link with
the ordering or the purchase requisition. Reduction of activities avoiding mails,
extra system or paper where the approval needs to be done. All the information
is under one application.
3.3 Receiving
After the purchase order is approved, the vendor will be able to see this request,
included de material or service that needs to be ship. And the vendor can choose
a date the request will be deliver. When the merchandise or services are received
at the plant it is registered in the ERP System, this activity must be done in
the SAP S4/Hana.
3.4 Invoicing
Once the merchandises are received in the plant, the vendor will be able to
create the invoice for the products he shipped. This step must be done with the
information for the Purchase Orders which come from the purchase requisition,
making all the links. From the beginning, avoiding any possible confusion that
can exist. With this, errors will be pretend to be less, reducing time an effort
from both sides, vendor and supplier, make possible the payment clean and easy.
After the invoice is generated, and validated by the vendor and the buyer,
the payment can be done. Avoiding any common mistake done for both sides,
making the process quick and efficient, including all the links from the purchase
requisitions until the payment to the vendor.
So far, we only have covered the execution of purchase, but SAP Ariba can
include the planning, not only the planning for the MRP execution. It can come
also for the strategic planning that company do at the beginning of the year.
These steps are:
3.5 Contract Management

Planning as in Contract management, the price of the materials can be less when
purchasing more from the same vendor. Contract management can be used in
SAP Ariba. Adding documents, dates of renewal.
3.6 Request for Proposal

Among a pool of vendors that has been approved and created, SAP Ariba has
the ability to compare the same article among vendors, even if the price are
similar, the system has the ability to save the supplier evaluations, so you can
assign a grade to the vendors.
3.7 Discover
SAP Ariba has his own network of supplier, where other vendors for the same
article or service can be found, adding easily to your vendors. Having better
prices an more request for proposal.
3.8 Register
All the process of register the new vendor can be done in SAP Ariba.
Fig. 4. Procurement process including strategic planning
3.9 Quality
Qualifications from vendors will be shown if existing prior of registration. All

the information in one place. This information can be used to decided not to
buy to certain vendor that has been fail in the past.
3.10 Plan
Review the budget, what are your numbers, how much has been expended and
how much is remaining. Fast and easy including in the solution.
3.11 Analyze
Reports from different sources, all under just one application. This reports can
be customize to obtain the information that is relevant to the strategic actions.
The whole process is illustrated in Fig. 4.
The SAP Ariba Cloud Integration Gateway, enabled by SAP Cloud Platform
Integration create new customers integration with the world’s largest Business to
Business network (B2B). Delivering a simple and well-organized way to integrate
with Ariba Network via standards other than cXML. It decreases the deployment
time for providers to integrate with Ariba Network. This is now only available
for Ariba Network supplier-side integration SAP (2017).
A diagram with this explanation can be found in Fig. 5.
4 Success Stories
4.1 British Columbia

BC Hydro is a crown corporation, owned by government and people of British
Columbia, headquartered in Vancouver, Canada. Their mission is to provide
their customers with reasonable power. In order to do that, BC Hydro relies on
a wide supplier network. Among the issues the company was facing were many
non-compliant invoices and many suppliers that were not paid on time, affecting
the supplier relationships. BC Hydro started working with SAP Ariba to deliver
the suppliers with a cloud solution to manage orders and invoices centrally With
SAP Ariba solutions, the company was successful in improving the supplier
relationship as 93% of invoices were paid on time and saved 1 million annually
in accounts payable costs as a result of early payment offers. Some major benefits
to BC Hydro by using SAP Ariba are: - improve supplier relationships, - provide
suppliers with more visibility of invoice and payment status, - reduces accounts
payable costs by 1 million annually, - compliance with internal and external
business rules, - reduces errors.
Methi and Tran (2019).
4.2 Grainger
Grainger is one the world’s leading industrial suppliers, always seeing to improve
the customer experience. Trading to reduce the waste in all the process they
have, trying to achieve a better way the relate with the customers. And for
this customer centricity is that Grainger unified and integrate his procurement
platform. Grainger was able to automate 2.7millon documents per year. SAP
(2021).
“When our customers leverage SAP Ariba solutions to connect with Grainger,
they see improved efficiency and productivity in their operations, streamlined
processes, better visibility into data, and improved control. All of this results
in real cost savings for both organizations.” James Finn, Senior Director of E-
Commerce, Grainger.
4.3 HH Global
Innovation is adapting and respond to the challenge every time they show, HH
Global is a marketing services companies that is doing. Improving their rela-
tionship with the environment using bearable product alternatives. HH Global
is an important supplier and serves clients in over 54 countries. Associated with
world’s most influential brands, the company offers creative services, marketing
technology, and marketing production across print, digital, and retail media. Six
years ago, seeing the benefits of managing procurement processes and documen-
tation for purchase orders and invoices electronically, HH Global committed to
embracing electronic processes in its supply chain using SAP Ariba. Develop-
ing successful electronic transaction experience After the implementation of SAP
Ariba automated manual procurement processes, creating process improvements
for the operations team by moving to electronic purchase orders, e-invoicing, and
published and punch out catalogs. Improved payment cycles, with around 25%
of purchasing volume managed through SAP Ariba solutions SAP (2021).
5 Conclusions
– Lean procurement is the process of procurement without any extra activities
or waste, and this waste is the target to be reduced
– Five key elements are included in lean culture, value, the value stream, flow,
pull and perfection
– Planning need to be involved in the procurement process, using tools like
SOP and MRP will help to concentrate all the purchase
– Using SAP S/4Hana to create all the structure for the company, and all the
master data that is required will be necessary and with this data, errors can
be reduced, a material that is needed to use in the production area, will be
planned and required on time
– With SAP Ariba, the procurement process can be simplified, under one plat-
form with all the information from buyers and suppliers, making easy the
process of ordering, approval, receiving and invoicing, under just one pro-
gram that can be access multi-platform
– Not only the execution of the procurement process also the strategic planning
can be included in SAP Ariba
– When one company have a lean procurement process, they can start to think
in how to improve their own process and not only use their time in execute
the purchases orders
– SAP Ariba Helped BC to save accounts payable costs by 1 million annually,
and reduce errors in the process for procurement
– Grainger was able to automate 2.7millon documents per year by using SAP
Ariba
– HH Global improved payment cycles, with around 25% of purchasing volume
by using SAP Ariba
References
Afonso, T., Alves, A.C., Carneiro, P.: Lean thinking, logistic and ergonomics: synergetic
triad to prepare shop floor work systems to face pandemic situations. Int. J. Glob.
Bus. Competitiveness, 1–15 (2021)
Allal-Chérif, O., Simón-Moya, V., Ballester, A.C.C.: Intelligent purchasing: how artifi-
cial intelligence can redefine the purchasing function. J. Bus. Res. 124, 69–76 (2021)
Al-Mashari, M., Zairi, M.: The effective application of SAP R/3: a proposed model of
best practice. Logist. Inf. Manag. 13, 156–166 (2000)
Baecker, M.: Improving lean manufacturing with digital technologies. Auto Tech. Rev.
1, 12–13 (2012)
Chowdary, R.: SAP Ariba introduction (2019). https://blogs.sap.com/2019/11/24/

sap-ariba-introduction/. Accessed 23 Sept 2021
Cooke, D.P., Peterson, W.J.: SAP Implementation: Strategies and Results (1998)
Decker, W.W., Stead, L.G.: Application of lean thinking in health care: a role in emer-
gency departments globally. BioMed. Central 1, 161–162 (2008)
Dickersbach, J.T., Keller, G.: Production Planning and Control with SAP ERP. Galileo
Press (2011)
Francalanci, C.: Predicting the implementation effort of ERP projects: empirical evi-
dence on SAP/R3. J. Inf. Technol. 16(1), 33–48 (2001)
Hnaien, F., Dolgui, A., Louly, M.A.O.: Planned lead time optimization in material
requirement planning environment for multilevel production systems. J. Syst. Sci.
Syst. Eng. 17(2), 132 (2008)
Kappauf, J., Lauterbach, B., Koch, M.: Procurement logistics. In: Logistic Core Oper-
ations with SAP, pp. 59–138. Springer, Heidelberg (2011). https://doi.org/10.1007/
978-3-642-18204-4 4
Lestari, L.L.Y.D.: Application value stream mapping to minimize waste in aircraft
industry, 4 (2015)
Luzzini, D., Ronchi, S.: Organizing the purchasing department for innovation. Oper.
Manag. Res. 4(1), 14–27 (2011)
Methi, G., Tran, L.: Implementing digital transformation of procurement process (2019)
Rolia, J., Casale, G., Krishnamurthy, D., Dawson, S., Kraft, S.: Predictive modelling
of SAP ERP applications: challenges and solutions. In: Proceedings of the Fourth
International ICST Conference on Performance Evaluation Methodologies and Tools,
pp. 1–9 (2009)
Rother, M., Shook, J.: Learning to See: Value Stream Mapping to Add Value and
Eliminate Muda. Lean Enterprise Institute, Brookline (2003)
SAP: Integrating sap R Ariba R cloud solutions with SAP ERP and SAP S/4HANA R
(2017)
SAP: How can the right procurement technology make buying simpler for customers?
(2021). https://www.ariba.com/-/media/aribacom/assets/pdf-assets/grainger-bts.
pdf. (An optional note)
Stevens, T.: Proof positive. Ind. Week 247(15), 22–28 (1998)
Susanto, R.: Raw material inventory control analysis with economic order quantity
method. IOP Conf. Seri. Mater. Sci. Eng. 407, 012070 (2018). https://doi.org/10.
1088/1757-899x/407/1/012070
Tuomikangas, N., Kaipia, R.: A coordination framework for sales and operations plan-
ning (S&OP): synthesis from the literature. Int. J. Prod. Econ. 154, 243–262 (2014)
Womack, J., Jones, D.: Lean Thinking: Banish Waste and Create Wealth in Your
Corporation, vol. 48 (1996). https://doi.org/10.1038/sj.jors.2600967
Yarramalli, S.S., Ponnam, R.S.M., Rao, G.R.K., Fathimabi, S., Madasu, P.: Digital
procurement on systems applications and products (SAP) cloud solutions. In: 2020
Second International Conference on Inventive Research in Computing Applications
(ICIRCA), pp. 473–477 (2020)
An Approximate Solution Proposal
to the Vehicle Routing Problem Through
Simulation-Optimization Approach
Jose Antonio Marmolejo-Saucedo(B) and Armando Calderon Osornio

jmarmolejo@up.edu.mx
Abstract. This study presents a proposal to design a resilient supply

chain for a company in the fuel sector in Mexico. The constant variation
in the geographic location and order quantities of the demand points
increases the operating and distribution costs of the supply network. For
this reason, the use of a dynamic simulation-optimization technique using
specialized software is proposed to test various network configuration
proposals. The location of a new distribution center and a vehicle routing
proposal are analyzed to optimize fuel distribution.
Keywords: Simulation-optimization · Distribution systems · Vehicle

routing problem · Supply chain resilience
1 Introduction and Literature Review
The distribution network, in the field of supply chain management, refers to the
steps that a product follows, from when it is received from the supplier until it is
made available to the customer. Distribution is a key factor in the profitability
of a company, as it has a direct impact on the cost and the consumer experience.
In this sense, many companies base their business model on the design of the
logistics network. It seeks to optimally configure all physical resources, which can
include a distribution center, a transport fleet, warehouses, platform crossings,
production plants or coordination with suppliers.
This study is carried out for a company in the fuel sector in Mexico. The
company distributes fuel to more than 50 demand points throughout the country.
The company has 28 distribution trucks with capacities of 20,000, 31,000 and
43,000 liters. The problem focuses on designing a resilient supply chain that,
due to the constant changes in demand points, can adapt its vehicular routing
to reduce costs. The supply chain design problem has been extensively discussed
by various authors. However, resilience as a criterion for designing supply chains
has been on the rise in recent years. With the Covid-19 pandemic, many studies
on supply chain resilience have been published.

https://doi.org/10.1007/978-3-030-93247-3_62
Design of a Resilient Distribution System 635
In Sun et al. (2021), this paper addresses the complexity of the supply chain
necessary for the distribution of the Covid-19 vaccine due to the need to be trans-
ported through a reliable cold chain logistics network. The authors used the soft-
ware anyLogistix to perform a real-world case study in the country of Norway,
in specific, in the Oslo area and Viken county because these areas have higher
infection rates and a large number of municipalities where the Covid-19 vaccines
are received and cold-stored. The authors had to consider the vaccine supply
chain needs, limitations, and requirements to simulate the vehicle routing prob-
lem. In a dynamic-realistic simulation environment, authors based the research
for the model development and simulated a total of 12 scenarios where different
types of vehicles were used including small box vans, small-refrigerated trucks,
and unmanned aerial vehicles. As part of the parameters used, they compiled
information such as vehicle capacity, vehicle speed, transportation cost, CO2
emissions, shipping time, processing time (inbound and outbound shipment) and
specified the experiment duration used for periods of time. After all data was
processed and simulated, authors conclusion raised the importance of optimize
the fleet configuration and the individual vehicle’s routing and scheduling in
the cold chain logistics system to improve the responsiveness, cost-effectiveness,
environmental impacts, and vaccine equity distribution.
In Muravev et al. (2021), this research proposed a two-stage optimization of
intermodal terminals main parameters via using AnyLogic simulation platform.
Developing a set of hybrid simulation models to optimize the main parameters of
intermodal terminals, an agent-based system dynamics simulation model for an
evaluation of the implementation of dry ports to achieve the stable state of the
main parameters of intermodal terminals, and agent-based discrete-event simu-
lation model of the seaport - a dry port system to clarify the obtained averaged
benefits of the main dry ports parameters while the port managers make key
decisions on the investments into implementation of intermodal terminals. As
part of the study discussion, and after they had search in the field, this research
is the first study in the field of logistics and transportation in terminal planning.
Proposing the system of the main ten parameters to evaluate the operational
performance of the intermodal terminals, authors achieved the sustainable state
of the intermodal terminal. Studying the interaction of the parameters dynam-
ically by applying the established functional dependencies between the main
parameters of the dry port and the capability to scale the simulation model
through the proposed hybrid agent-based system dynamics model of the main
dry port parameters, authors’ conclusion statement indicates that on the stages
of real-time and operational management the discrete-event simulation model is
the most rational way to refine the values of the proposed parameters.
Another study is based on the optimization of the immunization supply
chains (iSC) in Guinea, Madagascar, and Niger since these three countries are
facing to different challenges including unused facilities, high logistics costs and
logistics constraints such as roads inaccessibility, absence of roads, insecurity,
not upgraded facilities, insufficient human resources, etcetera, see Prosser et al.
(2021). The main purpose of this paper remains on the optimization of the cold
636 J. A. Marmolejo-Saucedo and A. C. Osornio
supply chain, reducing operational costs and vaccine cost reduction per dose
reviewing the findings and scenarios identified by stakeholders at each country,
highlighting common design ideas and differences on redesign options. As part of
this methodology, the scenarios proposed include aspects of integration, chang-
ing supply chain levels and delivery frequency, ignoring administrative bound-
aries, and direct delivery. After primary and secondary data were collected and
cleaned, they used Supply Chain Guru software in Madagascar and Niger, and
anyLogistix in Guinea for modeling purpose to build a virtual representation of
the iSC physical components and operating policies. The results from these two
modeling packages were duly reviewed and compared finding that in Madagascar
significant savings could be achieved on operational costs in $390,000 USD. In
Guinea the best proposed scenario resulted on minimal operational costs reduc-
tion in $16,500 USD and a reduced costs by $0.003 per dose. Finally in Niger,
by eliminating regional tiers operational costs could be reduced at a maximum
amount of $1.07 MUSD; however, this scenario directly affects the costs at the
regional increasing operational costs, while scenario 4 reduces operational costs
by $220,000 USD by eliminating regional tiers and by establishing direct deliv-
ery from districts to health facilities and integrating oxytocin. As part of the
conclusions, authors explained that there are similarities in scenarios that stake-
holders selected, but results are directly affected by the different parameters and
limitations each country is facing. Efficiencies can be found through changes to
the iSC design (Prosser Wendy et al. 2021).
Using anyLogistix digital Supply Chain twin software Burgos and Ivanov
(2021) examined the impact of the COVID-19 pandemic on food retail supply
chains and their resilience in Germany due to lockdown/shutdown governmental
measures, border controls, inventory-ordering dynamics in the SC, and customer
behaviors, where in total 5 supply chains operational scenarios were simulated,
duly analyzed, and compared within each other. The anyLogistix SC simula-
tion and optimization software has been used for SC resilience analysis in the
past with effective results, and in this specific case, the development of discrete-
event simulation models based on the timeline of the COVID-19 outbreak in
Germany let to understand the challenges, parameters, needs and requirements
food retailers were facing in 2020. Simulation results allowed authors to recom-
mend the evaluation of the impact of disruptions, identification of worst-case
scenarios from the outbreak and analyze real-time reports to develop measures
to stabilize these situations. The elaboration of a process plan to respond to SC
challenges, and the involvement of key suppliers to enhance end-to-end visibility
to enable specific actions based on existing priorities, additional stock security
and redefining inventory strategies along with alternative supply sources and
logistics improvement or different transportation options. An example of other
applications could be see in Timperio et al. (2020). The humanitarian logistics
and logistics planning are fundamental elements for the effective provision of
humanitarian assistance responding to natural disasters all around the world.
The survival rate in disaster-hit areas is directly affected by both availability of
relief supplies and speed of distribution operations. This paper refers to real-life
natural disasters in Indonesia where a high-risk exposure is latent and humani-

tarian assistance is required. As part of the methodology, authors’ main goal was
to design a robust and resilient supply network for disaster preparedness identi-
fying three phases to be developed and analyzed as part of the integrated deci-
sion support framework. By applying an integration of multi-criteria decision-
making (MCDM), network optimization, stress-testing and discrete event simu-
lation using anyLogistix software, authors obtained the optimum solution of the
network configuration by proving that a total of 6 nodes were required to fulfill
the emergency response while covering inventory needs at an optimal material
transportation cost achieving network relief efficiently and effectively. As part of
the conclusions and for future works, authors invite to explore the expansion of
the geographic scope to other ASEAN countries to take a cross-border logistics
perspective what could provide a different point of view bringing new challenges
to be examined, analyzed, and resolved by using these or other methods.
Humanitarian SC strategy is a branch of the SC not well known by everyone,
the lack of studies and knowledge of the problems faced by NGOs at their Logis-
tics Clusters by having so many limitations for the correct performance of their
operations constitutes a new challenging opportunity to analyze humanitarian
SC operations. While understanding that humanitarian SC confront different
issues such as logistics disruptions, procurement of enough food, fuel, shelter,
and medical aid for survivance purpose and delivery limitations. In Stewart and
Ivanov (2019), the authors proposed a design redundancy framework dealing
with humanitarian SC design and risk management, by analyzing the logistics
cluster in Yemen using system dynamics simulation and network optimization.
By using anyLogistix software for the implementation of design redundancy
and the modeling of five SC designs, authors give in this paper a new oppor-
tunity to humanitarian SC managers to analyze multiple alternatives for oper-
ations continuity and resilience of the humanitarian SC strategy ensuring that
each alternative or contingency design sufficiently meets the KPIs of service
level, cost-effectiveness, agility, and lead time well before those plans must be
implemented. As part of the conclusions, authors highlight the importance to
implement other simulations techniques to enhance the decision-makers in under-
standing the impact of individual behaviors in the SC as well as the impact of
different critical events on the performance.
In Monostori (2021), the authors exalt the importance of the interrelation-
ships of robustness, complexity, and efficiency aspects of supply chains to sup-
port decisions related to their design and management. Their applicability is
illustrated by the results of a case study on distribution networks, indicating
that the three aspects can be balanced while mitigating the ripple effect. These
aspects are analyzed with the intention to find the way to balance them and to
compare different supply chain settings in a reliable way.
As part of the developed methodology author propose a holistic evaluation
of supply chains proposing the consideration of disruptions, KPIs, description of
the entire SC, quantitative characterization of structural and operational prop-
erties, determination of efficiency measures relying on analytical computations
or simulation and the investigation of the appropriateness of the achieved perfor-

mance. NodeXL network analysis tool was used to compute structural measures,
while operational and the efficiency measures were determined by using anyLo-
gistix software. After data was inputted, ran, and analyzed, author determined
that this proposed approach has a clear practical relevance by using the pro-
duction network research on natural disasters and pandemics, the methodology
and the framework can be used in the (re) design, analysis and management of
supply chains and can be made capable of acting as a digital twin of them.
Finally, some studies are focuses in e-commerce. For example, the Amazon
Effect is the powerful disruption that eCommerce has made on the retail market.
The term came about as a result of Amazon’s dominant role in the eCommerce
marketplace and leading the disruptive impact of the industry (Wikipedia).
Because of this effect, Supply Chain Management has become more compli-
cated and the daily interaction with the consumers has demand that the Supply
Chain become more agile to respond to market’s demand, more flexible to use
different distribution and transportation channels, and cover the technical needs
by upgrading their technical knowledge by using new applications and technolo-
gies that can help the company to be more nimble and not be left behind in this
fierce competition between enterprises.
In Chakraborty and Das (2021), they suggest combining different technolo-
gies that will help the Supply Chain Management System by coupling TOPSIS
(identify consumers preferences), RFID or IoT (collect and share data of events)
and Cloud Computing (managing of Big Data) where preferences of consumers
will receive prime importance letting companies to quickly adapt to the demand
responding to the volatility in the business and consumers requests. The use
of anyLogistix software is a proposal from authors with the intention to opti-
mize the four-tracking metrics mentioned in the paper (ELT, number of vehicles,
vehicle rates and safety stock) on a case-to-case basis.
2 Methodology
Procedures within the supply chain can be carried out, but by themselves they
do not ensure the fulfillment of the desired objectives. Another major function
in supply chain management needs to be considered; This function is control,
the process by which the planned performance is regulated or remains regulated
with respect to the desired objectives. The control process is one in which actual
performance is compared with planned performance and corrective action is
initiated to bring them closer together, whenever required.
The control process consists, in part, of monitoring changing conditions in
anticipation that they may need corrective action to align actual performance
with planned. The basic need for a control activity in supply chain management
focuses on future uncertainties that alter the performance of established param-
eters. Variations in design parameters will occur as multiple forces acting on the
conditions of any plan cannot be predicted with certainty.
This study proposes the use of specialized software that allows designing and
locating a new distribution center and subsequently generating vehicle routing.
The specialized software used was AnyLogistix that consider three methods to
optimize the supply chain: Greenfield analysis, Network optimization and Sim-
ulation. These stages are described below.
2.1 Greenfield Analysis

The steps to follow in this model are the following:
1. Find the number of locations/facilities in the supply chain and their suggested
location with basic input data: customer locations, demand, and products.
2. A multi-level Greenfield analysis is performed
3. Perform an accurate analysis of your supply chain by converting GFA results
into a network optimization or dynamic simulation model.
In this stage, the optimal location of a new distribution center is identified that
will reduce the costs of vehicle routing. The current location of the distribution
centers is shown in Fig. 1.
Fig. 1. Current distribution centers location
2.2 Network Optimization

The steps that this model follows are the following:
1. Determine the best supply chain network configuration.
2. Establish constraints to find a feasible solution.
3. Supply chain planning is done by period to optimize where and how much to
produce, store and ship.
4. The results of analytical optimization in a dynamic simulation model for
further analysis and a better understanding of the dynamics of the internal
supply chain.
2.3 Simulation
Time-dependent factors, random events, actual system behavior, and dynamic

interactions between elements of your supply chain are analyzed.
Processes are simulated within their locations: manufacturing processes,
resources, scheduling, distribution center processes, designs, and costs. The key
service level process indicators as well as the simulation of the operations are
shown in Fig. 2.
Fig. 2. Key performance indicators
2.4 Transportation Costs
The analysis presented below was performed using the mentioned software. A
relevant fact to take into account in the analysis is that thanks to the robustness
of this software it is possible to perform routing analysis not only taking into
account Euclidean distances but also real distances.
In Fig. 1, it is possible to observe the current situation of the company, in
which as a first instance and making a visual analysis, it is observed that the
distribution centers that currently have are too close to each other. The sec-
ond part of the process is network optimization that will yield the company’s
annual net profit. Transportation costs are taken into account including vehicle
capacities and restrictions.
Next, after performing the Greenfield Analysis, the optimal location of the
company’s DC is shown in Fig. 5.
2.5 Inventory Policy
There are various policies whose function is to regulate inventory taking into
account supply, demand, installed capacity and the reorder point. In this study,
the policy called min-max will be used, which consists of having a minimum
stock value in which the order is generated, likewise there is a maximum value
that represents a new level of target stocks of the reorder.
For the analysis of scenarios, the inventory policy mentioned above was con-
sidered, in which each scenario is a proposal of different combinations of the
current distribution centers and the one proposed.
In each case, the storage capacity of the proposed distribution center is mod-
ified so that in total, the sum of the capacities of all the distribution centers,
according to each scenario, of a total of 1200 cubic meters. The values of that
are shown in the table of the statistical data of the simulation where the profits,
costs and income are shown.
2.6 Data Collected

Currently, the vehicle fleet of the company is shown in Table 1.
Table 1. Vehicle fleet
Number of vehicles Capacity [liters]

8 43,000
7 31,000
6 20,000
7 66,000
The geographical location of the current distribution centers is shown in

Table 2
Table 2. Distribution centers location
Type Location Coordinates

1 Current DC1 25.661795 −103.52851
2 Current DC2 24.109798 −104.55611
The capacities of the two current distribution centers, as well as the type of
product they sell are shown in Table 3.
Table 3. Distribution centers capacity
Distribution centers Storage tanks Capacity (liters) Product

DC 1 1 40,000 92 octanes
1 60,000 98 octanes
3 100,000 Diesel
DC 2 3 100,000 Diesel
Fig. 3. Customers
Fig. 4. Storage tank
Figures 3 and 4 show the distribution in the Mexican territory of the com-
pany’s clients and the physical image of a delivery terminal, respectively.
3 Results
After performing all the stages described in the methodology section, the results
are as follows: The greenfield analysis stage resulted in the location of a new
distribution center shown in Fig. 5.
3.1 Scenarios Proposed for the Simulation
Different scenarios were tested, however the one that obtained the best results
considers three distribution centers, two that the company already had, and the
new distribution center proposed by the greenfield analysis.
In each distribution center, min-max inventory policies were simulated and
the levels specified in the Table 4 were used.
Fig. 5. Distribution center proposal
Table 4. Proposed distribution centers capacity
Distribution center Max Min

DC 1 and DC 2 300 m3 50 m3
New DC 900 m3 200 m3
Figure 6 shows the inventory level of the warehouses where the y-axis is the
amount of fuel and the x-axis is the time period.
In Fig. 7 we find the graph where the service level, which is the proportion
of the satisfied demand of the available stock, and the service level for income,
which is the proportion of the satisfied demand of the available stock with respect
to the total demand measured in income. On the y-axis we see how close it is to
demand.
Fig. 6. Inventory levels of the proposal scenario

Fig. 7. Service level of the proposal scenario
3.2 Vehicle Routing Analysis Scenarios
Two vehicular routing scenarios are developed, the current one and a proposed
one. The objective is to compare the distances and transport costs. The benefits
of opening a new distribution center are discussed.
Once the vehicle routing problem is solved, different routes can be observed in
each distribution center. Each transport will have different destinations depend-
ing on the capacity of the vehicle. The purpose is to deliver the product to
customers in the area through a route in order to save time and cost. In Fig. 8
we can see the 3 current routes. Type T31 vehicles with a capacity of 31,000
liters are used.
Figure 9 shows the proposed routes with T31 type vehicles with a capacity
of 31,000 liters. In this case, the vehicle is considered to leave and return from
distribution center obtained in the greenfield analysis.
Fig. 8. Current vehicle routing

Fig. 9. Vehicle routing of the proposal
Tables 5, 6 and 7 show the results obtained for the service level, profit and
revenue respectively. These results corroborate that the metrics with which the
company currently works have been exceeded.
Table 5. Service level of the proposal
Statistics name Value Unit

1 Service level 0.804 Ratio
Table 6. Benefits of the proposal
Statistics Value
1 Profit 231,034,134.825
2 Revenue 260,643,000
Table 7. Benefits of the proposed scenarios
Scenario Revenue Profit

1 $ 260,643,000.00 $ 231,034,134.83
2 $ 209,748,000.00 $ 180,762,184.25
3 $ 273,003,000.00 $ 240,515,458.80
4 $ 254,841,000.00 $ 226,931,365.07
4 Conclusions
The present study proposes the design of a resilient distribution network. The
location of current distribution centers and fluctuating demand are considered
for the proposed design. A vehicular routing is proposed that reduces operating
costs and increases the level of service and the expected benefits. The use of
a specialized tool in supply network design allows solving the problems of a
company through a simulation-optimization approach.
References
Burgos, D., Ivanov, D.: Food retail supply chain resilience and the Covid-19 pandemic: a
digital twin-based impact analysis and improvement directions. Transp. Res. Part E
Logistics Transp. Rev. 152, 102412 (2021). https://www.sciencedirect.com/science/
article/pii/S1366554521001794. https://doi.org/10.1016/j.tre.2021.102412
Chakraborty, B., Das, S.: Introducing a new supply chain management concept by
hybridizing topsis, IoT and cloud computing. J. Inst. Eng. (India) Ser. C 102(1),
109–119 (2021)
Monostori, J.: Mitigation of the ripple effect in supply chains: balancing the aspects
of robustness, complexity and efficiency. CIRP J. Manuf. Sci. Technol. 32, 370–
381 (2021). https://www.sciencedirect.com/science/article/pii/S1755581721000134.
https://doi.org/10.1016/j.cirpj.2021.01.013
Muravev, D., Hu, H., Rakhmangulov, A., Mishkurov, P.: Multi-agent optimization of
the intermodal terminal main parameters by using anylogic simulation platform: case
study on the Ningbo-Zhoushan port. Int. J. Inf. Manag. 57, 102133 (2021). https://
www.sciencedirect.com/science/article/pii/S026840121931789X. https://doi.org/
10.1016/j.ijinfomgt.2020.102133
Prosser, W., et al.: Redesigning immunization supply chains: results from three coun-
try analyses. Vaccine 39(16), 2246–2254 (2021). https://www.sciencedirect.com/
science/article/pii/S0264410X21003182. https://doi.org/10.1016/j.vaccine.2021.03.
037
Stewart, M., Ivanov, D.: Design redundancy in agile and resilient humanitarian supply
chains. Ann. Oper. Res. 1–27 (2019)
Sun, X., Andoh, E.A., Yu, H.: A simulation-based analysis for effective distribution
of covid-19 vaccines: a case study in Norway. Transp. Res. Interdisc. Perspect.
11, 100453 (2021). https://www.sciencedirect.com/science/article/pii/S2590198221
001585. https://doi.org/10.1016/j.trip.2021.100453
Timperio, G., Tiwari, S., Lee, C.K., Samvedi, A., de Souza, R.: Integrated decision sup-
port framework for enhancing disaster preparedness: a pilot application in Indonesia.
Int. J. Disaster Risk Reduction 51, 101773 (2020). https://www.sciencedirect.com/
science/article/pii/S2212420920312759. https://doi.org/10.1016/j.ijdrr.2020.101773
Hybrid Connectionist Models to Investigate
the Effects on Petrophysical Variables
for Permeability Prediction
Mohammad Islam Miah(&) and Mohammed Adnan Noor Abir
Department of Petroleum and Mining Engineering, Chittagong University

of Engineering and Technology, Chittagong 4349, Bangladesh
islam.m@cuet.ac.bd
Abstract. An accurate model of the reservoir rock permeability is a vital task

for reservoir characterization, geo-modelling and simulation studies during the
gas/oil fields development. The research objective is to investigate the effects of
log variables and develop the correlation while predicting permeability (K) for
clastic sedimentary rocks using machine learning tools. The log data-driven
hybrid models are developed by coupling petrophysical log variables and the
least square support vector machine. In the model development, the global
approach of coupled simulated annealing is adopted to optimize kernel
parameters, and also evaluated the performance of predictive models using
statistical indicators. According to the research findings, the improved model is
presented to estimate accurate K, which captured the most effecting log variables
such as resistivity and grammar ray for shaly sand formation. The developed
correlation is compared with other models, whereas it demonstrates the high
correlation coefficient and minimal error. The connectionist model development
scheme can be employed for further assessment to obtain reservoir properties in
reservoir characterization, and field development decisions with minimal costs
in a timely manner.
Keywords: Machine learning LSSVM-CSA Log variables Permeability

modeling Reservoir characterization
1 Introduction
The ability of fluid flow in porous media, named permeability is one of the most
important rock properties in reservoir characterization and quality assessment of
hydrocarbon-bearing reservoirs. An accurate absolute permeability profile permits the
precise simulation study for effective oil and gas reservoir management. To overcome
the challenges problems in petrophysics, the most reliable accurate way to estimate
permeability is by surface laboratory analysis with core analysis, downhole well test-
ing, and well-logging analysis. However, first, one technique is costly and normally
applied to reservoir core samples from only several wells of a field while well logging
is performed in a downhole with all wells [1, 2]. Additionally, rock permeability can be
obtained from the correlations using petrophysical rock properties. Due to the impact of
investigation attention, many scholars have attempt to develop the empirical

https://doi.org/10.1007/978-3-030-93247-3_63
648 M. I. Miah and M. A. N. Abir
correlations/models using a downhole testing log and core data for predicting rock
permeability in reservoir characterization [1–6]. In the last few decays, several studies
have presented the advantage of machine learning (ML) techniques with different
algorithms such as artificial neural network, artificial neural fuzzy interference system,
dynamic neural network, decision trees, random forest, least-square support vector
machine (LSSVM) for the engineering and science disciplines [7–10]. A rising ten-
dency is found among researchers to implement ML algorithms in petroleum engi-
neering to solve the problems of numerous fields, including reservoir engineering,
petroleum production optimization, reservoir characterization and enhanced oil/gas
recovery techniques [11, 12]. A list of machine learning algorithm and intelligent
computing-based studies is investigated to obtain permeability [13], still there is scope
for the detailed investigation to obtain rock permeability and variable ranking by
coupling LSSVM and log parameters, considering the literature review.
The key objectives of this research are: i) to investigate the effects of petrophysical
log variables while predicting rock permeability and find their relative impact to the
connectionist models with LSSVM, and ii) develop an improved correlation to estimate
permeability for the clastic sedimentary rocks.
2.1 Development of LSSVM-Based Hybrid Connectionist Model

A modified approach of the Support vector machine (SVM) algorithm is the LSSVM,
whereas it is one of the effective intelligent computing methods that can be imple-
mented to the data classification with the high dimensional variables, and obtain the
predicted variables [11, 14]. The detailed information relatd to the model formulation,
advantages and disadvantages can be read in the literature [14, 15]. The following
function for LSSVM can be defined by considering classic SVM mathematical for-
mulations [15, 16]:
yðxi Þ ¼ xT uðxi Þ þ b where xi 2 Rn and yi 2 R ð1Þ
Finally, the following equation can be adopted for the LSSVM function estimation
with weight factor (a) and bias term (b):
X
n
y¼ ai Kðx; xk Þ þ b ð2Þ
i
For the more computational simplicity and capability of solving problem with
nonlinear cases, the Gaussian radial basis kernel function (RBF) is utilized in the
hybrid scheme using LSSVM [17]. In the hybrid model, the coupled simulated
annealing (CSA) is a global optimization technique applied with LSSVM to optimize
the tuning parameters of r2 and c [18–20]. A generalized configuration of kernel
function-based LSSVM structure is applied to obtain rock permeability (K) in the study
which is shown in Fig. 1. A flow chart is demonstrated in Fig. 2 to exhibit the major
Hybrid Connectionist Models to Investigate the Effects 649
steps for the development of hybrid mdel by coupling field log data and machine
learning tool with LSSVM-CSA.
Fig. 1. A generalized structure for RBF-based LSSVM to predict K.
The total samples of 265 log data are randomly classified into two major groups
such as training (65%) and testing (35%) with Mathab programming environment by
trial and error process to construct the hybrid models with LSSVM-CSA. The data
samples for each variable (such as gamma-ray (GR), formation bulk density (RHOB),
rock resistivity (RT), neutron porosity (NPHI), and sonic compressional travel time
(DT)) are obtained from the anonymous gas field of the Bengal Basin. All data samples
are applied to investigate the hybrid model performance to estimate K by coupling log
data and machine learning approach.
2.2 Assessment of Hybrid Model Performance

There are major three statistical performance indicators, namely correlation coeeficient
(CC), root mean square error (RMSE), and average absolute percentage relative error
(AAPE) are adopted to examine the predictive model performance. The mathematical
equations of these indicators are listed below [12]:
Pn 2
K a;i K p;i
CC ¼ 1 Pn i¼1
2 ð3Þ
i¼1 K a;i K t;mean
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1X n 2
RMSE ¼ K a;i K p;i ð4Þ
n i¼1
Fig. 2. A flowchart to depict the major steps for developing the LSSVM-CSA models [12].
n
1X ðK a;i K p;i Þ
AARE ¼ 100 ð5Þ
n i¼1 K a;i
In above Eqs. 3 through 5, n indicates the total number of samples; K a represents

the actually measured variable of rock permeability (K), whereas K a;mean and K p are the
mean value of K a and predicted variable of permeabiliy, respectively. The predictive
model performance, accuracies and output reliability are examined on their magnitude
with statistical performance parameters. The model is performed best while it has a
high CC (close to one) and low statistical errors.
2.3 Predictor Variables Sensitivity Evaluation

A systematic approach is employed to investigate the hybrid model with petrophysical
parameters, and their effects on the permeability estimation by following the above
illustrated Figs. 1 and 2. Furthermore, the effects of predictor variables to obtain the
permeability, considered their relative performance with statistical indicators in the
connectionist models to rank the variable in the study. In the petrophysical predictor
variables ranking through the LSSVM-CSA based hybrid model, if the model out-
comes in high CC and low AAPE, and RMSE, indicating that the selected variable has
a high impact to obtain permeability with the predictive model. Finally, only the most
prominent variables are used to build an improved correlation for determining rock
permeability by coupling of field data and multivariable regression analysis [1].
3.1 Data Analysis

The studied samples are obtained from the shaly sand and tight sandstones of the
Bengal basin. The descriptive statistics of predictor variables such as true resistivity
(Rt), gammar-ray (GR), bulk density (RHOB), sonic travel time (DT), and neutron
porosity (NPHI) as well as target variable, permeability (K) is summarized in Table 2.
The studied 265 samples magnitudes are greatly varied with each sample due to the
burial depth as well as the composition and diagenesis of reservoir rocks.
Table 2. Summarized statistical performance with the studied samples.

Variables Max. Min. Mean St. Dev. Sample variance
DT (µs/ft) 97.40 73.89 90.91 4.35 18.96
GR (API) 126.12 76.28 98.31 9.25 85.50
Rt (ohm-m) 39.70 10.68 21.31 4.93 24.34
3
RHOB (g/cm ) 2.52 2.30 2.38 0.04 0.002
NPHI (v/v) 20.81 11.82 16.52 1.54 2.39
K (mD) 176.70 5.68 70.18 28.1 789.56
3.2 Hybrid Connectionist Model and Effects of Petrophysical Log

Variables
The Gaussian RBF function is adopted to perform the model development using
LSSVM with a global CSA optimization approach while predicting reservoir perme-
ability. By adopting the CSA algorithm, the optimized annealing parameters of c, b,
and r2 are 6.14e+08, 9.01, and 79.11, respectively. The optimized model with
LSSVM-CSA reveals the magnitude of CC with 99% and the statistical error is
nominal. To capture the variable characteristics, the correlation matrix between
petrophysical log variables and permeability of reservoir is recorded in Table 3.
Table 3. Correlation between log variables and rock permeability.

Variables Rt GR RHOB NPHI DT K
Rt 1
GR −0.261 1
RHOB −0.369 0.525 1
NPHI −0.328 0.068 −0.229 1
DT 0.258 −0.110 0.422 −0.598 1
K 0.269 −0.349 0.316 −0.835 0.618 1
Based on the tabulated results, the permeability has a strong positive relationship
with RHOB, DT as well as Rt, whereas K has a negative relationship with GR and
NPHI. Considering linear regression with trends of logarithmic, rock permeability have
a significant correlation with the formation bulk density and sonic travel time of the
studied log data, whereas it shows the correlation coefficient of 70%, which is shown in
Fig. 3.
130
Rock permeability, K (mD)
105
80
55
R² = 0.6995
30
5
2.28 2.33 2.38 2.43 2.48 2.53 2.58
Formation bulk density, RHOB (g/cm^3)
Fig. 3. A cross plot between rock permeability vs. bulk density with CC (R2).
To obtain permeability using classic regression analysis, the gamma-ray and rock
resistivity have minor contributions with the correlation coefficient (%) of 10 and 1,
respectively. Additionally, the optimized model with LSSVM-CSA is implemented to
accomplish petrophysical parametric sensitivity evaluation while obtaining K using the
same samples. To investigate the effects of the predictor variable for estimating K,
model schemes 1 through 5 are accomplished using a single variable for each model
case. To investigate the combined effects of two major predictor variables with RHOB
and DT are utilized in model scheme 6. The performance of the data-driven predictive
models with RMSE, AAPE (%) and CC (%) is listed in Table 4, whereas the graphical
demonstration of the predictive models is shown in Fig. 4.
Table 4. Relative impact of predictor variables to obtain K with LSSVM-CSA.

Model Predictor variable Tuning AAPE RMSE CC
scheme (s) parameters trining trining trining
c; r2 and b (Tesing) (Tesing) (Tesing)
(iteration)
1 GR 664100, 5.43; 52.93 26.22 16.93
−309.41 (14) (53.94) (26.16) (3.50)
2 RHOB 6710000, 18.26; 18.97 15.74 70.08
139.55 (12) (14.87) (13.22) (75.27)
3 NPHI 48.81, 1.3562; 38.61 22.24 33.98
0.18 (12) (45.15) (22.74) (40.35)
4 DT 2.84, 13.95; 31.58 21.77 37.96
−0.57 (11) (31.77) (22.95) (39.11)
5 Rt 785.61, 4.23; 60.24 27.22 15.11
6.12 (11) (47.81) (22.82) (14.36)
6 RHOB, DT 6520, 72.42; 17.65 14.24 76.55
13.96 (8) (13.72) (11.79) (78.02)
Model schemes 1 and 5 indicate less performance with the higher statistical error
and lower CC using GR and Rt variables to obtain K, whereas RHOB and DT have a
high impact on the permeability prediction using the LSSVM-CSA approach. It is
revealed that the NPHI variable has an intermediate impact to estimate K for the studied
sedimentary rocks. In the model development strategy with LSSVM-CSA, the most
significant predictor variables (higher to lower order) are RHOB, DT, NPHI, Rt and
GR while predicting rock permeability of shaly sand reservoir rocks.
By adopting only two significant log variables of RHOB and DT to estimate K,
model scheme 6 reveals excellent performance with low errors and high CC with 99.
550 and 0.9113 for the training and testing phases, respectively, compared with model
schemes 1 through 5. Moreover, these two petrophysical log variables are essential in
capturing actual scenarios of rocks bulk density related to the rock void space and
compressional wave propagation into the porous formation, while predicting log-based
reservoir permeability.
APRE (Training) APRE (Testing)

70
CC (Training) CC (Testing)
Performance error, APRE and CC (%)
50
30
10
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Fig. 4. A comparison of statistical error AAPE and CC performance for the studies model scheme.
The following log data-based model is intended with the magnitude of CC and
RMSE using reliable 256 data samples to obtain K using power-law based multivariate
regression analysis:
DT 3:49
K ¼ 10:79 ðPower law : RMSE ¼ 10:69; CC ¼ 82:47% Þ ð6Þ
RBOB16:13
Furthermore, the obtained correlation of rock permeability can be employed to

investigate the reservoir quality evaluation and geomodelling for simulation studies in
reservoir characterization during the oil/gas fields development as well as petroleum
exploration stages.
4 Conclusions
The data-driven hybrid modes are developed to predict the vital rock properties of
permeability by coupling the petrophysical log variables (such as rock resistivity with
gamma-ray, neutron porosity, formation bulk density and sonic travel time) and radial
basis kernel function-based LSSVM-CSA. In the study, the statistical parameters of
AAPE, RMSE and CC are applied to investigate the model performance and also to
assess the variable effects to obtain rock permeability. The major research outcomes of
this investigation are listed as follows:
• Based on the simulation study with data-driven hybrid models, the rock bulk
density and sonic compressional travel time are the most influential predictor
variables to estimate the permeability of reservoir rocks.
• A log-based correlations is presented to estimate rock permeability by adopting the
rock bulk density related to the void space, and/or sonic compressional travel time,
and these exhibit a high correlation coefficient, CC with 80–90%.
• The researchers can be utilized the proposed model for the rock simulation studies
in reservoir characterization. Also, it can be compared and verified with ensemble
machine learning techniques using big data samples.
Acknowledgements. The authors would like to thanks the Department of Petroleum and
Mining Engineering (PME), and Directorate of Research & Extension (DRE), Chittagong
University of Engineering & Technology (CUET), Bangladesh for providing research grants
(Project No. CUET/DRE/2020-2021/PME-001), and experimental facilities to accomplish the
project study.
References
1. Balan, B., Mohaghegh, S., Ameri, S.: State-of-the-art in permeability determination from
well log data: Part 1-A comparative study, model development. In: SPE Eastern Regional
Meeting 1995 Sep 17. Society of Petroleum Engineers (1995)
2. Hamada, G.M., Elshafei, M.A.: Neural network prediction of porosity and permeability of
heterogeneous gas sand reservoirs using NMR and conventional logs. Nafta 61(10), 451–
465 (2010)
3. Lim, J.-S, Kim, J.: Reservoir porosity and permeability estimation from well logs using
fuzzy logic and neural networks. In: SPE Asia Pacific Oil and Gas Conference and
Exhibition. OnePetro (2004)
4. Moghadam, J., Naseryan, K., Salahshoor, Kharrat, R.: Intelligent prediction of porosity and
permeability from well logs for an Iranian fractured carbonate reservoir. Petrol. Sci. Technol.
29(20), 2095–2112 (2011)
5. Alobaidi, D.A.: Permeability prediction in one of iraqi carbonate reservoir using hydraulic
flow units and neural networks. Iraqi J. Chem. Petrol. Eng. 17(1), 1–11 (2016)
6. Miah, M., Sohrab, Z., Ahmed, S.: Log data-driven model and feature ranking for water
saturation prediction using machine learning approach. J. Petrol. Sci. Eng. 194, 107291
(2020)
7. Vardian, M., et al.: Porosity and permeability prediction from well logs using an adaptive
neuro-fuzzy inference system in a naturally fractured gas-condensate reservoir. Energy
Sourc. Rec. Util. Environ. Eff. 38(3), 435–441 (2016)
8. Alhendawi, K.M., Al-Janabi, A.A., Jehad, B.: Predicting the quality of MIS characteristics
and end-users’ perceptions using artificial intelligence tools: expert systems and neural
network. In: International Conference on Intelligent Computing & Optimization. Springer,
9. Basnin, N., Lutfun, N., Hossain, M.S.: An integrated CNN-LSTM model for micro hand
gesture recognition. In: International Conference on Intelligent Computing & Optimization.
10. Miah, M., Ahmed, S., Sohrab, Z.: Connectionist and mutual information tools to determine
water saturation and rank input log variables. J. Petrol. Sci. Eng. 190, 106741 (2020)
11. Miah, M.I.: Predictive models and feature ranking in reservoir geomechanics: a critical
review and research guidelines. J. Nat. Gas Sci. Eng. 82, 103493 (2020)
12. Miah, M.I.: Improved prediction of shear wave velocity for clastic sedimentary rocks using
hybrid model with core data. J. Rock Mech. Geotech. Eng. 13(6), 1466–1477 (2021)
13. Okon, A.N., Adewole, S.E., Uguma, E.M.: Artificial neural network model for reservoir
petrophysical properties: porosity, permeability and water saturation prediction. Mod. Earth
Syst. Environ. 7(4), 2373–2390 (2020). https://doi.org/10.1007/s40808-020-01012-4
14. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3),
199–222 (2004)
15. Suykens, J.A., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares
Support Vector Machines. World Scientific Publishing, Singapore (2020)
16. Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Networks 10
(5), 988–999 (1999)
17. Kondori, J., et al.: Hybrid connectionist models to assess recovery performance of low
salinity water injection. J. Petrol. Sci. Eng. 197, 107833 (2021)
18. Pelckmans, K., et al.: LS-SVMlab: a matlab/c toolbox for least squares support vector
machines. In: Tutorial. KULeuven-ESAT. Leuven, Belgium. Oct;142, 1–2 (2002)
19. Parvizi, S., Kharrat, R., Asef, M.R., Jahangiry, B., Hashemi, A.: Prediction of the shear wave
velocity from compressional wave velocity for Gachsaran formation. Acta Geophys. 63(5),
1231–1243 (2015)
20. Xavier-de-Souza, S., Suykens, J.A., Vandewalle, J., Bollé, D.: Coupled simulated annealing.
IEEE Trans. Syst. Man Cybern. Part B (Cybernetics). 40(2), 320–335 (2009)
Sustainable Environmental, Social
and Economics Development
Application of Combined SWOT and AHP
Analysis to Assess the Reality and Select
the Priority Factors for Social and Economic
Development (a Case Study for Soc Trang
City)
Dang Trung Thanh(&) and Nguyen Huynh Anh Tuyet
Thu Dau Mot University, Thu Dau Mot, Binh Duong, Vietnam
thanhdt@tdmu.edu.vn
Abstract. In this study, an analysis of Strengths, Weaknesses, Opportunities

and Threats (SWOT) based on quantitative Analytic Hierarchy Process
(AHP) has been conducted systematically to propose priorities from SWOT
factors. This method compares identified SWOT factors in pairs. After that, the
eigenvalue method applied in the AHP is used to analyze comparison matrices
to indentify priorities and assign the importance of SWOT factors. The results
have selected the proposed priority factors based on their weight for the overall
socio-economic development goal of Soc Trang city in the period from 2021 to
2030, including following factors: taking advantage of investment capital
(16.8%), promoting advantages of water resources (14.6%), paying attention to
future challenges of changing watershed flows (5.3%), overcome the shortage of
water in some places (4.5%) and low cultivation techniques (4.5%). The
application of SWOT and AHP techniques is useful in studying overall socio-
economic development master plans and help planners to analyze and forecast
factors affecting development requirements better.
Keywords: SWOT AHP Social-economic development Soc Trang city
1 Introduction
A good socio-economic development plan for a region, province or district is the result
of accurate calculation and prediction of internal and external resource factors. SWOT
(Strengths, Weaknesses, Opportunities, and Threat) analysis is an well-knowtechnique
to analysis both internal strengths, weaknesses and external opportunities and threats
[1, 2].
SWOT analysis commonly used attain a systematic approach and support for
making decisions in consideration of internal and external factors. However, SWOT
does not determine the importance of factors or to evaluate alternatives to factors [3].
Soc Trang city was established under Decree No. 22/2007/ND-CP dated February
8, 2007 of the Government based on the whole area and the population of Soc Trang
town established in April 1992 when Soc Trang province was re-established [4].

https://doi.org/10.1007/978-3-030-93247-3_64
660 D. T. Thanh and N. H. A. Tuyet
With advantages of its location, in the center of road traffic hubs such as National
Highway 1, Highway 60, two national highways of Southern Hau River and Quan Lo -
Phung Hiep, it is easy for Soc Trang city to connect with major economic centers (Can
Tho city, Ho Chi Minh city) and the southwestern provinces. Besides, the waterway of
Maspero River, Saintard River runs to Dai Ngai so it is very easy to access Cai Con and
Cai Cui ports in the north and Tran De seaport in the south. Soc Trang City is a
political, economic, cultural, scientific and technical center, an important economic
exchange hub of the province [4].
In order to explore the benefits of SWOT-AHP techniques in studying strategic
decisions for socio-economic development master planning, the study of “Application
of combined SWOT and AHP analysis to assess the reality and select the priority factors
for social and economic development (a case study for Soc Trang city)” is conducted.
2 Research Methods
2.1 Data Collection Method
Secondary information was collected from the Office of the People’s Committee of Soc
Trang City and other departments and sectors such as Planning - Finance, Natural
Resources and Environment, Agriculture and Rural Development, Infrastructure Eco-
nomics,… It includes the natural, socio-economic conditions that impact on manage-
ment and socio-economic development status in the past 10 years.
Field survey was conducted from March 20 to April 4, 2020 for 9/10 wards of Soc
Trang city, specifically wards: 2, 3, 4, 5, 6, 7, 8, 9 and 10. For ward 1, the survey was
not conducted due to the small total natural area (only 29.32/7,606.86 ha, accounting
for 0.39% of the natural area of Soc Trang city). This is the only ward with no
agricultural land, the entire natural area of the ward is used for residential land,
infrastructure land and non-agricultural purposes.
Exchanging opinions of managers and experts, scientists who expertise in eco-
nomics, natural resources, urban management and construction.
2.2 SWOT Analysis

SWOT is the abbreviation set of the first letters of the words: Strengths (S), Weak-
nesses (W), Opportunities (O) and Threats (T):
SWOT technique is used as a systematic approach in making decision by analyzing
the external and internal environment [1, 5, 6].
Strategic factors are the most important internal and external factors for the future
of a research project. These factors are grouped into four SWOT groups: strengths,
weaknesses, opportunities, and threats. The best fit between the internal and external
factors is selected by SWOT analysis [6]. Furthermore, the chosen strategy must also
be consistent with the decision maker’s current and future development goals [7].
SWOT consider systematically and comprehensively factors relating to a new tech-
nology, product, management or planning. The sequence of the SWOT analysis is
shown in Fig. 1.
Application of Combined SWOT and AHP Analysis 661
Environmental scan
Inernal analysis External analysis
Strengths Weaknesses Oppertunities Threats
S W
O T
“SWOT matrix”
Fig. 1. Implemention progress diagram [8]
2.3 Hierarchical Analysis Process (AHP)
Concept. AHP is a decision making method, based on the order of the criteria given
by this method, the decision maker can make the most reasonable final decision [9–11].
AHP (Analytical Hierarchy Process) is a technique supporting to make decision by
providing the order of sollutions and recomment the best decision. By applying AHP,
decision makers can understand their problems and discover the best thing for them.
AHP is closely associated with decision criteria and decision makers will apply pair-
wise comparison method to indentify the trade-offs between goals [12].
Steps of Performing
Analysis: Select the criteria that need to be studied, hierarchize and remove the less
important ones. Each criterion is divided into an appropriate level, which is analyzed
based on their importance. When finished, the process will be done iteratively to make
the matter change objectively. They are then included in the matrix to manage the
problem vertically and horizontally under the standard hierarchy of weights. When
increasing the number of criteria, the importance of these indicators decreases and
makes the research problem more accurate. Weight: Each criterion is a weight, based
on its importance in the whole system, we can determine the weight of each criterion
through the expert system. The sum of all criteria must be 100% or equal to 1. This
weight is the importance of each criterion or means how much it affects the research
problem.
Evaluate: Select and compare this criterion with other criteria to evaluate how they
affect on our research problem.
Selection: After evaluating the research criteria, comparing, selecting and eliminating
the criteria that have little influence on the research problem in order to best fit the
requirements [12].
How to Calculate AHP: The questions asked are how many times X1 is more prof-
itable, satisfying, contributing, surpassing,… than X2, X3, Xn,…
X1 X2, X3,…, Xn are the factors affecting the object. The questions are very
important, it must reflect the relationship between the components of one level with the
properties of the higher level [12].
Using a rating scale from 1 to 9 as shown in Table 1.
Table 1. Saaty’s relative importance classification

Scale Concept Explaination
1 Equal importance Criteria contribute equally to the objective
3 Weak importance Experience and judgement slightly favor one
criterion over another
5 Strong importance Experience and judgment strongly favor one
criterion over another
7 Demonstrated importance Criterion is strongly favored and its dominance is
demonstrated in practice
9 Absolute importance Importance of one criterion over another affirmed
on the highest possible order
2,4,6,8 Intermediate levels between Used to represent compromise between the
the above levels priorities listed above
Expert Opinion Matrix

0 1
X1 X2 ... Xn
X1 B
B a11 a12 ... a1n C
C
X2 B
B a21 a22 ... a2n C
C ð1Þ
@... ... ... ...A
Xn an1 an2 ... ann
Where aij is the level of assessment between the first criterion and the jth criterion.
aij > 0, aij = 1/aij, aii = 1.
Let wii be the vector weight of the ith factor, wii is calculated according to the
following formula:
aii
wii ¼ P
n ð2Þ
ani
i¼1
Then we get matrix 2 as follows (Table 2):

Table 2. Weight matrix

X1 X2 X3 X4 X5 X6 X7
X1 w11 w12 w13 w14 w15 w16 w17
X2 w21 w22 w23 w24 w25 w26 w27
X3 w31 w32 w33 w34 w35 w36 w37
X4 w41 w42 w43 w44 w45 w46 w47
X5 w51 w52 w53 w54 w55 w56 w57
X6 w61 w62 w63 w64 w65 w66 w67
X7 w71 w72 w73 w74 w75 w76 w77
Table 3. Average weight matrix

Factors Weights
X1 w1
X2 w2
X3 w3
…… ……
To calculate the weights of the factors, we need to build a matrix that is the average
of the vector weights from the matrix 2, we will have a matrix 3 like Table 3.
For matrix 3 to be reliable, we need to calculate the consistency ratio (CR):
CI
CR ¼ ð3Þ
RI
CI: Consistency Index.

RI: Random Index. RI is determined from a given table.
kmax n
CI ¼ ð4Þ
n1
kmax : eigenvalues of the comparison matrix.

n: number of factors.
The eigenvalues of the comparison matrix are calculated according to the following
formula:
1 X w0i
kmax ¼ ð5Þ
n wi
Table 4. The random index corresponds to the number of factors (RI)

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
RI 0.00 0.00 0.58 0.90 1.12 1.24 1.32 1.41 1.45 1.49 1.51 1.48 1.56 1.57 1.59
(Source: [13])
Where w0i calculated according to the following formula (assuming there are 7
factors).
w11 w12 w13 w14 w15 w16 w17 w1 w'1

w21 w22 w23 w24 w25 w26 w27 w2 w’2
w31 w32 w33 w34 w35 w36 w37 w3 w’3
w’i =Matric 1x
w41 w42 w43 w44 w45 w46 w47 x w4 = w’4
Matric 3 =
w51 w52 w53 w54 w55 w56 w57 w5 w'5
w61 w62 w63 w64 w65 w66 w67 w6 w'6
w71 w72 w73 w74 w75 w76 w77 w7 w'7
2.4 Combined Model SWOT - AHP

The three steps of proposed method include:
Firstly, listing the internal factors (weaknesses and strengths) and external factors
(threats and opportunities) for a strategic planning.
Secondly, comparing in pair to indentify the weights of each SWOT group.
Finally, applying the AHP to determine the relative priorities of each factor in the
SWOT groups. Then, multiplying the local weights by the specific group weight to
obtain the overall weight.
Researched a SWOT analysis combined with AHP to order the factors and to
provide a quantitative information in strategic planning [5, 14]. This method has been
commonly studied and applied in differnce areas, for example evaluating management
strategies for forestland estate [3], marketing plan for tourism revival strategic in Sri
Lanka [15], strategic planning for management of natural resource [16], integrated
management strategy of water resource in Mozambique [17].

3.1 Evaluation and Selection of Factors
The main objective in using AHP in combination with the SWOT framework is to
evaluate all the SWOT factors by assigning weights corresponding to their importance
[3]. In this research, the structure of AHP results from the SWOT matrix and is
structured into three parts: (a) goal need to be achieved, (b) the four SWOT groups, and
(c) the factors of each SWOT group (sub-criteria). A hierarchical structure of the
SWOT matric is shown in Fig. 2.
In this study, we carried out the SWOT analysis combined with AHP for the study
area, the socio-economic situation of Soc Trang city. The comparison scale is used to
compare in pair and determine the relative importance of each pair of SWOT factors.
By digitizing the SWOT frame via AHP and the obtained aggregated matrix, the
weights of the analyzed groups and factors is derived.
Goal
SWOT Strengths Weaknesses Opportunities Threats

group (S) (W) (O) (T)
SWOT
factor S1 S2 Sn W W W O O On T1 T2 Tn
1 2 n 1 2
Fig. 2. Hierarchical structure of the SWOT matric [17]
To create a strategic management model based on SWOT-AHP, we have estab-

lished the three-step model: determing initial task; modifying SWOT model and
building evaluation model and buiding evaluation model.
Firstly, a SWOT analysis is conducted and the matrix is structured. The external
and internal factors of research object are identified to establish the SWOT matrix
(Table 5).
AHP technique is applied to the SWOT matrix. First, comparisons in pair of the
SWOT groups are made by using the comparison scale of 1–9 (Table 1). Also Table 4
shows the results of the comparison. Second, all SWOT groups are considered. All
pairwise comparisons were done by experts (Tables 6, 7, 8, 9 and 10).
Table 5. SWOT matrix

Strengths (S) Opportunities (O) Weaknesses (W) Threats (T)
(S1) (O1) (W1) (T1)
(S2) (O2) (W2) (T2)
(S3) (O3) (W3) (T3)
(S4) (O4) (W4) (T4)
(W5) (T5)
Table 6. Pairwise comparisons of SWOT factor

SWOT S W O T Importance degrees
S 1.000 3.000 1.000 3.000 0.366
W 0.333 1.000 0.250 2.000 0.143
O 1.000 4.000 1.000 2.000 0.371
T 0.333 0.500 0.500 1.000 0.120
CR = 6.32
Table 7. Comparison matrix of strengths group

Strengths S1 S2 S3 S4 S5 Weights
(S1) Location, convenient for 1.000 0.700 0.200 0.500 0.167 0.056
transportation
(S2) Good quality land, potential 1.429 1.000 0.167 0.200 0.167 0.054
for production conversion
(S3)Abundant water resources 5.000 6.000 1.000 3.000 2.000 0.398
(S4)Production diversity: rice, 2.000 5.000 0.333 1.000 0.200 0.136
fruit trees, aquatic products
(S5) Abundant labor 6.000 6.000 0.500 5.000 1.000 0.356
CR = 7.32
Table 8. Comparison matrix of weaknesses group

Weaknesses W1 W2 W3 W4 W5 W6 Weights
(W1) Soft ground 1.000 2.000 0.200 0.200 0.500 0.250 0.056
(W2) Acidic soil in 0.500 1.000 0.167 0.167 0.500 0.200 0.040
some areas
(W3) Lack of fresh 5.000 6.000 1.000 1.000 6.000 2.000 0.316
water in the dry season
(W4) Farming 5.000 6.000 1.000 1.000 6.000 2.000 0.316
techniques are not high
(W5) Untrained labor 2.000 2.000 0.167 0.167 1.000 0.200 0.065
accounts for a high
percentage
(W6) Labor cost is 4.000 5.000 0.500 0.500 5.000 1.000 0.206
quite high
CR = 2.98
Table 9. Comparison matrix of oppertunities group

Oppertunities O1 O2 O3 O4 Weights
(O1) Investment from the Government 1.000 2.000 3.000 3.000 0.451
(Resolution 120)
(O2) Deep international integration 0.500 1.000 2.000 2.000 0.261
(O3) Science and Technology 0.333 0.500 1.000 2.000 0.169
Development
(O4) Expanded export market 0.333 0.500 0.500 1.000 0.119
CR = 2.63
Finally, the overall priority score calculation of the SWOT factors is performed and
shown in Table 11.
Based on the analysis, “Investment from the government (according to Resolution
120)” from the Opportunities group is the most important factors in the SWOT with an
overall priority value of 0.168 (16.8%). Other priority factors are: Abundant water
resources (due to the forecast that freshwater resources for the Mekong Delta in general
are forecast to be extremely important) accounting for 0.146 (14.6%); the change in
upstream flow accounting for 0.053 (5.3%); Lack of fresh water in the dry season
making up 0.045 (4.5%) and low farming techniques making up 0.045 (4.5%).
Table 10. Comparison matrix of threats group

Threats T1 T2 T3 T4 Weights
(T1) Change upstream flow 1.000 2.000 2.500 3.000 0.440
(T2) Climate change 0.500 1.000 1.500 2.000 0.247
(T3) Diseases tend to increase 0.400 0.667 1.000 2.000 0.192
(T4) Higher and higher product quality requirements 0.333 0.500 0.500 1.000 0.121
CR = 1.26
Table 11. Overall priority scores of SWOT factors

SWOT Group SWOT factors Factor Overall
priority priority priority of
factor
S 0.366 (S1) Location, convenient for 0.056 0.021
transportation
(S2) Good quality land, potential for 0.054 0.020
production conversion
(S3) Abundant water resources 0.398 0.146
(S4) Production diversity: rice, fruit 0.136 0.050
trees, aquatic products
(S5) Abundant labor 0.356 0.130
W 0.143 (W1) Soft ground 0.056 0.008
(W2) Acidic soil in some areas 0.040 0.006
(W3) Lack of fresh water in the dry 0.316 0.045
season
(W4) Farming techniques are not 0.316 0.045
high
(W5) Untrained labor accounts for a 0.065 0.009
high percentage
(W6) Labor cost is quite high 0.206 0.029
(continued)
Table 11. (continued)

SWOT Group SWOT factors Factor Overall
priority priority priority of
factor
O 0.371 (O1) Investment from the 0.451 0.168
Government (Resolution 120)
(O2) Investment from the 0.261 0.097
Government (Resolution 120)
(O3) Science and Technology 0.169 0.063
Development
(O4) Expanded export market 0.119 0.044
T 0.120 (T1) Change upstream flow 0.440 0.053
(T2) Climate change 0.247 0.030
(T3) Diseases tend to increase 0.192 0.023
(T4) Higher and higher product 0.121 0.014
quality requirements
3.2 Synthesize and Propose Priority Factors
Regarding to Investment Capital: (Effectively Promote the Opportunity Factor -

O1)
Take advantage and ultilize effectively ODA and NGO capital resources for the
Mekong Delta in terms of agricultural development, drought and saltwater resistance.
Socialize the investment in the construction of infrastructure works, combine many
invesment forms, implement the motto “The State and the people work together” to
encourage investment in infrastructure construction in the forms of Build-Operate-
Transfer (BOT) and Build-Transfer (BT).
Administrative reform, well implement investment incentive policies, create a
favorable and open investment environment, actively promote investment to attract
domestic and foreign investors.
Regarding to Administrative Management: (Promoting Strengths - S3 While
Overcoming Weaknesses - W3)
Administrative management, governance and reform are one of the motivational forces
for economic development with the goal of promoting the strengths of abundant water
resources (S3), and limiting the weaknesses of the lack of fresh water in the dry season
in some places (W3).
Strengthening measures to control environmental pollution, raising awareness and
responsibility of all levels, sectors, organizations and the community in the manage-
ment and protection of natural resources and environment, reducing pollution, sus-
tainable development.
Regarding to Science and Technology: (Overcoming Weaknesses - W4)

Encourage enterprises to invest in technological innovation, use advanced techniques
to improve farming techniques (contributing to solving the weight of weaknesses (W4)
of low farming techniques), product quality, increase competitiveness in the market.
Guide enterprises to build a system of product quality standards according to
international standards (ISO).
Regarding to Development Cooperation (Limiting External Challenges due to
Changes in Upstream Flows – T1)
Provinces and the cities in the region and the country coordinate to solve the problem
of economical exploitation and use of water sources to serve daily life and directly
serve agricultural production and aquaculture. To build a system of canals and fresh-
water reservoirs to regulate and limit floods in the rainy season and at the same time
reserve water to combat drought and salinity in the dry season.
Strengthen international cooperation, especially with countries in the upstream of
the Mekong River, in the strategy of exploiting and protecting water sources, envi-
ronment and livelihoods of people in the upstream, along both sides of the stream and
downstream under international environmental and resource protection programs.
4 Conclusion
Applying the SWOT technique in combination with AHP has provided the basis to
help us select the priority factors to consider in planning the socio-economic devel-
opment of Soc Trang City in the phase of 2021 to 2030. Including the following
factors, in order: Enlisting investment capital; bring into play advantages in terms of
water resources; pay attention to future challenges posed by changes in watershed
flows; overcome water shortage in some places, and low farming techniques.
Priority weights calculated in the SWOT and AHP can be used as a management or
support approach to an important decision. The results can be used to form a set of
target options suitable for strategic decisions. In future, the use of fuzzy logic frame-
works to gether with the AHP method may be researched to get more efficient analysis
of uncertain cases. Besides, other multi-criteria decision-making technique can be
applied to compared with the result in this study.
References
1. Shinno, H., Yoshioka, H., Marpaung, S., Hachiga, S.: Quantitative SWOT analysis on global
competitiveness of machine tool industry. J. Eng. Des. 17, 251–258 (2006). https://doi.org/
10.1080/09544820500275180
2. Houben, G., Lenie, K., Vanhoof, K.: A knowledge-based SWOT-analysis system as an
instrument for strategic planning in small and medium sized enterprises. Decis. Support Syst.
26, 125–135 (1999)
3. Kangas, J., Kurttila, M., Kajanus, M., Kangas, A.: Evaluating the management strategies of a
forestland estate-the S-O-S approach. J. Environ. Manag. 69, 349–358 (2003). https://doi.
org/10.1016/j.jenvman.2003.09.010
4. Soc Trang City People’s Committee: Summary report on adjustment of land use planning in
Soc Trang city up to 2020, approved by the Provincial People’s Committee under Decision
No. 2023/QD-UBND dated 23 July 2019
5. Kurttila, M., Pesonen, J., Kangas, M., Kajanus, M.: Utilizing the analytic hierarchy process
(AHP) in SWOT analysis-a hybrid method and its application to a forest-certification case.
For. Policy Econ. 1, 41–52 (2000). https://doi.org/10.1016/S1389-9341(99)00004-0
6. Kangas, J., Pesonen, M., Kurttila, M., Kajanus, M.: A’WOT: integrating the AHP with
SWOT analysis. In: 6th ISAHP 2001 Proceedings, pp. 189–198. Berne, Switzerland (2001)
7. Kajanusa, M., Kangas, J., Kurttila, M.: The use of value focused thinking and the A’WOT
hybrid method in tourism management. Tour. Manag. 25, 499–506 (2004)
8. Kahraman, C., Demirel, N.Ç., Demirel, T., Ateş, N.Y.: A SWOT-AHP application using
fuzzy concept: e-government in Turkey. In: Kahraman, C. (eds.) Fuzzy Multi-Criteria
Decision Making. Springer Optimization and Its Applications, vol. 16. Springer, Boston
(2008). https://doi.org/10.1007/978-0-387-76813-7_4
9. Saaty, R.W.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980)
10. Saaty, R.W.: The analytic hierarchy process - what it is and how it is used. Math. Model. 9
(3–5), 161–176 (1987). https://doi.org/10.1016/0270-0255(87)90473-8
11. Saaty, T.L., Vargas, L.G.: Models, Methods, Concepts & Applications of the Analytic
Hierarchy Process, p. 343. International Series in Operations Research & Management
Science. Springer, Heidelberg (2012). ISBN: 978-1-4614-3596-9
12. Loi, N.K., Hoang, T.T., Van Trai, N., Truong, H., Khoa, N.M.: GIS & AHP application
builds a map of adaptation to brackish water shrimp farming in Tuy Phong district, Binh
Thuan province (2010)
13. Berrittella, M., Certa, A., Enea, M., Zito, P.: An analytic hierarchy process for the evaluation
of transport policies to reduce climate change impacts. Fondazione Eni Enrico Mattei
(Milano), p. 26 (2007). http://hdl.handle.net/10419/74293
14. Gao, C., Peng, D.: Consolidating SWOT analysis with nonhomogeneous uncertain
preference information. Knowl.-Based Syst. 24, 796–808 (2011). https://doi.org/10.1016/j.
knosys.2011.03.001
15. Wickramasinghe, V., Takano, S.: Application of combined SWOT and analytic hierarchy
process (AHP) for tourism revival strategic marketing planning: a case of Sri Lanka tourism.
J. East. Asia Soc. Transp. Stud. 8, 954–969 (2010). https://doi.org/10.11175/EASTS.8.954
16. Pesonen, M., Kurttila, M., Kangas, J., Kajanus, M., Heinonen, P.: Assessing the priorities
using A’WOT among resource management strategies at the Finnish forest and park service.
For. Sci. 47, 534–541 (2001). https://doi.org/10.1093/forestscience/47.4.534
17. Gallego-Ayala, J., Juizo, D.: Strategic implementation of integrated water resources
management in Mozambique: an A’WOT analysis. Phys. Chem. Earth 36, 1103–1111
(2011). https://doi.org/10.1016/j.pce.2011.07.040
Design and Analysis of Water Distribution
Network Using Epanet 2.0 and Loop 4.0 –
A Case Study of Narangi Village
Usman Mohseni1, Azazkhan I. Pathan1(&), P. G. Agnihotri2,

Nilesh Patidar1, Shabir Ahmad Zareer3, D. Kalyan3, V. Saran3,
Dhruvesh Patel3, and Cristina Prieto3
1
Surat, Gujarat 395007, India
usmanmohsenialiakbar@gmail.com,
pathanazaz02@gmail.com,
nileshpatidar2671996@gmail.com
2
Department of Civil Engineering, School of Technology, Pandit Deendayal
Energy University-Formerly PDPU, Gandhinagar, Gujarat, India
pgasvnit12@gmail.com
3
IHCantabria – Instituto de Hidraulica, Ambiental De La Universidad de
“Cantabria, Santander, Spain
shabir.zareer@gmail.com, kalyandummuiiit@gmail.com,
saran29raaj@gmail.com, dhruvesh1301@gmail.com,
prietoc@unican.es
Abstract. The water distribution network plays a key role in ensuring and
providing a high quality of life for the public, with supply dependability being
the most crucial component. A communities thrive is dependent on the avail-
ability of safe drinking water for its members. As the population of any com-
munity increases, the request for water also increases which imposes additional
load to the existing water distribution system. As a result, the present water
distribution system may become untrustworthy to meet the demand of the
increased population. It is vital to provide an adequate and equitable quantity of
water through a well-designed network of pipes in order to meet the ever-
increasing water demand of the population. LOOP (Version 4.0) can be used to
develop and simulate gravity and pumped water distribution systems that are
novel, partially or completely functional. LOOP 4.0 is used to determine the cost
of pipe network. EPANET is a computer program that models hydrodynamic
behaviour in a pressurised pipe network over a long period of time. The
Examination presents the hydraulic analysis of pipe network of Narangi area in
Virar city using Loop 4.0 and EPANET 2.0. The results confirmed that the
pressure at all junctions and velocities at all pipes are acceptable to deliver
appropriate water to the study area’s network, as well as aiding in a better
understanding of the study area’s pipeline system. The study also deals with the
cost of distribution network.
Keywords: LOOP 4.0 EPANET 2.0 Arc GIS Elevation Nodes Pipe
network Pressure Water supply

https://doi.org/10.1007/978-3-030-93247-3_65
672 U. Mohseni et al.
1 Introduction
1.1 Water Distribution System
Pipes, tanks, pumps, and valves are among the elements of a water distribution system.
It’s the system that links the water supply sources to the end user. It is a meticulous
delivery system that permits water to flow via pipes before reaching the tap of the user.
Local governments normally own and maintain water distribution systems, but cor-
porate entities occasionally operate them. For arranging of water distribution network
city organizers engineer should consider numerous variables like area, flow interest,
future development, pipe sizes, head loss, firefighting, spillages etc. using pipe analysis
method and other tools [1].
A water system has two essential prerequisites. First and foremost, it needs to
convey adequate measures of water to meet utilization prerequisites. Besides, the water
framework should be solid; the necessary measure of water should be accessible 24 h
per day [2]. Astounding advancement has been made lately in different parts of water
supply and distribution. The sustainable management of water resources also plays a
key role in the development of human societies. One of the ways to effectively manage
the water distribution system is by using a model. A water distribution system model
aides the expert in administration, support, and expansion of the water supply system
nearby. A model can assist the authority with understanding when the current water
distribution needs development and what adjustment ought to be made to fulfil the need
of the populace [3].
This study aimed to design and analyse the water distribution network using LOOP
4.0 & EPANET 2.0 Software for Narangi area, Virar.
1.2 Introduction to LOOP 4.0

LOOP 4.0 is an entirely new version of the previous application LOOP 3.0 (written in
IBM BASIC), which was designed and adopted through UNDP/World Bank cooper-
ation. Apart from LOOP, the UNDP/World Bank also distributed FLOW (written in
Microsoft FORTRAN 4.0). FLOW (Version 3.0) offers more features and abilities than
LOOP (Version 3.0), but it is unquestionably less user appealing. In conjunction to
certain other technical features, LOOP Version 4.0 has leveraged a portion of the
FLOW code while concurrently improving the User Interface, resulting in an even
more spectacular and successful software. LOOP (Version 4.0), often known as LOOP,
is a software programme that may be used to build and simulate fresh, partially or fully
established gravity and pumped water distribution systems. Reservoirs (fixed or vari-
able head viz. pumps), valves (pressure lowering or check valves), and online booster
pumps are all possible [4].
1.3 Introduction to EPANET 2.0

The US Environmental Protection Agency’s (EPA) Water Supply and Water Resources
Division developed EPANET, a public domain water distribution modelling pro-
gramme. It is designed to be a learning tool that increases our understanding of the
Design and Analysis of Water Distribution Network 673
behaviour and outcome of drinking-water elements inside distribution systems. It is

used to do prolonged simulations of hydraulic and water-quality behaviour within
pressurised pipe networks. Both an independent programme and an open-source toolkit
are available for EPANET 2. The “. inp” input file format used by EPANET [5].
2 Literature Review
The major objectives of this work are to use EPANET software to investigate and
develop SAVEETHA University’s water distribution network. The SAVEETHA
University’s water distribution system consists of 14 pipelines, 14 nodes, and one main
overhead tank. The pipe diameter for the whole network is 250 mm. The pipes are
composed of cast iron with a roughness coefficient of 110, and they were utilised
throughout the network system. The total amount of water available in one day is
1600 m3, while the needed demand is about 1200 m3. As a result, there was no water
scarcity, and extra water was kept in sumps and tanks, which came in handy at peak
hours [6]. The current research depicts the renovation of an existing network as well as
the construction of a water distribution network with the aid of a programming tool.
Pipes, nodes, pumps, valves, and storage tanks or reservoirs make up a network.
EPANET calculates the water flow in each pipe as well as the pressure at each node.
The current network paradigm is being redesigned for the next 30 years. As per the
standard of the Central Public Health Environment Engineering Organization
(CPHEEO) handbook, the appropriate residual pressure and flow are reached at all
nodes and connections [7]. This study focuses solely on the use of the EPANET tool to
construct and distribute a pipe network. Chowduru of Proddaturumandal in YSR
Kadapa District of Andhra Pradesh, India, is studied in this study to build an effective
water distribution network. The residual head at each node was calculated using the
elevation as an input, and the related flow characteristics such as residual head,
velocity, and nodal demand were calculated as a result.[8]. The examination of the
existing network was investigated in this study, and the conclusion was Reliability on
the network using EPANET software. Various data are necessary for the study of the
current water distribution system, including the main water supply, population of the
region, water demand, pump requirements, and so on. Water tanks and a distribution
network to Analyse of Water Distribution Network using EPANET software in Olpad
village. Pressure and elevation at various nodes, as well as head loss at various pipes,
were the results of this research [9]. The performance of the WDS in a pilot study
region of Nagpur city (India) is assessed using EPANET software in this study.
Simulations of both continuous and intermittent water supplies are carried out. The
maps of the water supply network, node network, and elevation map were created using
ArcGIS10. The population data is generated using a remote sensing picture. The rec-
ommendations for reducing leakage and maintaining the system’s hydraulic integrity
will be beneficial to the local authorities in adopting a 24-h water delivery plan [10].
This study examines how EPANET calculates the efficiency of variable speed pumps
that operate at speeds other than the nominal speed. In EPANET 2, an experimental
setting at the Technical University of Civil Engineering in Bucharest was modelled.
The results demonstrate that when the pump is operated at varied rotational speeds,
EPANET has no effect on the efficiency curve [11]. This paper describes a modification
to the original EPANET concept, including a new graphical user interface menu (GUI).
A new framework for information exchange between EPANET and third-party pro-
grammes was developed as part of this project. This new connection allows you to use
EPANET’s GUI while concurrently extending its editing, calculating, and processing
capabilities [12]. This study involved a comparison of head-loss equations using the
EPANET computer software and the Hardy Cross iterative technique. The Darcy
Welsbach and Hazen Williams head loss equations were used to calculate the frictional
head losses in each technique. The findings were subjected to a t-test with a signifi-
cance threshold of 5%, revealing that there is no significant difference in the usage of
either of the head-loss equations for computations in any of the techniques. As a result,
any of the head-loss equations may be utilised to analyse pipe networks effectively [3].
The best design of a water distribution system may be determined using a method
called linear programming gradient (LPG). The system consists of a pipeline network
that transports known quantities from suppliers to consumers and may include pumps,
valves, and reservoirs. The optimization takes into account the system’s operation
under each of a set of demand loadings. The information needed to determine the
gradient of the total cost with regard to changes in the How distribution comes from
post optimality analysis of the linear programme. The gradient is employed to alter the
flow patterns in order to reach a (local) optimum. A computer programme was used to
accomplish the approach. There are examples that have been solved [13]. Using
WaterCAD and Epanet, this research examined the performance of the Wadata sub-
zone water distribution system in terms of pressure, velocity, hydraulic head loss, and
nodal demands. Although there was no statistical difference between Epanet and
WaterCAD findings, Epanet generated somewhat higher pressure and velocity values
in roughly 60% of the cases studied. The findings of this investigation indicated that the
Wadata sub-water zone’s distribution infrastructure is inefficient under present demand
[14]. The goal of this study was to use the WaterGEMS model to optimise the planned
water distribution system in Wukro. The Darwin Designer in WaterGEMS was used to
determine the best pipe diameter for supplying an appropriate quantity of water to end
consumers at acceptable pressures. The WaterGEMS model was used in water distri-
bution networks with 117 pipes (40.67 km) and 99 demand nodes (corresponding to
50480 end customers) distributed throughout a mountainous terrain with an elevation
gradient of 1989 m to 2046 m. The highest pressure before optimization was 31.1 m
and climbed to 38.1 m after optimization, while the minimum pressure on the former
was 7.9 m and 16 m later under peak hour demand. According to the results of this
study, the WaterGEMS model is a viable technique for optimal pipe sizing in water
distribution network design and pumping operational schedules [15].
3 Case Study
The Narangi region, Virar City, Taluka: - Vasai, District: - Palghar chose for the
examination reason. Narangi is situated at 19.4742°N 72.8107°E. It has an average
Elevation of 11 m. The village has a humid subtropical. The level of moisture in the air
is high in the summer, and the air is generally dry in the winter. Every year, during the
monsoon season of June to September, Narangi receives high to exceptionally high

rainfall, which can be fatal. The ward of Narangi is administered by the Vasai-Virar
Municipal Corporation, a recently constituted local authority in the area.
The current Elevated Service Reservoir (ESR) of 20,00,000 lit. Capacity and Full
Supply Level (FSL) is 36.04 m and Low Supply Level (LSL) is 33.04 m. The Ground
Level (GL) of ESR is 27.04 m (Figs. 1 and 2).
Fig. 1. Google map image of study area (Google Maps)
Fig. 2. Google Earth image of study area (Google Earth)

4 Methodology
Using EPANET 2.0
1. Steps required for simulation:

a) Using the backdrop draw a water distribution network.
b) Edit EPANET elements like length, diameter, roughness coefficient, water demand
for pipe and demand pattern, node elevation for the node.
c) Allow 24 h for simulation.
d) Run the analysis.
e) Finally, check the results.
2. Steps in Designing Water Distribution Network
a) Preparation of maps by conducting surveys.
b) Preparation of preliminary layout.
c) Computating pipe discharge.
d) Calculate pipe diameters.
e) Computation of pipe pressure.
3. Population Forecasting
Population used is taken from VASAI VIRAR CITY MUNICIPAL CORPORA-
TION. In 2010 population was 7085 and in 2020 population was 9902.
Arithmetical Increase Method was adopted to find population for the years 2030,
2040 and 2050 (Table 1).
Table 1. Population forecasting by arithmetical increase method.

Year Population
2030 12719
2040 15536
2050 18353
4. Input Data
See Table 2.
Table 2. Input parameters

Pipe no. Node Dia Length
From to (mm) (m)
1 1 2 600 478
2 2 3 600 21
3 3 4 500 45
4 4 5 500 142
5 5 6 450 89
6 6 7 450 166
7 7 8 450 118
8 8 9 450 152
9 9 10 150 110
10 10 11 150 63
11 4 12 150 112
12 12 13 150 42
13 13 14 150 98
14 3 15 150 123
15 15 16 150 178
16 16 17 150 87
17 8 18 150 164
18 18 19 150 71
19 20 20 150 46
20 20 21 150 71
21 22 22 150 234
22 9 23 150 193
23 10 24 150 224
24 24 25 150 143
25 24 26 150 63
26 27 28 300 68
27 28 29 150 88
28 28 30 250 112
29 30 31 150 98
30 30 32 150 56
31 32 33 150 51
32 32 38 150 44
33 38 39 150 24
34 39 40 150 24
35 39 41 150 50
36 34 42 150 27
37 41 43 150 57
38 38 44 150 52
39 44 45 150 51
(continued)
From to (mm) (m)
40 45 46 150 85
41 45 47 150 44
42 44 48 150 98
43 36 49 150 90
44 49 50 150 36
45 50 51 150 41
46 50 52 150 64
47 49 53 150 171
48 53 54 150 217
49 53 55 150 64
50 6 56 150 132
51 7 57 300 106
52 57 58 150 138
53 57 59 300 102
54 59 60 150 104
55 60 61 150 76
56 60 62 150 87
57 59 63 150 146
58 63 64 150 86
59 64 65 150 98
60 64 66 150 152
61 63 67 160 166
62 67 68 150 48
63 68 69 150 96
64 68 70 150 84
65 67 71 150 115
66 71 72 200 45
67 3 73 250 547
68 73 74 150 94
69 73 75 150 93
70 75 76 150 71
71 2 77 200 122
72 77 78 150 132
73 77 79 200 127
74 79 80 150 122
75 79 81 150 512
76 75 82 150 106
77 83 83 150 122
78 5 16 150 156
(continued)
From to (mm) (m)
79 27 34 300 146
80 34 35 150 76
81 34 36 150 29
82 36 37 150 61
83 7 27 300 20
84 27 83 150 62
85 81 85 150 85
86 85 87 150 87
87 85 86 150 86
88 72 101 200 101
89 72 88 150 88
90 88 89 150 89
91 88 90 150 90
92 11 99 150 99
93 11 95 150 96
94 95 96 150 96
95 95 97 150 97
96 55 91 150 91
97 91 100 150 100
98 91 92 150 92
99 92 93 150 93
100 92 94 150 94
101 53 98 150 98
5 Results
5.1 Results from EPANET 2.0
The results of EPANET 2.0 are depicted in the graphs below. We can determine
whether the pressure at all connections and velocities in all pipes are acceptable to
deliver appropriate water to the research area’s network using these graphs. The
maximum demand for water in pipe 67, according to Fig. 3, is 68.39 l pm. The highest
velocity in pipe 57 is 0.22 m/s, as seen in Fig. 4. The maximum water demand in
junction 67 is 69.86lpm, according to Fig. 5. In junction 16, as shown in Fig. 6, the
maximum pressure is 26.93 m (Fig. 7).
Fig. 3. Line graph of demand through pipes.
Fig. 4. Line graph of velocity through pipes.

Fig. 5. Line graph of pressure through junctions.
Fig. 6. Line graph of demand through junctions.

Fig. 7. Water distribution network in EPANET
5.2 Result from LOOP 4.0
Fig. 8. Pipe cost summary from LOOP 4.0

6 Conclusion
The main goal of this project is to examine the flow of water in the water distribution
network throughout the chosen area and determine whether there is any water scarcity
at any given node. And to clarify about the day-by-day utilization of water in the
selected Area. This study will aid water supply engineers in planning ahead because the
process is simple and rapid. In anticipation of the town’s future expansion, the newly
created network is laid according to the road layout using the project proposal. At the
conclusion of the analysis, it was discovered that the resulting pressures and velocities
at all nodes are sufficient to give water to the study region. Results from Graphs
indicates that pipe 67 and junction 67 carries maximum demand of 68.39 lpm and
69.86 lpm respectively. Also, maximum velocity of 0.22 m/s and maximum pressure of
26.93 m is observed in pipe 57 and junction 16 respectively. Result from Loop 4.0 are
shown in Fig. 8 which determines the cost summary of pipes.
References
1. Anisha, G., Kumar, A., Kumar, J., Raju, P.: Analysis and design of water distribution
network using EPANET for Chirala municipality in Prakasam District of Andhra Pradesh.
Int. J. Eng. Appl. Sci. 3, 257682 (2016)
2. Jain, A., Bhavani, D., Gamit, M., Kahar, A.: 24x7 water distribution network (sarsana)
using. 5, 675–679 (2019)
3. Iglesias-Rey, P.L., Martínez-Solano, F.J., Ribelles-Aquilar, J.V.: Extending EPANET
capabilities with add-in tools. Procedia Eng. 186, 626–634 (2017)
4. Sumithra, R.P., Amaranath, J.: Feasibility analysis and design of water distribution system
for Tirunelveli corporation using loop and water gems software. Int. J. Appl. Bioeng. 7, 1
(2013)
5. Sonaje, N.P., Joshi, M.G.: A review of modeling and application of water distribution
networks (WDN), vol. 3, pp. 174–178 (2015)
6. Nallanathel, M., Ramesh, B., Santhosh, P.A.: Water distribution network design using
EPANET A case study. Int. J. Pure Appl. Math. 119, 1165–1172 (2018)
7. Jumanalmath, S.G., Shivapur, A.V.: Analysis of 24 7 water distribution network of
Gabbur zone in Hubballi city, Karnataka state, India using EPANET software. Int. Res.
J. Eng. Technol. 4, 478–485 (2017)
8. Venkata Ramana, G., Sudheer, C.V.S.S., Rajasekhar, B.: Network analysis of water
distribution system in rural areas using EPANET. Procedia Eng. 119, 496–505 (2015)
9. Parmar Ankita, P.N.: Water distribution network using EPANET: a case study of Olpad
Village. In: Emerging Research and Innovations in Civil Engineering, pp. 25–30 (2019)
10. Zolapara, B., Morbi, L.E.C.: Case study on designing water supply distribution network
using Epanet for Zone-I of Village Kherali Assistant Engineer Narm a da Water resources
Water supply. Indian J. Res. 4, 281–284 (2015)
11. Mohapatra, S., Kamble, S., Sargaonkar, A., Labhasetwar, P.K., Watpade, S.R.: Efficiency
study of a pilot water distribution system using EPANET and ArcGIS10. In: Conference on
CSIR-NEERI (2012)
12. Georgescu, A.M., et al.: Estimation of the efficiency for variable speed pumps in EPANET
compared with experimental data. Procedia Eng. 89, 1404–1411 (2014)
13. Nwajuaku, I.I., Wakawa, Y.M., Adibeli, O.J., Ijeoma, N.: Analysis of head-loss equations
under EPANET and hardy cross method, vol. 2, pp. 125–134 (2017)
14. Agunwamba, J.C., Ekwule, O.R., Nnaji, C.C.: Performance evaluation of a municipal water
distribution system using WaterCAD and Epanet. J. Water Sanit. Hyg. Dev. 8, 459–467
(2018)
15. Case, T., Berhane, T.G.: Optimization of water distribution system using WaterGEMS: the
case of Wukro Town, Ethiopia. Civ. Environ. Res. 1–14 (2020)
Effect of Climate Change on Sea Level Rise
with Special Reference to Indian Coastline
Dummu Kalyan1, Azazkhan Ibrahimkhan Pathan1(&),

P. G. Agnihotri1, Mohammad Yasin Azimi1, Daryosh Frozan2,
Joseph Sebastian3, Usman Mohseni1, Dhruvesh Patel4,
and Cristina Prieto5
1
2
Dr. S.&S.S. Ghandhy College of Engineering and Technology,
Gujarat Technological University, Surat, India
3
S.P.B Patel Engineering College, Mehsana, India
4
Department of Civil Engineering, School of Technology, Pandit Deendayal
Energy University - Formerly PDPU, Gandhinagar, Gujarat, India
5
IHCantabria – Instituto de Hidraulica Ambiental De la Universidad de
Cantabria, Santander, Spain
Abstract. This paper provides a brief examination of how climate change has
impacted sea level throughout the world, focusing on variables such as increased
frequency of extreme weather events, glacier melting, precipitation changes,
ocean currents, and so on. and statistics of the effect of each factors mentioned in
different papers along with the one given by Intergovernmental Panel on Cli-
mate Change (IPCC) which have taken place due to climate change is done.
Climate warming owing to an increasing greenhouse effect might trigger drastic
climatic changes and hasten sea level rise over the next century. These might
have a severe impact on the Indian coastline’s coastal regions. These are heavily
inhabited places that support a diverse range of economic activity. The region's
physical environment, as well as the different socioeconomic activity in the
coastal areas, are briefly discussed. Rising sea levels and inundated of low-lying
regions, salt intrusion, flooding owing to storm surges and high tides, and
habitat destruction are all physical effects of rising seas. In addition, a literature
evaluation of certain research articles is conducted in which the aforementioned
issue, namely the influence of climate change on sea level, is investigated. In the
later sections of this report, an analysis of how sea level rise has occurred along
the Indian coastline is carried out by selecting tide gauges with a consistent and
large number of data, and this data is compared with data obtained from satellite
altimetry, which can provide us with better results and aid in better sea level rise
prediction in those regions.
Keywords: IPCC Climate changes Sea level rises Indian coastline

https://doi.org/10.1007/978-3-030-93247-3_66
686 D. Kalyan et al.
1 Introduction
Anthropogenic as well as natural events have caused observable climatic changes in the
past few decades, thanks to the industrial revolution, emission of various greenhouse
gases has increased to a large extent causing the temperature on the earth’s surface,
both on land as well as water to increase drastically. Because the bulk of the excess heat
collected in the Earth’s system is retained in the ocean waters, which has produced sea
water expansion that might raise sea level by 30–50 cm for next couple of centuries,
this rise in temperature has created concerns such as an increase in sea surface tem-
perature (12–20 in.) A study has revealed that oceans absorb nearly 90% of the radi-
ations emitted by the sun on the earth’s surface. Also, it has caused the mountain
glaciers in Antarctica and Greenland to melt which can add around 20–40 cm to the sea
level, in the next century [1]. It is told that global warming will cause an increase in the
precipitation all across the world because retention capacity of water vapor of a warmer
atmosphere is more, resulting into water level rise in oceans.
According to Intergovernmental Panel on Climate Change (IPCC, 2013), Three
factors prove responsible for sea level rise, namely sea water expansion 0.11 m
to 0.43 m, Antarctic glacier melting changing −0.17 m to 0.02 m, and Greenland ice
melting from −0.02 m to 0.09 m). The rising sea level will inevitably drown a large
portion of each continent’s shoreline resulting into coastal retreat and also economic
losses as major cities of the world like New York, Mumbai, Tokyo, etc. are on the sea
coast. Hence, it has become the need of the hour to find a solution to this disastrous
phenomenon of sea level rise by using structural measures like construction of dikes
along the sea coast as in the case of Netherlands, installation of pumps to pump out
water overtopping these detention structures, etc. as well as non-structural measures
like prediction models to determine the effect of this global phenomenon on the
coastlines across the globe. In the current Arctic Coast scenario, ice is the major factor
influencing hydraulic conditions by damping wave currents, intensifying lower cur-
rents, transporting sediments and supplying rivers and beaches with sediments. How-
ever, owing to rising temperatures, mechanical erosion, i.e. wave currents, was
assaulted on the ice-rich coasts, resulting in fast coastal erosion. Because of ice push,
currents and waves, islands that were small were in constant change. Earlier the coastal
systems had a balance between the impacts induced by waves and currents along with
ice, but climate warming caused the sea level to increase which reduced the impact of
ice procedures and reduced the position of wind and waves on the shoreline [2].
The study’s goal is to provide a quick overview of how climate change has
impacted sea level throughout the world by examining variables such as rising global
temperatures, melting glaciers, increased precipitation, and ocean currents, among
others. and statistics of the effect of each factors mentioned in different papers along
with the one given by Intergovernmental Panel on Climate Change (IPCC) which have
taken place due to climate change is done. To monitor the sea level variation in the Bay
of Bengal and the Persian Gulf, a preliminary survey was carried out to gather tide
number of data points across all stations along the Indian coastline. Not only tidal
gauging information from the Persistent Mean Sean Level Service (PSMSL) but also
Effect of Climate Change on Sea Level Rise 687
satellite altimetry data from TOPEX/Poseidon, Jason 1 and Jason 2 were utilised to
enhance the precision of the testing (available since 1993).
Coastal drainage systems that keep roads and dwellings from flooding have
developed significantly in recent decades, to the point where flooding from rainfall is
now rarely more than a minor annoyance in most regions. Carbon dioxide and water
vapour in the earth’s atmosphere have been recognised for over a century to warm our
planet by absorbing outgoing infrared radiation [7]. The various climate change sce-
narios as estimated by Global Circulation Models (GCM), it evaluates the rate and
extent of sea level rise, as well as the influence on saltwater intrusion in a coastal
aquifer for a fictional problem [8]. One of the most dramatic repercussions of climate
change is sea-level rise. The world’s attention has been drawn to the high estimated
rates of future sea level rise [9]. Due to increased ocean warming and mass loss from
glaciers and ice sheets, the rate of global mean sea level rise during the twenty-first
century is very expected to exceed the pace observed between 1971 and 2010 for all
Representative Concentration Pathway (RCP) scenarios [10].
2 Literature Review
To gather all of the Indian coastal tidal gauge data and to achieve the changes in sea
level seen in the Persian Gulf and Bay of Bengal, a rough survey was conducted. The
variation in mean sea level in some North Indian Ocean areas was analyzed using both
satellite altimetry information and tide gauge as well. Out of 28 tide gauges on the
Indian coastline, 17 are dysfunctional and 5 have severe gaps in data of many years.
Only 6 tide gauges proved beneficial as they had better consistency in data. Based on
the location of tidal gauges and geology, the coastline was split into four zones. The
master station was chosen as the one with the longest tide data (in this case the stations
were Kandla, Mumbai, Cochin in the south sea belt and Hiron Point in the Bay of
Bengal belt). In zone A, the master station exhibited a positive trend, in contrast to the
negative trend reported by the subsidiary station Okha, which might be attributable to a
variation in bathymetry. Zones B and C were more consistent with the known 1–2 mm
per year increase in global sea level (IPCC, 2013). In contrast to the worldwide esti-
mate, Zone D showed very elevated favorable value as it was an extremely flood-prone
region [3].
The study used sea level and tide gauge data as well as sophisticated geospatial
tools to highlight various local coastal hazards in the face of climate change as a result
of the present rise in sea level along India’s eastern shore. Coastal elevation and flood
risk regions along India’s eastern shore were obtained using SRTM’s 90 m resolution
global DEM. The Coastal mangrove area is the most vulnerable area owing to lower
altitude (ranges 0 to 20 m) and higher tidal impact, according to the findings. The
southern region of the shoreline (Ganga-Brahmaputra delta area) is mostly impacted by
the increase in sea level (4.7 mm/year), with the Sundarban area being the most
vulnerable area due to decreased average elevation (ranges 0 to 20 m) and greater high
tide effect. Sea surface increase is also higher in Visakhapatnam and Bhubaneswar, at
0.73 mm annually 0.43 mm annually, respectively. The ultimate findings of spatial
categorization support and propose future initiatives to researchers and decision [4].
Three different models were employed to simulate the influence of rising temper-
atures on rising sea levels: the process-based model, the semi-empirical model, and the
dynamic system model. The worldwide mean shift in the process-based model was split
into ocean thermal expansion, ice caps, mountain glaciers and ice sheets, and calcu-
lations of each component were performed individually by connecting each particular
physical mechanism with mathematical equations. Statistical and numerical techniques
were used in the semi-empirical model to create a connection between global tem-
perature and worldwide sea level that was used to forecast the future. The above two
components were treated as an interactive dynamic system in the dynamic system
model by taking into consideration the possible synergy and response between them.
A linear dynamic equilibrium model was used to forecast both sea level rise and global
mean temperature change at the same time. The model was built using data collected to
raise sea level and the system’s behaviour as a result of temperature changes. The
model using the available information confirmed the conclusion that the increase rate at
sea level depends not only on temperature, but also on the present state. The tem-
perature increase, on the contrary, is extremely influenced by its present state and is
slightly influenced by the sea level state [5].
An estimate of the increase in sea level along India’s shoreline was made using the
tide gage information acquired in the past, in which Mumbai, of the other 3 tidal
stations (Kochi, Vishakhapatnam and Chennai), had information of 129 years that
would be useful in simulating a better and coherent model for predicting the increase in
sea level of that town. The study was carried out up to 2100, which revealed that Kochi,
Mumbai and Vishakhapatnam stations showed an increase in sea level of about 1
mm/year, while Chennai had −0.69 mm/year. The above estimates had to be corrected
because the impact of vertical ground movements was not considered. As it was
observed that major contributor to sea level rise in the coastal regions was due to the
storm events occurring because of cyclones, two conditions namely controlled (CTRL)
and increased greenhouse gas (GHG) was taken for simulation of the regional climatic
model of Hadley Centre, HadRM2, According to the report, cyclones occur more
frequently in the Bay of Bengal than in the Arabian Sea, and this is related to the
increasing GHG situation [6].
3 Research Study Area
The entire length of India’s shoreline is 7516.6 km, comprising 6100 km of continental
shoreline and 1196.6 km of island border. Gujarat, Maharashtra, Goa, Karnataka,
Kerala, Tamil Nadu, Andhra Pradesh, Odisha, and West Bengal, as well as four Union
Territories, Daman and Diu, Puducherry, Andaman & Nicobar Islands, and Lakshad-
weep, are all impacted by the Indian coastline. The continental drift from Gond-
wanaland was considered to be responsible for the creation of the Indian shoreline.
Large areas of the Indian coastal plains are covered with fertile soils, which are used to
grow a variety of crops, the most important of which being rice. The people who live
on the coastal plains make their living by fishing. The Indian coastline is divided into
two sections: the western and eastern shorelines. Figure 1 shows a research study area
map of India’s coastline.
Fig. 1. Coastline of India
• The Western coastal plains of India is divided into 5 regions,

1. Kutch and Kathiawar region
2. Gujarat plain
3. Konkan plain
4. Karnataka plain
5. Kerala or Malabar plain
The north western plains, which stretch from the Rann of Kutch in the north to
Kanyakumari in the south, are home to several river estuaries, the most famous of
which are the Narmada and Tapi. There are some lagoons, backwaters and lakes on the
Kerala coast, the largest being the Vembanad lake.
The Eastern coastal plains of India is divided into 3 regions,
1. Utkal plain
2. Andhra plain
3. Tamil Nadu or Coromandal plain
It has deltas of many rivers, the main ones being Godavari, Mahanadi and Krishna.
The coastline length for each state is shown in the table below with Gujarat having the
highest 1214.7 km of coastline (Table 1).
Table 1. State/UTs along with their coastline lengths. Source: https://quickgs.com/coastal-

length-of-indian-states/
States Coastal length (km)/Number of islands
Gujarat 1214.7
Maharashtra and Goa 652.6
Karnataka 280
Kerala 569.7
Tamil Nadu Pondicherry 937.5
Andhra Pradesh 973.7
Orisha 476.4
West Bengal 157.5
Lakshadweep Island 132 km/37 Islands
Andaman and Nicobar Islands 1962 km/348 Islands
4 Analysis
In this approach, first of all a preliminary survey was carried to collect the tidal gauge
data across all the stations of the Indian coastline to observe the sea level variation in
the Bay of Bengal and the Arabian Sea.
Not only tide gauge data as from Perpetual Mean Sean Level Service (PSMSL) but
also satellite observations information from TOPEX/Poseidon, Jason 1 and Jason 2
were utilised to enhance the precision of the testing (available since 1993).
17 of the 28 tidal gauges along the Indian coastline are broken, and 5 have sig-
nificant data gaps dating back many years, for example. Garden Reach had a long data
record that spanned from 1932 to 2010, however there was a ten-year gap between
them, making it discontinuous. Only six tidal gauges were found to be useful since their
data was more consistent. Based on the location of tidal gauges and geology in the
region, the analysis area was split into four areas: A, B, C, and D. The master station
among all tide gauge stations is chosen based on the best available information; this
technique implies that the regions of such master stations are homogenous in character.
The supervisory stations were:
• Zone A – Kandla (Gulf region)
• Zone B – Mumbai (open sea)
• Zone C – Cochin (open sea)
• Zone D – Hiron Point (deltaic region)
To discover the global sea level tendency, nearly all data was chosen and split; this
contrast made it simple to locate tidal gauges with an exceptional up/down trend.
Year MSL information from satellite altimetry data (TOPEX/Poseidon, Jason 1 and
Jason 2) for both the Persian Gulf and the Bay of Bengal were compared to the trend
provided by the IPCC to detect any gaps in the estimations (Fig. 2).
Fig. 2. (a) Stations of the master tide gauge (b) India’s existing tide gauge stations
To illustrate the influence of global warming, a tide gauge should have more than 50–
60 years of data, however such numbers are difficult to come by. Only Mumbai in India
has a history of more than a century (1878–2010) and has therefore established itself as
a key point of comparison in the Persian Gulf or Arabian Sea (Table 2 and Fig. 3).
Table 2. Relative patterns with their respective master stations at various sites [3].
Tide gauge stations Comparison period (Years) Relative variability (mm/year)

Master station A - Kandla
Okha 1975-2007 -2.076 ± 0.47
Master station A – Mumbai
Marmagao 1969-2010 0.032 ± 0.01
Karwar 1971-2010 0.734 ± 0.16
Master station C – Cochin
Mangalore (Panamburu) 1977-97 0.646 ±0.03
Mangalore 1953-76 1.354 ±0.06
Chennai 1953-2007 0.505 ±0.06
Master station D – Hiron point
Cox Bazar 1979-2000 4.30 ±0.04
Khepupara 1979-2000 15.50 ±0.90
Charchanga 1979-2000 5.84 ±0.15
Fig. 3. AMSL (mm) at master stations in Mumbai, Kandla, Cochin, and Hiron Point.
Only one site is available for comparison in zone A, showing a declining trend, but
the master station exhibited a rising trend, indicating that MSL at Okha is lower than
Kandla, or that wave currents are flowing from the Kutch Gulf to Kandla, or that there
is a recorded mistake. While zones B and C are in line with the worldwide average of
1–2 mm per year (IPCC, 2013), there was still some variation, and more study was
needed to determine the exact source of the variation.
Zone D had very rising positive attributes that differed from statistics, indicating
that it was a heavily flooded area due to the high drainage potential of the world's
largest Ganga-Brahmaputra delta in this region, as well as the world’s most populated
delta. High groundwater extraction occurs in those areas, causing land to subside.
The next move was to use satellite data including TOPEX/Poseidon (1993–2002),
Jason 1 (2001–2013) and Jason 2 (2008-present) altimetry data, available from 1993 at
a resolution of 0.25° 0.25° (Fig. 4).
In the period 1993–2012, the IPCC reported a worldwide rise of 3.2 mm/year in the
Bay of Bengal and 2.15 mm/year in the Arab Sea, with a global increase of 3.2 mm Per
year. Satellite data from the Bay of Bengal indicated a trend that was more or less
similar to the worldwide estimate, but not for the Arab Sea, implying that a compre-
hensive hydrodynamic coastline survey of these areas might reveal variations in global
trend patterns.
Fig. 4. (a) Bay of Bengal AMSL (mm) and (b) Arabian Sea AMSL (mm)
6 Conclusion
A rising trend in sea level was verified by tide gauge and satellite data in the North
India Ocean region. The pattern was similar to global forecasts in a few places, but
there were notable differences in others. Because there are fewer long-term tidal gauge
records in this location, it’s been challenging to compare historical trends with other
lengthier tide gauge data from across the world. This analysis revealed that certain
nations’ regional and global trends are not similar. This research revealed that regional
and worldwide trends in some regions are not the same. Zone D showed a trend that
differed from the other sites and the IPCC’s worldwide forecast. Similar variations were
discovered at the Okha and Kandla tidal gauge sites, where each tide gauge reacted
differently from the zonal pattern. Regional and worldwide patterns were also revealed
by the satellite’s data. In order to understand the particular behaviour of each region in
response to climate change, it was necessary to examine the local/local variance of
MSL. Integrated coastal area management and planning can benefit from a compre-
hensive regional study of MSL variation. However, analysing current data alone is
insufficient to comprehend the reasons behind MSL regional heterogeneity.
References
1. Revelle, R.: Probable Future Changes in Sea Level Resulting from Increased Atmospheric
Carbon Dioxide, Changing Climate. National Academy Press, Washington, D.C (1983)
2. Barnes, P.W.: Effects of elevated temperatures and rising sea level on Arctic coast. J. Cold
Regions Eng. 4(1), 21–28 (1990)
3. Chowdhury, P., Behera, M.R.: A study on regional sea level variation along the Indian coast.
Procedia Eng. 116(1), 1078–1084 (2015)
4. Pramanik, M.K., Biswas, S.S., Mukherjee, T., Kumar Roy, A.: Sea level rise and coastal
vulnerability along the Eastern Coast of India through geo-spatial technologies. J. Remote
Sens. GIS 4(2), 145 (2016). https://doi.org/10.4172/2469-4134.1000145
5. Aral, M.M., Guan, J., Chang, B.: Dynamic system model to predict global sea-level rise and
temperature change. J. Hydrol. Eng. 17(2), 237–242 (2012)
6. Unnikrishnan, A.S., Rupa Kumar, K., Fernandes, S.E., Michael, G.S., Patwardhan, S.K.: Sea
level changes along the Indian coast: observations and projections. Curr. Sci. 90(3), 362–368
(2006)
7. Titus, J.G.: Greenhouse effect, sea level rise, and coastal zone management. J. Coast. Zone
Manage. 14(3), 147–171 (1986). https://doi.org/10.1080/08920758609362000
8. Tiruneh, N.D., Motz, L.H.: Climate change, sea level rise, and saltwater intrusion. J. Hydrol.
Eng. 5(2), 229–237 (2001)
9. Mimura, N.: Sea-level rise caused by climate change and its implications for society. Proc.
Jpn. Acad. Ser. B Phys. Biol. Sci. 89(7), 281–301 (2013)
10. Church, J.A., Gregory, J.M.: Sea level change. In: Encyclopedia Ocean Sciences, pp. 493–
499 (2019). https://doi.org/10.1016/B978-0-12-409548-9.10820-6
Design and Analysis of Water Distribution
Network Using Watergems – A Case Study
of Narangi Village
Usman Mohseni, Azazkhan I. Pathan(&), P. G. Agnihotri,

Nilesh Patidar, Shabir Ahmad Zareer, V. Saran, and Vaishali Rana

Surat, Gujarat 395007, India
Abstract. The water distribution network for portable water supply is neces-
sary for a well-planned city. The water distribution system ensures that the water
is been supplied from distribution centre to end consumer. Water distribution
system are designed in such a manner that at minimal cost the demand is
fulfilled at adequate pressure supplies. Due to rapid urbanisation the demand for
water supply and the pressure to deliver them increases. This leads to damage
and leakages in the existing pipelines and also requires additional pipelines to
meet the demand. This paper will investigate the operation of Narangi village’s
water distribution system in Maharashtra using Bentley WATERGEMS soft-
ware. The WATERGEMS software preforms the analysis on flow of water in
each pipe, height of the water in each tank, maximization of water flow velocity
and also the pressure at each node during simulation. This software also reveals
the crucial locations in the study area with an explanation of its causes and also
its impacts on various aspects of life. This research is being done for the years
2020, 2030, 2040, and 2050. According to the findings, as the population grows
from 2020 to 2050, demand, flow, head loss gradient, and pressure development
all increases.
Keywords: WATERGEMS Water distribution system Pressure Water

supply
1 Introduction
There is no formula or expression that can adequately express the importance of water
in human, plant, and animal life. A lifecycle on our planet cannot continue in absence
of water [1]. Water is one of the basic demands of all life forms. Without water, there
can be no life [2]. Water has always been acknowledged as a primary good and an
indispensable natural resource. As the standard of living increases, so does the need for
consumption of water for various anthropogenic activities [3]. Water is an important
component and also a key element for socio-economic development of a country. As
such, it has a major role one must pay close attention to the mode via in which the
water is conveyed to the end users at their various stops. In many rural areas wells and
ponds are the sources of domestic water supply. The supply of water will be pre-
dominantly by carrying them in buckets or cans from wells and ponds for their day-to-
day activities. For the past two decades major efforts have been taken place to improve

https://doi.org/10.1007/978-3-030-93247-3_67
water treatment and also water distribution system. With the modern technology we
can conveniently supply water from source to every household through pipes. This
mode of transporting water should be designed carefully in such a manner that there
should not be any problem when demand increases due to population or in case if we
have to increase the pressure of supply at source. Therefore, they are very crucial in
design aspects. To avoid those problems nowadays we use various software and
modern-day apps and data to solve those issues.
1.1 Water Distribution System

A typical water distribution system consists of network of pipes, nodes linking the
pipes, storage tanks, reservoirs, pumps, additional appurtenances like valve. Water
distributions systems (WDSs) are vital parts of water supply networks with components
which carry potable water from the source or centralized treatment plant to the end
users. WDS has been designed commonly to fulfil the demand of industry, residential,
institutional, and other commercial purposes. However, the design of the WDS and
how it supplies water to its users and its layout is related to its performance. The water
supply can be of any type either mechanical pumping, gravity flow or both. Fur-
ther WDS should be able to assist the abnormal conditions like mechanical failure of
pipes and valves, breakage of pipe and control systems [4].
1.2 Introduction to Watergems

Water GEMS was initially developed by the Company Hasted Methods, Inc. based in
Watertown (USA). Later this company was acquired by Bentley Systems in 2004.
After acquisition the product began to be known commercially as Bent-
ley WATERGEMS V8i. It is an evolution of Water CAD the same software launched
in the 90s. Since they both have same structure model created in Water CAD can be
read in Water GEMS and vice versa. For hydraulic simulation of the water distribution
system software like Water CAD, LOOP and EPANET are already in use yet
Water GEMS is the most advanced and powerful tool which is used in this analysis [5].
Water GEMS is a multi-platform hydraulic modelling solution for WDS with
geospatial model building, optimization, advanced interoperability and asset manage-
ment tools. Water GEMS provides the best environment for the modern engineers to
design, analyze and also optimize the WDS. The software is also useful for managing
the water system data, current and future scenarios and time series hydraulic result [6].
The objective of this study is to analyze the water supply network for the selected
area and check whether there is any shortage of water at particular node. With the help
of this study the effect of forecasted resident population and floating population on flow
in pipe and demand at junction can be analyzed.
Design and Analysis of Water Distribution Network Using Watergems 697
2 Study Area
Narangi region has the coordinates 19.47° N 72.8° E. Narangi is a village in city of
Virar, of vasai taluka in palghar district, state Maharashtra. The climate of the narangi
village is tropical. The village has an average elevation of over 11 m. During winter the
environment is often dry and at summer the moisture level is quite high. Narangi falls
experience devastating rainfall during monsoon period of June to September each and
every year. This Narangi falls comes under the region of Vasai-Virar Municipal cor-
poration as a newly formed civic body.
There is an existing Elevated Service Reservoir (ESR) which holds the capacity of
20,00,000 L. And Full Supply Level (FSL) has elevation of 36 m and Low supply level
(LSL) has an elevation of 33 m and also the Ground Level (GL) of ESR is 27 m (Fig. 1)
Fig. 1. Location map of study area

3 Literature Review
In this study the water gems software was found to be the most suited, easy to use, and
accurate for the design and analysis of major water supply networks. According to the
competent authorities for water supply arrangements in the area, Mangalnath Zone is
one of the most difficult zones in water distribution [7]. In this research the hydraulic
modelling of the network in this study displays the viability of the proposed network,
allowing for proper implementation of the data and observations in real applications.
The study is based on a research project entitled “Energy-efficient, community-based
water- and wastewater-treatment systems for deployment in India” funded by
Department of Science & Technology (DST) and European Commission (EC) [8]. In
this study there is a pressure fluctuation, which is caused by elevation variations and
draws out at the nodes. At all of the junctions, the system recorded a pressure shortage
in order to satisfy the required demand [9]. In this research the present water distri-
bution system is evaluated in this study using Bentley WaterGEMS to build a model. It
aided in the analysis of the overall network system as well as the visualization of the
effects of individual components and factors [10]. In this study, the WaterGEMS model
was introduced and used to develop the most cost-effective water distribution networks
in Wukro town. The results revealed that the least expensive options were obtained that
consistently met the requisite flow and pressure at the node [11]. According to the
findings of this study, the resulting pressures at all connections and flows with their
velocities at all pipelines are sufficient to provide water to the study region in accor-
dance with consumer needs [6]. The main outcome of this analysis is the water
company’s action scenario, which outlines many phases for connecting and optimizing
maintained water distribution infrastructure [12]. The use of software for distribution
system analysis results in ease of material selection for the distribution system, as
illustrated in the graph. In the future, this will be useful for distribution system analysis.
The graph below depicts the numerous optimum parameters with various materials,
which can be difficult to analyze manually [13]. The contour map production using
Google Earth is a faster and more cost- effective alternative than surveying. The
contour map created for elevation extraction can also be utilized to find a good location
for water supply components. It also aids in determining the location of customers and
taking immediate action in the event of a customer complaint [14]. The best design of a
water distribution system may be determined using a method called linear program-
ming gradient (LPG). The system consists of a pipeline network that transports known
quantities from suppliers to consumers and may include pumps, valves, and reservoirs.
The optimization takes into account the system’s operation under each of a set of
demand loadings. The information needed to determine the gradient of the total cost
with regard to changes in the How distribution comes from post optimality analysis of
the linear programme. The gradient is employed to alter the flow patterns in order to
reach a (local) optimum. A computer programme was used to accomplish the approach.
There are examples that have been solved [15]. Using WaterCAD and Epanet, this
research examined the performance of the Wadata sub-zone water distribution system
in terms of pressure, velocity, hydraulic head loss, and nodal demands. Although there
was no statistical difference between Epanet and WaterCAD findings, Epanet generated
somewhat higher pressure and velocity values in roughly 60% of the cases studied. The
findings of this investigation indicated that the Wadata sub-water zone’s distribution
infrastructure is inefficient under present demand [4].
4 Methodology
Following steps has been carried out to analyze existing Water Distribution Network
using WATERGEMS V8i:
1. Population forecasting:Population used is taken from VASAI VIRAR CITY
MUNICIPAL CORPORATION. In 2010 population was 7085 and in 2020 pop-
ulation was 9902.
Arithmetical Increase Method was adopted to find population for the years 2030,
2040 and 2050 (Table 1).
Table 1: Population forecasting by arithmetical increase method.

Year Population
2030 12719
2040 15536
2050 18353
2. Encode the input data: The majority of hydraulic analysis software has similar data
entry requirements. These data are separated into two categories: pipe data and node
data. The given pipe number, pipe diameter (mm), C-value, length (m), and
diameter are all pipe data (mm). Node data has a node number, elevation (m), and
water demand allocated to it (lps).
3. Hydraulic Network Simulation: this step is done by water gems itself, if all the input
data are entered correctly. the software could proceed with its hydraulic run. Head
loss in each pipe, rate of head loss in each pipe, flow velocities and pressure in each
node, all are computed by the software (Fig. 2).
Fig. 2. Flow chart of methodology
4. Validation of results: The output computed by the WaterGEMS from the input data
need to be validated for analysing the actual scenario of water distribution system.
The computer run results usually show all conceivable hydraulic parameters.
5. Selecting the network configuration: repeated simulation is done until an acceptable
network configuration is obtained.
6. Result and analysis: the software generate the result in table and graph format.
In the current study, the water distribution system is simulated through construction of
a model using Bentley WaterGEMS. The study is being done for the years 2020, 2030,
2040, and 2050. According to the findings, as the population increases from 2020 to
2050, flow in pipe and demand at junction increases. In the year 2020 population was
9902 which increases to 18353 by the year 2050. Pipe 1 had a flow rate of 1036 L/min
in 2020, but by 2050, it had increased to 1655 L/min. In 2020, demand at junction 67
was 54 L/min, and by 2050, it had increased to 86 L/min. With the help of this study
the effect of forecasted resident population and floating population on flow in pipe and
demand at junction is analyzed (Table 2).
Table 2. Increase in flow in pipe and demand at junction with increase in population.
Year Population Flow in pipe 1 (l/min) Demand at junction 67 (l/min)
2020 9902 1036 54
2030 12719 1145 59
2040 15536 1400 72
2050 18353 1655 86
A. For the year 2020
Fig. 3. Flow through pipes for year 2020
Fig. 4. Demand at junctions for year 2020

Figures 3 and 4 show a maximum flow of 1036 L/min in pipe 1 and a maximum
demand of 54 L/min at junction 67 for the year 2020.
B. For the year 2030
Fig. 6. Demand through junctions for year 2030
Figures 5 and 6 show a maximum flow of 1145 L/min at pipe 1 and a maximum
C. For the year 2040
D. For the year 2050
Fig. 11. Plot showing an increase in pipe Fig. 12. Plot showing an increase in junction
flow as the population increases demand as the population increases
6 Conclusion
The current study simulates the existing water distribution system by building a model
with Bentley WaterGEMS. It aided in the analysis of the overall network system as
well as the visualisation of the effects of individual components and factors. Almost all
towns and cities use an intermittent water delivery system, which has major flaws that
contribute to poor water quality and pressures, insufficient volumes, discomfort and
inconvenience, contamination, and other issues. This will be handled appropriately by a
continuous water delivery system. The current water supply method used by municipal
bodies for cities with shortened supply hours not only fails to meet designed hydraulic
requirements, but it is also severely hampered by adverse hydraulics, which causes
many of the key issues affecting local governments to spiral into an avoidable vicious
circle. From Figs. 11 and 12 it is clear that as population increases flow in pipe and
demand at junctions increases. Pipe 1 had a flow of 1036 L/min in 2020, which
increased to 1655 L/min in 2050. In 2020, demand at junction 67 was 54 L/min, rising
to 86 L/min in 2050.
References
1. Kawathe, L.N., Thorvat, A.R.: Analysis and design of continuous water distribution system
against existing intermittent distribution system for selected area in Pandharpur, M.S.,
INDIA. Aquademia 4(2), ep20028 (2020)
2. Masum, M.H., Ahmed, N., Pal, S.K.: Water distribution system modeling by using Epanet 2.
0, a Case. Civ. Eng. Sustain. Dev. 0–11 (2020)
3. Georgescu, A.M., et al.: Estimation of the efficiency for variable speed pumps in EPANET
compared with experimental data. Procedia Eng. 89, 1404–1411 (2014)
4. Agunwamba, J.C., Ekwule, O.R., Nnaji, C.C.: Performance evaluation of a municipal water
distribution system using waterCAD and Epanet. J. Water Sanit. Hyg. Dev. 8, 459–467
(2018)
5. Sonaje, N.P., Joshi, M.G.: A review of modeling and application of water distribution
networks (WDN). Int. J. Tech. Res. Appl. 3, 174–178 (2015)
6. Mehta, D., Yadav, V., Prajapati, K., Waikhom, S.: Design of optimal water distribution
systems using WaterGEMS: a case study of Surat city. In: E-proceedings 37th IAHR World
Congr. 1–8 (2017)
7. Rai, R., Dohare, D.: A review on application of water-gems in hydraulic modeling and
designing of water distribution network for Simhastha Mela area in Ujjain. Glob. J. Eng. Sci.
Res. India 6, 400–405 (2019)
8. Roy, P.K., Banerjee, G., Mazumdar, A.: Development and hydraulic analysis of a proposed
drinking water distribution network using WaterGEMS and GIS pollution research, West
Bengal, India. Pollut. Res. 34, 371–379 (2015)
9. Terlumun, U., Robert, E.: Evaluation of municipal water distribution network using
Watercard and Watergems. J. Eng. Sci. 5, 147–156 (2019)
10. Paneria, D.B., Bhatt, B.V.: Analyzing the existing water distribution system of Surat using
Bentley Water GEMS. J. Emerg. Technol. Innov. Res. 4, 19–23 (2017)
11. Case, T., Berhane, T.G.: Optimization of water distribution system using WaterGEMS: the
case of Wukro Town Ethiopia. Civ. Environ. Res. 12, 1–14 (2020)
12. Świtnicka, K., Suchorab, P., Kowalska, B.: The optimisation of a water distribution system
using Bentley WaterGEMS software. ITM Web Conf. 15, 03009 (2017)
13. Ekhande, N.A., Hangargekar, P.A.: Optimization water distribution network by using
software. IJNRD 2, 50–59 (2017)
14. Yuvaraj, S.: Data extraction from google earth for modeling a water supply distribution
network in water GEMS. J. Sci. Eng. Technol. 4, 122–129 (2017)
15. Nwajuaku, I.I., Wakawa, Y.M., Adibeli, O.J., Ijeoma, N.: Analysis of head-loss equations
under EPANET and hardy cross method. Saudi J. Eng. Tech. 2, 125–134 (2017)
Weight of Factors Affecting Sustainable Urban
Agriculture Development (Case Study in Thu
Dau Mot Smart City)
Trung Thanh Dang1,2(&), Quang Minh Vo2, and Thanh Vu Pham2

1
Thu Dau Mot University, Thu Dau Mot, Binh Duong, Vietnam
thanhdt@tdmu.edu.vn
2
Can Tho University, Can Tho, Vietnam
Abstract. Agricultural production in urban areas has a direct role in providing

fresh food sources, reducing transportation costs and loss rates during storage
and transportation. This study was conducted with the following objectives:
survey the current status of agricultural production in Thu Dau Mot city
(Vietnam); identify factors affecting sustainable urban agriculture development,
and evaluate the weight of each factor. Implementation methods include: sec-
ondary survey of information and documents; primary survey of households
through 200 paper questionnaires; aggregate data using excel and apply multi-
criteria evaluation technique MCE to determine the weight of the factors. The
research results have summarized the actual situation of urban agricultural
production in the area. 4 factors have been identified at level 1 with weight
(Weight) respectively: Technology and technique (0.48), Economy (0.24),
Environment (0.16) and Society (0.13). At the same time, four limiting factors to
sustainable urban agriculture development were found from 26 secondary fac-
tors. The global weights of these factors are respectively: Product sales (0.10);
Product preservation and processing (0.09); Solid waste (0.08) and Health and
spirit (0.04). Through identified limiting factors, the study proposes solutions
on: technology and engineering, economy, society and environment. These
findings have practical value in Thu Dau Mot urban development planning in
particular and reference for other similar cities in Vietnam.
Keywords: Factors Sustainability Urban agriculture Thu Dau Mot city

Vietnam
1 Introduction
Urban and peri-urban agriculture is an industry located within (‘intra-urban’) or on the

fringe (‘peri-urban’) of a town, a city, or a metropolis, where it develops and enhances,
processes and distributes the diverse products of agriculture, which derived from both
plants and animals, using human, land and water resources, products, and services
found in and around that urban area [1].
Urban agriculture (UA) is generally accepted by governments, but rarely encour-
aged despite its important contribution to jobs and livelihoods, although this is reported
to be changing [2]. UA in addition to providing food, also includes a multitude of

https://doi.org/10.1007/978-3-030-93247-3_68
708 T. T. Dang et al.
social elements and economic services and activities related to play, tourism, health
care and maintenance [3]. Urban poverty is a big problem and UA is one of the
activities that can play an important role in reducing poverty and enhancing wealth
creation opportunities [4]. UA is often assessed as an appropriate solution to address
shortages in local food systems, and submitters often highlight multiple benefits in
terms of: economic, social, and environmental [5]. However, UA can also have neg-
ative ecological impacts, such as increased nutrient content in wastewater [6]. There-
fore, the sustainability of UA may be uncertain, a bottom-up approach is needed to
assess conditions affecting production and people’s livelihoods [5].
In Vietnam, experts say that it is necessary to have a policy to develop sustainable
UA in order to take advantage of advantages such as: Reduce packaging, storage and
transportation; provide fresh service; create jobs and increase income [7]. In a case
study, based on the theory of UA and UA systems, author Le Van Truong (2012)
selected 7 criteria to identify UA production systems including: including: production
distribution area, subject, purpose of production, product, degree of commercialization,
production technique and technology used, production scale [8].
Binh Duong province is located in the southern key economic region of Vietnam.
In recent years, Binh Duong has been evaluated as a dynamic and modern developing
locality according to the smart city criteria and has been awarded by the World
Intelligent Community Forum for 2 consecutive years (2018 and 2019) recognized as
one of the localities and regions with typical smart city development strategies in the
world [9].
Thu Dau Mot city (TDMC) is the political, economic and social center of Binh
Duong province. The natural area of the city is 118.9 km2, the average population
density: 2,738 people/km2, TDMC has 14 commune-level administrative units [10].
Agricultural production in TDMC makes an important contribution to providing a part
of food demand for city people, creating jobs for workers and developing urban green
areas. From the above assessments, it is necessary to carry out a quantitative study on
the factors affecting the sustainable development of UA and the selected study area is
TDMC.
2.1 Secondary Data Collection Method

In order to collect available documents to serve the implementation of the content of
the research, specifically: scientific articles published in journals, specialized books,
reports on scientific research results, of topics and projects related to this study.
2.2 Primary Data Collection Method

The purpose is to collect factual information about agricultural production in TDMC.
In this study, the household survey was conducted directly using a printed question-
naire, the main information being investigated for the study included:
Weight of Factors Affecting Sustainable Urban Agriculture Development 709
• General information: Name of household owner, Address, Number of people,

Gender, Type of production.
• Technology and engineering factors: Basic construction, Soil farming, Growing on
substrates, Hydroponics or aeroponics, Automatic irrigation technology, Fertiliza-
tion, Environmental sensors, Production management, Product storage and
processing.
• Economic factors: Production investment capital, Initial costs for capital con-
struction, Production costs, Profit, Product sales.
• Social factors: Education level, Labor, Infrastructure, Consultant, Brand, Land use
right, Health and spirit, State support policy.
• Environmental factors: Ecological landscape, Air pollution, Wastewater, Solid
waste.
Number of survey samples: Apply Slovin (1984) sampling method for small
sample size and known population.
n = N/[1 + N (e)2], In there:
n: survey sample size; N: sample size of the population;
e: allowable error (referable error levels are 1%, 5% and 10%).
According to data from the local agricultural management agency, the total number
of agricultural production households in the TDMC area is about: 400 households, so
with e = 5%, according to the above formula, we calculate the number of samples to be
investigated is: 200 samples. Time to survey: from June to December 2020.
Subjects of the Survey: Households and establishments that are engaged in agricultural
production in the area and have at least 3 years of experience or more.
2.3 Data Analysis and Processing Methods

Results of information on factors affecting sustainable agricultural production in
TDMC obtained through surveys are synthesized using Excel software. At the same
time, the Analytic Hierarchy Process (AHP) analysis was applied to Multi Criteria
Evaluation (MCE) by important factors affecting sustainable agricultural production.
AHP application to support a more quantitative basis in strategic planning [11]. This
method has been applied in the following areas: management of forest land [12],
evaluation of strategic marketing plans for tourism development in Sri Lanka [13],
strategic planning of resource management nature [14], and implementation of an
integrated water resource management strategy in Mozambique [15].
The MCE method provides decision makers with varying degrees of importance for
the criteria, which was developed by Saaty (1980, 2001).
In the MCE method, based on the values of the weights (W) to determine the
importance of the factors. The more important the criteria, the higher the weight. W has
a value from 0–1. The importance level is divided into 5 categories as (Table 1).
Table 1. Classification of the importance of factors affecting the sustainable UA of TDMC

No. Weights (W) Type
1 0–0.2 Very low
2 0.2< - 0.4 Low
3 0.4< - 0.6 Medium
4 0.6< - 0.8 High
5 0.8< - 1 Very high
(Source: Thornley J. et al. 2007)
Based on the background information obtained through the surveys, the factors
affecting the development of sustainable UA in TDMC are summarized in Table 2.
Table 2. Level 1 and Level 2 factors affecting sustainable agricultural production in TDMC
No. Level 1 element Level 2 element
1 Technology and engineering Membrane house
2 Farming on the land
3 By means of scaffolding
4 Hydroponics, aeroponics
5 Automatic watering
6 Automatic fertilizer
7 Environmental sensor
8 Product storage and processing
9 Production manager
10 Economic Investment
11 Basic construction cost
12 Production cost
13 Profit
14 Sell products
15 Social Educational level
16 Labor
17 Infrastructure
18 Consultants
19 Trademark
20 Land use rights
21 Health and spirit
22 Supporting policies
23 Environmental Ecological landscape
24 Air pollution
25 Wastewater
26 Solid waste
(Source: compiled from survey results)
The data in Table 2 shows that there are 4 factors at level 1 and 26 factors at level 2
that affect the development of sustainable UA production in TDMC. To evaluate the
influence of factors, it is necessary to determine the weight through hierarchical
analysis of AHP and MCE. The results are presented in Sect. 3.2.
The presented research results include: the current status of agricultural production in
TDMC; factors affecting sustainable UA development and propose solutions to develop
sustainable UA for TDMC.
3.1 Current Status of UA Development in TDMC

The area of agricultural land group of TDMC is 2,937.05 ha, accounting for 24.70% of
the total natural area of the city (Table 3 and Fig. 1).
Table 3. Status of land use in 2020 of TDMC

No. Soil type Land code 2020
Area (ha) Rate (%)
Total area 11,890.58 100.00
1 Agricultural land NNP 2,937.04 24.70
1.1 Land for agriculture production SXN 2,914.32 24.51
1.1.1 Annual crop land CHN 679.66 5.72
1.1.1.1 Land for rice cultivation LUA
1.1.1.2 Land for other annual crops HNK 679.66 5.72
1.1.2 Land for perennial crops CLN 2,234.66 18.79
1.2 Forestryland LNP
1.3 Aquaculture land NTS 11.94 0.10
1.4 Other agricultural land NKH 10.78 0.09
2 Non-agricultural land PNN 8,953.54 75.30
(Source: compiled from [10])
Fig. 1. Structure of land types according to the purpose of use of TDMC

Through the data of Table 3, it shows that the land for perennial crops occupies the
area mainly because it is suitable with the soil characteristics, the topography of the last
area of the low hills (the transition between the plateau and the plains topography)
evaluated under natural conditions. The main crops here are perennial industrial crops
such as rubber and perennial fruit trees.
The current status of agricultural production in TDMC is described by the fol-
lowing fields: cultivation, animal husbandry and aquaculture:
Cultivation: Seed crops have a cultivated area of 31.60 hectares, with an output of
112.70 tons. The remaining annual crops have a planted area of 620.16 ha (Table 4). In
which, mainly vegetables of all kinds 383.20 ha, flower background 2.50 ha; In
addition, other crops are also grown such as: peanuts and beans of all kinds [10].
Perennial Trees: Rubber tree planting area 108.50 hectares, of which the area for
harvesting is 85.20 hectares, the output is 136.70 tons. The area of fruit trees is
241.80 ha, of which mangosteen cultivation occupies the main area, due to its high
economic value and suitable soil and climate characteristics for this tree to grow.
Table 4. Area, yield and output of some annual crops

No. Type tree Area Productivity Output
(ha) (ton/ha/crop) (ton/year)
1 Groundnut 4.40 1.64 21.60
2 Beans of all kinds 8.20 1.95 48.00
3 Fruit vegetables, tubers 86.91 25.00 15,209.30
4 Leafy vegetables of all kinds 272.65 30.00 81,795.00
5 Flowers background of all 2.50 - -
kinds
6 Other annual plants 245.50 - -
Total 620.16
(Source: [10] and investigation)
Livestock: The total herd of cattle (buffalo, cow, pig, goat) currently available in
TDMC is 3,249 heads, accounting for 0.48% of the total cattle herd of the province.
The output of live buffalo meat for slaughter in the province is 8.0 tons, accounting for
1.52% of the province’s total output. Live beef sold for slaughter is 79 tons, accounting
for 2.56% of the province’s output. Live pork for slaughter is 101 tons, accounting for
0.07% of the province’s output [10].
Through the data of Table 5 and the actual survey, it shows that the livestock sector
is not the strength of TDMC, due to the main reasons such as: limited land area
affecting grazing and high cost of labor, wastewater affects the environment.
Table 5. The current situation of livestock and poultry raising in the area of TDMC
No. Total Quantity Compared to the whole Percentage of the
herd (livestock/poultry) province (livestock/poultry) whole province
1 Buffalo 289 5,178 5.58
2 Cow 2,285 25,044 9.12
3 Pig 633 640,984 0.10
4 Goat 42 2,845 1.48
5 Poultry 59,000 11,858,000 0.50
(Source: [10] and investigation)
Aquaculture: Aquaculture water surface area is 11.94 ha, of which aquaculture is

3.10 ha, output 85 tons. The remaining area is aquaculture, combining entertainment
and resort services to serve the needs of urban residents and surrounding areas.
3.2 Factors Affecting the Sustainable Development of UA
Level 1 Element: The level of importance according to the analysis results is the
Technology and engineering factor (W = 0.477) (Table 6 and Fig. 2). The reason may
be that production conditions are limited by the size of the area and labor, so it is
necessary to apply automation technology and techniques to improve production
efficiency. Economy is the second most important factor, because once it has developed
into a manufacturing industry, economic efficiency needs to be focused on generating
income and reinvesting in production. Environment is the third most important factor,
because when the economy and people’s living standard are improving, the environ-
ment is more concerned, and society is the factor with the lowest weight.
Level 2 Element:
Regarding Technology and Technique: The most important factor of level 2 is
preservation and processed products (0.185), the cause is determined that the products
produced in urban areas are mainly vegetables. Fruit should be managed or processed
to reduce the loss rate (Fig. 3a).
Economic: The most concerned factor is the sale of products (0.405), the reason that
farmers are concerned about is the need for a solution on the form of purchasing
agricultural products for production facilities and households, because many Producer
households are not able to bring their own production and cultivation products to sell
(Fig. 3b).
Regarding Society: The health and spiritual factors of the people have the highest
importance (0.343). The results of this assessment have reflected the focus of urban
residents on quality of life, specifically the health and well-being of themselves, their
loved ones, and the community (Fig. 3c).
Regarding the Environment: The limiting factor to sustainable UA development in
TDMC was identified as solid waste (0.534). The reason is that in the production of
Table 6. Level and global weights of factors affecting sustainable UA development TDMC
No. Level 1 Weight Level 2 element Weight Global weight
element level 1 level 2 (W = W1 *
(W1) (W2) W2)
1 Technology 0.477 Membrane house 0.054 0.026
2 and Farming on the 0.048 0.023
engineering land
3 By means of 0.064 0.030
scaffolding
4 Hydroponics, 0.086 0.041
aeroponics
5 Automatic 0.158 0.076
watering
6 Automatic 0.103 0.049
fertilizer
7 Environmental 0.153 0.073
sensor
8 Product storage 0.185 0.088
and processing
9 Production 0.148 0.071
manager
10 Economic 0.238 Investment 0.156 0.037
11 Basic 0.172 0.041
construction cost
12 Production cost 0.127 0.030
13 Profit 0.140 0.033
14 Sell products 0.405 0.096
15 Social 0.127 Educational level 0.017 0.002
16 Labor 0.029 0.004
17 Infrastructure 0.026 0.003
18 Consultants 0.039 0.005
19 Trademark 0.094 0.012
20 Land use rights 0.175 0.022
21 Health and 0.343 0.044
spirit
22 Supporting 0.277 0.035
policies
23 Environmental 0.158 Ecological 0.101 0.016
landscape
24 Air pollution 0.098 0.015
25 Wastewater 0.267 0.042
26 Solid waste 0.534 0.084
Fig. 2. Factors affecting sustainable UA development in TDMC
agricultural by-products and solid waste in large quantities, it affects the environment
and is expensive to treat (Fig. 3d).
Fig. 3. Level 2 factor weighting on Technology and farming techniques (a), Economy (b),
Society (c) and Environment (d)
The data in Table 6 shows that, the global weights of 26 factors from top to bottom
are: product sales, product preservation and processing, solid waste, and health and
morale. This result shows that to develop sustainable UA in TDMC, it is necessary to
well solve the limiting factors that people care about: selling products, preserving and
processing products, and waste and health, spirit. The results of this assessment serve
as the basis for building solutions for sustainable UA development in TDMC (pre-
sented in Sect. 3.3).
3.3 Solutions for Sustainable UA Development in TDMC
Technology and Engineering: The Industrial Revolution 4.0 is opening up many

opportunities for developing countries like Vietnam. Therefore, Vietnam needs to grasp
the trend and inherit the technological achievements of advanced countries to apply
suitable to each locality and agricultural production field. According to the results of
the weight analysis in Sect. 3.2 in the level 1 factor, technology and technique have the
highest weight, and for the level 2 factor, the preservation and processing of products
with large weight should be concentrated. Development Investment. Specific solutions
proposed for sustainable development of UA in TDMC are: Department of Science and
Technology cooperates with Department of Agriculture and Rural Development to
immediately deploy production models applying modern technology and techniques to
transfer and replicate for farmer. Establishing a group of technical consultants of each
production field to help farmer confidently apply technology and techniques in pro-
duction. Disseminate modern and smart production technologies and techniques for
each production field according to local thematic television channels in convenient time
frames for people to watch.
Economic: The weighted level 2 factor is product sales. The specific solution proposed
is: The Department of Industry and Trade survey to capture the demand for agricultural
products that need to be imported and consumed by supermarkets, trade centers,
wholesalers and retailers of food and foodstuffs in the area. Such as: AEon, BigC, Coop
mark, Mega mall, Vinmark, Bach Hoa Xanh, and convenient retail stores in residential
areas. At the same time, acting as a bridge to help contract farmers supply agricultural
products to these business systems. Quickly provide market price information for
agricultural products and agricultural materials on local television and radio channels.
Social Factors: Level 2 factors are weighted as health and mental health. The specific
solution proposed is: select suitable production models for each urban core, suburban
and suburban areas. Choose models that need to use less labor, can take advantage of
part-time workers, people who are over working age. Develop models with landscape
value, create urban green space and entertainment space, reduces trees. Using input
materials that are less harmful to the health of product manufacturers and users.
Environment: Level 2 factor determined by weight is solid waste. The specific solution
proposed is: The Department of Natural Resources and Environment expands the waste
separation program at source. Provide free trash according to classification regulations.
Disseminate measures to recycle and reuse agricultural by-products.
4 Conclusion
The study has identified 4 level 1 factors and found 4 limiting criteria to sustainable UA
development from 26 level 2 factors. From the identified limiting factors, the study
proposes solutions on: technology and engineering, economy, society and environ-
ment. Research results have provided a scientific and practical basis for the planning of
smart urban construction in TDMC to 2030 with a vision to 2050.
Research results have reference value for other similar cities of Vietnam. Research
and development direction: clean and safe agricultural production process in urban
areas; smart agricultural production models in urban areas; closed production and
processing; chain of production and consumption of products.
References
1. World Bank: Urban Agricultural Findings from Four City Case Studies. Urban Development
Series, 80759, vol. 18, p. 88, July 2013
2. De Bon, H., Parrot, L., Moustier, P.: Sustainable urban agriculture in developing countries.
A review. Agron. Sustain. Dev. 30, 21–32 (2010). https://doi.org/10.1051/agro:2008062
3. Butler, L., Moronek, D. (eds.): Urban and Agriculture Communities: Opportunities for
Common Ground, p. 124. Council for Agricultural Science and Technology, Ames (2002).
ISBN 1-887383-20-4 (paper)
4. Carsan, S., Osino, D., Opanga, P., Simons, A.J.: Urban agroforestry products in Kisumu,
Kenya: a rapid market assessment. In: Prain, G., Lee-Smith, D., Karanja, N. (eds.) African
Urban Harvest, pp. 249–266. Springer, New York (2010). ISBN 978-1-4419-6249-2
(hardcover), https://doi.org/10.1007/978-1-4419-6250-8_13
5. Cook, J., Oviatt, K., Main, D.S., Kaur, H., Brett, J.: Re-conceptualizing urban agriculture: an
exploration of farming along the banks of the Yamuna River in Delhi, India. Agric. Hum.
Values 32(2), 265–279 (2014). https://doi.org/10.1007/s10460-014-9545-z
6. Taylor, J.R., Lovell, S.T.: Urban home food gardens in the Global North: research traditions
and future directions. Agric. Hum. Values 31(2), 285–305 (2013). https://doi.org/10.1007/
s10460-013-9475-1
7. Ánh, H.T.N.: The bright spot of urban agriculture development in Hai Phong. Finan. Mag.
(Vie), 107–108 (2016)
8. Trưởng, L.V.: Develop criteria to identify urban agricultural production systems in Thanh
Hoa city. Econ. Dev. Mag. (Vie) 182(II), 68–74 (2012)
9. People's Committee of Binh Duong Province: Project of Binh Duong Smart City in 2020.
Plan No. 09/KH-UBND, p. 18, 22 January 2020
10. Duong, B.: Statistical Office: Statistical Yearbook 2020, p. 546. Statistical Publication
(2020)
11. Gao, C., Peng, D.: Consolidating SWOT analysis with nonhomogeneous uncertain
preference information. Knowl. Based Syst. 24, 796–808 (2011). https://doi.org/10.1016/j.
knosys.2011.03.001
12. Kangas, J., Kurttila, M., Kajanus, M., Kangas, A.: Evaluating the management strategies of a
forestland estate-the SOS approach. J. Environ. Manag. 69, 349–358 (2003). https://doi.org/
10.1016/j.jenvman.2003.09.010
13. Wickramasinghe, V., Takano, S.: Application of combined SWOT and analytic hierarchy
process (AHP) for tourism revival strategic marketing planning: a case of Sri Lanka tourism.
J. Eastern Asia Soc. Transp. Stud. 8, 954–969 (2010). https://doi.org/10.11175/EASTS.8.
954
14. Pesonen, M., Kurttila, M., Kangas, J., Kajanus, M., Heinonen, P.: Assessing the priorities
using A’WOT among resource management strategies at the Finnish forest and park service.
Forest Sci. 47, 534–541 (2001). https://doi.org/10.1093/forestscience/47.4.534
15. Gallego-Ayala, J., Juizo, D.: Strategic implementation of integrated water resources
management in Mozambique: an A’WOT analysis. Phys. Chem. Earth 36, 1103–1111
(2011). https://doi.org/10.1016/j.pce.2011.07.040
Factors Behind the World Crime Index:
Some Parametric Observations Using
DBSCAN and Linear Regression
Shahadat Hossain1(B) , Md. Manzurul Hasan2 , Md. Mahmudur Rahman3 ,

and Mimun Barid4
1
City University, Dhaka, Bangladesh
shahadat.cse@cityuniversity.edu.bd
2
American International University-Bangladesh (AIUB), Dhaka, Bangladesh
manzurul@aiub.edu
3
Bangabandhu Sheikh Mujibur Rahman Aviation and Aerospace University
(BSMRAAU), Dhaka, Bangladesh
mahmud@bsmraau.edu.bd
4
University of South Asia, Dhaka, Bangladesh
Abstract. Escalation of crime rates in any country is the most concern-

ing problem globally. Nevertheless, other components (such as happiness
factors, education index, GDP, and population density) impact the crime
index of such countries in a positive and negative manner. This study
sheds insight into the elements that influence a country’s increasing crime
index through parametric analysis. Moreover, the analysis of the crime
index provides some evidence that these elements are related. First, we
build clusters using a density-based spatial clustering application with
noise (DBSCAN) and discover commonalities among those countries.
Hence, we use linear regression to link other key characteristics with
those countries’ respective crime indexes. Then, we study the trends of
those elements in different countries to see how decreasing happiness
factors effect the crime indexes. As a result, additional relative analyses
reveal some significant undulations within the components underlying
the crime indexes.
Keywords: Crime index · DBSCAN · Linear regression · Data

visualization · Happiness factors
1 Introduction
Increasing the crime index is a global issue that significantly impacts a society’s
quality of life in a negative manner. Although the definition of crime varies by
country, extreme crimes always cause severe dangers to the capacities against
the quality of lives. As the literature has been grown, the variables impacting
a country’s crime index have been expanded to include education, inequality,
unemployment, human development, and urbanization. We aim to investigate
https://doi.org/10.1007/978-3-030-93247-3_69
Factors Behind the World Crime Index 719
the impacts of these socioeconomic factors on the crime index. Unlike the exist-
ing literature, this paper focuses on constructing parametric analyses of the world
crime index using different types of parameters and some socioeconomic variables
that impact the crime index. Though the statistical approach is widely used to
study crime and its undulations, variations, patterns, reasons, prevention, and
avoidance, more parametric analyses on crime are unrevealed. Therefore, para-
metric analyses on the crime index with multiple dimensions have to be revealed.
Some specific attributes from the world happiness report datasets invoke us to
analyze the involvements of happiness factors on the crime index. Multiple fac-
tors are responsible for this happiness measurement, and some factors impact
a country’s crime index. These parametric analyses enlighten those impacts of
other factors on the crime index in a country.
Terrorist attacks are playing a role in the crime index of any country. We
observe various parameters’ impacts over the crime indexes of countries derived
from multiple datasets. Since the eighteenth century, criminologists have exam-
ined regional differences in recorded crime rates. The crime rate of any country
depends of the socioeconomic and social conditions of its people [4]. The people
of any country have a vital role in the country’s factors. Hence, terror attacks are
another dangerous thing for any country. Happiness and the social conditions of
the people of a country impact these types of acts. If the economic standards
of the population are low, then there are some possibilities of creating criminal
activities. This study concentrates on that point where decreasing the happiness
factors impacts the crime rate of any country in a positive manner.
All the observations and findings from the collective datasets are recorded
and investigated in this research paper. Some clusters provide similarities among
the countries where the terror attacks have happened at least once. The influ-
ences of countries’ GDP, perceptions of corruption, education indexes, freedom
to choose, population densities, and life ladder scores are examined through para-
metric analyses. We organize the rest of the content of this article as follows.
Section 2 describes the related works. In Sect. 3, we go into the study’s tech-
niques. Section 4 contains a description of our datasets’ dimensions. In Sect. 5,
we present some of our findings from the datasets. Section 6 describes our results
and findings before concluding in Sect. 7.
2 Related Works
The crime indexes of different countries have been investigated in various con-
texts by researchers. Shabbir et al. [19] examined and analyzed high, moderate,
and minor countries in terms of crime indexes to find out the reasons. The author
used three variables in this study: socioeconomic, demographic, and deterrence
variables to validate the result. The findings of that research shows, GDP growth
rate and urbanization have a significant negative impact on the crime indexes of
countries with moderate scales. Shepley et al. [20] did a review study based on 45
major US English articles to find out the relation between green space and fre-
quency of violence. They found that green space has a mitigating impact on the
violence of urban areas. Ghani et al. [12] conducted a qualitative study to find out
the comparison of urban crimes in Malaysia and Nigeria. The author discussed
720 S. Hossain et al.
four impact factors: unemployment, poverty, bad governance, and weaknesses in

law enforcement which have an influence on crimes in urban areas and show that
poverty is the primary cause of crimes. An empirical study to understand the
relationship between education and crime had been done by Rakshit et al. [17].
They collected data from 33 Indian states during 2001 and 2013 and showed
that a 1% increase in enrollment in education reduces the crime rates by 8%.
Richmond et al. [18] analyzed a dataset of 1.7 million New Zealanders to find
out the relations among health, crimes, and social welfare. They also studied
whether poor health, crime, and social welfare dependency aggregate within the
same individuals. They found that more investment in education and training
could mitigate crime rates, and thus it will increase well-being. Brown et al. [2]
discussed the role of education in reintegrating into society after prison. They
researched the female prisoners who finished their prison lives and tried to rein-
tegrate into society. They suggested that higher education plays a vital role in
females’ successful reintegration into society after prison.
Mario [6] conducted a study focusing on a dataset of 191 countries to find
out the relation of crime and economic equality. The author tried to explore
the socioeconomic reasons that lead to violence and crime. Overall, the results
show that economic inequality is a significant factor that generates more crimes
worldwide. Brown et al. [3] explored a dataset of homicides of Mexican fami-
lies to find out the effect of crime on education and the relation of crime with
economic stability. Authors considered homicide as a measure of violence. They
show that young people are leaving education to join drug-related crimes in
order to tackle economic hardship. As a result, the education rate is decreasing,
and crimes are increasing. Bondy et al. [1] conducted an empirical study to find
out the relationship between air pollution and crime using a one-year dataset of
London city. They used two different identification strategies and found a pos-
itive link between air pollution and crime rates. The analysis also shows that
crimes caused by air pollution have more impact in less wealthy areas. Locher
[14] explored different current surveys of education and crime and tried to find
out a relation between these two. The author conducted this study from an eco-
nomic perspective and found a strong relationship between education and crime,
and also suggested that more social benefits can reduce the crime rate.
Nadai et al. [8] propose a Bayesian model to depict the socioeconomic rela-
tionships with crimes in different cities. The authors analyzed the crimes in small
areas and integrated them with different open-source data. They found a posi-
tive relationship between socioeconomic conditions and crime rates. Chetani et
al. [5] investigated a dataset of South Africa during 1995 and 2016 to find out
the link between poverty and drug-related crime. To evaluate the relationship,
the authors applied the ARDL-ECM approach, which is used to examine the
cointegration relationships of the variables. The result shows that poverty plays
the most vital role in drug-related crime in the long and short term. Santana
et al. [13] conducted research focusing on the link between socioeconomic seg-
regations and crimes in Colombia. To find out that link authors used the Risk
Terrain Modeling (RTM) technique, which is used to find a relationship between
place and crime. Their analysis shows that the most impoverished areas have
more crimes than that of the affluent areas. Chowdhury et al. [4] addressed
the relationship between socioeconomic factors and crimes in different states of
India based on the Indian Govt. consensus data. The authors used the aggregate
crime index function to determine the impact of the socioeconomic environment
on crime. The result shows that more socioeconomic development leads to more
jobs which decrease the crime rates significantly.
3 Methodology
This section outlines our analyses’ methodologies and processes. It also depicts
the data gathering process and other strategies use in this research.
3.1 Data Collection
We collect information from various credible sources. For each country, there
are 13 columns that include various values from different datasets. First, the
overall number of terror attacks is gathered from Kaggle [7], an open-source
data repository. Besides, we get the data for the happiness features from the
same repository [21]. The happiness index can be measured with the following
parameters: household, income, work, communities, civic involvement, educa-
tion, health, life, safety, and the balance of work in general in the dataset. Next,
to obtain the education, crime index, population, and population density statis-
tics, we choose the United Nation Office on Drug and Crime (UNODC), United
Nation(UN), and United Nations Development Programme (UNDP) data repos-
itory [9,15,16]. Finally, we choose 120 countries based on the number of terror
attacks. During this data collection, no null values are being accepted. Then,
we sort, clean, and store the new dataset for our analyses using MS Excel and
Google Sheets.
3.2 Density-Based Spatial Clustering Application with Noise

(DBSCAN)
DBSCAN is a clustering algorithm that makes clusters based on some properties

[10]. First, it uses the density of several points which are too close to each other.
Next, whenever it selects a point to make a cluster, it makes a circular region
based on a radius value eps. Then algorithm checks the min samples which is
specified at the initial step. Eventually, it makes a cluster if that point satisfies
the two properties. Each point is density-connected to every other point, and
every point is density-reachable from some other points in the cluster (in Fig. 1).
Finally, it reduces the outliers as noise points by only considering the high densely
regions points.
Fig. 1. Density-Based Spatial Clustering Application with Noise (DBSCAN)
3.3 Linear Regression

Linear Regression is a modeling process to build up some relationship between
a scalar input which is the dependent variable, and one or multiple independent
variables. If there is one independent variable and one dependent variable, then it
is called simple linear regression, and if it contains more than one variable, then
it is called multiple linear regression [11]. For example, in the following equation,
y is a dependent that varies from i = 1, ......n. X is known as a regressor and
varies from i = 1, ......n. Besides, β indicates the dimensional parameter vector,
and ε is in the equation as an error term.
y = X, β + ε
4 Dimension of the Dataset

As we use a collective dataset for this analysis, we have to maintain the different
sources of the dataset. The first dataset we choose is about the world happiness
report from Kaggle, which contains a 1949 number of data with 11 attributes.
From this dataset, we gathered the happiness traits of countries in the world in
different years. We have to make an average of total years of data to obtain one
value for each attribute. Next, we choose the countries based on terror attacks.
This information is in the dataset titled ‘globalterrorismdb 0718dist’. We take
only the number of terror attacks around the world. After that, we gather each
country’s crime index, education index, and population from UNODC, UNDP,
and UN data repository. Finally, we accumulate all the attributes according to
our requirements in an MSExcel datasheet for further analysis. The final dataset
contains total 120 countries’ 13 attributes’ values. We make a correlation in
the data we collect. In Fig. 2, there is a heatmap that shows the correlations
in the features. We observe noticeable correlation among ‘gdp’, ‘crime index’
and ‘edu index’. We find effective correlation between ‘social support’ and ‘lad-
der score’.
Fig. 2. Heatmap of the features
5 Some Observations over the Dataset
We describe some insights from our dataset through data visualization tools. We
plot the data from our collective dataset and make several map visualizations
(in Fig. 3). Greener regions indicate the highest values, and the reddish region
indicates the lowest features values for the countries in the maps. We choose
the primary and practical features’ data for this plotting. Figure 3a shows the
total populations around the world. We find the highest population in South
Asia’s subcontinents (India and China). From Fig. 3b we find that Norway has
the highest GDP, whereas Uganda has the lowest value in GDP. Most South
Asian countries have a poor education index compared to the rest of the world
(Fig. 3c). We observe that there is some relation between GDP and Education
index around the world. We then plot the number of terrorist attacks for each
country on the map (Fig. 3d).
6 Results and Findings
We have some findings after applying linear regression using sklearn.linear model,
a python library to our dataset. For example, Fig. 4 indicates that the crime index
and education index are negatively interrelated after applying linear regression
by sklearn, a python library. Furthermore, other attributes, GDP, perception of
corruption, and ladder score, are inversely proportional to the crime index. It
(a) Population around the world (b) GDP around the world
(c) Education index around the world (d) Number of terror attacks around
the world
Fig. 3. Different world index parameter visualization.
(a) Linear regression on education (b) Linear regression on GDP Vs crime

index Vs crime index index
Fig. 4. Linear regression model from dataset.
means any parameter of GDP or education index or perception of corruption or

ladder score increases, the crime index of any country decreases.
We use DBSCAN algorithm from sklearn.cluster in python for clustering.
We take the value of ‘min samples’ 6, and the ‘eps’ value is 0.35. After applying
the DBSCAN, we have a 6 cluster from the dataset. Cluster ‘0’ is for the out-
liers. In the clusters, we observe that the characteristics of the countries remain
identical in the same cluster. Most clusters hold a high crime index value (such
as Afghanistan, Argentina, and Bangladesh).
Fig. 5. DBSCAN clusters.
In clusters, we discover that all countries hold the value of corruption percep-
tion in between 0.70 and 0.84. It indicates that all the countries holding higher
perceptions about corruption have similarities in their crime index. Besides, we
find that the ladder scores of the countries vary between 3.59 and 4.75. The life
ladder scores of the countries in the same cluster are not at the highest level. All
the clusters have GDP on average 8.92. Argentina has the highest GDP value
among the clusters of 10.03, while Afghanistan has the lowest GDP value of 7.65.
For example, 6 DBSCAN clusters based on crime index and GDP are shown in
Fig. 5.
After clustering, we plot each country’s attribute value in a polyline chart (in
Fig. 6). We discover a relationship between the attributes of freedom to choose
and the ladder score of each country. We scale down the attribute values to
between 0 and 1 before plotting them. We notice that these two ideals are undu-
lating. The values of these two features oscillate in the same way. We discover
that all countries where terror attacks have occurred have the same freedom to
make decisions and a similar ladder score. The ladder score decreases if every
point of freedom to make a choice decreases.
Fig. 6. Undulation in life ladder score and freedom of make choice index
7 Future Research Directions and Conclusion

Positive and negative correlations exist between numerous happiness measures,
yet neither happiness nor the crime index changes simultaneously. Another
aspect that influences a country’s crime index is the country’s population. This
study analyzes the correlation among the factors that raise the crime index.
By focusing on other related factors, any country can aim to lower its crime
index. These factors have been analyzed, and substantial correlations have been
found among them. As a result of this parametric analysis, these countries now
understand which components need to be enhanced or removed to lower the
crime index. Research is an ongoing process, and no research effort is limited.
In the future, we will examine the changes in parameters and their influences
on the crime index over time. We believe that our study will encourage fellow
researchers to look into crime index increase and other aspects.
References
1. Bondy, M., Roth, S., Sager, L.: Crime is in the air: the contemporaneous rela-
tionship between air pollution and crime. J. Assoc. Environ. Resour. Econ. 7(3),
555–585 (2020)
2. Brown, M., Bloom, B.E.: Women’s desistance from crime: a review of theory and
the role higher education can play. Sociol. Compass 12(5), e12580 (2018)
3. Brown, R., Velásquez, A.: The effect of violent crime on the human capital accu-
mulation of young adults. J. Dev. Econ. 127, 1–12 (2017)
4. Chaudhuri, K., Chowdhury, P., Kumbhakar, S.C.: Crime in India: specification and
estimation of violent crime index. J. Prod. Anal. 43(1), 13–28 (2015)
5. Cheteni, P., Mah, G., Yohane, Y.K.: Drug-related crime and poverty in South
Africa. Cogent Econ. Financ. 6(1), 1534528 (2018)
6. Coccia, M.: Violent crime driven by income inequality between countries. Turkish
Econ. Rev. 5(1), 33–55 (2018)
7. National Consortium for the Study of Terrorism: Global terrorism database.
https://www.kaggle.com/START-UMD/gtd. Accessed 20 July 2021
8. De Nadai, M., Xu, Y., Letouzé, E., González, M.C., Lepri, B.: Socio-economic, built
environment, and mobility conditions associated with crime: a study of multiple
cities. Sci. Rep. 10(1), 1–12 (2020)
9. United Nations Office on Drugs and Crime: Office on drugs and crime. https://
dataunodc.un.org/. Accessed 12 July 2021
10. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering
clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad,
U.M. (eds.) Proceedings of the Second International Conference on Knowledge
Discovery and Data Mining (KDD-96), Portland, Oregon, USA, pp. 226–231. AAAI
Press (1996)
11. Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University
Press, Berkeley (2009)
12. Ghani, Z.A.: A comparative study of urban crime between Malaysia and Nigeria.
J. Urban Manag. 6(1), 19–29 (2017)
13. Giménez-Santana, A., Caplan, J.M., Drawve, G.: Risk terrain modeling and socio-
economic stratification: identifying risky places for violent crime victimization in
Bogotá, Colombia. Eur. J. Crim. Policy Res. 24(4), 417 (2018)
14. Lochner, L.: Education and crime. In: The Economics of Education, pp. 109–117.
Elsevier (2020)
15. The United Nations: Department of economic and social affairs population dynam-
ics. https://population.un.org/wpp/Download/Standard/Population/. Accessed
24 June 2021
16. United Nations Development Programme: Human development report. http://hdr.
undp.org/en/indicators/103706. Accessed 24 June 2021
17. Rakshit, B., Neog, Y.: Does higher educational attainment imply less crime? Evi-
dence from the Indian states. J. Econ. Stud. (2020)
18. Richmond-Rakerd, L.S., et al.: Clustering of health, crime and social-welfare
inequality in 4 million citizens from two nations. Nat. Hum. Behav. 4(3), 255–
264 (2020)
19. Shabbir, S., Ali, Q., Yaseen, M.R.: Crime and labor market: a panel data analysis.
Eur. Online J. Nat. Soc. Sci. 6(3), 343 (2017)
20. Shepley, M., Sachs, N., Sadatsafavi, H., Fournier, C., Peditto, K.: The impact of
green space on violent crime in urban environments: an evidence synthesis. Int. J.
Environ. Res. Public Health 16(24), 5119 (2019)
21. Singh, A.: Education index. https://www.kaggle.com/ajaypalsinghlo/world-happin
ess-report-2021. Accessed 21 June 2021
Object Detection in Foggy Weather
Conditions
Prithwish Sen(B) , Anindita Das , and Nilkanta Sahu
Indian Institute of Information Technology, Guwahati, India

prithwish.sen@iiitg.ac.in
Abstract. In this paper, we addressed the problem of object detec-

tion under foggy weather in an outdoor environment. State-of-the-art
object detection schemes perform very well in normal weather condi-
tions but many of them fail when it comes to adverse weather. We pro-
posed a novel approach to remove fog followed by object detection using
PP-YOLO. Removal of fog is achieved using a U-shaped network with
residual feedback from one layer to another. The network resembles an
encoder-decoder scheme which enables us to reduce noise in the form of
fog. Defogged images are then fed to the object detection network i.e.
PP-YOLO. For the training purpose, we synthesized a dataset of foggy
images and also used other existing datasets. Experimental results show
the efficiency of the proposed scheme when compared to many existing
methods for fog removal. Ablation studies show that the inclusion of the
fog removal method certainly improves the detection performance.
Keywords: PP-YOLO · Encoder-decoder · U-shaped network · Loss ·

Residual block · Fog removal · Object detection
1 Introduction
Computer Vision has experienced exponential growth in the past few years with
the progressive advancement of deep learning. Visual inspection systems based
on different algorithms prevail an application series ranging from intrusion detec-
tion to vehicle detection. In these types of systems, Object detection is a prime
computer vision problem in illegal migration, self-driving vehicles and unmanned
surveillance. A typical object detection aims to localise and identify objects in
a given scenario. Since 2012, with the help of deep learning approaches, object
detection schemes became more and more efficient in terms of accuracy. Most
of these schemes [4,13,15] work well when the scene under consideration is well
illuminated and noise-free but there is still a scope of improvement when we talk
about detection in adverse conditions.
In this paper, we present a novel scheme that can be used to solve the prob-
lem of “Object detection in foggy weather”. The mentioned goal is achieved in
two steps - firstly, fog removal scheme with U-Shaped Network based approach
and secondly, obtaining confidence score and localizing objects using PP-YOLO
object detection model.
https://doi.org/10.1007/978-3-030-93247-3_70
Object Detection in Foggy Weather Conditions 729
2 Related Work
As object detection in normal condition already achieved great success, append-
ing an algorithm to defog the foggy images will enable existing schemes to detect
objects in foggy environment. It is seen that image dehazing does recover the
original content from a hazy or foggy image. In the year 2018, Dongdong Chen,
et al. [6] proposed an end-to-end gated context aggregation network where they
adapted the dilation method to reduce artifacts caused by dilated convolution
for image dehazing and deraining tasks. In 2019 [21], a two-stage method con-
sisting of two properties is proposed. Firstly, in place of down-sampling and
up-sampling, a wavelet UNet was implemented for computing discrete wavelet
transform (DWT) and inverse DWT for edge features extraction. Secondly, con-
volutional layers use chromatic adaptation transform for image enhancing. Just
about a year after, Xu Qin, et al. [16] proposed an end-to-end feature fusion
attention-based network to reconstruct fog-free images. But these methods cer-
tainly do not produce promising results in restoring image details except the
low-level vision task like denoising. In 2020, a new single-image dehazing is pro-
posed by Mingyao Zheng, et al. [23]. Their solution to the problem of dehazing
depends on adaptive structure decomposition integrated multi-exposure image
fusion (PADMEF). In the work [20], they introduced a degradation model and
group-based sparse representation (GSR) method for fog removal. The model
is built on the traditional physical model-dichromatic atmospheric scattering
model. Fahim and his group [7] designed an architecture that uses four feature
extracting methods to extract nonlinear features and also introduce the spatial-
edge loss function to achieve efficient results.
Object detection is also affected by various environmental conditions when
outdoor situations are considered which is in general termed as object detection
in adverse weather conditions. Many methods had been proposed to address
this issue. One such research work done by Nguyen et al. [15] where they inves-
tigated an auto-encoder feature to increase the throughput of existing object
(vehicle) detections in adverse weather conditions. In another work [13], Maan-
paa et al. designed a multimodal end-to-end learning for autonomous vehicles
under adverse weather conditions. Adverse weather condition is a critical prob-
lem where annotated dataset is sparse and it is difficult to acquire due to the
natural weather bias. Thus recently [14], Mirza et al. addressed the issue of
degrading performance using single and dual modality architecture in adverse
weather conditions. The work consists of PointPillars and AVODS for estimating
the performance of single and multi-modal 3D detection. Krivsto et al. [10], in
their work, compared the result of detecting objects in thermal images using dif-
ferent models like Faster R-CNN, SSD, Cascade R-CNN, and YOLOv3. Where
they found YOLOv3 is faster in achieving good performance when experiment-
ing with different datasets. In the work [8], the detection of object in foggy
condition is proposed using a dual-subnet network (DSNet). Baukhriss et al.
[5] aim to detect moving objects in different weather conditions by introducing
an algorithm based on background modelling with Full-Spectrum Light Sources
(FSLS-MOD).
730 P. Sen et al.
3 Proposed Scheme
The proposed scheme consists of two sub-problems as Fog Removal and Object
Detection.
Fog Removal. Fog is removed using the conventional approach of feature reduc-
tion of those features which are considered as noise in the form of fog. A U-shaped
deep neural network that has an encoder-decoder structure is used to remove the
fog. The use of this U-Shaped Network like architecture helps to preserve more
the structural integrity and visual properties of the ground truth which reduces
distortion in the generated result. Due to the limited dataset [2] U-shaped based
architecture performs more efficiently than other conventional networks. This
method is not based on dark channel priority algorithms.
Fig. 1. Fog removal pipeline
The system diagram is presented in Fig. 1. Residual blocks in the bottom, help
to learn [9,17,19,22] the property of haze structures. The left side (Fig. 1) con-
sists of convolutional blocks with 64 and 128 channels respectively from the top.
These convolutional blocks are responsible for feature extraction with kernel size
of 3 × 3. Following convolutional blocks, input features are then downsampled.
The downsampling is just a representation learning. The features extracted from
convolutional blocks are transferred to the representation learning phase where
we encode with 512 and 256 neurons at each level respectively. The encoded
features are fed to 4 residual blocks with skipped connections. The residual
layers represent the transition from encoding to decoding or downsampled to
upsampled and is followed by the deconvolutional layer on the right side that
decodes the residual layer output and reconstructs a new volume for another
round of convolutional operations. Also to minimize the feature loss a transfer
of features from convolutional blocks to deconvolutional blocks is done. Hence,
residual block connecting from start to end helps to better capture the boundary
details with many levels of depths in the scene. Fog can be imagined as a noise.
Thus the encoder-decoder network squeezes out the features discarding the noise
information. The last two blocks are responsible for reconstructing output fea-
ture maps into RGB image. These features are added to the original image and
operated with ReLU activation function to generate a fog-free version. Also for
better results contrast enhancing is done. The decoder learns and regenerates
the lost information of the fog-free image.
The loss function that are responsible for this regeneration are Mean Squared
Error (MSE) loss, Structural Similarity Index (SSIM) loss and Perceptual loss
functions. MSE loss is measured taking the difference between resultant image
and ground truth. Minimizing MSE at pixel level produces optimal value PSNR,
namely LM SE as shown in Eq. 1.
N
3
LM SE = 1/N ˆ j ) − I(yj )||2
||I(y (1)
y=1 j=1
ˆ j ) is the output image, I(yj ) is the ground truth, j is the channel index
Here, I(y
and N is the total number of pixel.
MSE actually estimates the distinguishable errors but to deal with the struc-
tural change, we compute SSIM to model the distinguishable change in the
structure of the image. SSIM defined as Eq. 2,
(2μx μy + C1 ) (2σxy + C2 )
SSIM = . = lx,y .csx,y (2)
(μ2x + μ2y + C1 ) (σx2 + σy2 + C2 )
where x, y are the two images (Ground Truth(GT) and Output), μ is the
mean of image, σ is the standard deviation of image, C1 , C2 are constants to
ensure stability and lx,y , csx,y are luminance comparison function and contrast
comparison function respectively. The SSIM loss LSSIM can be realized from
the Eq. 3.
LSSIM = 1 − SSIM (x, y) (3)

However, the MSE loss or SSIM loss is not essentially a good measure for the
visual effect. To generate a visually meaningful image in addition to MSE and
SSIM, the Perceptual loss function is taken into consideration. It produces a dis-
cernibly meaningful image by generating finer details of the output images using
extracted features. Instead of using pixel-level losses, in this work perceptual loss
components enables the network to achieve the desired result. In VGG16, the
ground truth and the output image is fed where the feature maps were extracted
from each layers to compute the loss defined as perceptual loss Lp as in Eq. 4.

3
1
Lp = ˆ − φi (I)||2
||φi (I) (4)
i=1
Ci , Hi , Wi
ˆ and φi (I) are the feature map of output image and ground
Where φi (I)
truth extracted from each layer of VGG16. Ci , Hi and Wi are the dimensions
732 P. Sen et al.
of the feature map of layer i of VGG16. Gross loss Lgross can be computed by
combining all these three loss components. To provide the balance between the
three loss components, perceptual loss is pre-multiplied with λ and resulted in
the Eq. 5.
Lgross = LM SE + LSSIM + λLp (5)
Object Detection. In this section, the objective of object detection is accom-

plished with PP-YOLO [12]. Works done on object detection till now uses clas-
sifiers to perform detection. But, considering object detection as a regression
problem so as to specify separate bounding boxes and corresponding confidence
scores. A neural network predicts bounding boxes and confidence scores directly
from full images in one evaluation. Since the detection scheme is a single net-
work, it can be improved end-to-end directly on detection throughput. This
model processes images in real-time at 75 frames per second (FPS). PP-YOLO
generates less localization errors and is less likely to predict false positives in
the background. It also learns generalized representations of objects. This detec-
tion algorithm outperforms other object detection schemes, including DPM and
R-CNN while considering artwork. The fog-free images are fed to this object
detection algorithm and the resultant is the detected objects.
The whole experiment is carried out with NVDIA’s 1050 Ti GPU engine.
Datasets used here are ours [18], SOTS outdoor [1] and COCO [11]. In SOTS
outdoor, 13,865 images with 2 classes are present. COCO dataset with 328k
images and 91 classes used for detection. Various results of the proposed scheme
along with their description are given below.
Fog Removal. The proposed fog removal algorithm is tested on many input
images. Among them, some of the dehazed outputs of the inputs with different
amount of fog are given in Fig. 2. Here, GT is the Ground Truth, output 1 is
the corresponding output of Input 1 and output 2 is the corresponding output
of Input 2.
Some comparisons with existing schemes are shown in Fig. 3. Output of dif-
ferent algorithms (Ours, Qin [16], Chen [6], Bianco [3] and Yang [21]) is given
in the figure when images with high density of fog are given as input.
Further PSNR and SSIM values are tabulated for all above mentioned
schemes in the Table 1. Higher the PSNR and SSIM values, higher is the sim-
ilarity between the Synthesized Dehazed Image and the Ground Truth (GT).
PSNR is defined as Eq. 6.
P SN R = 10log10 ((L − 1)2 /M SE) (6)

Here, L is the number of maximum possible intensity levels.
GT Input 1 Output 1 Input 2 Output 2
Fig. 2. Some results of this approach
GT Input Ours Qin[16] Chen[6] Bianco[3] Yang[21]
Fig. 3. Comparisons with different approaches
Table 1. Quantitative PSNR values in dB and SSIM values for different Approaches
trained with SOTs outdoor dataset.
Approaches PSNR SSIM

Ours 30.83 0.923
Yang [21] 24.39 0.901
Bianco [3] 16.22 0.691
Chen [6] 22.73 0.623
Qin [16] 28.22 0.550
Object Detection. Results of object detection can be found in the Fig. 4. The
dehazed Images are input to the object detection network. The detected objects
are shown with bounding boxes. The detection is performed for GT or clear
original images, without fog removal and with fog removal. It can be visually
seen (Fig. 4) that without fog removal, the detection scheme performs poor. With
fog removal, object detection seems to be more likely to be the same as GT object
detection.
734 P. Sen et al.
GT Without Target Image With

& its Detection Fog Removal Fog Removal
Fig. 4. Results of object detection
Object Detection Performance Metrics. To compare performance between

two or more models, we compute Precision and Recall values for each detected
bounding boxes (for object detection with and without fog removal) with the
following Eqs. 7 and 8.
TP
P recision = (7)
TP + FP
TP
Recall = (8)
N umberof ActualGroundT ruthvalues
Finally, plotting (Fig. 5) those precision and recall values and computing the
area under the curve gives us the resultant Average Precision for the class of
objects under consideration. But to determine the performance of the object
detection scheme in comparison to other detection schemes for all objects to
be detected, mean Average Precision has to be computed. Before finding the
Average Precision, interpolated precision is found out with the following Eq. 9.
P recisioninterpolated = max (P recision, Recall) (9)

Also the recall is segmented to 11 equally spaced points from 0 to 1. Thus the
Average Precision for an object is given by the Eq. 10.
1
AverageP recision = P recisioninterpolated (Recall) (10)
11 n=0,0.1..1
Fig. 5. Performance evaluation plot for object detection with (a, b) and without (c, d)
fog removal.
Where TP is the True Positive and FP is the False Positive. Also, the TP and
FP are the Accumulated TP and FP.
To analyse the performance of the object detection scheme 100 sample images
are taken. At first, all False Positives (FP) and True Positives (TP) are calcu-
lated followed by the computation of accumulated TP and FP. Further, precision
and recall values are calculated for each confidence values. Interpolated Preci-
sion is calculated along with 11 point Segmented Recall for further processing,
the corresponding evaluated values can be realised from the Table 2. Also, a plot
for Interpolated Precision vs Segmented Recall is shown in the Fig. 5 for detec-
tion with fog removal and without fog removal. The area under the curve gives
the Average Precision for a particular object. To find the Mean Average Preci-
sion for the detection scheme computing Average Precision for each class along
with confidence thresholds from 0.50 to 0.95 with step size 0.05. The PP-YOLO
Scheme with COCO [11] dataset shows Mean Average Precision (mAP) 45.2 at
72.9 FPS respectively.
At last, the Average precision is found out to 0.98 or 98% for the class Car
with the help of 11 point interpolation method (for objects detected after fog
removal). The average precision for the class Car is found out to be 40.62% when
the object detection algorithm is applied to images without fog removal.
736 P. Sen et al.
Table 2. Segmented recall and interpolated precision of object detection with and
without Fog removal
Segmented recall 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Interpolated precision 1 1 1 1 1 1 0.965 0.966 0.925 0.933 1
with Fog removal
Interpolated precision 0.833 0.916 0.923 0.913 0.884 0 0 0 0 0 0
without Fog removal
5 Conclusion
In this paper, a U-shaped Network based fog removal scheme is proposed. PP-
YOLO object detection is used to show the efficacy of the fog removal scheme.
The comparisons made for fog removal are based on PSNR and SSIM which
evaluates the similarity between the ground truth and the dehazed image. The
proposed fog removal scheme outperforms the other methods. The experimental
result also shows that the efficiency of object detection on foggy images improves
by a significant margin when proposed fog removal is applied before the detec-
tion. A simple classification system to identify foggy and non-foggy scenes can be
appended. Depending on the outcome of the classification, the proposed system
can be employed before feeding the input to the object detection scheme.
References
1. https://www.kaggle.com/wwwwwee/dehaze
2. Bardis, M., et al.: Deep learning with limited data: organ segmentation
performance by u-net. Electronics 9, 1199 (2020). https://doi.org/10.3390/
electronics9081199
3. Bianco, S., Celona, L., Piccoli, F., Schettini, R.: High-resolution single image dehaz-
ing using encoder-decoder architecture. In: Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019)
4. Bijelic, M., Mannan, F., Gruber, T., Ritter, W., Dietmayer, K., Heide, F.: Seeing
through fog without seeing fog: Deep sensor fusion in the absence of labeled training
data. CoRR abs/1902.08913 (2019). http://arxiv.org/abs/1902.08913
5. Boukhriss, R.R., Fendri, E., Hammami, M.: Moving object detection under differ-
ent weather conditions using full-spectrum light sources. Pattern Recognit. Lett.
129, 205–212 (2020)
6. Chen, D., et al.: Gated context aggregation network for image dehazing and
deraining. In: 2019 IEEE Winter Conference on Applications of Computer Vision
(WACV), pp. 1375–1383. IEEE (2019)
7. Fahim, M.A.N.I., Jung, H.Y.: Single image dehazing using end-to-end deep-dehaze
network. Electronics 10, 817 (2021)
8. Huang, S.C., Le, T.H., Jaw, D.W.: DSNet: joint semantic learning for object detec-
tion in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 43,
2623–2633 (2020)
9. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep
convolutional networks. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1646–1654 (2016)
10. Krišto, M., Ivasic-Kos, M., Pobar, M.: Thermal object detection in difficult weather
conditions using yolo. IEEE Access 8, 125459–125476 (2020)
11. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D.,
Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp.
740–755 (2014). Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1 48
12. Long, X., et al.: Pp-yolo: An effective and efficient implementation of object detec-
tor. arXiv preprint arXiv:2007.12099 (2020)
13. Maanpää, J., Taher, J., Manninen, P., Pakola, L., Melekhov, I., Hyyppä, J.: Mul-
timodal end-to-end learning for autonomous steering in adverse road and weather
conditions. arXiv preprint arXiv:2010.14924 (2020)
14. Mirza, M.J., et al.: Robustness of object detectors in degrading weather conditions.
15. Nguyen, V., Tran, D., Tran, M., Nguyen, N., Nguyen, V.: Robust vehicle detection
under adverse weather conditions using auto-encoder feature. Int. J. Mach. Learn.
Comput. 10(4), 549–555 (2020)
16. Qin, X., Wang, Z., Bai, Y., Xie, X., Jia, H.: FFA-Net: feature fusion attention
network for single image dehazing. In: Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 34, pp. 11908–11915 (2020)
17. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object
detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.
39(6), 1137–1149 (2016)
18. Sen, P., Das, A., Sahu, N.: Rendering scenes for simulating adverse weather condi-
tions. In: Rojas, I., Joya, G., Catalá, A. (eds.) IWANN 2021. LNCS, vol. 12861, pp.
347–358. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85030-2 29
19. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet
and the impact of residual connections on learning. In: Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 31 (2017)
20. Wang, X., Zhang, X., Zhu, H., Wang, Q., Ning, C.: An effective algorithm for single
image fog removal. Mob. Netw. Appl. 26, 1250–1258 (2019)
21. Yang, H.H., Fu, Y.: Wavelet U-Net and the chromatic adaptation transform for sin-
gle image dehazing. In: 2019 IEEE International Conference on Image Processing
(ICIP), pp. 2736–2740. IEEE (2019)
22. Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser:
residual learning of deep CNN for image denoising. IEEE Trans. Image Process.
26(7), 3142–3155 (2017)
23. Zheng, M., Qi, G., Zhu, Z., Li, Y., Wei, H., Liu, Y.: Image dehazing by an artificial
image fusion method based on adaptive structure decomposition. IEEE Sens. J.
20(14), 8062–8072 (2020). https://doi.org/10.1109/JSEN.2020.2981719
Analysis and Evaluation of TripAdvisor Data:
A Case of Pokhara, Nepal
Tan Wenan1,2, Deepanjal Shrestha1(&), Bijay Gaudel1,

Neesha Rajkarnikar3, and Seung Ryul Jeong4
1
School of Computer Science and Technology, Nanjing University of
Aeronautics and Astronautics, Nanjing, China
deepanjal@hotmail.com
2
College of Computer and Information Engineering, Shanghai Polytechnic
University, Shanghai, China
3
School of Energy Science and Engineering, Nanjing Tech University Pukao,
Nanjing, China
4
Graduate School of Business IT, Kookmin University, Seoul, South Korea
srjeong@kookmin.ac.kr
Abstract. TripAdvisor is a famous social and travel site that helps people to
arrange for different products and services in a destination. This work is an
attempt to analyze the TripAdvisor data for the years 2018 to 2020 to find out
interesting features, reviews, and aspects embedded in it. The work employs an
unsupervised machine learning approach to discover clusters in data and cor-
relate them with other important attributes like rating, reviews, ranking, and
positions, etc., to find useful information about Pokhara city. Further, the work
tries to identify popular places, activities, food, sports, and destinations in
Pokhara based on TripAdvisor data. The reviews obtained from the user are
tested to discover the sentiments of the visitors using TextBlob API for senti-
mental analysis. A geo-plot of the locations is used to get insight into the
tourism hot spots in and around the city. This work gives knowledge about the
brand image of Pokhara as a tourist destination and the different sets of activities
a tourist gets involved in, in the city. This work is of great importance for the
Nepalese tourism industry and it can greatly benefit the business houses, tourism
organizations, and governing bodies to understand the tourist perspective on
Pokhara as a Tourism destination.
Keywords: TripAdvisor kNN Tourism clusters Sentiment analysis

Pokhara Nepal
1 Introduction
Travel sites and social media form a huge source of data for travelers around the world.
The content in these websites is rich, crowdsourced, and based on human inputs
directly from different electronic mediums [1]. The content of these websites includes
opinions, ratings, travel histories, blogs, and sharing of user experiences, etc. in dif-
ferent forms [1]. Moreover, in the current context, the governing bodies, business
houses, small and medium-sized enterprises as well as all the important tourism

https://doi.org/10.1007/978-3-030-93247-3_71
Analysis and Evaluation of TripAdvisor Data 739
business entities use social media and travel sites as a primary source to disseminate
and collect information [2]. The role of these sites has become more and more
important with time as they play a very crucial role in framing an image of a destination
and providing a competitive advantage [2]. TripAdvisor is an important tourism
website that is used by tourist who visits Nepal as one of the top sources of information
and arrangement for destination management. According to Alex.com Tripadvisor is
the first choice for travelers to look for tourism-related information in Nepal [3]. It
provides immense information on hotels, restaurants, things to do, shopping, etc. and
the review and blog sections provide a detailed impression and suggestions about the
place. There is no such work that is done earlier on the data stored in TripAdvisor to
discover meaningful insights about Pokhara as a tourism destination of Nepal [3]. This
work explores the TripAdvisor data of Pokhara city to understand the tourism scenario
and discover important and meaningful information clusters to utilize them for pro-
motion and growth of tourism business.
2 Background and Literature Review
Tourism is an important industry of Nepal which contributed US$0.8bn in 2018,

representing 3.6% of GDP with 945,000 direct and indirect jobs [4]. Tourism is one of
the most prioritized sectors and has seen an upward growth in terms of tourist arrivals
in the business before the COVID pandemic scenario. Figure 1 represents the data of
tourist arrival in Nepal from 2010 to 2019 and is seen that the trend is in the upward
direction. Further, almost all age groups have chosen Nepal as their tourism destination
31–45 as the most prominent group. This indicates that Nepal is a good choice as a
tourism destination for young masses followed by 46–60 as the next age group as
shown in Fig. 2 [5].
Fig. 1. Tourist arrival in Nepal. [5]

740 T. Wenan et al.
Fig. 2. Tourist arrival based on age group. [5]
Holiday pleasure, trekking and mountaineering, and pilgrimage are considered as

the top tourism activities in Nepal as shown by data in Fig. 3. Tourists are involved in
different activities on their course of the journey in Nepal which includes trekking,
hiking, adventure sports, religious sites, entertainment. shopping, excursion, etc. [6, 7].
Adventure tourism is a prominent category of tourism activity in Nepal and most of the
activities are concentrated in and around Pokhara due to its geographical location and
as the tourism capital of Nepal [7]. Figure 4 represents the data of paragliding, an air
sports activity in Pokhara and it depicts that it is a popular activity with both domestic
and inbound tourists. Similarly, many other major micro activities are prevalent in
Nepal and especially in Pokhara [5].
Purpose of Visit
Total
Others
Conv./ Conf.
Official
Pilgrimage
Business
Trekking &Mountaineering
Holiday Pleasure
0 2,00,000 4,00,000 6,00,000 8,00,000 10,00,000 12,00,000 14,00,000
2019 2018 2017 2016
Fig. 3. Tourist purpose of visit in Nepal [5].

Paragliding Statistics for 2018 and 2019

35000
30000
25000
20000
15000
10000
5000
0
Nepali Foreigner Total Nepali Foreigner Total
2018 2019
Fig. 4. Tourist participation in Air sports activity in Pokhara [5].
The role of digital technologies is very important in the current tourism business
scenario [7]. The studies conducted by different scholars in Nepal concerning Infor-
mation and Communication Technology have shown that it is an important driver of
the tourism business in Nepal. Tourist visiting Nepal looks for various forms of
information on websites, social sites, and travel sites before they plan to visit the
country [8]. Studies have shown that people look for authentic and verified information
on accommodation, food, health and hygiene, and tourism spots before they come to
Nepal [9]. The business houses, governing bodies, transportation industries, and all
other stakeholders of the tourism business in Nepal are nowadays aware to have a
website, emails, and advertise products and services on social sites and travel sites [9].
TripAdvisor is one of the oldest and most frequently used websites used by tourists
when they plan to visit Nepal and arrange for tourism products and services. A study
conducted on tourism website concluded that travel sites are the first choice for visitors
to look for information and Tripadvisor is the most popular [9]. Besides travel sites,
there are hundreds of other electronic mediums that are used by tourists when planning,
arranging, and booking tourism destinations [10]. Websites and Social sites have
become the storehouse of information and data. The data collected on these websites,
travel logs, and social sites have valuable information that is hidden in it [11]. To, date
no studies in Nepal are conducted on these huge data sources to discover information
clusters, patterns and extract useful information. This work explores the TripAdvisor
data to analyze and evaluate the information clusters, patterns, and other related
knowledge sources hidden in it.
3 Research Framework
This work uses exploratory data analysis methodology on the data available from
TripAdvisor, scrapped using open API, and Maxcopell TripAdvisor scraper API for
Pokhara city of Nepal. The Maxcopell TripAdvisor scraper API [12] is suitable for
742 T. Wenan et al.
scraping TripAdvisor reviews, emails, addresses, awards, and many other attributes of
hotels and restaurants from the website. The API allows the user to either enter the
location and download the dataset or send an asynchronous request to the actor end-
point and crawl all the information about a single place. A variety of data extraction
attributes can be set for a single place or default settings can be used to download data
for a complete location, including essential information attributes such as email, phone,
price, and reviews, etc. Data can be downloaded in various formats, such as JSON,
CSV, XML, and others [12]. In this work, after the data is downloaded, it is prepro-
cessed to select essential variables as the downloaded data had 26 20 = 520 attri-
butes, with the majority of them with missing or incomplete values. The final data was
divided into two sets, where the first part consisted of text (Including titles, reviews,
and review text) and the second part consisted of selected variables and data for
analysis, evaluation, and interpretation as shown in Fig. 5. The data was converted into
CSV format to further program and analyze it using Python 3.6.
SenƟment Analysis
Data Variable Data Pre-
collecƟon SelecƟon processing Word Cloud
API kMeans++ based

clustering
Title and
Review Cluter based
Prepare correlaƟon analysis
text
Standard Enrich
Scrapper Module Data Data Discovery of
Selected
(maxcopell tripadvisor scraper API) Repository InformaƟon and
Data for
Analysis Knowledge
Fig. 5. Representing the research framework of the study.
The original data consisted of 4323 data sets with of 50 attributes that included
reviews, rating, ranking for each year, classification type, classification subtype, geo
coordinates, certificate of excellence, userID, review language, country, region, etc. as
shown in Table 1.
Table 1. Data samples of TripAdvisor data before and after processing

S. Data Attributes Total data Remarks
N instances items
1 4323 50 216150 Uncleaned raw data
2 1324 50 66200 Processed cleaned data items, removed
duplicates and missing values
3 1324 22 29128 Processed, enriched data with attributes
4 1324 28 37072 Text, titles, and review text data
The ratings were combined to make an average rating and the attributes like
country, region, sub-region which looked unimportant were removed. The review text
was preprocessed and cleaned to include attributes like name, rating, title, and text. The
two data sets finally consisted of 29128 and 37072 data items which were further
analyzed and evaluated.
4 Data Analysis, Evaluation, and Interpretation

4.1 Identifying the Clusters and Silhouette Score
This work uses the kMeans++ algorithm as shown in Algorithm 1 to identify clusters
for the group of data provided. As kMeans++ is an improved K-means clustering
algorithm in the view that it specifies a procedure to initialize the cluster centers before
proceeding with the standard k-means to optimization iterations [13]. The initial
environment had data columns normalized with 100 re-runs and a maximum of 500
iterations set for the execution of data set.
Algorithm 1. The basic clustering algorithm for the TripAdvisor data
:= argmin II - II2
The processing formed 7 centroids instances with 7 clusters, 83 variables, with 69

numeric features, and 14 Metadata consisting of 2 categorical, 1 numeric and11 string
types. Figure 6 depicts seven distinct clusters concerning the two major destination
types, hotels, and attractions.
It can be seen that clusters C2 and C3 are distinctly seen on the attraction side,
while cluster C4 is seen for hotel type and clusters C5 and C7 have two centroids each
forming cluster on both attractions and hotel side. To, understand the formation of
clusters and identify how well objects have been classified by the K-means clustering,
silhouette values were calculated for each cluster. The silhouette score helped in the
interpretation and validation of consistency within clusters of data [14]. This technique
gave an idea of how well each object has been classified in a cluster. The values of the
score range from −1 to 1, where 0 signifies overlapping clusters with samples very
close to the decision boundary of the neighboring clusters, −1 or negative score sig-
nifies that samples are assigned to the wrong cluster and values close to 1 signify that
744 T. Wenan et al.
Fig. 6. Representing clusters (Y-axis) vs silhouette (X-axis)
clusters are dense and classified well [14]. The silhouette plot for the clusters for the
TripAdvisor data shown in Fig. 7 depicts that all data samples are marked well and are
classified well as the average silhouette score is seen above 0.58 and reaches up to 0.69.
Cluster Frequency vs Average Silhouette Score

500 0.7
400 0.65
300
0.6
200
100 0.55
0 0.5
C1 C2 C3 C4 C5 C6 C7
Frequency Average Silhouette
Fig. 7. Representing average silhouette score for cluster with values and frequency.
The further analysis of data based on sub-type category and silhouette score also
depict that subtypes like hotels (C6, C3), hiking (C2, C4), trekking (C5), Spas (C5,
C2), air sports including parasailing and paragliding (C5, C4), yoga and pilates (C5,
C4) and religious sites cluster are above 0.6 and are well marked. This also depicts that
these categories are the most popular tourism sub-type in Pokhara, Nepal as shown in
Fig. 8.
Fig. 8. Representing subtype (Y-Axis) vs silhouette score (X-Axis)
4.2 Data Interpretation and Visualization

The study of sub-type in comparison to average rating shows that 90% of the product
and services in Pokhara are rated above 3 on an average. Trekking is seen as a
prominent subtype and has a rating of 3 to 5. Similarly, Yoga, Multiday tours, and
some non-star hotels have an average rating in the range of 2.5 to 4. The star hotels,
paragliding, the sacred and religious site alone with some individual categories have a
high rating as shown in Fig. 9. Further, the visualization of Tripadvisor data clusters
using PCA also provides a useful insight for clusters concerning three related attributes,
silhouette, average rating, and ranking position for hotels and attractions as shown in
Fig. 10.
The data of the text review and titles ripped from TripAdvisor are also plotted in a
word cloud to visualize the hot words and occurrences in user reviews. It can be seen
that ‘Pokhara’, ‘best’, ‘good’, ‘place’, ‘trek’, ‘trekking’, ‘experience’, ‘amazing’,
‘yoga’, ‘hotel’, ‘poon hill’, ‘lake’, ‘paragliding’, ‘massage’, ‘excellent’, ‘service’,
‘autumn’, ‘beautiful’, etc. are some of the frequent and popular words that are men-
tioned by the visitors in Pokhara. Almost all important tourist categories, places, hotels,
attractions are seen in the word cloud depicting the popularity and tourism activities in
this place as shown in Fig. 11.
The geo-coordinate plotting shows the physical location of popular and most used
tourism destinations. It can be seen in both Figs. 12 and 13 that all the major locations
and activities are in and around the most popular destination of Pokhara, the Lakeside.
Fewa lake is seen as a popular tourism destination and it also shows that the majority of
the tourism business exists here. Further, we can also see that the users are scattered all
around Pokhara city through their geo-locations. This information helps us to under-
stand the spread of tourists in Pokhara city. The geo-plots depict that the whole city is a
746 T. Wenan et al.
Fig. 9. Representing subtype (Y-Axis) vs average rating (X-Axis)
Fig. 10. Visualization of TripAdvisor data clusters based on PCA
Fig. 11. Visualization of word cloud based on user reviews and text
Fig. 12. Geo-coordinate plotting from the TripAdvisor data set. (Print view)
Fig. 13. Geo-coordinate plotting from the TripAdvisor data set. (Map view)
viable tourism destination and tourists find a lot of tourism-related activities in and
around the city.
4.3 Sentimental Analysis

Sentiment analysis is a natural language processing technique that helps to identify the
emotions hidden in a text for a particular subject under consideration [14]. It is an
important tool for business houses, governing bodies, film critics, or for social analytics
to discover the hidden emotions in terms of positive, negative, or neutral emotions for
them. In this work, sentiment analysis is performed using TextBlob for Python 3.6, a
Python library that provides a simple API for processing textual data. The data before
feeding to TextBlob is preprocessed to separate English text, words, and sentences
748 T. Wenan et al.
from other languages, remove the stop, regexp, and some customized words to make
the text clean. It is then processed with TextBlob to categorize sentiments in 7 scales as
shown in Fig. 14. It can be seen that reviews have strongly positive sentiments with
58.6%, positive with 37.93%, weakly positive with 1.38%, and weakly negative with
2.09%. The sentiment analysis data depicts that tourists have a strongly positive to
positive outlook for Pokhara as a tourism destination.
Fig. 14. Representing sentiment analysis of TripAdvisor review text
5 Conclusion
The study concludes that the people visiting Pokhara have well-defined tourism
interests and have a positive outlook for the city. It can be seen that the kNN clustering
forms well-defined clusters and has a good silhouette score confirming the validity and
consistency of the formed clusters. The linear projection using PCA shows that clusters
are dense and concentrated for high values of rating, silhouette, and ranking confirming
the good standards of tourism services. Further, the cluster plots reveal that tourist
attractions and hotels are rated as good to best by the tourist. This reveals that the hotel
services are of good quality and tourist attractions also have a good impression on the
tourist. Sporting activities, hiking, trekking, religious places, and health and beauty
services are some of the most popular tourism activities. This is represented in both the
cluster plots and word cloud visualizations. The word cloud represents distinct and
well-marked words and phrases that give an idea of tourism activities, destinations, and
events. The overall sentiments about the city are also seen positive to highly positive
from the user reviews. Thus, it can be concluded that analysis of TripAdvisor data
confirms Pokhara is a rich and popular tourist destination in Nepal.
Acknowledgement. The paper is supported in part by the National Natural Science Foundation
of China under Grant (No. 61672022 and No. U1904186), and the Collaborative Innovation
Platform of Electronic Information Master under Grant No. A10GY21F015 of Shanghai
Polytechnic University.
References
1. Devkota, B., Miyazaki, H., Witayangkurn, A., Kim, S.M.: Using volunteered geographic
information and nighttime light remote sensing data to identify tourism areas of interest.
Sustainability 11, 4718 (2019). https://doi.org/10.3390/su11174718
2. Editorial, Digital Economy Report 2019, United Nations Publications, ISBN 978-92-1-
112955-7, eISBN 978-92-1-004216-1. United Nations Publications, New York (2019).
https://unctad.org/system/files/official-document/der2019_en.pdf
3. Mahatara, TB.: Tourism industry in Nepal: Make it backbone of economy, The Himalayan
Times, 6 September 2019. https://thehimalayantimes.com/opinion/tourism-industry-in-
nepal-make-it-backbone-of-economy
4. Shrestha, D., Wenan, T., Khadka, A., Jeong, S.R.: Digital tourism security system for Nepal.
KSII Trans. Internet Inform. Syst. 14(11), 4331–4354 (2020). https://doi.org/10.3837/tiis.
2020.11.005
5. Ministry of Culture: Tourism & Civil Aviation, “Nepal Tourism Statistics 2019”
Government of Nepal, Ministry of Culture, Tourism & Civil Aviation, Singhadurbar,
Kathmandu. Singha Durbar, Kathmandu (2020). www.tourism.gov.np
6. Wenan, T., Shrestha, D., Rajkarnikar, N., Adhikari, B., Jeong, S.R.: Digital reference model
system for religious tourism and its safeties. In: 2020 IEEE 7th International Conference on
Engineering Technologies and Applied Sciences (2020). https://doi.org/10.1109/
ICETAS51660.2020.9484189
7. Shrestha, D., Wenan, T., Gaudel, B., Maharjan, S., Jeong, S.R.: An exploratory study on the
role of ICT tools and technologies in tourism industry of Nepal. In: Raj, J.S. (ed.) ICMCSI
2020. EICC, pp. 93–110. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-49795-
8_9
8. Tripadvisor, Explore Pokhara (2020). https://www.tripadvisor.com/Tourism-g293891-
Pokhara_Gandaki_Zone_Western_Region-Vacations.html. Accessed 20 Oct 2020
9. Shrestha, D., Wenan, T., Rajkarnikar, N., Shrestha, D., Jeong, S.R.: Analysis and design
recommendations for nepal tourism website based on user perspective. In: Smys, S.,
Palanisamy, R., Rocha, Á., Beligiannis, G.N. (eds.) Computer Networks and Inventive
Communication Technologies. LNDECT, vol. 58, pp. 29–48. Springer, Singapore (2021).
https://doi.org/10.1007/978-981-15-9647-6_3
10. The World Travel & Tourism Council: WTTC Travel & Tourism Economic Impact, 2018
Nepal, p. 2018. The World Travel & Tourism Council, London (2018)
11. Liu, X., Mehraliyev, F., Liu, C., Schuckert, M.: The roles of social media in tourists’ choices
of travel components. Sage J. 20(1), 27–48 (2019). https://doi.org/10.1177/
1468797619873107
12. Copelli, M.: Tripadvisor Scrape, August 2021. https://github.com/maxCopell/tripadvisor-
scraper
13. Nielsen, F., Nock, R.: Total Jensen divergences: definition, properties and k-Means++
Clustering. arXiv:1309.7109, Bibcode: 2013 arXiv1309.7109N (2013)
750 T. Wenan et al.
14. Bhardwaj, A.: Silhouette coefficient validating clustering techniques, Published in Medium,
26 May 2020. https://towardsdatascience.com/silhouette-coefficient-validating-clustering-
techniques-e976bb81d10c
15. Norambuena, K.B., Lettura, E.F., Villegas, C.M.: Sentiment analysis and opinion mining
applied to scientific paper reviews. Intell. Data Anal. 23, 191–214 (2019). https://doi.org/10.
3233/IDA-173807
Simulation of the Heat and Mass Transfer
Occurring During Convective Drying
of Mango Slices
Ripa Muhury1(&), Ferdusee Akter1,2, and Ujjwal Kumar Deb1

1
Department of Mathematics, Chittagong University of Engineering
and Technology, Chattogram, Bangladesh
imferdusee@cvasu.ac.bd, ukdebmath@cuet.ac.bd
2
Department of Physical and Mathematical Sciences, Chattogram Veterinary
and Animal Sciences University, Chattogram, Bangladesh
Abstract. Taste and flavor make mangoes one of the world’s most desired
fruits. For its perishability, mangoes have a short shelf-life, resulting in economic
losses for the farmers. To extend shelf-life, dehydration is an ancient and
extensively used method. This study describes the modeling and simulation of
the drying process of mango slices. This simulation considered an environment
where hot and dry air flows around a quarter elliptical-shaped mango slice with a
height of 8 mm and a diameter of 30 mm. The heat and mass are transferred
simultaneously with air (1 m/s and 60 °C) are developed by combining mass and
energy balances and the moisture diffusivity is considered as 8:5 1010 m2 s1 .
The simulation was run for 12 h and observed moisture concentration decreases
exponentially with rising temperature. The moisture had been lost rapidly at the
surface concentration losing 32% moisture in the first hour as well as, the rate was
being reduced gradually.
Keywords: Dehydration Moisture concentration Heat and mass transfer

Self life FEM Simulation
1 Introduction
The basic process of dehydration is to lessen the water from the product so that the
microorganisms like yeast and bacteria can’t affect the product. From the pre-historic
time, drying is one of the most convenient methods of preservation [1]. The application
of heat and mass transfer for removing the moisture content of fruits and vegetables
within the suitably controlled condition by the evaporation process is known as con-
vective drying [2]. This mass transfer procedure consists of removing water or another
solvent from a solid, semi-solid, or liquid through the process of evaporation. By using
this method, we can control and inhibit the growth of bacteria, yeasts, and mold
* Please note that the AISC Editorial assumes that all authors have used the western naming
convention, with given names preceding surnames. This determines the structure of the names in the
running heads and the author index.

https://doi.org/10.1007/978-3-030-93247-3_72
752 R. Muhury et al.
through the removal of water and eventually contributes to getting tasty, nutritious, and
healthy food. These types of food are easy to store and can be preserved for a long time.
Dried fruits and vegetables are found as a prime sector in the food ingredient market
[3]. Nowadays, due to inadequate storage facilities and a lack of proper processing, 1.3
billion tons of food are being wasted every year since its low shelf-life [4]. So, we need
to reduce wastage and preserve these foods for a long time by increasing their shelf-life
and making them available for off-season consumption. There are various kinds of
drying methods like sun drying, freeze-drying, solar drying, cabinet air drying, spray
drying, microwave drying, etc. Although sun-drying is the quickest and most conve-
nient method, we have no control over it for its climate dependency. Microwave drying
is another drying process that is very suitable for high-quality food. Solar drying is a
refined version of sun drying. The spray drying process is satisfactory for making
powder and juice products. One of the most expensive processes is freeze-drying; it
provides us with excellent quality products. Cabinet/Tray dryers are used for drying
small food to medium to large (2000 to 20 000 kg) solid food batches. Various
methods and many theories have been proposed to investigate different drying pro-
cesses. Currently, the development of technologies that allow for increasing produc-
tivity while using the least amount of energy is beneficial to a wide variety of
agricultural producers [5]. Kumar C, Karim A along his group replicated a paired heat
and mass transfer for expanding a mathematical model, which can foresee the tem-
perature & moisture dispensation inside the fruit during advective drying [6]. Managuli
and his study team investigated heat and mass transport, shrinkage mal-formation, and
stress distribution in Brinjal to come up with their findings. In those findings, they
found that in Brinjal the distortion counts on temperature, air velocity, and moisture
content [2]. Yuan and his team launched a mathematical model for apple slices in 2019
which combined heat-mass transport with solid mechanics [7].
Innumerable varieties of Mangoes are grown widely all over Bangladesh and it is
called the king of fruits. Mango is considered one of the highest nutritious fruits and is
famous for its taste and flavor. Due to its reasonably affordable purchasing price,
individuals from all walks of life may readily consume mangoes. Mango is essential and
mostly consumed fruit in many developing countries like Bangladesh. But about 25% of
the mangoes produced in our country are being wasted after being collected from the
fields. Mainly these mangoes are being wasted during transport and storage and the total
loss is around 3,600 crore taka per year [8]. Many people make a living by producing
and selling mangoes because there is such a high demand for mangoes. As a result of the
perishable nature of mango, physiological changes take place, and mangoes decompose
in large quantities causing extreme losses to farmers and traders. Formalin is widely
used as a food preservative in countries like Bangladesh. The human body and the
ecology are, however, severely harmed by formalin [9, 10]. A recent UC Berkeley
health study has shown that this sort of food adulteration can induce even neurological
conditions such as Parkinson’s disease which affect the gate cycle [11]. In that case,
drying plays a prime role in extending the shelf-life of mango and other fruits by
reducing water and also making them available for off-season consumption.
Mathematical modeling is one of the most convenient tools for simulating and
evaluating the performance of the dry process. It manages physical results for the whole
scene through a virtual lab that will be very high cost or conceptually complex and
Simulation of the Heat and Mass Transfer 753
time-consuming to execute [12]. Dehydrated foods are self-stable (safe to store at room
temperature) when adequately dried and stored properly. The method of preserving dry
food is easy to do, very safe, and can be used for most kinds of food (meat, fruits, and
vegetables). The purpose of this study is to investigate the distribution of temperature
and moisture in the food drying process and to develop a basic Multiphysics-based
model for obtaining superior dried food with great taste, aroma, and proper nutrition.
Modeling is required to understand the process, optimize the drying process and
enhance the efficiency of the process and the quality of the product [13]. To optimize
the drying process and develop better control strategies, the model must be selected to
function properly, safely storage by analyzing temperature and relative humidity dis-
tribution. We will execute the simulation of our work by using COMSOL Multi-
physics. This model can be quickly modified to accommodate advanced physics
concepts, and it can also be easily replicated for a variety of forms without much
difficulty, as well.
2 Methodology
The following Fig. 1. Presents simultaneous heat transfer and mass transfer of food
drying [6].
Air flow Energy in from air Energy,moisture

converted away
Heat flux Mass flux
Food
Fig. 1. Schematic diagram of heat and mass transfer during drying of food material.
2.1 Governing Equations

Two different transport mechanisms occur concurrently with the drying of the food
products, which includes heat transfer from the drying medium (or heat source) to the
foodstuffs and the water transportation from the inside of the solid product to the
surface from which a carrier gas eventually transports the water [12]. If the com-
pression effect is considered, this shifting phenomenon may simultaneously be influ-
enced by structural changes in the mango sample. The planned aspect of the above-
mentioned events is shown in the following Fig. 2, [14].
Heat
convection Water evaporation
Axial Radial shrinkage
Shrinkage
Heat
convection
Water diffusion
Heat conduction
Water diffusion
Fig. 2. Schematic representation of simultaneous heat and mass transfer while considering radial
and axial shrinking during the evaporation process.
Here both heat and mass transfer equation is considered for this research;
Heat Transfer equation: The general form of the heat transfer equation is as follows:
@T
qcp þ r ðkrTÞ þ qcp u rT ¼ Q ð1Þ
@t
Mass transfer equation: Fick’s law is used for the expression of mass transfer as
follow:
@c
þ r ðDrc Þ þ u rc ¼ R ð2Þ
@t
Where T is the temperature, cp is the specific heat, q is the density of the food,k is
the thermal conductivity, c is the moisture concentration and D is the effective diffusion
coefficient.
For developing the model, the following assumptions were made:
(1) The movement of moisture and heat transfer is considered as one-dimensional.
(2) No chemical reaction occurs during drying.
(3) Negligible heat generates in the system (i.e., Q = 0) and R = 0.
(4) The domain is considered as 2D axis-symmetric while modeling. And a quarter of
the sample was considered.
(5) The mango slice was considered as an ellipsoid slab and Mass transfer was
considered only on the top and side surfaces.

Depending on the assumptions the following initial and boundary conditions are
applied for both heat and mass transfer,
The initial conditions are,
T ¼ 28 c and c ¼ c ð3Þ
The heat transfer boundary at the transport boundaries are
n ðkrTÞ ¼ hT ðTair TÞ hm qðM Me Þhfg ð4Þ
The heat transfer at the symmetry is
n ðkrTÞ ¼ 0 ð5Þ
The mass transfer at the transport boundaries are
n ðDrc Þ ¼ hm ðcb cÞ ð6Þ
The mass transfer boundary condition at the symmetry boundary,
n ðDrc Þ ¼ 0 ð7Þ
Where ht is the heat transfer coefficient, hm is the mass transfer coefficient, Tair is
drying air temperature, Me equilibrium moisture content dry basis, and hfg is the latent
heat of evaporation.
The mass transfer coefficient (hm ) of mango is calculated by the following equation
proposed by Janjai and his research group [15].
Dair
hm ¼ ð2:0 þ 0:522Re0:5 Sc:33 Þ ð8Þ
L
Where Dair represents effective diffusivity of vapor air and L represents the length
of the domain.
And Reynolds number (Re) and Schmidt number (Sc) are defined by the following
equations;
uair L
Re ¼ ð9Þ
vk
vk
Sc ¼ ð10Þ
Dair
Here, uair and vk are the velocity of drying air and kinematic viscosity respectively.
In this study, the heat transfer coefficient (ht ) of mango is calculated by the fol-
lowing equation proposed by Perry and Green, in 1997.
1=3 K
air
ht ¼ 0:664Re1=2 Pr ¼ Nu ð11Þ
L
Where Pr, Nu, and L represent Prandtl number, Nusselt number, air conductivity,
and length of the food respectively.
The Day, Nelson, and modified Smith models were chosen for application in this
simulation for results in comprehension and temperature stability. The mean of the
prediction of these two models was settled as the equilibrium moisture content of
mango [15]. Both equilibrium moisture content models are as follows;
Modified smith model:
Me ¼ ð49:8255 0:7896TÞ ð157:7409 1:6234TÞ lnð1 RhÞ ð12Þ
And Day and Nelson model:

ð1=1:3855T :0432 Þ
lnð1 RhÞ
Me ¼ ð13Þ
:000029T 1:3261
2.3 Domain Formation and Mesh Generation

In this study, mango is considered as a domain. A mango is cut into pieces with a width
of 8 mm and a diameter of 30 mm for this study. The geometry and the appropriate
mesh generated by the program COMSOL Multiphysics are shown in Fig. 3, and
Fig. 4.
Fig. 3. (a) Computational domain of sample slice; (b) Simplified 2D axisymmetric model
domain
Fig. 4. Mesh design of the model domain
The model is developed for air velocity 1 m/s and air temperature 60 °C. The initial
temperature of the domain in this study is 28 °C. Other properties and values are given
in the following table:
Table 1. Physical properties of the model.

Properties Value
Initial air temperature 301.15 [k] [This work]
Initial moisture content (dry basis) 3.6511 kg water/kg dry mango [16]
Drying air temperature 333.15 [k] [This work]
Thermal conductivity 0.5 W/mK [16]
Specific heat 3240 J/kg K [16]
Density of food 1359 kg/m3 [16]
Heat of vaporization 2400 kJ/kg
In order to predict the temperature and humidity profile, a 2D axisymmetric finite

element model using COMSOL Multiphysics Program is constructed that included
simultaneous heat and mass transport. The simulation study run for 12 h. The tem-
perature and moister concentration changes after simulation are discussed as follows:
Fig. 5. Temperature distributions after 10 h drying
The above Fig. 5 (a and d) indicates the distribution of temperature at 50 °C, and
Fig. 5 (b and e) shows the distribution of temperature at 60 °C, and also Fig. 5 (c and f)
shows the distribution of temperature at 70 °C after 36000 s of drying. It is clear that
the temperature inside the product is less than outside. The temperature gradient has
shown negligible due to the thickness of the material and, depending on the temper-
ature that applied on the food material, then the material moisture is decreased at
varying rates. The temperature reached its target level after one hour of drying.
Fig. 6. Different temperature profile over the time
Above Fig. 6 shows the combined temperature after 36000 s of drying. Here we
observe the temperature reading obtained from the mango’s center point. After drying
for a while (more than an hour), we notice that the temperature graph increases and
reaches the temperature that we set for the simulation and it remains constant until the
simulation is finished. Although we set the simulation for 12 h, the temperature graph
looks the same in all kinds of time settings.
Fig. 7. Distribution of moisture concentration inside the food at different times
The above figure of 7 shows the moisture profile after 3600 s, 18000 s, 360000 s,
and 43200 s of drying at a temperature 60 °C where Moisture concentration decreases
with increasing drying time. The concentration values are indicated by the color bar on
the right. Starting with a uniform mass concentration, a blue-color stripe, which shows
that moisture is being lost from the product, grows as the simulation progresses. It is
worth noting that the moisture is instantly removed from the surface. As soon as the
surface moisture is removed, the interior moisture is released and there is a start of
moisture movement. From Fig. 7(a–d), we can see the moisture removing process is
very rapid. But from Fig. 7(e–h) we can see, the moisture removing process is being
slow. Since moisture initially existed, but moisture began to reduce gradually and the
process becomes slow. As we know, this is a slow process, because the moisture from
the inside has a difficult time getting out, and the amount of moisture in the interior of
the product is higher than in the outer region. The distribution of the moisture in dried
items is important to know because spoilage can start at higher depths in moisture.
Fig. 8. The loss of concentration of moisture over time
The concentration at different cut-points of the domain at a temperature of 60 °C

and velocity 1 m/s is shown in above Fig. 8(a). Figure 8(b) shows the average surface
concentration and Fig. 8(c) shows concentration at the center of the domain at tem-
perature 60 °C and velocity 1 m/s. As we studied these figures, we observed that the
concentration decreases over time. The simulation was set for 12 h to complete. The
moisture loss was rapid at first, with the surface concentration losing 32% of its
moisture in 1 h, but it has steadily slowed. The simulation was set with a starting
concentration of 59215 mol/m3 and finally, the concentration became 16188 mol/m3,
which means we lost 73% of moisture during drying. Moisture concentration decreases
exponentially with varied time constants, as seen in the above figure. In addition, as the
drying temperature increases, the amount of moisture becomes more significant in the
first negative operation of the curves, which requires less drying time. The simulated
outcome was found to be consistent with experimental data from the literature [16].
Subsequent FORTRAN simulations were performed using data from this literature and
this study also conformed to the study of Ambarita [17]. This study shows that utilizing
the recommended technique may save substantial time without sacrificing accuracy and
is capable of properly predicting moisture content and core temperature. In conse-
quence, this method assures that food products are drying in the most optimal manner
possible, without distortion, deterioration, or loss of nutrients.
4 Conclusion
A simple numerical method, the simultaneous analysis of the transport of heat and mass
on food drying has been accomplished in this study. A 2D axisymmetric finite element
model with COMSOL Multiphysics program that incorporates simultaneous transport
by heat and mass is designed for the prediction of the temperature and humidity profile.
The impacts of phase changes during dehydration are also measured. This study
indicates that by using the suggested approach, significant time may be saved without
losing accuracy and is capable of accurately predicting moisture content and core
temperature. It is noted that the moisture is instantly removed from the surface of the
mango sample. It is also notable that as soon as the surface moisture is removed, the
interior moisture starts to release. The results of this study were found to be compatible
with the experimental results reported in the literature. We observed that the moisture
concentration decreases exponentially with varied time constants. The temperature
gradient of the material was found negligible due to its thin thickness, with material
moisture decreasing at varying rates depending on the temperature used. Consequently,
the model provided in this study may be used to represent a wide range of agricultural
products, including cylindrical shapes and heat and mass transfer procedures such as
frying, roasting, etc. In addition, we may extend this work by adding shrinkage into our
computations. This simulation ensures food products are dried optimally without dis-
tortion, degradation, or nutrient loss. The numerical findings also help industrial and
academic users comprehend the drying process.
Acknowledgement. The authors gratefully acknowledge the technical supports to The Centre of
excellence in Mathematics, Department of Mathematics, Mahidol University, Bangkok,
Thailand.
References
1. Ahmed, N., Singh, J., Chauhan, H., Anjum, P.G.A., Kour, H.: Different drying methods their
applications and recent advances. Int. J. Food Nutr. Saf. 4, 34–42 (2013)
2. Managuli, S.C., Sathish, H.M., Seetharamu, K.N.: Numerical simulation of heat and mass
transfer along with shrinkages in Brinjal (Eggplant/Solonummelongena). Int. J. Recent.
Technol. Eng. 8, 2867–2872 (2019)
3. Torringa, E., Esveld, E., Scheewe, I., Van Den Berg, R., Bartels, P.: Osmotic dehydration as
pre-treatment before combined microwave-hot-air drying of mushrooms. J. Food Eng. 49,
185–191 (2001)
4. Gustavsson, J., Cederberg, C., Sonesson, U., Emanuelsson, A.: The methodology of the
FAO study: “Global Food Losses and Food Waste - Extent , causes and prevention” (2013)
5. Budnikov, D., Vasilyev, A.N.: Development of a laboratory unit for assessing the energy
intensity of grain drying using microwave. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.)
ICO 2019. AISC, vol. 1072, pp. 93–99. Springer, Cham (2020). https://doi.org/10.1007/978-
3-030-33585-4_9
6. Kumar, C., Karim, A., Joardder, M.U.H., Miller, G.J.: Modeling heat and mass transfer
process during convection drying of fruit. In: 4th International Conference on Computational
Methods, pp. 1–9 (2012)
7. Yuan, Y., Tan, L., Xu, Y., Yuan, Y., Dong, J.: Numerical and experimental study on drying
shrinkage-deformation of apple slices during process of heat-mass transfer. Int. J. Therm.
Sci. 136, 539–548 (2019)
8. The daily newspaper samakal. http://www.samakal.com Accessed 24 Aug 2021
9. Hasan, R.M., Dewanjee, A.N., Shemul, S.N.: Design and construction of a formalin detector
using the conductivity property. In: 1st National Conference on Electrical & Communication
Engineering and Renewable Energy (ECERE 2014), pp. 131–135 (2014)
10. Dewanjee, A.N., Dey, M., Rashedul Haq Rashed, M., Muhury, A., Prakash Dhar, J.: High
performance cost effective formalin detector using conductivity property. In: 4th Interna-
tional Conference on 4th International Conference on Advances in Electrical Engineering,
ICAEE 2017, pp. 635–640 (2017)
11. Dewanjee, A.N., Hossain, Q.D., Muhury, A.: Quantitative deviation of spatial parameters of
gait in parkinson’s disease. In: 2019 International Conference on Wireless Communications,
Signal Processing and Networking, WiSPNET 2019, pp. 304–309 (2019)
12. Sabarez, H.T.: Modelling of Drying Processes for Food Materials. Elsevier Ltd., London
(2015)
13. Karim, M.A., Hawlader, M.N.A.: Mathematical modelling and experimental investigation of
tropical fruits drying. Int. J. Heat Mass Transfer 48, 4914–4925 (2005)
14. Seyedabadi, E., Khojastehpour, M., Abbaspour-Fard, M.H.: Convective drying simulation of
banana slabs considering non-isotropic shrinkage using FEM with the Arbitrary Lagrangian-
Eulerian method. Int. J. Food Prop. 20, S36–S49 (2017)
15. Janjai, S., et al.: Finite element simulation of drying of mango. Biosyst. Eng. 99, 523–531
(2008)
16. Barati, E., Esfahani, J.A.: A new solution approach for simultaneous heat and mass transfer
during convective drying of mango. J. Food Eng. 102, 302–309 (2011)
17. Ambarita, H., Nasution, A.H.: A numerical solution to simultaneous heat and mass transfer
of convective drying of food. J. Phys. Conf. Ser. 1116 (2018)
A Literature Review on the MPPT Techniques
Applied in Wind Energy Harvesting System
Tigilu Mitiku1 and Mukhdeep Singh Manshahia2(&)

1
Department of Mathematics, Bule Hora University, Bule Hora, Ethiopia
2
Department of Mathematics, Punjabi University Patiala, Patiala, Punjab, India
Abstract. Wind energy harvesting system (WEHS) is one of the promising

renewable energy system (RES) that generates clean energy to power the grid or
stand-alone load located at remote areas connected through the power electronic
devices. Wind turbines convert kinetic energy created due to motion of wind to
mechanical energy and then to electrical energy using generator. The output of
PMSG varies depending on the variation of wind speed. The maximum power
point tracking (MPPT) controller is used to drive the WEHS at the maximum
speed that corresponds to optimum power at any wind speed. The works carried
out by several researchers on modeling of WEHS using different MPPT tech-
niques are reviewed as under. The detailed literature review on the MPPT
techniques applied in WEHS is presented in this paper.
Keywords: Wind energy harvesting system Permanent-magnet synchronous

generator Maximum power point tracking Bidirectional DC-DC converters
1 Introduction
The world population, industrial sectors advancement and economic growth of many
countries are rapidly increasing at a faster rate which requires more demand of energy.
Moreover, the fast growing of urbanization all over the world greatly increases energy
consumption. Due to such reasons the demand of energy is increasing worldwide and
expected to be triple by 2050. In developing countries like Ethiopia, the energy sector
assumes a critical importance in view of the ever-increasing energy needs requiring
huge investments to meet them [1].
The energy, we use to power everything from our homes to workplaces comes from
a variety of different sources and can be classified into two broad categories i.e.,
renewable and non-renewable energy sources. They can reduce the environmental
impacts of traditional energy sources in high amount and decreases the dependency on
fossil fuels. The burning question and main challenges of the world today is in addition
to producing sufficient energy for mankind could we be able to ensure a safe world for
next generation? Many researches are focusing on the enhancement of technology
which can efficiently convert the RESs into useful electrical energy sources. Wind
energy system is one of the rapidly growing technologies which provide sustainable
supply of energy to the world due to abundant, inexhaustible potential, cost effec-
tiveness and environmentally friendly [2].

https://doi.org/10.1007/978-3-030-93247-3_73
A Literature Review on the MPPT Techniques 763
The remote areas like islands, rural areas, hill stations, having lack of infrastructure
are mostly found at very far distance from the main grid so that they need autonomous
energy generation systems such as solar and wind etc. for their local operation. In such
areas power supply system should provide constant frequency and voltage to supply
stable power to the consumers [3, 4]. For this reason, wind energy sources are getting
attention in current research due to widely available all over the world.
2 Objectives of the Research
The main objective of the present work is modelling of WEHS with the help of the soft
computing technique. Most researchers applied conventional modelling approach based
on its available input output data. However, the results depend on the mathematical
model of the system and its accuracy. In the absence of the mathematical model of the
system the analysis becomes very difficult. However, soft computing techniques do not
require a mathematical model of the system to be controlled. Motivated by this
advantageous feature of soft computing-techniques, the present work focuses on
building a model for WEHS based on the obtained input-output data using ANFIS.
3 Review of Related Literature
Most effective control strategies are under research to develop reliable, cost effective
and quality power from wind energy system. Researchers classified the MPPT tech-
niques into conventional and artificial intelligence based approaches.
3.1 Review on Conventional Methods Based MPPT

Mohammd Hassan et al. [5] have presented OTC of small-scale variable-speed WEHS
to extract maximum power from wind within a whole range of wind speed variations.
With the proposed controller the power coefficient reached to its maximum possible
value. The error between the estimated dc current and actual dc current is used to adjust
the duty cycle of the boost converter switch to regulate the output voltage.
Pindoriya et al. [6] have developed pitch angle control of grid connected PMSG
based WEHS using P&O algorithm for obtaining MPPT. The system was connected to
the utility grid via back-to-back power electronic converter. The aim of the control was
to control the shaft speed of a PMSG, DC link voltage and minimize the PMSG losses
during the normal operation of the system. Several simulations were carried out for
different wind conditions and pitch angle to test the performance of the proposed
method.
Bouzid Mohamed Amine et al. [7] have proposed PI control technique to control
the generator torque of stand-alone WEHS according to the variation of wind speed to
produce maximum power. The system contains controlled rectifier and inverter linked
by DC link capacitor. Vector-control scheme was used for the control of load-side
PWM inverter to regulate the amplitude and frequency of the inverter output voltage.
764 T. Mitiku and M. S. Manshahia
Zebraoui and Bouzi [8] have presented comparative study of different MPPT
methods such as HCS, OTC, PSF and FLC for the control of WECS. The comparison
of the simulation result was made based on system efficiency, the response time, the
maximum power to be achieved and the system behavior during the MPPT. They
showed as FLC is more efficient and presents better performance as compared to other
MPPT algorithms.
Ramadoni Syahputra and Indah Soesanti [9] have proposed the extended P&O
based MPPT control to improve the performance of generators in WECS. They
combined predictive method and P&O algorithm to improve the disadvantage of P&O
algorithm of producing oscillations under steady-state conditions due to constant duty
cycle changes. Predictive methods are used to determine the magnitude of step changes
in the P&O algorithm. The principle of the controller is to increase and decrease the
voltage by adjusting the duty cycle on the boost converter so that power output from
the generator can be optimized.
Abdelghani Harrag and Sabir Messalti [10] have proposed a new modified
P&O MPPT algorithm with adaptive duty cycle step using PID controller based on
genetic algorithm to improve and overcome the drawbacks of the classical P&O MPPT.
The GA was used to tune the PID controller gains in order to optimize the variable step
size needed by the P&O MPPT generating the PWM duty cycle to drive DC/DC boost
converter. The efficiency of the proposed method was studied using a boost converter
connected to a Solarex MSX-60 model. Analysis and comparison with the classical
fixed step size P&O and that developed genetic variable step size are presented. The
new algorithm addresses the challenges associated with rapidly changing isolation
levels. The TSR method [11], the OTS strategy [12, 13], the PSF algorithm, the TPP
method, the P&O algorithm [14, 15], the ORB control, the HCS control and the MPPT
algorithm method with INC [14]. The majority of these techniques of the optimal
operating point require knowledge of the wind turbine characteristics. Indeed, these can
be obtained by conducting studies in simulations or experimental tests of the wind
turbine studied which are time consuming and costly tasks.
3.2 Artificial Intelligence Based MPPT

Nowadays AI methods are mostly used in renewable energy systems due to the flexible
nature of the control offered by such techniques. They are highly successful in non-
linear systems due to the fact that once properly trained they can interpolate and
extrapolate the random data with high accuracy.
Fuzzy Logic Control Based MPPT
FLC is a suitable way to map an input space to output space with the help of fuzzy logic
theory. Fuzzy logic uses fuzzy set theory, in which a variable is a member of one or
more sets, with a specified degree of membership. In [16] FLC was designed to keep
the amplitude and frequency of the output voltage of 3-phase AC loads supplied from a
wind/battery hybrid energy system at a constant value and compared the obtained result
with PI controller for performance validation.
Yassine Lkhal et al. [17] carried out a MPPT algorithm based on FLC mechanism to
improve production efficiency of variable speed synchronous generator. The control
strategy estimates the restoring torque of the generator by adjusting the TSR of the
generator. The comparative analysis was made between the controlled design with
uncontrolled model and the result indicates that the controlled system performs better
with an increase of 30% more.
Ndirangu et al. [3] have developed a FLC for MPPT to drive the WEHS at the rated
speed that corresponds to maximum power at any wind speed. The system contains
uncontrolled rectifiers, DC-DC boost converter and variable load resistance expressed
as a function of duty cycle. The proposed controller tracks the maximum power point
curve of the system by varying the duty cycle of the DC-DC boost converter using
rectifier output current error and change in error as an input to the controller. The
simulation indicates that the designed system is able to extract maximum power for
varying wind speeds. However, as the load side inverter is not included in the system,
the frequency and voltage are not stable.
Huynh Quang Minh et al. [18] have performed two FLC one for MPPT to optimize
the efficiency of a variable speed PMSG based wind turbine and the other for the
produced power management and storage of the autonomous system in respecting load
demand. The system contains diode bridge rectifier, two boost converters, inverter and
the battery for back up purpose. The FLC1 was applied on the first converter to
increase the output voltage of the generator by controlling the rotor speed to deliver the
optimum power to the load under variable condition. The FLC2 was applied on the
second converter to adjust the DC output voltage, to a value suitable for battery
charging and also suitable for the proper operation of the PWM inverter. If the wind
turbine power output exceeds the load demand, the surplus is stored in the battery and
if the battery is full, the surplus will be dissipated in a resistor.
Mohamed Kesraoui [19] have investigated the control of the aerodynamic power in
a variable-speed wind turbine at high wind speeds using FLC. The purpose of the
control was to manage the excess power produced during high wind speeds based on
PMSG connected to the grid through back-to-back power converter. The aerodynamic
power was limited through pitch angle control using a FLC and the power on the dc bus
voltage through power converter control. Comparisons between fuzzy logic and con-
ventional controllers was made and satisfactory results were obtained in term of pitch
angle, dc bus voltage and grid power.
Chakraborty and Barma [20] designed and simulated FLC to manages power pro-
duction and storage according to wind conditions & load demand for variable speed
stand-alone WEHS. The system use battery for back up purpose. With the help of this
controller a smooth AC output voltage was supplied to fixed load under any wind
speed. Based on the battery state-of-charge and, the error between wind power and load
demands the controller decide the duty cycle applied to the boost converter, the
moment to switch on and off the battery and the moment to discharge the surplus into
dumb resistance.
Ali M. Eltamaly and Hassan M. Farh [21] have developed a modified MPPT control
algorithm based on FLC to regulate the rotational speed to force the PMSG to work
around its MPP in speeds below rated speeds and to produce the rated power in wind
speed above the rated speed. The controller used two real measurements, change of
output power and rotational speed between two consequent iterations. The output is the
required change in rotational speed. Indirect vector-controlled PMSG system has been
used for this purpose. The electromagnetic torque control was performed to track
maximum power using current control. The control of active and reactive power was
achieved by controlling quadrature and direct current components of grid current
respectively. The integration of two effective computer simulation software packages
(PSIM and Simulink) was applied together to carry out the simulation of the modified
system effectively.
According to [22] FLC was developed for the control of both rectifier boost and
inverters for variable speed direct driven PMSG system connected to the grid. The FLC
produces the reference signals which produce gate pulses by using the hysteresis
controller. Researcher used scaling factors to get the desired output response during
transient and dynamic states. The selection of the scaling factors was done through the
trial and error which is time consuming.
Marwan Rosyadi et al. [23] have developed FLC to effectively control a pitch angle
to regulate rotational speed of PMSG to its rated value. To control under the rated level,
THIPWM is used to utilize the voltage reference without over modulation and maxi-
mize fundamental amplitude of the output voltage. [24] have developed a FLC to
stabilize the value of load busses voltage fed by wind turbine/super capacitor hybrid
energy system. The voltage output of the inverter is controlled to keep the voltage at
380 V, line to line, at a fixed frequency of 50 Hz. The result is compared with PI
controller; the voltage on the load is very different from 380 V with PI controller.
However, there is very little ripple with FLC and it sets to 380 V value.
Altan Gencer [25] have implemented FLC in variable speed WEHS to analyze
power flow efficiency of PMSG. Proposed control and modeling of the whole system
are designed to operate for variable speed PMSG in the WEHS. The performance of the
proposed controller system has very good settling time, peak value and drop value.
Baskar and Jamuna [26] have proposed closed loop FLC strategy of a variable-
speed stand-alone Wind-Driven PMSG system connected with grid to supply constant
voltage to inverter. The system contains Buck Boost Converter in between rectifier and
inverter to step up and step down the voltage. SPWM control technique was used to
control frequency of the AC output voltage to supply constant voltage to the load. FLC
based converter gives the quick dynamic response and accurate control compared with
conventional controllers.
Chaicharoenaudomrung et al. [27] have presented a novel approach by combining
the conventional P&O method and FLC variable speed PMSG based wind energy
system to achieve maximum dc output power of three phase rectifier. Simulation and
experimental were conducted to demonstrate the performance of the proposed FLC-
P&O MPPT method. They verified that the FLC-P&O method is able to reach the MPP
with a fast response under the rapid change of wind speed.
Diana Petrila [28] have describes the design of a MPPT strategy for a variable speed,
small scale, wind turbine systems based on FLC technique. The change in mechanical
power, the change in rotor speed and the sign of their ratio were input variables to FLC.
The change of reference generator current is the output variable.
Amine et al. [29] performed tuning the MFs of the FLC by adapting their width
using GAs in order to achieve a better result, thereby increasing the energy produced.
In [13] the AMPC technique was implemented to increase the effectiveness of the MPP
Control strategy by using a new AMPC for maximizing the power delivered by the
wind turbine system during partial load operation whatever the disturbances caused by
variations in wind profile. FLC was chosen as adaption algorithm so as to converge the
adjustable model to the reference model by minimizing the error and having a stability
of the system.
Artificial Neural Network Based MPPT
Hassan H. El-Tamaly and Ayman Yousef Nassef [30] have proposed two ANN
models, one TSR control for MPPT purpose when wind speed is below rated speed and
the other controls pitch angle to produce rated power and prevent the system from
damage due to wind gusts. The main goal was to produce maximum power by
adjusting the turbine speed in such a way that the optimal TSR is maintained. The
proposed model was found to have the capability of MPPT for the variable speed wind
turbine.
Hui Li et al. [31] have developed ANN based control of directly driven PMSG based
small WEHS to produce maximum power delivered to the load. The proposed control
and the circuit configuration without anemometer will give a low cost, light weight, and
same solid performance over many years. ANN was used to implement a novel
mechanical sensor less peak power extraction. The ANN has two applications in the
proposed control system: (i) it estimates the actual wind speed and (ii) based on the
estimated wind speed, the optimum rotor speed profile will be generated. The PI
controller then controls the actual rotor speed to be the desired value by varying the
switching ratio of the PWM inverter. The mechanical power sample data was produced
from turbine power equation with pre-selected rotor speed and wind velocity samples.
The rotor speed and power samples are then recombined as input matrix of the neural
network to estimate the wind speed. On the other hand, the wind velocity sample are
used as target to train a three-layer network.
Thongam et al. [32] have proposed MPPT based on Jordan recurrent multilayer
ANN with one hidden layer for variable speed WEHS. The instantaneous output
power, maximum output power, rotor speed and wind speed were inputs to the net-
works whereas the rotor speed was output command signal of the network. When the
obtained reference speed was applied to the speed control loop of the machine side
converter control, maximum power will be produced by the WEHS. The online back
propagation training algorithm was used to train the Jordan recurrent ANN to con-
tinuously modify the weights of the networks during the operation of the WEHS.
A vector control approach was used on the machine side converter where control is
exercised on the rotor flux reference frame and SVPWM technique was used for grid
side converter control scheme.
Ren and Bao [33] have designed ANN based MPPT controller for small wind
turbine directly driven PMSG system. The developed controller estimated the wind
speed by using generator speed and mechanical power of the turbine to provide fast and
accurate velocity information and avoid using anemometers. The system adopts AC-
DC-DC-AC conversion and the control process can be achieved by the control of DC-
DC circuit and the controllable inverter. The main goal of the control system was to get
the maximum input power. They employed a feed forward type BPNN and tested the
performance of the control system for wind speed variation. Only the output voltage
and current values of the rectifier was tested over the entire system. The method saved
the cost of the design and reduced the system failure rate.
Abbas Rezaei and Leila Noori [34] have applied RBF network for the modelling and
simulation of turbo generator. As doing experimental work is expensive and time
consuming to predict the behaviour of turbo generators with changing all variables,
they used RBF network having speed and excitation current as inputs whereas voltage,
active and reactive power as desired outputs. The obtained result was compared with
the experimental values and shows that there is a good agreement between them. Also,
they confirmed that RBF is more accurate as comparison with MLP model.
Ahmet Serdar Yilmaz and Zafer Ozer [35] have proposed an ANN based pitch angle
controller of wind turbines to improve the quality of power produced above the rated
wind speed. MLP and RBF networks were used to control the pitch angle under
variable wind speed condition to reduce the aerodynamic efficiency and avoid the
turbine overloading. The obtained result indicates that the obtained output power was
successfully regulated during high wind speed, and overloading or outage of the wind
turbine was prevented well.
Sanaz Sabzevari et al. [36] have developed robust direct adaptive fuzzy-PI based
MPPT control for PMSG based WEHS. This technique does not depend neither on the
variable parameters of the PMSG nor on the exact information about the plant. The
information of the wind speed is needed to track the MPP during the moderate wind
speed region based on the tip speed ratio control. ANN-PSO was utilized to estimate
the wind speed properly both in transient and steady states using the aerodynamic
power and generator speed, with small estimation errors. The system contains rectifiers,
boost converter and resistive load. By controlling the duty cycle of the converter, the
apparent load developed by the generator can be adjusted. Thus, its shaft speed can be
tuned. A sudden wind change was applied for evaluating the robustness of the adaptive
fuzzy-PI controller and compared with the conventional PI controller.
Adaptive Neuro-Fuzzy Based MPPT
Yuksel Oguz et al. [37] have developed design and control of AC/DC/AC/IGBT-based
PWM power converter using ANFIS for variable speed WEHS. The grid voltage or
load voltage was regulated at RMS value by a neuro-fuzzy voltage regulator. In [37]
ANFIS controller has designed for the control of blade pitch angle to manage the
output voltage and frequency of a VSWEHS to a desirable value and improve the
quality of output power.
The dynamic modelling of the hybrid wind-gas power generation system for
meeting the electric energy demand of small settlement units far from city center’s or
energy distribution networks was designed and simulated by authors. A wind power
generation system and gas power generation system were interconnected on an AC
distribution network to meet the electric energy demands of consumers. ANFIS is used
to ensure electrical output magnitudes of the hybrid power generation system at a
desired operating performance.
Meharrar et al. [38] have proposed ANFIS model to predict the optimal rotational
speed by taking the variation of the wind speed in consideration. The controller was
designed and adapted to track maximum power of the wind and regulate output voltage
within a wide range of wind speed variation. The performance of the controller was
tested for two case studies of fast wind speed variation. The result indicates the pos-
sibility of achieving maximum power tracking for the wind and output voltage regu-
lation for the DC bus simultaneously with the ANFIS controller. The result was
compared with FLC and gives better result than FLC.
Hafsi Slah et al. [39] have presented the pitch angle and DC bus control of variable
speed PMSG WEHS using PI controllers, FLC and ANN control and compared their
simulation results. Jemaa Aymen et al. [40] has discussed and compared the advan-
tages, efficiency and accuracy of two MPPT based control methods i.e., ANN and
Neuro-Fuzzy controllers applied to a variable speed PMSG based WEHS. The pitch
angle of turbine is synchronized according to the measured wind speed values in neural
network and Neuro-Fuzzy controls which are applied to boost the performance.
Ali et al. [41] have proposed ANFIS based on the field orientation control for the
control of the PMSG converter. The main aim was to maximize the output power and
address the output control of a utility-connected VSPMSG for wind power generation
systems. The control system allows extracting maximum energy for below the rated
wind speeds by optimizing the turbine speed, while minimizing mechanical stresses on
the turbine during gusts of wind. A back-to-back converter was used to control the
output of the PMSG driven by the wind turbine.
Sitharthan and Geethanjali [42] have developed ANFIS based MPPT controller
using the TSR technique to estimate sensor-less wind speed to track MPP of WEHS.
The proposed sensor-less wind speed technique requires only the instantaneous active
power as its input and estimates the wind speed. Using the estimated wind speed and
knowledge of maximum TSR value, optimum rotor speed command is executed. The
optimum rotor speed command is applied to the PI based speed control loop of the rotor
side converter for controlling of the actual rotor speed of the system to the desired value
by varying the switching ratio of the PWM inverter. This makes the turbine to track
maximum power points by dynamically changing the turbine torque to operate in
optimum TSR. The grid-side converter power flow is controlled consecutively to keep
up the dc-link voltage at reference value and to control active and reactive power using
SVPWM technique. Simulation was carried out in order to verify the performance of
the proposed controller in a grid connected variable speed operated DFIG based
WEHS.
Rahman and Rahim [1] have proposed two ANFIS based MPPT algorithm, one for
wind speed estimation and the other for maximum power point. Compared to the
conventional ANN based MPPT techniques ANFIS based controller has the ability to
track the MPP and the corresponding rotor speed of the wind generator by estimating
wind speed with very little error.
Padmaja and Srikanth [43] have proposed MPPT of a PV system using ANFIS
under variable solar irradiation conditions. The variables operating temperature and
irradiance were taken as input variables and predict the maximum output power from
PV module at that instant. At the same operating temperature and irradiance, the actual
output power from the PV module is calculated by sensing operating voltage and
current. The error between predicted power were calculated and given to a PI con-
troller, to generate operating signals. The operating signal generated by the PI con-
troller is given to the PWM generator which generates high frequency of carrier signal
as compared to the operating signal.
Muthukumari et al. [43] proposed a intelligent tuned PID controller-based SEPIC

for MPPT operation of WEHS. They implemented Fuzzy-PID and ANFIS-PID in the
proposed system and results are compared. The inputs to the ANFIS controller are the
error signal e(t) and the derivative of error signal De(t) of the actual dc voltage of the
converter and reference dc voltage. The output from the ANFIS controller are the
parameters for PID controller such as proportional gain Kp, integral gain Ki and
derivative gain Kd. The intelligent ANFIS-PID technique provides better performance
the conventional PID method in terms of rise time, settling time, and overshoot than the
conventional tuning methods.
4 Findings and Conclusion
After an extensive review of literatures on MPPT techniques applied in WEHS, the

following gaps are observed: i) the model developed so far would not focus on the cost
minimization rather it was on power maximization, ii) in some studies the performance
evaluation of the proposed method has not been evaluated and iii) further attempts for
developing a model for wind energy harvesting system for meeting the energy needs of
the remote society are also limited. Most of them focus on urban power.
Acknowledgements. The authors wish to extend their great gratitude to Punjabi university
Patiala, and Ministry of science and higher education of Ethiopia.
References
1. Rahman, M.A., Rahim, A.H.M.A.: An efficient wind speed sensor-less MPPT controller
using adaptive neuro-fuzzy inference system. In: 2015 International Conference on
Advances in Electrical Engineering (ICAEE), Bangladesh (2015)
2. Bisoyi, S.K., Jarial, R.K., Gupta, R.A., Bisoyi, S.K., Jarial, R.K., Gupta, R.A.: Modeling and
analysis of variable speed wind turbine equipped with PMSG. Int. J. Curr. Eng. Technol. 2,
421–426 (2014)
3. Ndirangu, J.G., Nderu, J.N., Maina, C.M., Muhia, A.M.: Power output maximization of a
PMSG based standalone wind energy conversion system using fuzzy logic. IOSR J. Electr.
Electron. Eng. 11(1), 58–66 (2016)
4. Jafari Nadoushan, M.H., Akhbari, M.: Optimal torque control of PMSG-based stand-alone
wind turbine with energy storage system. J. Electr. Power Energy Convers. Syst. 1(2), 52–59
(2016)
5. Pindoriya, R.M., Usman, A., Rajpurohit, B.S., Srivastava, K.N.: PMSG based wind energy
generation system: energy maximization and its control. In: 7th International Conference on
Power Systems (ICPS) (2017)
6. Mohamed, A.B., Massoum, A., Allaoui, T., Zine, S.: Modelling and control of standalone
wind energy conversion system. Int. J. Adv. Eng. Technol. 6(6), 2382–2390 (2014)
7. Zebraoui, O., Bouzi, M.: Comparative study of different MPPT methods for wind energy
conversion system. In: IOP Conference Series: Earth and Environmental Science (2018)
8. Syahputra, R., Soesanti, I.: Performance improvement for small-scale wind turbine system
based on maximum power point tracking control. Energies 12(20), 3938 (2019)
9. Harrag, A., Messalti, S.: Variable step size modified P&O MPPT algorithm using GA-based
hybrid offline/online PID controller. Renew. Sustain. Energy Rev. 49, 1247–1260 (2015)
10. Kumar, D., Chatterjee, K.: A review of conventional and advanced MPPT algorithms for
wind energy systems. Renew. Sustain. Energy Rev. 55, 957–970 (2016)
11. Hannachi, M., Elbeji, O., Benhamed, M., Sbita, L.: Optimal torque maximum power point
technique for wind turbine: proportional–integral controller tuning based on particle swarm
optimization. Wind Eng. 45(2), 337–350 (2020)
12. Saidi, Y., Mezouar, A., Miloud, Y., Brahmi, B., Kerrouche, K.D.E., Benmahdjoub, M.A.:
Adaptive maximum power control based on optimum torque method for wind turbine by
using fuzzy-logic adaption mechanisms during partial load operation. Periodica Polytechnica
Electr. Eng. Comput. Sci. 64(2), 170–178 (2020)
13. Jha, K., Dahiya, R.: Comparative study of Perturb & Observe (P&O) and Incremental
Conductance (IC) MPPT technique of PV system. In: Dutta, D., Mahanty, B. (eds.)
Numerical Optimization in Engineering and Sciences. AISC, vol. 979, pp. 191–199.
14. Mousa, H.H., Youssef, A.R., Mohamed, E.E.: Variable step size P&O MPPT algorithm for
optimal power extraction of multi-phase PMSG based wind generation system. Int. J. Electr.
Power Energy Syst. 108, 218–231 (2019)
15. Mengi, O.O., Altas, I.H.: Fuzzy logic control for a wind/battery renewable energy
production system. Turk. J. Electr. Eng. Comput. Sci. 20(2), 187–206 (2012)
16. Lakhal, Y., Baghli, F.Z., El Bakkali, L.: Fuzzy Logic Control Strategy for tracking the
maximum power point of a horizontal axis wind turbine. In: 8th International Conference
Interdiscipilinarity in Engineering, Romania, vol. 19 (2014)
17. Minh, H.Q., Frederic, N., Najib, E., Abdelaziz, H.: Power management of a variable speed
wind turbine for stand-alone system using fuzzy logic. In: 2011 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1404–1410. IEEE (2011)
18. Kesraoui, M., Lagraf, S.A., Chaib, A.: Aerodynamic power control of wind turbine using
fuzzy logic. In: 3rd International Renewable and Sustainable Energy Conference (IRSEC),
Algeria (2015)
19. Chakraborty, N., Barma, M.D.: Modelling of stand –alone wind energy conversion system
using fuzzy logic controller. Int. J. Innov. Res. Electr. Electron. Instrum. Control Eng. 2(1),
861–868 (2014)
20. Eltamaly, A.M., Farh, H.M.: Maximum power extraction from wind energy system based on
fuzzy logic control. Electr. Power Syst. Res. 97, 144–150 (2013)
21. Sekhar, V.: Modified fuzzy logic based control strategy for grid connected wind energy
conversion system. J. Green Eng. 6(4), 369–384 (2016)
22. Rosyadi, M., Muyeen, S.M., Takahashi, R., Tamura, J.: A design fuzzy logic controller for a
permanent magnet wind generator to enhance the dynamic stability of wind farms. Appl. Sci.
2, 780–800 (2012)
23. Altas, I.H., Mengi, O.O.: A Fuzzy Logic Voltage Controller for off-grid wind
turbine/supercapacitor renewable energy source. In: 8th International Conference on
Electrical and Electronics Engineering (ELECO) (2013)
24. Gencer, A.: Modelling of operation PMSG based on fuzzy logic control under different load
conditions. In: 10th International Symposium on Advanced Topics in Electrical Engineering
(ATEE), Turkey (2017)
25. Baskar, M., Jamuna, V.: Green energy generation using FLC based WECS with lithium ion
polymer batteries. Braz. Arch. Biol. Technol. 59(2), 1–15 (2016)
26. Chaicharoenaudomrung, K., Areerak, K., Areerak, K., Bozhko, S., Hill, C.I.: Maximum
power point tracking for stand-alone wind energy conversion system using FLC-P&O
method. IEEJ Trans. Electr. Electron. Eng. 15(12), 1723 (2020)
27. Petrila, D., Blaabjerg, F., Muntean, N., Lascu, C.: Fuzzy logic based MPPT controller for a
small wind turbine system. In: 13th International Conference on Optimization of Electrical
and Electronic Equipment (OPTIM) (2012)
28. Amine, H.M., Abdelaziz, H., Najib, E.: Wind turbine maximum power point tracking using
FLC tuned with GA. Energy Procedia 62, 364–373 (2014)
29. El-Tamaly, H.H., Nassef, A.Y.: Tip speed ratio and Pitch angle control based on ANN for
putting variable speed WTG on MPP. In: Eighteenth International Middle East Power
Systems Conference (MEPCON) (2016)
30. Li, H., Shi, K.L., McLaren, P.G.: Neural-network-based sensorless maximum wind energy
capture with compensated power coefficient. IEEE Trans. Ind. Appl. 41(6), 1548–1556
(2005)
31. Thongam, J.S., Bouchard, P., Ezzaidi, H., Ouhrouche, M.: Artificial neural network-based
maximum power point TrackingControl for variable speed wind energy conversion systems.
In: Control Applications, (CCA) & Intelligent Control, Saint Petersburg, Russia (2009)
32. Ren, Y.F., Bao, G.Q.: Control strategy of maximum wind energy capture of direct-drive
wind turbine generator based on neural-network. In: Asia-Pacific Power and Energy
Engineering Conference, China (2010)
33. Hayati, M., Rezaei, A., Noori, L.: Application of radial basis function network for the
modeling and simulation of turbogenerator. J. Adv. Inf. Technol. 4(2), 76–79 (2013)
34. Yilmaz, A.S., Ozer, Z.: Pitch angle control in wind turbines above the rated wind speed by
multi-layer perceptron and radial basis function neural networks. Expert Syst. Appl. 36(6),
9767–9775 (2009)
35. Sabzevari, S., Karimpour, A., Monfared, M., Naghibi Sistani, M.B.: MPPT control of wind
turbines by direct adaptive fuzzy-PI controller and using ANN-PSO wind speed estimator.
J. Renew. Sustain. Energy 9(1), 013302 (2017)
36. Oguz, Y., Guney, I.: Adaptive neuro-fuzzy inference system to improve the power quality of
variable-speed wind power generation system. Turk. J. Electr. Eng. Comput. Sci. 18(4),
625–645 (2010)
37. Meharrar, A., Tioursi, M., Hatti, M., Stambouli, A.B.: A variable speed wind generator
maximum power tracking based on adaptative neuro-fuzzy inference system. Expert Syst.
Appl. 38, 7659–7664 (2011)
38. Slah, H., Mehdi, D., Lassaad, S.: Advanced control of a PMSG wind turbine. Int. J. Mod.
Nonlinear Theory Appl. 5, 1–10 (2016)
39. Jemaa, A., Zarrad, O., Mansouri, M.: Performance assessment of a wind turbine with
variable speed wind using artificial neural network and neuro-fuzzy controllers. Int. J. Syst.
Appl. Eng. Dev. 11(3), 167–172 (2017)
40. Ali, A., Moussa, A., Abdelatif, K., Eissa, M., Wasfy, S., Malik, O.P.: ANFIS based
controller for rectifier of PMSG wind energy conversion system energy conversion system.
In: 2014 IEEE Electrical Power and Energy Conference IEEE (2014)
41. Sitharthan, R., Geethanjali, M.: ANFIS based wind speed sensor-less MPPT controller for
variable speed wind energy conversion systems. Aust. J. Basic Appl. Sci. 8(18), 14–23
(2014)
42. Padmaja, A., Srikanth, M.: Design of MPPT controller using ANFIS and HOMER based
sensitivity analysis for MXS 60 PV module. Int. J. Innov. Res. Adv. Eng. (IJIRAE) 11(2),
40–50 (2014)
43. Muthukumari, T., Raghavendiran, T.A., Kalaivani, R., Selvaraj, P.: Intelligent tuned PID
controller for wind energy conversion system with permanent magnet synchronous generator
and AC-DC-AC converters. IAES Int. J. Robot. Autom. 8(2), 133 (2019)
Developing a System to Analyze
Comments of Social Media and Identify
Friends Category
Tasfia Hyder1 , Rezaul Karim2 , and Mohammad Shamsul Arefin1(B)

1
of Engineering and Technology, Chattogram 4349, Bangladesh
u1504053@student.cuet.ac.bd, sarefin@cuet.ac.bd
2
Department of Computer Science and Engineering, University of Chittagong,
Chittagong 4331, Bangladesh
rezaul.cse@cu.ac.bd
Abstract. Today users on the social platform are expressing their emo-
tions, ideas, proposals and views. The opinion may articulate critical
opinions in various ways and may include different polarities such as
positive, negative or neutral and it is often a difficult challenge for peo-
ple to appreciate the feeling of each opinion and the time it takes. The
analysis of the feeling in each statement will resolve this issue. This paper
presents a framework to analyze social media comments e.g. Facebook
and identify a category of friends. First, some public profiles are selected
and comments are retrieved from different posts of them. Secondly, those
comments are stored as dataset and pre-processed for sentiment analysis.
After that, the pre-processed data trained and tested in a sentiment anal-
ysis model developed by us. From the sentiment of data, we then identify
friends. For example, a friend with a positive sentiment of comment can
be considered as a good friend. The evaluation of the performance is
measured. A decent accuracy is achieved by the system.
Keywords: Social media · Sentiment analysis · Natural language

processing · Facebook comments · Long Short Term Memory (LSTM) ·
Friends category
1 Introduction
Social media is an interactive computer-mediated technology that enables infor-
mation, ideas, career interests and other forms of expression to be created or
shared via virtual communities and networks. In today’s world, social media
plays an important role in our lives. It has become an important forum for
expressing opinion and thoughts. Meanwhile, the popularity of internet user is
rapidly increasing that use social media for expressing their opinions.
Sentiment analysis is the automatic mining of attitudes, opinions, and emo-
tions from sources of text, speech, and database using Natural Language Pro-
cessing (NLP). The object of sentiment analysis is to digitally acknowledge and
https://doi.org/10.1007/978-3-030-93247-3_74
774 T. Hyder et al.
express opinion. There are huge number of information in web as well as in social
media. Through Sentiment Analysis we can learn about public opinion about a
particular object or a public figure. It is useful for product reviews, classification
of feedback, opinion mining etc.
Comment section of social networking sites like Facebook and Twitter can
be represented as social media. Such social media will catch the thoughts or
word of mouth of millions of people. People are now sharing facts about their
lives, knowledge, experiences and opinions with the entire world through it. They
express their opinions and state comments to participate in events. As we can see
opinions and comments differ from person to person. A certain thing can have
positive opinion from a individual and negative from another. It is a common
issue in social media as every individual’s thoughts and perspective varies from
another. Social media has been seen as a medium for people to make positive or
negative remarks.
Our proposed framework addresses and analyzes social media comments. At
first, some Facebook profiles are chosen and the data with required parameters
is retrieved. Then the results are pre-processed and primed for the next level
of sentiment analysis. To carry out this function, an LSTM model is built with
suitable parameters. The model has been validated and trained. The comments
are graded according to the score received as positive, negative and neutral. It’s
a part of our job. The next step will be to identify the followers we worked on.
Followers are graded in three categories based on the sentiment of comments.
2 Related Works
In recent years, we have seen enormous works in the field of sentiment anal-
ysis of social media data. There has been different methodology and idea for
implementation for each work.
In [1], the study aims at developing sentiment analysis using lexicons and
polarity in multiplication. D Gurkhe et al. [2] develop a model to train machine
to extract the polarity (positive, negative, or neutral) of a social media dataset in
relation to a query keyword. Using the following technique, this project proposes
an approach for automatically classifying the sentiment of social media data.
Kaur et al. [3] did research on Facebook comments to review and explore
sentiments of users. This project demonstrates a new algorithm written in the
Java programming language.
In [4] sentiment classification on Bangla textual content was done. The
researchers used both classical and deep learning algorithms to build classifiers
for many publicly accessible sentiment named datasets in this research.
Sarkar et al. [5] for detecting sentiment polarity in Bengali tweets, proposed
a method that employs supervised machine learning algorithms. Data cleaning
and preprocessing, Attribute extraction, Model creation and classification are
the phases in the proposed method.
In [6] researchers perform text sentiment analysis based on LSTM technique.
This paper proposes an improved RNN language model, LSTM, that success-
fully covers all background sequence information and outperforms traditional
Developing a System to Analyze Comments 775
RNN. It is used to accomplish text emotional attribute multi-classification, and

it describes text emotional attributes more reliably than a traditional RNN.
Hassan et al. [7] showed sentiment analysis on Bangla and Romanized Bangla
texts using deep learning approach. This paper presents a large textual dataset
of both Bangla and Romanized Bangla texts that is the first of its kind, has been
post-processed, has been multiple checked, and is ready for SA implementation
and experiments.
In [8] the research deals with the sentiment analysis from a Machine Learn-
ing perspective for Facebook comments written and posted in Arabic (Modern
Standard or Dialectal) language.
In [9] a method was proposed for analyzing data from social networks to
classify human behaviour. By crawling public data obtained from online social
network users, a framework was developed for the collection and analysis of large
data.
In [10] researchers suggest methods for correctly classifying the sentiment
label. In this article they concentrate on the division of optimistic and negative
feelings of the tweets.
Nabi et al. [11] use Tf.Idf (date frequency-inverse document frequency) to
obtain a better response and to achieve a more reliable outcome by extracting
various features from a positive, negative or neutral word study, especially from
a view of the Bangla text.
In [12], Murthy et al. suggested a technique for sentiment classification based
on LSTM on text data.
Wahid et al. [13] developed a dataset in Bangla on the cricket comments for
the texts of actual people feelings in three categories, i.e. positive, negative and
neutral. Then utilized word embedding approach to vectorize every word and
used LSTM for long-term dependence.
In [14] Kandhro et al. presented in this work a LSTM model to evaluate the
performance of the instructor using the input from the student. In conventional
techniques, for example, bag-of-word, n-gram, Naı̈ve Bayes and SVM models,
this pattern overcomes numerous faults where order and information on the text
have been removed. The experimental findings have demonstrated that a student
feedback dataset is capable of achieving the latest accuracy.
In essence, we may conclude that most of the preceding works have been
only to perform sentiment analysis on texts. However, work has not yet been
done using social media comments to identify friends category. Thus, we ana-
lyze Facebook comments with the LSTM Model in our suggested technique and
categorize friends or followers on the basis of an analysis of sentiments.
3 System Architecture and Design
Sentiment analysis is a natural language processing technique for determining

the positive, negative, or neutral nature of results. Till this date various method-
ologies and techniques have been implemented for this. Figure 1 shows our
proposed framework. This framework has basically three main steps: (1) Data
776 T. Hyder et al.
pre-processing (2) Sentiment Analysis of data and (3) Categorization of friends

or followers based on sentiment.
Fig. 1. System architecture of proposed framework
Figure 1 illustrates our system. First, we choose popular profiles at random,

in this case celebrities and public figures. The feedback from some of their posts
were then recovered. We collected comments in both English and Bangla from
Facebook. Our job begins after we have gathered the feedback. For our upcoming
sentiment analysis, we need to preprocess the information.
After that, we build a sentiment analysis model that can be used in both
languages. A LSTM model was created. Since the model has been developed,
a direction for testing and assessment has been established. We get a labelled
dataset from the model, where we can find the sentiment score of each comment.
We then list the friends or supporters of that profile user based on the polarity
score of their remarks. We receive a list of friends divided into three groups. In
social media, their replies to a user’s posts determine whether they are nice,
supportive, or aggressive friends/followers.
A proper dataset is needed for implementing our framework. But unfortunately

there exists no standard dataset according to our requirement. On the other
hand, we were unable to use Facebook API to collect data due to their restriction
for COVID-19. So, we built our own dataset. We collected all the datas from
Facebook manually and had to label them. We collected 2617 comments total
in English and Bangla language from 30 user profiles of Facebook. It is divided
into 80:20 for train and test.
The datasets comprise of six columns: (1) User Profile (2) Post ID (3) Fol-
lower name (4) Follower ID (5) Comment and (6) Label of comment.
Data preprocessing is a data mining strategy that entails converting raw data
into a format that can be understood. Real-world evidence is often unreliable,
contradictory, and/or deficient in specific patterns or developments, as well as
containing numerous errors. At first the dataset is splited into two different
datasets based on language. It was done using Unicode and ASCII code. Tok-
enization, Removing mentions and unnecessary data etc. are done. Stemming is
done applying. Porter Stemming Algorithm and stop words are removed using
NLTK tool.
3.3 Sentiment Analysis Model Using LSTM

Long-term dependency is captured by the LSTM (Long Short Time Memory), a
kind of RNN network. They are also commonly used for a wide range of activities
such as speech recognition, text classification, sentimental interpretation, and so
on. The framework can detect long and short patterns in data and removes
the disappearance gradient problem by training RNN. LSTM has been accepted
in numerous applications and the language modeling course seems to be very
promising as well.
LSTM uses three doors, which are input gates, forget gates and exit gates,
to check for the use and the updating of text history records. The memory cell
778 T. Hyder et al.
Fig. 2. Sample input dataset in English and Bangla language
and three gates enable LSTM to read, store and update historical information
from a longer distance. The Tanh function is non-linear. It governs network flow
values, with values from −1 to 1 maintained. The forget gate decides which facts
should be considered and overlooked. The current X(t) input and hidden state
h(t − 1) information is transferred by the sigmoid function. Sigmoid produces
values from 0 to 1. The next thing is to pass through the tanh mechanism the
same knowledge about the secret condition and current state. The tanh operator
will produce a vector (C(t)) with all possible values between −1 and 1 in order
to control the network. The provided output values of the activation functions
are ready to be multiplied point by point. The next move is to determine and
store the new cell state details. Forget vector f(t) multiplied by the previous cell
condition C(t − 1)(t). If the result is 0, so cell status values are lowered. The
output gate decides the next hidden state attribute. This state includes details
of past inputs (Fig. 2).
In our work for building a model for sentiment analysis we split our dataset
into two different part named as Train and Test. The ratio of them is 80:20.
Fig. 3. A LSTM cell
We used a LSTM sequential model with one embedding layer, two LSTM
layer and one dense layer. The input layer contains the tokens that will be fed to
embedding layer. The Embedding layer takes 128000 parameters and generate
a (51, 256) shape output. It enables us to convert each word into fixed length
vector of defined size. The fixed length of word vectors helps us to represent
words in a better way along with reduced dimensions. Then the dropout layer
start working with no parameters and generate a (51, 256) shape output. The
dropout rate is 0.3 (Fig. 3).
780 T. Hyder et al.
Now the 1st LSTM is activated for action with 525312 parameters and gen-
erate a (51, 256) shape output. The 2nd LSTM layer activate with same size of
parameters but generate a 256 size output. Adam optimizer is used to optimize
the model and categorical crossentropy is used as the loss function. A batch size
of 32 is adopted with 20 epoch. Finally, in the output or dense layer we get
three classes of output for the input data. This is in the form of probability. The
classes are 0 (positive), 1 (negative) and −1 (neutral). The target class will have
the highest probability score (Fig. 4).
Fig. 4. LSTM architecture for sentiment analysis
3.4 Friends Categorization
The model is constructed and we test to assess it. The following thing to do
is classify the friends/followers based on their comments. Our model classify
the comments into three categories- positive, negative and neutral comments. It
assigns a score to each of the comments based on the sentiment. According to
the polarity of comments, the persons are grouped into three class mainly Good
friends, Hostile friends and Neutral friends. Friends with maximum number of
positive comments will be considered as good friends, hostile friends will be
the friends with maximum negative comments and neutral friends are there in
between. In the algorithm Positive list, Negative list and Neutral list indicates
Good friends, Hostile friends and Neutral friends list respectively.
4 Implementation and Experimental Result
This framework has been implemented on a machine having windows 10, Intel
Core i3 processor and 8 GB RAM. Keras with TensorFlow library is used. Python
3.6.9 (version) is used for developing it.
4.2 Implementation
A dataset is constructed for this thesis by gathering comments from various

prominent people and celebrities’ Facebook posts. Then the datasets are cleaned
and we start our data pre-processing (Fig. 5).
782 T. Hyder et al.
Fig. 5. Cleaned data
For both datasets removing @mentions, numbers, stop words are done. Tok-
enization and Stemming is also accomplished to get a proper text which will
further help in Sentiment Analysis. It is done in both languages (Fig. 6).
Fig. 6. Data pre-processing for English and Bangla
A LSTM based Sentiment Analysis Model is built. A sequential model is

initialized. One embedding layer, two LSTM layers, one dropout layer and a
dense layer with three output are added with softmax activation function. We use
categorical crossentropy loss and Adam optimizer to train the model. Also, we
set accuracy as the metric for measuring model’s performance. For the training
of LSTM neural network we use batch size = 32 and train the network for 20
epochs (Fig. 7).
Fig. 7. LSTM architecture
The last step is to categorize the followers based on their comments in posts
of those user profiles. We classify them into three groups- (1) Good followers (2)
Hostile followers and (3) Neutral followers.
A list of classified friends is given in following pictures of a user from Facebook
(Fig. 8).
Fig. 8. Classified friends list
4.3 Performance Evaluation
After establishing a system, an adequate assessment is required. It helps to find

out how the framework works correctly. To evaluate the efficacy of our frame-
work we considered few metrics. With the following equations, we determine the
accuracy.
784 T. Hyder et al.
TP
P recision = (1)
TP + FP
TP
Recall = (2)
TP + FN
2 ∗ P recision ∗ Recall
F 1 Score = (3)
P recision + Recall
No of correct predictions
Accuracy = ∗ 100% (4)
Total no of predictions
The accuracy achieved in both cases is 98%. We achieved the following accu-
racy and loss curve (Tables 1 and 2):
Table 1. Performance measure of proposed framework for English
English Precision Recall F1-score

Neutral 0.89 0.99 0.94
Positive 1.00 0.98 0.99
Negative 0.98 0.98 0.98
Table 2. Performance measure of proposed framework for Bangla
Bangla Precision Recall F1-score

Neutral 0.97 0.95 0.96
Positive 0.97 1.00 0.98
Negative 0.99 0.95 0.99
4.4 Comparison with Other Existing Framework
The purpose of the proposed method is to perform sentiment analysis of Face-

book comments and classify the users based on the score of their comment. It
needs to be added that social media friends classification depending on senti-
ment of comments of Facebook is totally new. There have been many sentiment
analysis done on Twitter data (tweets), Movie review, Product review, Facebook
posts i.e. statuses. In [6] sentiment analysis was done using a LSTM and RNN
network and text was classified into positive, negative and neutral. The accu-
racy is 89%. In [7] a Bangla and romanized Bangla dataset was developed and
sentiment analysis is performed. The accuracy of that system is 78%. In that
case, our system performs better having a 98% accuracy (Fig. 9).
Fig. 9. Accuracy and loss curve for [a] English and [b] Bangla
4.5 Future Work
While implementing the system, we experienced many obstacles to achieve our

goal. A proper dataset was missing. We had to build our own dataset which was
time consuming. Moreover, it was not labelled either. Therefore, it was really
challenging for us. There are some scopes in future where we can make it a better
system by changing certain things. Automatic crawling of data from Facebook
will help to achieve a large dataset. It will save time. If dataset is increased, the
system will be more suitable for practical use and accuracy will be better.
The proposed technique is limited to only English and Bangla language data.
In social media, we see large number of transliteration comments. In the future,
this work can be extended to transliteration comments analysis with a good
amount of dataset.
5 Conclusion
In this work, our objective is to develop a system that can analyze comments of
social media i.e. Facebook of a particular user and classify the followers of him.
786 T. Hyder et al.
The main purpose is to help a user to identify his/her good and hostile followers
automatically. Checking through thousands of comments everyday is not very
effective. So, this system will come in handy. We retrieved the comments from
different public figure profiles. Then a sentiment analysis model was created with
an algorithm by us. After pre-processing the datas, they are fed to model and the
model generated required score and result. Another algorithm was introduced to
classify the friends on the basis of the score of their comments. Finally, we could
classify the friends into our decided category. At the end, an adequate accuracy
has been reached for this proposed method.
References
1. Mashuri, M.: Sentiment analysis in Twitter using lexicon based and polarity mul-
tiplication. In: International Conference of Artificial Intelligence and Information
Technology (ICAIIT), pp. 365–368 (2019)
2. Gurkhe, D., Pal, N., Bhatia, R.: Effective sentiment analysis of social media
datasets using Naive Bayesian classification. Int. J. Comput. Appl. 975(8887),
99 (2014)
3. Kaur, R., Singh, H., Gupta, G.: Sentimental analysis on Facebook comments using
data mining technique. Int. J. Comput. Sci. Mob. Comput. 8(8), 17–21 (2019)
4. Hasan, M.A., Tajrin, J., Chowdhury, S.A., Alam, F.: Sentiment classification in
Bangla textual content: a comparative study. In: 23rd International Conference on
Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2020)
5. Sarkar, K., Bhowmick, M.: Sentiment polarity detection in Bengali tweets using
multinomial Naı̈ve Bayes and support vector machines. In: IEEE Calcutta Confer-
ence (CALCON), pp. 31–36. IEEE (2017)
6. Li, D., Qian, J.: Text sentiment analysis based on long short-term memory. In: First
IEEE International Conference on Computer Communication and the Internet
(ICCCI), pp. 471–475. IEEE (2016)
7. Hassan, A., Amin, M.R., Al Azad, A.K., Mohammed, N.: Sentiment analysis on
Bangla and romanized Bangla text using deep recurrent models. In: International
Workshop on Computational Intelligence (IWCI), pp. 51–56. IEEE (2016)
8. Elouardighi, A., Maghfour, M., Hammia, H., Aazi, F.Z.: A machine Learning app-
roach for sentiment analysis in the standard or dialectal Arabic Facebook com-
ments. In: 3rd International Conference of Cloud Computing Technologies and
Applications (CloudTech), pp. 1–8. IEEE (2017)
9. Alam, K.T., Hossain, S.M.M., Arefin, M.S.: Developing a framework for analyzing
social networks to identify human behaviours. In: 2nd International Conference
on Electrical, Computer and Telecommunication Engineering (ICECTE), pp.1–4.
IEEE (2016)
10. Huq, M.R., Ali, A., Rahman, A.: Sentiment analysis on Twitter data using KNN
and SVM. Int. J. Adv. Comput. Sci. Appl. 8(6), 19–25 (2017)
11. Nabi, M.M., Altaf, M.T., Ismail, S.: Detecting sentiment from Bangla text using
machine learning technique and feature analysis. Int. J. Comput. Appl. 153(11),
28–34 (2016)
12. Murthy, D., Allu, S., Andhavarapu, B., Bagadi, M., Belusont, M.: Text based
sentiment analysis using LSTM. Int. J. Eng. Res. Technol. Res. 9(05) (2020)
13. Wahid, M.F., Hasan, M.J., Alom, M.S.: Cricket sentiment analysis from Bangla
text using recurrent neural network with long short term memory model. In: Inter-
national Conference on Bangla Speech and Language Processing (ICBSLP), pp.
1–4. IEEE (2019)
14. Kandhro, I.A., Wasi, S., Kumar, K., Rind, M., Ameen, M.: Sentiment analysis
of students’ comment using long-short term model. Indian J. Sci. Technol. 12(8),
1–16 (2019)
Comparison of Watershed Delineation
and Drainage Network Using ASTER
and CARTOSAT DEM of Surat City, Gujarat
Arbaaz A. Shaikh1, Azazkhan I. Pathan2(&), Sahita I. Waikhom1,

and Praveen Rathod2
1
Dr. S. & S. S. Ghandhy Government Engineering College,
Surat, Gujarat, India
2
Abstract. Accurate delineation of watershed and drainage networks is essential

for hydrological and geomorphological models, water resource management,
flood risk management, change of floodplains, and surface water mapping. This
study aims to examine the accuracy of watershed delineation and drainage
network of Tapi river in Surat city between Digital Elevation Model
(DEM) from ASTER and CARTOSAT in ArcGIS. In this study, free online data
sources from Earthdata Search – NASA and Bhuvan – NRSC websites were
used to delineate watersheds from ASTER and CARTOSAT satellite imageries.
Hydrologic information was extracted from both the DEMs in ArcGIS using
Hydrology Tools. The analysis revealed that the watershed extracted from the
ASTER DEM was having a total area of 6608 km2 and that from
CARTOSAT DEM was 6759 km2. The correlation analysis was carried out to
analyze the performance of watershed areas delineated from ASTER and
CARTOSAT DEM. The R2 value and RMSE value indicated that the
ASTER DEM provided a more accurate estimation of the watershed area
compared to the CARTOSAT DEM. The results from this study have suc-
cessfully shown that both ASTER and CARTOSAT DEM are suitable for
watershed delineation of the Tapi river in Surat city at free and reliable sources.
Keywords: Watershed delineation ASTER Cartosat DEM GIS

Hydrology tool Tapi
1 Introduction
1.1 General
Any surface area from which rainwater runoff is collected and drained through a
common point is referred to as a watershed. A watershed might be as tiny as a few
hectares in the case of small ponds or as large as hundreds of square kilometres in the
case of rivers. Sub-watersheds can be found within any watershed. Rainfall that falls
within the watershed divide runs down to the lowest point in the terrain, where it meets

https://doi.org/10.1007/978-3-030-93247-3_75
Comparison of Watershed Delineation and Drainage 789
a water body (such as a river or a lake) [14, 15]. The outlet, also known as the pour
point, is a point on the surface where water drains out of a region [1]. The outflow is
also the lowest point on a watershed’s boundary.
The estimation of the planimetric area of a watershed is a challenge in hydrology
studies since delineation of boundaries between watersheds is difficult. Delineating
precise watershed borders for the study watersheds is an important part of data
preparation for modelling [2, 16]. However, few studies on watershed delineation,
especially at large scales, have been conducted [3]. GIS is an effective tool for
managing vast and complicated databases, as well as providing a digital representation
of watershed features for hydrologic modelling [4]. The delineation of catchments
using contour lines and triangulated irregular networks is accurate, but it requires a
large amount of data storage and processing time [17]. The extent and accessibility of
data sources, the type and distinctive features of the modelled area, and the correctness
and competency of the GIS database are all aspects that go into a successful geographic
information system (GIS)-based automatic catchment delineation [5, 18].
The extraction of catchment and drainage networks is commonly done using a
digital elevation model (DEM) produced from remote sensing data. Grid DEMs are
digital files that store the elevation values of the landscape at the nodes of a normal
square grid [6]. They can be considered as two-dimensional monochrome photographs
in which the value of a node denotes elevation rather than reflectance. Digital elevation
models (DEMs) have been developed all over the world since the advent of Geographic
Information Systems (GISs) [19]. DEMs are commonly used in watershed modelling
because they provide accurate terrain representations. Using GIS technology, DEMs
may be utilized to derive flow networks and then automatically build watershed borders
for given outlet sites [2, 20].
1.2 Objective of the Study
• The main aim of the study is to examine the accuracy of watershed delineation and
drainage network between Digital Elevation Model (DEM) from ASTER and
CARTOSAT in ArcGIS.
1.3 Scope of Work
• To delineate the watersheds basin in Surat city using Digital Elevation Model
(DEM).
• To prepare watershed boundary, flow direction, flow accumulation, flow length, and
stream ordering using Hydrology tool and determine basin and sub-basin in the city
by using the watershed function in ArcGIS for watershed delineation.
790 A. A. Shaikh et al.
2 Study Area
Fig. 1. Study area
Figure 1 given above shows the study area. Surat is a city in Gujarat, located near the
mouth of the Tapi River in western India. It is Gujarat's second-largest city, after
Ahmedabad, and India's eighth-largest city by population. Surat covers an area of
474.185 sq.km. The Surat city falls within the geographical location of North Latitude
21°10′12.864″ and East Longitude 72°49′51.819″. Surat is one of the prime districts
included in the Tapi river basin. The basin extends over states of Madhya Pradesh,
Maharashtra and Gujarat having an area of 65,145 sq.km with a maximum length and
width of 534 and 196 km. The Tapi is the second largest westward draining river of the
Peninsula. It originates near Multai reserve forest in Betul district of Madhya Pradesh at
an elevation of 752 m [7].
3 Data Collection
In this study mainly Digital Elevation Model (DEM) data is used for watershed
delineation. DEM data are collected from two sources one from Bhuvan NRSC which
is CARTOSAT-1 DEM and another is from Earth data NASA which is ASTER DEM.
3.1 Cartosat-1 DEM

Cartosat-1 was launched on May 5, 2005, by the Indian Space Research Organisation
(ISRO) with the primary goal of delivering 2.5 m in-track stereo high-resolution
satellite data. One of its missions is to create a Digital Elevation Model (DEM) and
ortho-image for the entire country to aid large-scale mapping and terrain modelling.
Cartosat-1 has completed 16 years of operation and has acquired photos over India and
around the world. Each Cartosat-1 segment's DEM and orthoimages are split into tiles
with 7.5′ 7.5′ extents. Approximately 500 Cartosat-1 segments, totalling around
20,000 tile pairings, cover the whole Indian continent. To discover and demarcate
distortions in the Quality Verification (QV) system for further refinement, each tile is
exposed to a quality verification procedure that includes panning and 2.5D draped
viewing. The Tile Editing (TE) technique corrects problems including water-body
abnormalities, hill-top distortions, plain-area sinks, and residual mosaics that occur
during the automatic creation of DEM [8].
3.2 ASTER DEM

The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) is a
high-resolution multispectral imager that was launched in December 1999 on NASA’s
Terra spacecraft. ASTER has 14 bands that cover a large spectrum range from visible
to thermal infrared, with great spatial, spectral, and radiometric precision. The spatial
resolution varies depending on the wavelength: 15 m in the visible and near-infrared
(VNIR), 30 m in the short wave infrared (SWIR), and 90 m in the thermal infrared
(TIR) [2]. Bands 3N (nadir-viewing) and 3B (backward-viewing) of an ASTER Level-
1A image collected by the Visible Near Infrared (VNIR) sensor are used to create the
ASTER Digital Elevation Model (DEM). Two independent telescope assemblies are
included in the VNIR subsystem, which aids in the creation of stereoscopic data. With
a base-to-height ratio of 0.6 and an intersection angle of around 27.7°, the Band-3
stereo pair is collected in the spectral range of 0.78 and 0.86 µm. Between the
acquisition of the nadir and rearward photos, there is around a one-minute lag. View a
graphic displaying the ASTER VNIR nadir and backward-viewing sensors’ along-track
imaging geometry [9].
4 Methodology
4.1 Watershed Delineation

The watershed delineation and the drainage network were derived by following the
methodology described in the flowchart (Fig. 2). Import several DEM files into ArcMap
to cover the study area. DEM raster files were converted into a single DEM file using the
Mosaic tool in the toolbox to combine all DEM files into one continuous raster layer and
then clip raster. The ArcGIS software (version 10.8) with the Hydrology tools extension
used. The first step is to project DEM into a coordinate system in which the horizontal
units of the x y coordinates are in meters, not degrees. It is important to use a DEM with
no depressions or sinks, remove all sinks in the elevation grid from the DEM layer using
the Fill function of the Hydrology toolbox, sinks in elevation data are most commonly
due to errors in the data. These errors are often caused by sampling effects and
the rounding of elevations to integer numbers. As the cell size increases, the number of
sinks in a dataset also often increases, the flow direction raster is generated from the
Fig. 2. Flowchart for automatic extraction of watershed (ArcHydro Model)
DEM. The flow direction raster shows the actual direction of water flow. Each pixel in
the flow direction raster is assigned a slope value, to create a stream network, use the
Flow Accumulation tool to calculate the number of upslope cells flowing to a location.
The output flow direction raster created in a previous step is used as input, where the
cells taking place in the catchment area of each cell are calculated. Flow Accumulation
defines a Stream Network, use the Stream to Feature tool in ArcGIS to convert the grid
to a linear vector file (shapefile). Extract Stream orders, run Snap Pour Point to snap the
pour points to the nearest point of highest flow accumulation. Delineate watersheds
using the Watershed function of the Hydrology toolbox (Fig. 3 and 4).
Fig. 3. Automated methodology of watershed delineation using ASTER DEM

Fig. 4. Automated methodology of watershed delineation using CARTO DEM
4.2 Drainage Density Calculation

Morphometric parameters generated from all DEMs were used to evaluate the watershed
boundary and the accompanying hydrological parameters. The drainage channels for all
three DEMs were calculated using the threshold value in the ArcGIS raster calculator in
different orders. Any DEM's potential drainage density is determined by the applied
threshold value, which can be adjusted according to the DEM's pixel resolution. The
total length of channels per unit area of the watershed is referred to as the drainage
density [10]. When a node becomes a drainage pour point, it is determined by a
threshold value. Using a threshold value, the watershed's stream network was built using
DEM. A pixel that has a flow contributing value less than the threshold value is con-
sidered a flat pixel. To approximate the actual shape of the stream network, the threshold
value used is critical. Drainages can be rated by size since larger drainages have larger
catchment areas than smaller drainages. Dd (drainage density) is computed as follows:
Lu
Dd ¼ ð1Þ
A
Where Lu is the total stream length of all orders (km) and A is the area of the basin
(km2).
4.3 Bifurcation Ratio Calculation

The bifurcation ratio is the number of streams of one order divided by the number of
streams of the next higher order. If the river network’s bifurcation ratio is low, flooding
is more likely since the water will be concentrated in one channel rather than spread
out. The bifurcation ratio (Rb) should remain essentially consistent between stream
orders, according to [11] and [12]. He discovered that stream order was substantially
linked with a variety of water-shed and channel factors, and that stream order could be
utilized to predict the outcome variables.
Nu
Rb ¼ ð2Þ
Nu þ 1
Where Nu is the total number of stream segments of first order and Nu + 1 is the
number of segments of the next higher order.
The linear parameters such as stream number, stream length, total stream length, and
bifurcation ratio were considered to conduct a comparison study. From Fig. 5 it can be
observed that the drainage vector file provided by ASTER DEM has a 5th Strahler
order while CARTOSAT DEM has a 4th Strahler order.
Fig. 5. Stream order network derived from ASTER and CARTO DEM
Stream Number v/s Stream Length

1600
Stream Length (km)
1400
1200
1000
800
600 ASTER
400 CARTOSAT
200
0
0 200 400 600
Stream Number
Fig. 6. Chart showing the variation in stream length with respect to stream number for each
order from the drainage delineated from ASTER and CARTO DEM
Figure 6 shows a line plot stream length vs. stream number for all the stream orders in
the basin derived from ASTER and CARTOSAT DEM. As the stream goes to higher-
order, the stream number and the stream length are found to be higher for
CARTOSAT DEM as compared to ASTER DEM. Figure 7 shows the variation in stream
length for each stream order. The total stream length is found to be 2775.32 km for
ASTER DEM and 2777.81 km for CARTOSAT DEM. The number of streams will vary
in each order, depending upon the threshold value. The smaller threshold value will result
in a denser drainage network, and therefore there will be an increase in the stream length.
Climate conditions over the study area play an important role in defining the hydraulic
response of the watersheds existing in that region [13]. Therefore, a smaller threshold will
result in a denser stream network usually with a greater number of delineated catchments
[13]. Drainage density was derived using a threshold value of 25000 for both the DEMs.
Drainage characteristics of the study area viz. the stream orders and stream length were
obtained from the potential drainage channels from DEMs (Table 1 and Table 2). The
difference in the total stream length obtained from both the DEMs is about 0.1%.
Stream Order v/s Stream Length

1600
Stream Length (km)
1400
1200
1000
800
600 ASTER
400 CARTOSAT
200
0
0 2 4 6
Stream Order
Fig. 7. Chart showing the variation in the stream length for each order from the drainage derived
from ASTER and CARTO DEM
Table 1. Stream ordering

Stream order ASTER DEM CARTO DEM
1st 486 477
2nd 235 243
3rd 101 117
4th 37 49
5th 37 -
Total 896 886
Table 2. Stream length

Stream length details ASTER CARTOSAT
Last order stream length (km) 122.012575 153.960298
Total length of the stream (km) 2775.320953 2777.808033
ASTER Stream Network CARTO Stream Network
Fig. 8. Comparison of stream network derived from ASTER and CARTO DEM
Figure 8 shows the comparison of the spatial patterns of the two drainage networks
in the sub-watersheds of the study area delineated from ASTER and
CARTOSAT DEM. The bifurcation ratio highlights the variation in the drainage
density in different stream orders (Table 3). The bifurcation ratio varies from a mini-
mum of 2 in flat or rolling drainage basins to 3 or 4 in mountainous or highly dissected
drainage basins. Figure 9 shows the variation in bifurcation ratio for all the stream
orders in the basin derived from both the DEMs. The bifurcation ratio for the drainage
from the ASTER DEM and that from CARTOSAT DEM is having a value between 2
and 3, as the drainage basin has a semi-humid tropical climate.
Table 3. Bifurcation ratio

Between stream order ASTER DEM CARTO DEM
1st and 2nd 2.07 1.96
2nd and 3rd 2.32 2.07
3rd and 4th 2.73 2.39
th th
4 and 5 1 -
Bifurcation Ratio
3
2.5
Bifurcation Ratio
2
ASTER
1.5
CARTOSAT
1
0.5
Fig. 9. Chart showing the variation in bifurcation ratio for the drainage delineated from ASTER
and CARTO DEM
To compare the qualitative and quantitative changes in the watershed delineation,

different watershed parameters such as the number of watersheds delineated, the
minimum area of the watershed, the maximum area of the watershed, the average area
of the watershed, and standard deviation of the watersheds area have been calculated
for both the DEMs (Table 4). The ASTER DEM has given 212 numbers of watersheds
while CARTOSAT DEM has given 353 numbers of watersheds. The minimum area of
the watershed has been recorded around 0.000567 km2 for ASTER DEM and 0.000568
for CARTOSAT DEM whereas the maximum area is recorded around 148 km2 for
ASTER DEM and 138 km2 for CARTOSAT DEM. Similarly, the average area of the
watersheds derived from ASTER and CARTOSAT DEMs are 31 km2 and 19 km2,
respectively. After analysing the watershed characteristics, we have found effective
differences in the watersheds area and their boundaries, delineated from both the
DEMs.
Table 4. Watershed characteristics

Watershed area & Hierarchy ASTER CARTOSAT
Number of watersheds 212 353
Min. area of the watershed (sq.km.) 0.000567 0.000568
Max. area of the Watershed (sq.km.) 148.215643 138.049425
Total area (sq.km.) 6608.473093 6759.823047
Average area (sq.km.) 31.172043 19.14964
St. deviation of the watershed area 31.140577 27.862233
Figure 10 shows the variation in watershed boundaries extracted from ASTER and
CARTOSAT DEM. The western part of the study area shows the maximum variation in
watershed boundaries derived from both the DEMs. The area of watersheds derived from
ASTER DEM and CARTOSAT DEM are 6608 and 6759 km2, respectively. The per-
formance of watershed borders drawn from ASTER and CARTOSAT data is examined
using correlation analysis in this study. The area of the watershed was calculated using the
ASTER and CARTOSAT datasets. The correlation coefficient, R2, and trendline graph
analysis were calculated. The better the relationship between the chosen parameters, the
closer R2 is to 1.00. The correlation value of ASTER-delineated watershed areas is
0.8374 whereas the correlation value of CARTOSAT-delineated watershed areas is
ASTER DEM Derived Watersheds CARTO DEM Derived Watersheds
Fig. 10. Comparison of watershed boundaries among ASTER and CARTO DEM
0.7001. The RMSE value was obtained as 12.55 and 15.25 for ASTER DEM and
CARTOSAT DEM, respectively. The resulting R2 was virtually the same for both cor-
relation values, indicating that the region of the watershed demarcated dataset is asso-
ciated at 83% for ASTER and 70% for CARTOSAT. Watersheds produced from ASTER
and CARTOSAT have a moderate association, according to the findings.
6 Conclusion
GIS has become an important tool for obtaining geographic information due to the
rapid growth in information technologies. The study highlights some of the most
commonly utilized features in water resources application such as DEM-based
watershed delineation and network generation. The accuracy of the watershed delin-
eation and stream network generation depends mainly on the quality and accuracy of
the raw DEM. The study explored the feasibility of using ASTER and
CARTOSAT DEM to delineate the watershed properties of the Tapi river in Surat city.
Based on the results of this study it can be concluded that both the DEMs are suitable
for watershed delineation of Tapi river. However, ASTER DEM provided a more
accurate estimation of the watershed area as compared to CARTOSAT DEM. The
selection of threshold is very important because it will directly affect the static char-
acteristics of the entire basin and river network; different thresholds will result in
different forms of river network and catchment. The methodology used in this study
facilitates efficient and consistent watershed delineation on DEMs of any size, which
can be used for water resources management. We could refer to high-resolution DEM
instead of low-resolution DEM to enhance the hydrological characteristics and
parameters. The results will be better when the resolution of DEM is high but the
accuracy of DEM would also matter. The automatic delineation of catchment and
drainage network, illustrated in this study, will allow new researchers in this field to
significantly speed up the process of generating initial catchment delineations.
References
1. Rana, V.K., Suryanarayana, T.M.V.: Visual and statistical comparison of ASTER, SRTM,
and cartosat digital elevation models for watershed. J. Geovisual. Spatial Anal. 3(2), 1–19
(2019). https://doi.org/10.1007/s41651-019-0036-z
2. Pryde, J.K., Osorio, J., Wolfe, M.L., Heatwole, C.D., Benham, B.L., Cardenas, A.:
Comparison of watershed boundaries derived from SRTM and ASTER digital elevation
datasets and from a digitized topographic map. Paper presented at ASABE Annual
International Meeting (Paper Number: 072093). American Society of Agriculture and
Biological Engineers, Minneapolis, Minnesota (2007)
3. Ahmadi, H., Das, A., Pourtaheri, M., Komaki, C.B., Khairy, H.: Redefining the border line
of the Neka river’s watershed with comparing ASTER, SRTM, digital topography DEM,
and topographic map by GIS and remote sensing techniques. Life Sci. J. 9(3), 2061–2068
(2012)
4. Shahimi, S.N.A.T., Halim, M.A., Khalid, N.: Comparison of watershed delineation accuracy
using open source DEM data in large area. In: IOP Conference Series: Earth and
Environmental Science, vol. 767, no. 1, p. 012029. IOP Publishing, May 2021. https://doi.
org/10.1088/1755-1315/767/1/012029
5. Gopinath, G., Swetha, T.V., Ashitha, M.K.: Automated extraction of watershed boundary
and drainage network from SRTM and comparison with Survey of India toposheet. Arab.
J. Geosci. 7(7), 2625–2632 (2014). https://doi.org/10.1007/s12517-013-0919-0
6. Lee, L., Huang, M., Shyue, S., Lin, C.: An adaptive filtering and terrain recovery approach
for airborne lidar data. Int. J. Innovative Comput. Inf. Control 4(7), 1783–1796 (2008)
7. India WRIS. River Basins of India. https://indiawris.gov.in/wiki/doku.php?id=tapi
8. ISRO: CartoDEM a national digital elevation model from Cartosat-1 stereo data. https://
www.nrsc.gov.in/sites/default/files/pdf/cartodem_bro_final.pdf
9. NASA: New Version of the ASTER GDEM | Earthdata. https://earthdata.nasa.gov/learn/
articles/new-aster-gdem
10. Yildiz, O.: An investigation of the effect of drainage density on hydrologic response.
Turkish J. Eng. Environ. Sci. 28(2), 85–94 (2004)
11. Horton, R.E.: Erosional development of streams and their drainage basins; hydrophysical
approach to quantitative morphology. Geol. Soc. Am. Bull. 56(3), 275–370 (1945). https://
doi.org/10.1177/030913339501900406
12. Strahler, A.N.: Quantitative analysis of watershed geomorphology. Eos Trans. Am.
Geophys. Union 38(6), 913–920 (1957). https://doi.org/10.1029/TR038i006p00913
13. Subyani, A.M.: Hydrologic behavior and flood probability for selected arid basins in
Makkah area, western Saudi Arabia. Arab. J. Geosci. 4(5), 817–824 (2011). https://doi.org/
10.1007/s12517-009-0098-1
14. Pathan, A.I., Agnihotri, P.G.: Application of new HEC-RAS version 5 for 1D hydrodynamic
flood modeling with special reference through geospatial techniques: a case of River Purna at
Navsari, Gujarat, India. Model. Earth Syst. Environ. 7(2), 1133–1144 (2021). https://doi.org/
10.1007/s40808-020-00961-0
15. Pathan, A.I., Agnihotri, P.G.: A combined approach for 1-D hydrodynamic flood modeling
by using Arc-Gis, Hec-Georas, Hec-Ras Interface-a case study on Purna River of Navsari
City, Gujarat. IJRTE 8(1), 1410–1417 (2019)
16. Pathan, A.K.I., Agnihotri, P.G.: 2-D unsteady flow modelling and inundation mapping for
Lower Region of Purna Basin using HEC-RAS. Nat. Environ. Pollut. Technol. 19(1), 277–
285 (2020)
17. Pathan, A.I., Agnihotri, P.G.: One dimensional floodplain modelling using soft computa-
tional techniques in HEC-RAS - a case study on Purna Basin, Navsari District. In: Vasant,
18. Pathan, A.I., Agnihotri, P.G.: Use of computing techniques for flood management in a
coastal region of South Gujarat–a case study of Navsari District. In: Vasant, P., Zelinka, I.,
Weber, G.W. (eds.) ICO 2019. AISC, vol. 1072, pp. 108–117. Springer, Cham (2019).
https://doi.org/10.1007/978-3-030-33585-4_11
19. Pathan, A.I., Agnihotri, P.G., Eslamian, S., Patel, D.: Comparative analysis of 1D
hydrodynamic flood model using globally available DEMs–a case of the coastal region. Int.
J. Hydrol. Sci. Technol. 13(1), 92–123 (2021). https://doi.org/10.1504/IJHST.2021.
10034760
20. Pathan, A.I., Agnihotri, P.G., Patel, D., Prieto, C.: Identifying the efficacy of tidal waves on
flood assessment study—a case of coastal urban flooding. Arab. J. Geosci. 14(20), 1–21
(2021). https://doi.org/10.1007/s12517-021-08538-6
Numerical Investigation of Natural Convection
Combined with Surface Radiation in a Divided
Cavity Containing Air and Water
Zouhair Charqui, Lahcen El Moutaouakil, Mohammed Boukendil(&),

Rachid Hidki, and Zaki Zrikem
LMFE, Department of Physics, Faculty of Sciences Semlalia, Cadi Ayyad

University, B.P. 2390, Marrakesh, Morocco
m.boukendil@uca.ac.ma
Abstract. A two-dimensional numerical study of coupled heat transfer by

laminar natural convection and surface radiation in a rectangular cavity is per-
formed. The cavity is divided by a rigid vertical partition into two sub-cavities,
one filled with water and the other with air. The left vertical wall (water side) is
heated uniformly with a temperature TH, while the right vertical wall (air side) is
cooled isothermally with a temperature TC. The horizontal walls are supposed to
be thermally insulated. The surface radiation is considered only in the air cavity.
The authors used the Finite Volume Method to solve the conservation equations,
and the Radiosity Method to determine the radiative transfer between the air
cavity surfaces. The main subject of the present study is to analyze the influence
of some critical control parameters on the convective and radiative heat transfer
by analyzing the streamlines, isotherms, and the mean Nusselt numbers.
Keywords: Numerical simulation Natural convection Surface radiation

Divided cavity Different fluids Air Water
1 Introduction
Natural convection and surface radiation are two heat transfer modes that have been
studied extensively in the last decades. This is due to their occurrence in many engi-
neering applications, such as heat exchangers, nuclear reactors, storage tanks, cooling
of electronic components, etc. In the literature, one can find many papers investigating
these phenomena within enclosures for different geometric configurations and
boundary conditions. Particularly, there are studies that have focused on combined heat
transfer in divided cavities [1–5]. This type of configuration is very interesting because
the introduction of one or more impermeable rigid partitions allows to control (reduce
or intensify) the heat transfer by convection and/or radiation through the cavity in
question. On the other hand, divided cavities may also be used to exchange heat
between two different fluids without mixing them. This case is a generic configuration
representative of several potential applications like solar collectors, internal combustion
engine cooling, or electronic radiator cooling, etc.

https://doi.org/10.1007/978-3-030-93247-3_76
802 Z. Charqui et al.
The research works dealing with cavities confining air and water are the most
studied in the category of divided enclosures filled with different fluids. For example,
Amin et al. [6] analyzed conjugate heat transfer in a three-dimensional cavity divided
into two compartments. Their results indicate that the position of the partition signif-
icantly influences the overall heat transfer rate. On their side, Charqui et al. [7]
numerically studied the heat transfer in a tall cavity divided by a rigid partition
absorbing a solar flux. They found that a multicellular flow can develop in the air
and/or water regions. Pure natural convection occurring in a square enclosure divided
into two regions by a solid partition was investigated by Öztop et al. [8]. They indicated
that, the heat transfer can be improved by bringing the water domain in contact with the
hot wall. In another work [9], the same authors revealed that the conductivity ratio of
the partition, significantly alter the heat transfer and the flow intensity. Another
interesting study was published by Wang et al. [10]. They have numerically investi-
gated the instability mechanisms of conjugate thermal boundary layers in a vertically
divided rectangular enclosure. Their simulation results revealed that the air cavity
aspect ratio is more influential than that of the water cavity.
Other scientific papers have focused on the study of natural convection in divided
cavities confining different fluids other than air and water, namely the one published by
Ikram et al. [11]. They considered a divided cavity filled by two different nanofluids
subjected to a magnetic field. They indicated that the Rayleigh and Hartmann numbers
contribute significantly to the thermal performance in the cavity. In the same spirit,
Selimefendigil et al. [12–14] studied conjugate natural convection in an enclosure with
a solid partition. They found that heat transfer can be optimized by adding low-
conductivity nanoparticles to the cavity in contact with the cold wall. On their side, Nia
et al. [15] examined transient natural convection combined with surface radiation in a
double-space cavity with conducting walls. They showed that heat penetrates easily
when the optical thickness of the gas layer near the heat source is less than that of the
second layer.
Following this literature survey, one may notice that almost all mentioned studies
neglect the effect of surface radiation. This motivated the authors to conduct the present
numerical investigation to clarify the influence of such a heat transfer mode. The
present study considers a bidimensional vertically partitioned enclosure filled with air
and water. The effect of the cavity aspect ratio and the emissivity of the air enclosure
surfaces on the velocity and temperature fields will be analyzed in detail.
2 Mathematical Formulation
The physical model of the studied configuration is illustrated in Fig. 1. It is a rectan-

gular cavity of width L and height H, split into two domains by a rigid vertical partition
placed at X = 0.5. The water-filled domain (Pr = 7) is uniformly heated via its left wall
with a temperature TH, while the air-filled domain (Pr = 0.71) is subjected to a cold
temperature TC from its right wall. The horizontal walls are thermally insulated. Sur-
face radiation is only considered in the air cavity since water has a sufficiently high
opacity that it absorbs almost all the radiation issued from the cavity surfaces. A = H/L
denotes the aspect ratio of the enclosure.
Numerical Investigation of Natural Convection 803
The flow within the two domains is assumed to be laminar and two-dimensional.
The two fluids are supposed to be Newtonian and incompressible. Air is considered
perfectly transparent to surface radiation. The inner surfaces of the right cavity are grey
and diffuse with the same emissivity e. The thermophysical properties of air and water
are independent of temperature, except for the density in the buoyancy term, which is
assumed to satisfy Boussinesq’s approximation.
Fig. 1. Physical model of the divided cavity
The dimensionless equations expressing the conservation of mass, momentum, and

energy in the two mediums and the corresponding boundary conditions are given by:
@U @V
þ ¼0 ð1Þ
@X @Y
2
@U @U @U @P @ U @2U
þU þV ¼ þ Pr a þ ð2Þ
@s @X @Y @X @X2 @Y2
2
@V @V @V @P @ V @2V
þU þV ¼ þ Pra þ þ Pra Rah ð3Þ
@s @X @Y @Y @X2 @Y2
2
@h @h @h @ h @2h
þU þV ¼a þ ð4Þ
@s @X @Y @X2 @Y2
• For all the cavity walls:
U¼V¼0 ð5Þ
• For the adiabatic walls of the water domain:

@h
¼0 ð6Þ
@Y Y¼0;A
• For the adiabatic walls of the air domain:

@h
¼ Nr:Qr ð7Þ
@Y Y¼0;A
• For the extreme left wall:
h¼1 ð8Þ
• For the extreme right wall:
h¼0 ð9Þ
• For the partition:
@hw @ha
K ¼ þ Nr:Qr ð10Þ
@X @X
The dimensionless parameters appearing in Eqs. (1–10) are given by:

(
kw gbðTH TC ÞL3 1 for air
K ¼ ; Ra = ;a ¼ aw
ka aa ma aa for water
ð11Þ
rT4C L q
Nr ¼ ;Q ¼ r
ka ðTH TC Þ r rT4C
The radiosity method is used to determine surface radiation. It consists of solving

the following two systems of equations, giving respectively the radiosity and the net
radiative heat flux lost by the ith surface element:
XN
Ji ¼ eððTr 1Þhi þ 1Þ4 þ ð1 eÞ j¼1
Fij Jj ð12Þ
XN
Qr;i ¼ Ji e j¼1
Fij Jj ð13Þ
Where Tr is the temperature ratio defined by Tr ¼ TH =TC .

The mean convective and radiative Nusselt numbers on the right vertical wall are
given by:

1 Z A @h 1ZA
Nuc ¼ dY; Nur ¼ Nr Qr dY ð14Þ
A 0 @X X¼1 A 0
3 Numerical Procedure and Validation
The staggered finite volume method FVM is used to discretize the equations of the
mathematical formulation. The pressure-velocity coupling is handled using the SIM-
PLE algorithm. The convective and diffusive terms are interpolated by means of the
power-law and centered difference schemes, respectively. A numerical program written
in FORTRAN is developed. The tri-diagonal matrix algorithm TDMA is used to solve
the system of algebraic equations line by line with an iterative procedure.
A mesh refinement test was conducted to find the optimal grid giving accurate
results in a minimum of computational time. This test showed that a uniform grid of
100 100 nodes is a good compromise between computation time and accuracy for all
considered aspect ratios.
Multiple validations against published results in the literature have been conducted
to ensure the credibility of the numerical code outputs. In this study, and for limitation
reasons, only two validations will be presented, considering the following two cases:
(a) Natural convection coupled to surface radiation in an air-filled cavity [16]; (b) Pure
natural convection in a water-filled enclosure [17]. The results have been compared in
terms of mean convective and radiative Nusselt numbers. Table 1 shows that they are in
a good agreement with those reported by the mentioned authors.
Table 1. Comparison between the Nusselt numbers obtained by Wang et al. [16] and Lai and
Yang [17] and those of the present work
Ra Case (a) Case (b)
Nuc Nur Nuc
[16] Code [16] Code [17] Code
104 2.249 2.249 2.401 2.401 2.288 2.273
105 4.189 4.181 5.196 5.196 4.728 4.717
106 7.815 7.780 11.265 11.266 9.129 9.199
4.1 Streamlines and Isotherms

Figures 2 and 3 represent, respectively, the streamlines and isotherms obtained for
different values of the cavity aspect ratio (A = 1, 6, 10) without and with surface
radiation (e = 0, 1) for Ra = 5 106. The form of the streamlines shows that generally
the air flow is more agitated than the water flow. Such agitation increases with A for the
two considered values of e. Nevertheless, it is noted that the intensification of the air
flow is less and less appreciable by increasing the aspect ratio of the cavity. This is due
to the lower effect of horizontal walls on the velocity field in the middle of the cavity
(attenuation of edge effects). One can also notice that increasing this parameter results
in a widening of the main cell constituting the air flow, thus giving rise to a weak
gradient of the velocity field along the vertical direction in the major part of the cavity.
The streamlines also show the development of two recirculation cells near the hori-
zontal walls, indicating that the flow regime pulls towards turbulence by increasing A.
On the other hand, the effect of surface radiation on the flow structure is only appre-
ciable for low values of the aspect ratio. This can also be explained by the attenuation
of the horizontal walls’ effects, as the coupling convection-radiation only occurs
through the adiabatic boundary conditions. Concerning the water flow, one may notice
that it is more stable, ordered and made up of a single cell that occupies almost the
entire cavity. By increasing the aspect ratio, the center of this cell moves downward,
resulting in a weak (strong) velocity field gradient near the top (bottom) wall. Such a
result can be explained by the increase of the partition temperature with Y. Although
surface radiation is only considered in the air cavity, the streamlines distribution in the
left cavity undergoes a notable change for A >> 1 by increasing the emissivity e from 0
to 1. Indeed, surface radiation tends to make the single-cell structure of the flow occupy
the whole domain. This is because such a mode of heat transfer tends to unify the
temperature of the partition.
The isotherms plotted in Fig. 3 show that a very strong temperature gradient occurs
near the vertical walls of the air cavity for the three considered values of A (A = 1, 6,
10). This indicates that the convection regime is fully developed in the air cavity due to
the significant buoyancy force (Ra = 5 106) that drives the natural flow. In the
central part of this domain, it turns out that the isotherms are almost parallel to the
adiabatic walls for A = 1. This shows that the heat extracted from the partition and
released to the cold wall is mainly transported by the fluid in contact with the four
boundaries. For A = 6 and 10, this amount of heat decreases in favor of that carried by
the fluid in the central part. This can be seen clearly by examining the progressive tilt of
the isotherms which were parallel to the horizontal walls for A = 1. Such a result is
explained by the overlapping of the two thermal boundary layers that stand against the
partition and the cold wall. By comparing the isotherms without and with radiation, one
can see that the latter counteracts the effect of A on the distribution of isotherms in the
middle of the cavity. Indeed, the thermal stratification of air in this region decreases for
e = 1. This leads to an increase in the amount of heat transported by the fluid in contact
with the walls at the expense of the heat exchanged through the core of the cavity. This
effect diminishes with A, as the adiabatic walls responsible for the convection-radiation
coupling move away by increasing this parameter. Concerning the water flow, Fig. 3
shows that the isotherms occupy much more the bottom of the cavity in the absence of
surface radiation. This result is expected since the partition temperature increases with
Y, which favors the buoyancy force by approaching the lower adiabatic wall. On the
other hand, one can notice that by increasing e from 0 to 1, the isotherms become
ε=0
ε=1 ε=0 ε=1 ε=0 ε=1

A=1 A=6 A = 10
Fig. 2. Streamlines obtained for different values of the aspect ratio at Ra = 5 106 without and
with surface radiation
parallel to the vertical walls and occupy the entire domain equally. In other words,
surface radiation favors the conduction regime in the left cavity.
4.2 Mean Heat Transfer

Figure 4 shows the variations of the convective Nusselt number, calculated on the right
wall, with the aspect ratio for (a) e = 0 and (b) e = 1 and different Rayleigh numbers
(Ra = 105, 106, 5 106). It shows that by increasing A, the convective heat transfer
progressively decreases with an asymptotic behavior towards the following values: Nu
(Ra1) = 2.75, Nu(Ra2) = 5.06 and Nu(Ra3) = 7.18. Such a result is predictable because
increasing the cavity aspect ratio approaches the case of natural convection occurring
between two differentially heated infinite planes. This shows that in practice (e.g., in
the building industry), there is no interest in increasing or decreasing the aspect ratio of
the cavity in question (air layer, thermal insulation, double pane windows, Trombe
walls, etc.) since the Nusselt number is not sensitive to it (A >> 1). Considering surface
ε=0
ε=1 ε=0 ε=1 ε=0 ε=1

A=1 A=6 A = 10
Fig. 3. Isotherms obtained for different values of the aspect ratio at Ra = 5 106 without and
with surface radiation
radiation (Fig. 4(b)), the curves of Nuc still keep the same asymptotic behavior, but this
time towards values lower than those found for e = 0. Indeed, the new limits found are
Nu(Ra1) = 1.73, Nu(Ra2) = 3.22, and Nu(Ra3) = 4.63.
The variations of the radiative and total Nusselt numbers for e = 1 are plotted in
Figs. 5(a) and (b), respectively. It is noted that the radiative Nusselt number undergoes
a significant increase by varying the aspect ratio from 1 to 4, indicating a strong
convection-radiation coupling in this range of A. Beyond A = 4, Nur is characterized
by an asymptotic behavior towards: Nur(Ra1) = 13.93, Nur(Ra2) = 13.75, and
Nur(Ra3) = 14.71, reflecting a weakening of the convection-radiation coupling. Such a
result is also predictable since we approach the case of radiative heat transfer between
two infiniteplanes when A >> 1. The analytical expression of this flux is given by
4
Nur ¼ Nr 1 T M
TC , where TM is the mean temperature of the partition sepa-
rating the two fluids. Regarding the total Nusselt number, it appears that from A = 2, it
is almost independent of the cavity aspect ratio for Ra1 = 105 and Ra2 = 106 despite
(a) (b)
Fig. 4. Variations of the mean convective Nusselt number against the aspect ratio for different
values of the Rayleigh number: (a) e = 0 (b) e = 1
the significant variation of the radiative Nusselt number for 1 A 4. However, for
Ra3 = 5 106, Nut undergoes a non-negligible decrease of about 7.82% before it
stabilizes at Nut = 19.35.
(a) (b)
Fig. 5. Variations of the mean radiative (a) and total (b) Nusselt numbers against the aspect ratio
for different values of the Rayleigh number at e = 1
5 Conclusion
A numerical study of combined natural convection and surface radiation in a divided

cavity filled with air and water was presented in this work. A mathematical model
expressing the principles of conservation of mass, momentum, and energy was
adopted. The related equations have been solved numerically by the finite volume
method using an in-house CFD solver. The object consists of studying the effect of
some control parameters. Namely: the aspect ratio of the enclosure, the emissivity of
the air cavity surfaces, and the Rayleigh number. The numerical results revealed that
the distributions of the velocity and temperature fields in the two mediums are
influenced by the aspect ratio and the emissivity. On the other hand, the convective
(radiative) heat transfer decreases (increases) with the aspect ratio following an
asymptotic behavior, indicating the irrelevance of increasing or decreasing this
parameter in practical applications. In contrast, the emissivity radically influences the
overall heat transfer through the enclosure.
Acknowledgment. This research was supported by the National Center for Scientific and
Technical Research CNRST [grant number 6UCA2020].
References
1. Ferhi, M., Djebali, R., Abboudi, S., Kharroubi, H.: Conjugate natural heat transfer scrutiny
in differentially heated cavity partitioned with a conducting solid using the lattice Boltzmann
method. J. Therm. Anal. Calorim. 138(5), 3065–3088 (2019). https://doi.org/10.1007/
s10973-019-08276-8
2. Khatamifar, M., Lin, W., Armfield, S., Holmes, D., Kirkpatrick, M.: Conjugate natural
convection heat transfer in a partitioned differentially heated square cavity. Int. Commun.
Heat Mass Transfer 81, 92–103 (2017)
3. Khatamifar, M., Lin, W., Dong, L.: Transient conjugate natural convection heat transfer in a
differentially-heated square cavity with a partition of finite thickness and thermal
conductivity. Case Stud. Therm. Eng. 25, 100952 (2021)
4. Chordiya, J.S., Sharma, R.V.: Numerical study on effect of corrugated diathermal partition
on natural convection in a square porous cavity. J. Mech. Sci. Technol. 33(5), 2481–2491
(2019). https://doi.org/10.1007/s12206-019-0445-4
5. Saha, S., Barua, S., Kushwaha, B., Subedi, S., Hasan, M.N., Saha, S.C.: Conjugate natural
convection in a corrugated solid partitioned differentially heated square cavity. Numer. Heat
Transf. Part A Appl. 78(10), 541–559 (2020)
6. Amin, M., Tushar, F.A., Hossen, D., Saha, S.: Conjugate natural convection in three-
dimensional differentially heated cubic partitioned cavity filled with air and water. In:
Proceedings of the 13th International Conference on Mechanical Engineering AIP
Conference Proceedings (2021)
7. Charqui, Z., Moutaouakil, L.E., Boukendil, M., Hidki, R.: Numerical study of heat transfer
in a tall, partitioned cavity confining two different fluids: application to the water Trombe
wall. Int. J. Thermal Sci. 171, 107266 (2022)
8. Oztop, H.F., Varol, Y., Koca, A.: Natural convection in a vertically divided square enclosure
by a solid partition into air and water regions. Int. J. Heat Mass Transf. 52(25), 5909–5921
(2009)
9. Varol, Y., Oztop, H.F., Koca, A.: Effects of inclination angle on conduction-natural
convection in divided enclosures filled with different fluids. Int. Commun. Heat Mass Transf.
37(2), 182–191 (2010)
10. Wang, H., Lei, C.: A numerical investigation of conjugate thermal boundary layers in a
differentially heated partitioned cavity filled with different fluids. Phys. Fluids 32(7), 074107
(2020)
11. Ikram, M.M., Saha, S.: Conjugate natural convection and entropy generation under uniform
magnetic field in a partitioned square cavity filled with two different nanofluids. In:
Proceedings of the 13th International Conference on Mechanical Engineering, AIP
Conference Proceedings (2021)
12. Selimefendigil, F., Öztop, H.F.: Corrugated conductive partition effects on MHD free
convection of CNT-water nanofluid in a cavity. Int. J. Heat Mass Transf. 129, 265–277
(2019)
13. Selimefendigil, F., Öztop, H.F.: Conjugate natural convection in a cavity with a conductive
partition and filled with different nanofluids on different sides of the partition. J. Mol. Liq.
216, 67–77 (2016)
14. Selimefendigil, F., Öztop, H.F.: MHD natural convection and entropy generation in a
nanofluid-filled cavity with a conductive partition. In: Exergetic, Energetic and Environ-
mental Dimensions, pp. 763–778 (2018)
15. Nia, M.F., Nassab, S.A.G., Ansari, A.B.: Transient combined natural convection and
radiation in a double space cavity with conducting walls. Int. J. Therm. Sci. 128, 94–104
(2018)
16. Wang, H., Xin, S., Quéré, P.L.: Numerical study of natural convection-surface radiation
coupling in air-filled square cavities. C.R. Mec. 334(1), 48–57 (2006)
17. Lai, F.-H., Yang, Y.-T.: Lattice Boltzmann simulation of natural convection heat transfer of
Al2O3/water nanofluids in a square enclosure. Int. J. Therm. Sci. 50(10), 1930–1941 (2011)
Key Factors in the Successful Integration
of the Circular Economy Approach
in the Industry of Non-durable Goods:
A Literature Review
Marcos Jacinto-Cruz1, Román Rodríguez-Aguilar2(&),

and Jose-Antonio Marmolejo-Saucedo1
1
Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin 498,
03920 Mexico City, Mexico
2
Facultad de Ciencias Económicas y Empresariales, Universidad Panamericana,
Augusto Rodin 498, 03920 Mexico City, Mexico
Abstract. Nowadays consumers are more informed about the characteristics of

the products they are buying and the services they are using as well as their
respective environmental impacts. The nondurable goods industry is the closest
to consumers in everyday life, therefore awareness of the environmental impacts
of these products has gained greater attention from consumers. In response to
increased consumer demand for environmental attributes, the nondurable goods
industry has begun to apply circular economy guidelines in its supply chain, in
addition to complying with new environmental regulations in various countries.
This research addresses a literature review to identify the key factors that allow
the correct implementation of the circular economy approach in the non-durable
goods industry. Among the main factors identified are the voice of the customer,
the traceability and collection of empty containers, as well as and efficient
international environmental regulation.
Keywords: Circular economy Non-durable goods Consumer behavior

Sustainability
1 Introduction
Consumer goods are formally considered as any tangible product produced and sub-
sequently purchased to satisfy the current wants and perceived needs of the buyer.
Consumer goods are divided into three categories: durable goods, nondurable goods,
and services. Non-durable consumer goods are products for everyday use in normal
daily activities, from mouthwash in the morning, soap, shampoo, to skincare products
at night. As they are products with close contact with the consumer, they allow the
consumer's environmental preferences to be indirectly captured. In recent years, a
greater preference has been identified for certain segments that promote care for the
environment or are associated with a lower environmental impact. The positioning of

https://doi.org/10.1007/978-3-030-93247-3_77
Key Factors in the Successful Integration 813
the brands through showing social responsibility and concern for the environment has
made it possible to capture an important segment of consumers.
The segment of non-durable consumer goods represents a broad segment that
reflects the tastes and preferences of the population, and indirectly the demand for
environmental objectives in its production. That is why this research seeks to identify,
through a systematic review of the literature, which are the key factors that allow
solving these environmental demands through the adoption of the principles of the
circular economy in the segment of non-durable goods. To properly define Circular
economy (CE), we should define what Conventional or Linear Economy is (LE) in the
first place [1]. LE consists of extracting raw materials from nature for final con-
sumption or to be used as inputs. Then at the end of its useful life, that waste will be
disposed of, ideally in a regulated landfill (Cradle to Grave approach). On the other
hand, CE is the use of resources within closed-cycle systems, reducing pollution or
avoiding the loss of resources, the extracted resources are maintained in a use-recovery
cycle that seeks to have zero waste [2]. This is the main difference between LE and CE,
avoiding the generation of waste and making the most of the extracted natural
resources. Marketing campaigns and corporate communications have been primarily
responsible for making CE such a widespread concept, from advertising on television
and social media to eco-claims on product labels, the CE concept is closer than ever
from consumers, however, far from the general acceptance of this approach, its
implementation requires overcoming great challenges [3].
The CE concept is usually used by various stakeholders in the supply chain and
with differentiated objectives [4]. Within the operation of the supply chain, the
adoption of the circular economy approach would imply different objectives depending
on the link in the supply chain, financial and operations areas may not have the same
perception and objectives as the Environmental Health and Safety areas within a
company. However, CE is a broader concept, CE seeks to rebuild capital, be it
financial, manufactured, human, social, or natural, and offers opportunities and solu-
tions for all organizations. It is closer to a sustainability model than to an environmental
approach [5]. There are many examples of CE approaches for different products and
services. However, studies on nondurable goods seem to be preferred by academia and
industry, precisely because of their marketing, prestige, and economic impact on large
corporations and in general the visibility of plastic waste in the streets, the ocean, etc.
In addition to the existence of environmental regulations in the production of non-
durable goods in several countries. Since 2010, the circular economy approach to
nondurable goods has been formally addressed by the Ellen McArthur Foundation and
adopted by large companies such as Nestlé, PepsiCo, The Coca-Cola Company, etc.
[6]. This was an industry response to consumer demands, as well as to adopt in advance
the expected changes in environmental legislation at the international level.
The main objective of this research is to identify through a systematic literature
review which are the key success factors for the implementation of the circular
economy approach in the non-durable goods industry. The work is structured as fol-
lows, in the first section the process of literature review and selection of texts is
described, later the main results identified according to the search carried out are
shown, in the next section the success factors are detailed identified, finally the con-
clusions, recommendations and references are presented.
814 M. Jacinto-Cruz et al.
2 Methodology
2.1 A Systematic Literature Review
This paper follows a methodology based on a systematic literature review, taking into
account a set of keywords and their respective synonyms, specialized databases, and
previously defined literature exclusion criteria [7]. The systematic literature search
focused on research where the circular economy concept was applied to real cases
(regardless of the industry). A search was carried out in a set of specialized databases
using “and” and “or” connectors (for example “circular economy” + “consumer
goods”) to identify the different possible combinations according to the defined key-
words. It is worth mentioning that in a first search using as selection criteria those
works that included the concept of circular economy, more than 2000 works were
identified in 2021, it is interesting to observe the significant growth in scientific pro-
duction related to the concept of the circular economy since in 2011 there were only
less than 100 related research products (Fig. 1).
Fig. 1. Papers on the search string circular economy, 2011–2021.
In the first selection, papers that included the keywords in the article title, abstract,
and keywords were considered. From this first combination of keywords, a total of 101
results were obtained, all in English. As a second filter, it was determined to select jobs
related to non-durable goods, especially with the food, beverage, and cosmetic
industries. Of these 101 articles, those that did not contain the keyword “circular
economy” in their title were discarded (58 articles were excluded). To be more precise
in the selection of works, it was decided to exclude those cases also related to the
fashion industry, discarding those works that included the words “fashion” or “textile”.
Prioritizing only the food and cosmetic industries. After applying this new exclusion,
39 jobs remained. Table 1 illustrates the inclusion/exclusion criteria followed in the
present systematic literature review.
Table 1. Inclusion-exclusion criteria

Inclusion criteria Exclusion criteria
CE + CG related Not having CE in the article title
CE in the article title Including keyword “textile” or
“fashion”
Consists of one or several elements that can help to Durable Goods mentioned as the
explain the CE approach in the NDG industry center of the study
Of these 39 articles, 13 were not fully accessible through Scopus, so they were also
discarded. After this step, 26 articles remained. Of these articles, a complete review was
carried out, seeking to prioritize jobs related to non-durable goods and to present a real
case of the application of the circular economy approach in the industry. For all cases
found in the literature, those that were related to beverage and cosmetic packaging
waste were identified and selected, since they are the wastes generally identified as the
main pollutants in different ecosystems (especially in ocean ecosystems) [8].
3 Results
3.1 General Results

To structure the article discrimination process, these were classified according to the
levels of agreement with the objective of the paper, level 1 represents a general level of
agreement and level 5 represents a specific level of agreement (Fig. 2).
Fig. 2. Results in every level of analysis.
After reviewing the 26 articles, we discovered that some of them were not even related
to a CE approach for any specific industry or business case [9, 10]. Other works were
more related to consumer goods durable goods [11, 12]. Despite having an excellent
approach to CE in their respective industries, it was decided to include only those works
where NDG was the focus of the publication. There were some exceptions in this final part
of the systematic literature review, there are a couple of final articles selected from the
original group of 26 that might not exactly talk about a CE approach in an NDG study,
however, these articles include key factors for a successful implementation of the CE
approach in general. Examples of this exception are [13, 14]. The year 2016 was the most
prolific year for research products related to the circular economy. Therefore, it is not
surprising that the selected articles have been published in the last 3 years, with 2018
being the oldest published article. The distribution in magazines of final works is con-
centrated in a small set of journals, however, 50% of the publications are concentrated in
the Journal of Cleaner Production (Fig. 3). This finding could mean that cleaner pro-
duction includes being co-responsible for waste generation after NDG containers are
empty and not leaving all responsibility to consumers [15].
Fig. 3. Results distributions in journals.
The 26 articles based on case studies were selected for thematic analysis. Three of
these were actual business cases for waste generated for NDGs packaging or related to
it. All articles in this category were published after 2018. However, three other papers
were selected since they could help to provide a mainframe for a Circular Economy
approach in the NDG industry [16].
3.2 Key Factors in the Successful Integration of the CE Approach

in the Industry of NDGs
The CE approach to the NDG is possible and urgent to implement. NDGs have a
relatively short life, less than three years according to [17]. However, the plastic
containers of these NDGs take a long time to degrade in nature, sometimes hundreds or
thousands of years. There are clear examples of the implementation of CE in the NDG
industry. [15] studied an EC economics approach in Mexico for empty plastic soft
drink bottles. Legislation related to co-responsibility for post-consumer waste is weak
in Mexico, nevertheless, a group of companies decided to start a business model to help
consumers recycle empty soft drink containers and, at the same time, improve their
reputation and avoid possible sanctions by the government. Another example of the CE
approach in the NDG industry has been studied by [18], it is shown that in the
Netherlands recycled plastics made from post-consumer plastic packaging with a
conventional recycling process should be considered mixtures. The CE approach in the
NDG industry is fully feasible, success depends on companies investing in the
technology and suitability to implement this paradigm in their supply chains. It doesn’t
have to be perfect, there are very few examples where the CE approach has high
returns. Even medium yields mean a high positive environmental impact since these
actions reduce the amount of plastic packaging that reaches the sea and the environ-
mental impact that this means.
According to [19], a key factor for successful integration of the EC in the plastics
industry includes the concerns expressed by consumers and the opportunity for a
positive reputation. Therefore, a positive reputation should be a key factor to incen-
tivize companies to adopt the circular economy approach in their supply chains. For
their part, [17] argue that for a CE approach in the industry to be successful, it must
overcome the following barriers and challenges: a) Technical and Technological, b)
Legislative, c) Economic and d) Sociocultural. Based on the evidence collected in this
literature review, we develop the barriers listed by [19] and cite them as the main key
factors for implementing a CE approach in the NDG industry. These identified criteria
coincide with several of the articles analyzed.
a) Technical and Technological: Technical and Technological: A key factor for the
implementation of a CE approach in the NDG industry is technology. The capacity
to collect empty containers, sort them, and transport them to recycling or co-
generation plants. In Mexico, there are only a few co-generation and recycling
plants owned by the Government, many of the CE approaches in the industry of
NDG has been done by the actual soda [18]. Without the technical and techno-
logical skills and resources, no CE approach is possible for any industry.
b) Legislative: Legislative: Many of the changes done by the NDG industry are
addressed because of changes in the legislation. For example, new legislation in
Mexico City banned plastic buds, this banning made a transnational company
change its famous plastic cotton bud sticks to the paper. There are other cases in
which clusters of companies make the first move and anticipate changes in the
legislation, that is the case of CANIPEC (National Chamber of the Cosmetic
Products Industry) in Mexico City presenting a business model to collect empty
plastic containers of cosmetic products and reintroduce up to 20% of PCR (post-
consumer resins) in new cosmetic containers (body lotions containers, shampoos
containers, etc.).
c) Economic: Avoiding legal fines is indeed taking care of the finance of a company,
but it does not represent the strongest economic indicator to consider when imple-
menting a CE approach in the NDG industry. For the board of a company to adopt
the idea of a CE approach it is needed to present a very detailed business case that
shows how the investment in technology and new raw materials can represent an
economic enabler for the long term by saving costs and creating new markets [14].
d) Socio-cultural: Nowadays more than ever before, Consumers are empowered to
decide from plenty of NDG in the market. There are hundreds of options when
selecting a product. Cultural factors affect the way consumers dress, eat, and buy
products which in turn has an impact on the business economy [20]. Thus, ana-
lyzing the new socio-cultural trends and the way organizations communicate the CE
approach in the NDG that different communities use is key to being effective in the
implementation of this business model.
4 Conclusions and Recommendations
The main objective of this article was to provide a review of the literature on EC
approaches in the GND industry. In the early levels of the systematic review, it was
noticeable that almost half of the results were related to the definition of actual CE
rather than business cases. All the empirical cases found in the last six selected articles
were published after 2017. Applied CE is becoming a notorious issue in both academia
and industry. There are case studies of the application of the CE approach in the NDG
industry [18]. However, each industry and each country are a different universe that
must be approached and studied in detail to be effective in the transition from LE to CE.
The barriers found by [17] are the key factors for successful integration of the CE
approach in the NDG industry: Technical and Technological; Legislative; Economic;
Sociocultural. [14] also, identify key factors such as government policies and consumer
behavior, however, these two may fit into the legislative and socio-cultural key factors
respectively.
While there is a real concern and willingness to lessen the environmental impacts of
NDG packaging and operations, a lack of understanding of the CE concept sometimes
leads corporate strategies to greenwash a label, product, or operation instead of
applying true CE models [10]. The main limitation of this study is that the great
diversity of the NDG industry could limit the external validity of the conclusions, what
might work for a soft drink company might not work for a brand of body lotions. The
types of plastic that contain those NDGs are also extremely different. Therefore, trying
to get a general idea of what are the key factors in the successful integration of the CE
approach in the NDG industry may seem overly ambitious. Another limitation is the
number of articles that are CE applied cases to the NDG industry.
Academia and industry are encouraged to continue to build empirical research
based on the barriers they encounter in the transition from LE to CE. We recommend
Environment and Sustainability professionals working in Consumer Goods companies
get involved in developing a CE strategy for a successful transition from LE to CE.
Lead and advise their organizations on the importance of this transition for their
business. Supporting their advice on new legislation on environmental protection or
plastic ban could be a catalyst for this change to occur. Finally, we recommend a real
sustainable approach to CE in the NDG industry, this document explores Legal,
Environmental, and Economic Factors, however, Social Responsibility and Commu-
nity Impact in the transition from LE to CE remains to be explored.
References
1. Korhonen, J., Honkasalo, A., Seppala, J.: Circular economy: the concept and its limitations.
Ecol. Econ. 143, 37–46 (2018)
2. Winans, K., Kendall, A., Deng, H.: The history and current applications of the circular
economy concept. Renew. Sustain. Energy Rev. 68, 825–833 (2017)
3. Kopnina, H.: Teaching circular economy: overcoming the challenge of green washing. In:
Handbook of Engaged Sustainability, pp. 1–25 (2017)
4. Kirchherr, J., Reike, D., Hekkert, M.: Conceptualizing the circular economy: an analysis of
114 definitions. Resour. Conserv. Recycl. 127, 221–232 (2017)
5. Stahel, W.R., MacArthur, E.: The Circular Economy: A User’s Guide. Routledge, New York
(2019)
6. Stahel, W.R.: The circular economy. Nature News 531(7595), 435 (2016)
7. Bjørnbet, M.M., Skaar, C., Fet, A.M., Schulte, K.Ø.: Circular economy in manufacturing
companies: a review of case study literature. J. Clean. Prod., 126268 (2021)
8. Wabnitz, C., Nichols, W.J.: Plastic pollution: an ocean emergency. Mar. Turt. Newsl. 129, 1
(2010)
9. Hobson, K., Holmes, H., Welch, D., Wheeler, K., Wieser, H.: Consumption work in the
circular economy: a research agenda. J. Clean. Prod., 128969 (2021)
10. Laurenti, R., Martin, M., Stenmarck, A.: Developing adequate communication of waste
footprints of products for a circular economy—a stakeholder consultation. Resources 7(4),
78 (2018)
11. Bressanelli, G., Perona, M., Saccani, N.: Reshaping the washing machine industry through
circular economy and product-service system business models. Procedia CIRP 64, 43–48
(2017)
12. Onete, C.B., Albastroiu, I., Dina, R.: Reuse of electronic equipment and software installed
on them–an exploratory analysis in the context of circular economy. Amfiteatru Econ.
20(48), 325–339 (2018)
13. Ksenija, K., Sharon, P., Dale, W., Fiona, C.: Future scenarios for fast-moving consumer
goods in a circular economy. Elsevier Futures 1(12), 1–15 (2018)
14. Patwa, N., Sivarajah, U., Seetharaman, A., Sarkar, S., Maiti, K., Hingorani, K.: Towards a
circular economy: an emerging economies context. J. Bus. Res. 122, 725–735 (2021)
15. Schwanse, E.: Recycling policies and programs for PET drink bottles in Mexico. Waste
Manage. Res. 29(9), 973–981 (2011)
16. Kuah, A.T., Wang, P.: Circular economy and consumer acceptance: an exploratory study in
the east and southeast Asia. J. Clean. Prod. 247, 119097 (2020)
17. Paletta, A., Leal Filho, W., Balogun, A.-L., Foschi, E., Bonoli, A.: Barriers and challenges to
plastics valorization in the context of a circular economy: case studies from Italy. J. Clean.
Prod. 241, 118149 (2019)
18. Brouwer, M.T., van Velzen, E.U.T., Augustinus, A., Soethoudt, H., De Meester, S., Ragaert,
K.: Predictive model for the Dutch post-consumer plastic packaging recycling system and
implications for the circular economy. Waste Manage. 71, 62–85 (2018)
19. Gong, Y., Putnam, E., You, W., Zhao, C.: Investigation into circular economy of plastics:
the case of the UK fast moving consumer goods industry. J. Clean. Prod. 244, 118941 (2020)
20. Nair, S.S., Gulati, M.G.: Understanding the effect of cultural factors on consumers moods
while purchasing gold jewelry: with reference to brand Tanishq. In: Optimizing Millennial
Consumer Engagement with Mood Analysis, pp. 298–318. IGI Global (2019)
Profile of the Business Science Professional
for the Industry 4.0
Antonia Paola Salgado-Reyes and Roman Rodríguez-Aguilar(&)
Facultad de Ciencias Económicas y Empresariales, Universidad Panamericana,

Augusto Rodin 498, 03920 Mexico City, Mexico
Abstract. The development of the fourth industrial revolution called Industry

4.0 has generated a significant boost in areas related to information technology.
However, this development has permeated into other areas such as business
sciences. Based on a systematic literature review, the main areas of development
of business sciences were identified within the framework of industry 4.0, this,
in turn, generates the need to update profiles and capacities for professionals.
The large areas identified are auditing, finance, accounting, and planning, among
others. The need for the comprehensive development of all areas of knowledge
in the digital age is evident. Changes in the mode of production, trade, and
interaction between individuals have permeated all areas and business science is
no exception.
Keywords: Industry 4.0 Human resources Business science Digital

economy Education Professional profile
1 Introduction
The process of digital transformation in the world is continuous, technological

development and massive connectivity have allowed the implementation of a set of
technologies that are part of the so-called industry 4.0. The general framework of
Industry 4.0 consists of the evolution of the mode of production, commercialization,
and interaction between economic agents. The integration of these technologies in
economic society has triggered a transformation process linked to the generation of
technical capabilities of professionals linked not only to those who participate in the
development and implementation of technological innovations linked to industry 4.0.
The need has been triggered for the group of professionals to acquire additional
knowledge to their area of concentration, for the correct use of new technologies in
their discipline.
In recent years the area called the digital economy has been developed, which
integrates the effect of digital technology on production and consumption into tradi-
tional economic analysis. As well as the consideration of assets with a zero-marginal
cost, such as the internet. The integration of digital technologies into the economic
model has triggered organizational, economic, and even political changes in different
countries. According to [1] the digital economy requires an adequate ecosystem for its
development, which depends on the development of three fundamental pillars, business

https://doi.org/10.1007/978-3-030-93247-3_78
Profile of the Business Science Professional for the Industry 4.0 821
infrastructure, digital business, and digital commerce. The Economic Commission for
Latin America and the Caribbean (ECLAC) for its part includes within the ecosystem
of the digital economy, the telecommunications infrastructure, the Information, and
Communications Technology Industries (ICT), and the networks of economic and
social activities facilitated thanks to the internet, cloud computing, mobile and social
networks and remote sensors [2].
The final link in the value chain would be in the users (people, companies, and
government) who demand services and applications, and precisely at this level of
development is where the development of digital capacities in professionals and
especially in those is necessary related to business sciences. Knowledge has always
been considered from an economic point of view as a fundamental asset for the eco-
nomic development of any country. In the digital economy, the processing of infor-
mation, tracking of transfers and transactions in real-time allows optimizing the
operations of companies. However, not all countries have the same level of develop-
ment and access to digital technologies, in the same way, there is a generation gap
between the population that was born with the new technologies and those who have
had to adapt to change. The gaps increase depending on age, educational level, or even
gender [3]. The social impact of this transformation process in education is a challenge
for all countries, universities and companies must collaborate to develop professional
profiles that evolve under the new reality.
One of the biggest challenges is the intersection of areas traditionally linked to
certain professions such as ICT and economic-administrative areas. The increased skills
required will also impact jobs linked to routine activities that could be automated. The
challenges posed by the fourth industrial revolution require the joint work of the
government, companies, and educational centers to achieve correct technological and
social integration. Concerning the exposed problem, this work seeks to identify through
a systematic literature review the integration of the need to strengthen or transform the
profiles of professionals specialized in business sciences. The work is structured as
follows, the first section presents the systematic literature search strategy carried out in
specialized databases, then the main results of the search process and selection of
articles for in-depth review are presented. The main findings identified based on the
reviewed articles are presented below, and finally, the conclusions are presented.
2.1 Search Process and Selection Criteria

In conducting the exploration of scientific literature, a set of keywords, synonyms and
descriptors were defined that were the basis for constructing the different search
strategies. Table 1 presents the selected concepts as well as the synonyms used in the
literature search carried out. It is worth mentioning that within the search in specialized
databases, theses and degree works were also considered.
822 A. P. Salgado-Reyes and R. Rodríguez-Aguilar
Table 1. Keywords used in the search criteria

Concept Synonyms
Industry 4.0 Smart industry, Smart manufacturing, Digital Industry
Education Training, Study, Teaching, Learning
Professionals Graduate, Specialist, Expert
Employment Job, Recruitment, Enrollment, Employ
Internet of things Industrial Internet, Internet of Everything, Web of Things
Digital twin Digital copy, Digital replica, Mathematical counterpart
Data science Data analytics, Data-driven science, Data mining
Fourth industrial revolution Industry 4.0, Smart industry, Smart manufacturing
Job profile Professional-profile, Professional-outline, Job specification
Requirements business Business needs, Performance requirements,
Blockchain Blockchains can, Digital ledger, Distributed ledger
Human resources Staff, Forces, HR, Personnel
University Colleague, Academy, Institute
Artificial intelligence Robotics, AI, Expert systems, Machine Learning
Automation Mechanization, Machine control, Systematization
Business Trade, Dealing, Commerce
Accounting Analyst, Auditor, Bookkeeper
Administration Management, Managing, Direction
Finance Business, Investment, Banking
Once the keywords were identified, the different search combinations were per-
formed in the databases of citations and abstracts of articles, theses, journals, and
conferences within the framework of industry 4.0, education, etc., using “AND” “OR”
connecting connectors between classifiers. Likewise, 39 different combinations of
words were made for each of the proposed bases. As an example of a search engine
performed on Google Academic: Search No. 1: (“Industry 4.0”) AND (“data man-
agement”). On the other hand, for each search, inclusion, and exclusion criteria were
used on the title to eliminate articles not related to the topic of interest. The inclusion
criteria corresponded to words related to the subject of analysis. Figure 1 shows the
process and result that was obtained after deleting documents, to move on to the second
phase of review and selection, which consisted of reading the abstract and eliminating
again non-useful articles for the topic in question.
Fig. 1. Search process and selection criteria.
2.2 Data Description

To know a current perspective of the state-of-the-art aspects in the profile of the
professional in Industry 4.0, a systematic literature review was carried out. The review
considered the following aspects related to industry 4.0: a) evaluation of its current and
future implementation, b) standards for professionals and the education, and c) eco-
nomic impact. With the information generated by the search, a database was integrated
with the articles generated from each search from which the articles to be fully
reviewed were filtered. The filter was carried out by mentioning and frequencies in
which the same keywords were presented within the identified documents. The data-
base included a set of variables and the full text corresponds to the corpus (Table 2).
Table 2. Database description

Variable Description Kind of variable
Title Paper title String
Autor Authors String
Year Year of publication Number
Search keywords Keywords used in the search String
Keywords in text Frequency of keywords in each text Number
Citations Number of citations Number
Kind Kind of paper String
Abstract Abstract String
Since the professional profile in business sciences is of particular interest, this aspect
was given priority over the selection of articles for a full reading. The 60 selected articles
were analyzed in detail to identify the main trends in the state of the art. This database
was used for the detailed analysis of the selected articles. It is important to mention that
in addition to the systematic review, machine learning techniques were used with the
database to perform the pertinent filters of the information.
3 Results
3.1 Summary Statistics
The results show an increase in the total number of articles generated over time. The
selection made to identify articles related to the professional profile in business sciences
significantly decreased the results obtained. Identify a total of 60 articles related to the
topic of interest in the last 20 years. The review found that the distribution of the
articles is within the period 2000–2020 (Fig. 2).
Fig. 2. Generation of publications in the last 20 years.
Analyzing the distribution of products by type, it is observed that the vast majority
of publications focus on scientific articles, followed by graduate thesis as well as
conference proceedings (Fig. 3).
Total 60
Conference 6
Chapter 1
Thesis 7
Journal 3
Paper 43
Fig. 3. Distribution by type of publication.
The distribution of the articles identified in the analysis period shows the increase in
the production of articles mainly. In 2000 there was only one publication that addressed
the subject and in 2019 there are 12 identified scientific articles and 3 theses. The pace
of scientific production has occurred gradually, starting from the dissemination of the
concept of Industry 4.0, and from its development and implementation, it has per-
meated into other areas of knowledge. Although the total number of products in the
period of analysis seemed very low, in reality, it is a transformation process that we are
currently experiencing, so it is expected that in the short term this issue will be more
prolific in the research generation.
Analyzing the citations generated by the selected publications, we can see that the
increase in citations has been significant, which positions the topic in related scientific
production. In total, the articles show 868 citations, and the conference proceedings
have 1762 citations (Table 3).
Table 3. Number of citations by type of text

Type of text 2000 2016 2017 2018 2019 2020 Total
Paper 7 134 413 268 46 868
Journal 43 1 44
Thesis 3 3
Chapter 18 18
Conference 1229 492 41 1762
3.2 Main Findings

According to the detailed analysis of the articles selected for full reading, three large
areas were identified related to the profile of business science graduates within the
framework of the implementation of Industry 4.0: 1) professional competencies, 2)
digital economy and business management, and 3) implications of artificial intelligence
in education.
Professional Competences
The development of technical skills and new professional profiles demanded in the
framework of Industry 4.0 are necessary to face the threat of technological unem-
ployment. [4] described 21 digital competencies encompassed in 5 areas, which are the
follows: Information and data literacy, communication and collaboration, digital con-
tent creation, security, and solving problems. To face the digital transformation process,
the competencies that the new professional profiles must have is knowledge in the
Internet of things, cybersecurity, cloud computing, additive printing, augmented reality,
data analytics, and artificial intelligence, with the digital transformation we face the
debate of creation and destruction of different jobs. In terms of the student environ-
ment, there is a lack of careers that can offer such knowledge is why industry pro-
fessionals are trained in graduate programs [5].
Industry 4.0 assumes as the foundation of its operation the existence and devel-
opment of the following digital technologies: cloud computing; big data, mobile
applications, geolocation, the internet of things, mobile robots, virtual work has given
rise to new forms of employment, which are combined with non-traditional workplaces
and schedules, with the use of digital technologies and new contractual forms other
than traditional labor contracting [6]. Twenty-five skills were determined in five broad
categories: physical and manual, basic cognitive, higher cognitive, social and emo-
tional, and technological. Within each category are more specific skills, lifelong
learning, teaching, and training skills were separated from higher cognitive skills,
although some of the former requires higher cognitive abilities [7–10].
Accounting and auditing are progressing towards digitization, blockchain technol-
ogy is the technology where the reliable status of systems is built, useful in those spaces
where information is stored in an orderly manner, it also serves to verify the identity of
those who interact in the networks, to track transactions systems, the blockchain
technological device brings research opportunities for teachers and also for students
[11, 12]. Technology is constantly evolving, and it is up to workers and society, in
general, to adapt to it. Future employment will be based on the intellectual and cog-
nitive capacity of individuals and their creative and social skills. It is not so much a
question of replacing personnel it is more of supplementing and improving techniques
[13]. Nowadays many companies will have great problems in finding professionals
qualified in digital areas and many jobs and positions will be lost due to its feasibility of
being automated. The human being must assume this new scenario as a challenge to
respond to the demands posed by the digital world, the management of Big data
becomes a unique tool for understanding the needs of human beings and their behavior
[14]. With the advent of the fourth industrial revolution, it is vital that technological
developments exist, for the business profession it helps to optimize and facilitate
various processes, which allows taking advantage of more time to perform other
activities and provide added value to the position and service provided [15].
Digital Economy and Business Management
The digital economy and business management have had a great impact on digital
transformation, global innovation and digital integration processes have been generated
that have influenced the way of doing business and managing them. The application of
these approaches allows for the solution of many types of specific problems and the
generation of new business models, which is becoming a competitive advantage for
companies. However, as with any technology, its application presents equally profound
and far-reaching challenges [16]. Industry 4.0 will make it possible to improve effi-
ciency in factories, achieving greater productivity and optimizing time, costs, and
materials. In an increasingly globalized world, with a changing market and more
stringent customer requirements, this system makes it possible to manufacture different
individual products for each customer in the same production run [17, 18]. The digi-
tization or hybridization of products and services allows the expansion and improve-
ment of existing products. To improve them, intelligent sensors are added to collect
new information, integrating with data analysis tools. The massive analysis of data
from these sensors makes it possible to detect habits, patterns, and needs that are not
yet covered by the market [19–21]. Supply chain management has been widely
influenced by new digital trends, aspects such as the use of digital twins in manage-
ment, logistics 4.0, custom product design, as well as smart manufacturing are
examples of the new ones trends in production and service [22–26].
AI has a strong influence on digital financial business in areas related to risk
detection, measurement, and management, addressing the problem of information
asymmetry, availing customer support and helpdesk through chatbots, and fraud
detection and cybersecurity. AI is influencing wealth management through robot

advisors that provide automated financial planning services like tax planning advice,
insurance advice, health, investment advice, etc. [27]. Expert systems are the Artificial
Intelligence technique that was first used in credit risk analysis systems. Most of the
expert systems in the field of creditworthiness analysis consist of two modules and are
also based on classification rules, which are obtained from the experience accumulated
by one or more human experts [28]. Innovation is a task to be developed daily, it is a
continuous and dynamic process, not seasonal. Innovation must go hand in hand with
corporate social responsibility, it must strive to be an instrument that provides a
competitive advantage for the company and at the same time a real benefit for all
stakeholders and the environment. Also, to impact the business strategy for the gen-
eration of value in the supply chain [29–31].
Implications of Artificial Intelligence in Education
Industry 4.0 brings together at least nine integrated technologies that require new
professional skills: Big Data, the ability to collect, store and analyze large amounts of
data to identify inefficiencies and bottlenecks in production. New teaching/learning
methodologies will be developed, computer scientists and networks of professionals in
various disciplines will be required, remote, virtual and interactive teaching and
research will be implemented, new computer programming skills will be introduced at
technical, technological, and professional levels not only for computer science experts
but for all related disciplines that make use of these technologies [32]. One of the ways
to carry out the change with education is to include project-based educational models,
which will generate in the student the capacity for critical development to be able to
implement knowledge in the solution to real situations that occur in the day-to-day in
any organization. The student is responsible for forming new logical processes that
adapt to the generation of responses to everyday problems [33].
The main challenge of the future in organizations is not the lack of job opportunities
but rather the challenge is the skills and competencies that will be needed in new jobs,
new professions appear that are aimed at preparing in physical sciences, systems,
analytics, statistics, and their job will be to analyze the large amounts of information or
Big Data that allow effective decision making and achieve the objectives of companies
[34]. Education is the most powerful tool that can be used to respond to the need to
update and improve the skills of an increasing number of people and also throughout
their lives, digital technologies provide new possibilities regarding where, how, and
when to learn and teach [35]. In traditional models, one learns from what has already
been done and acts from what has been done, in the new ones the way to learn is
through the presence in emerging futures [36].
Academic life is being renewed along with its productions, processes, and tasks;
training, teaching, learning, research, curriculum, etc., everything is being disrupted by
innovation, university education in the fourth industrial revolution points towards the
innovative university based on research. The future of the university, then, will lie in a
short time in the formula (F+I+D+i) to prepare in three or perhaps two generations to
the formula ia+I+D+ii (intelligence+research+development+intelligent innovation)
[37]. Education must innovate to support change based on the principles of the circular
model; preserve natural capital by balancing resources, Education must innovate to face
the threat of automation and the gap in competencies required by the irruption of
Artificial Intelligence that shape an uncertain labor horizon with the disappearance of
multiple jobs for which it must provide human competencies that machines do not
have; with dynamic strategies in a process of continuous empowerment, which tran-
scends teaching, learning and where academic degrees cease to be evidence of
employability [38].
The Fourth Industrial revolution needs people with extraordinary knowledge and
creative ideas. In the same way, the future of education emphasizes the need to look
ahead to utilizing internet sources to prepare the workforce for the challenging envi-
ronment. These kinds of technologies powered by artificial intelligence are so much
transforming the world that social concepts such as “post-work” are more and more
defining the present period, this period requires certain skills, these skills are critical
thinking, people management, emotional intelligence, judgment, negotiation, cognitive
flexibility, as well as knowledge production and management [39]. From the earliest
levels such as the infant level to the highest graduate standards, one of the key
mechanisms by which AI will impact education will be through applications related to
individualized learning including ethical aspects. The application of AI can, in some
ways, be seen as a viable solution, as automated assistance about student support
allows for a new and compelling perspective on the dynamism of learning [40–47].
4 Conclusions
From the literature review carried out, it is possible to highlight the significant increase
in the number of articles and conference proceedings related to the concept of Industry
4.0 in recent years. The change in the production, commercialization, and consumption
model has permeated various areas. A growing interest is identified in the technical
skills required by professionals in various disciplines and their relationship with
Industry 4.0. In the special case of business sciences, the need for technological skills
related to the concept of the digital economy is identified, skills that have not been fully
integrated into the curricula of administrative sciences at least in undergraduate
degrees. In the same way, the work areas of business science professionals have been
highly influenced by the technological innovation observed in recent years, areas such
as auditing, finance, accounting, and business require a comprehensive profile that
considers the general management of new technologies related to industry 4.0. The
influence of these technologies has impacted the education and updating of business
science professionals, it is necessary to take into account that it is a dynamic and
growing process so that the adaptability of companies, governments, and universities is
highly important to avoid the generation of a gap in capacities and human resources
within countries and between countries. A key aspect of this transformation process is
the necessary interaction between the stakeholders interested in the implementation of
this new paradigm.
References
1. Mesenbourg, T.L.: Measuring Electronic Business: Definitions, Underlying Concepts, and
Measurement Plans. US Census Bureau, Washington DC (2000)
2. Economic Commission for Latin America and the Caribbean (ECLAC). Digital economy for
structural change and equality. CEPAL, Santiago de Chile (2013)
3. Tapscott, D.: The Digital Economy: Promise and Peril in the Age of Networked Intelligence.
McGraw-Hill, New York (1995)
4. Naji, M.J.: Industria 4.0, competencia digital y el nuevo Sistema de Formación Profesional
para el empleo. Relaciones Laborales y Derecho del Empleo 6(1) (2018)
5. Palomés, X.P., Tuset-Peiró, P.: Los nuevos perfiles profesionales en el marco de la Industria
4.0. Oikonomics Revista de economía, empresa y sociedad 12, 5 (2019)
6. Cedrola Spremolla, G.: Economía digital e Industria 4.0: reflexiones desde el mundo del
trabajo para una sociedad del futuro. Revista Internacional y Comparada de Relaciones
laborales y derecho del empleo 6(1), 262–297 (2018)
7. Bughin, J., Hazan, E., Lund, S., Dahlström, P., Wiesinger, A., Subramaniam, A.: Skill shift:
automation and the future of the workforce. McKinsey Glob. Inst. 1, 3–84 (2018)
8. Rivero, P., Mota, M.: Evolución de las Habilidades Laborales en la Industria 4.0 y su
Impacto Financiero. Revista Innova ITFIP 6(1), 106–119 (2020)
9. Arranz, F.G., Blanco, S.R., San Miguel, F.J.: Competencias digitales ante la irrupción de la
Cuarta Revolución industrial. Estudos em Comunicação 1(25) (2017)
10. Piwowar-Sulej, K.: Human resource management in the context of Industry 4.0. Organizacja
i ZarzƒÖdzanie: kwartalnik naukowy (2020)
11. Macias, H.A., Farfán, M.A., Rodríguez, B.A.: Contabilidad digital: los retos del blockchain
para académicos y profesionales. Revista Activos 18(1) (2020)
12. Hong, S., Seo, C.R.: Developing a blockchain-based accounting and tax information in the
4th industrial revolution. J. Korea Convergence Soc. 9(3), 45–51 (2018)
13. Rodríguez Sanz, S.: Los posibles efectos de la inteligencia artificial sobre el empleo [Tesis
de grado en Economía, Universidad de Valladolid]. Repositorio Institucional-Universidad de
Valladolid. Facultad de Ciencias Económicas y Empresariales (2019)
14. Botache, L.P.C., Osorio, N.M.H., Castillo, E.S.: Professional Competences of the Economic,
Administrative, and Accounting Sciences in the Framework of Industry 4.0. EasyChair
Preprint (2020)
15. Montes Buriticá, M., Marín Giraldo, K.: ¿Qué impacto tiene la cuarta Revolución Industrial
en la profesión contable en Colombia? [Tesis de grado en Contaduría Pública, Tecnológico
de Antioquia]. Repositorio Institucional-Tecnológico de Antioquia. Facultad de Ciencias
Administrativas y Económicas (2020)
16. Rego, A.Z., López, I.P., Bringas, P.G.: Inteligencia artificial: una aproximación desde las
finanzas. Boletín de Estudios Económicos 75(229), 99–117 (2020)
17. López, P.: Análisis de Casos de Estudio sobre Industria 4.0 y Clasificación según Sectores de
Actividad y Departamentos Empresariales. Repositorio Institucional UPV (2016)
18. Jacquez-Hernández, M., Torres, V.: Modelos de evaluación de la madurez y preparación
hacia la Industria 4.0: una revisión de literatura. Ingeniería industrial. Actualidad y Nuevas
Tendencias 6(20), 61–78 (2018)
19. Ruiz Cófreces, J., Dejo Oricain, N.: Industria 4.0: aplicaciones en la gestión empresarial
[Tesis de grado en Administrción y Dirección de Empresas, Universidad Zaragaoza].
Repositorio Institucional-Universidad Zaragaoza. Facultad de Economía y Empresa (2017)
20. Bearzotti, L.A.: Industria 4.0 y la Gestión de la Cadena de Suministro: el desafío de la nueva
revolución industrial. Gaceta Sansana 3(8) (2018)
21. Morales, P.G., España, J.A., Zárate, J., González, C., Frías, T.: La nube al servicio de las
pymes en dirección a la industria 4.0. Pistas Educativas 39(126) (2017)
22. Uhlemann, T.H.J., Lehmann, C., Steinhilper, R.: The digital twin: realizing the cyber-
physical production system for Industry 4.0. Procedia CIRP 61, 335–340 (2017)
23. Witkowski, K.: Internet of Things, big data, Industry 4.0–innovative solutions in logistics
and supply chains management. Procedia Eng. 182, 763–769 (2017)
24. Wagner, R., Schleich, B., Haefner, B., Kuhnle, A., Wartzack, S., Lanza, G.: Challenges and
potentials of digital twins and Industry 4.0 in product design and production for high-
performance products. Procedia CIRP 84, 88–93 (2019)
25. Qi, Q., Tao, F.: Digital twin and big data towards smart manufacturing and industry 4.0: a
360-degree comparison. IEEE Access 6, 3585–3593 (2018)
26. Buisán, M., Valdés, F.: La industria conectada 4.0. ICE Revista de Economía 898, 89–100
(2017)
27. Mhlanga, D.: Industry 4.0 in finance: the impact of artificial intelligence (AI) on digital
financial inclusion. Int. J. Financ. Stud. 8(3), 45 (2020)
28. Suárez, J.D.A.: Técnicas de inteligencia artificial aplicadas al análisis de la solvencia
empresarial. Documentos de trabajo. Universidad de Oviedo. Facultad de Ciencias
Económicas (2000)
29. Muñoz, L.D.C.: Elementos clave de la innovación empresarial. Una revisión desde las
tendencias contemporáneas. Revista Innova ITFIP 6(1), 50–69 (2020)
30. Scavarda, A., Daú, G., Scavarda, L.F., Goyannes Gusmão Caiado, R.: An analysis of the
corporate social responsibility and the Industry 4.0 with focus on the youth generation: a
sustainable human resource management framework. Sustainability 11(18), 5130 (2019)
31. Nagy, J., Oláh, J., Erdei, E., Máté, D., Popp, J.: The role and impact of Industry 4.0 and the
internet of things on the business strategy of the value chain—the case of Hungary.
Sustainability 10(10), 3491 (2018)
32. Carvajal, J.: La cuarta Revolución Industrial o Industria 4.0 y su impacto en la educación
superior en ingeniería en Latinoamérica y el Caribe. In: 15th LACCEI International Multi-
Conference for Engineering, Education, and Technology: Global Partnerships for Devel-
opment and Engineering Education. Boca Ratón, EE UU (2017)
33. Martinez Bahena, E., Campos Perez, A., Escamilla Regis, D.: Industry 4.0 and the digital
transformation... A new challenge for high education. Working paper (2019)
34. Martínez Valdez, R.I., Catache Mendoza, M.D.C., Huerta Cerda, Z.M.: La Cuarta
Revolución Industrial (4RI) y la Educación de Negocios: Un estudio comparativo de
programas de posgrado en México y Estados Unidos de América. VinculaTégica-EFAN
(2018)
35. Willcox, K., Sarma, S., Lippel, P.: Online Education: A Catalyst for Higher Education
Reforms. MIT, Cambridge (2016)
36. Echeverría Samanes, B., Martínez Clares, P.: Revolución 4.0, competencias, educación y
orientación. Revista digital de investigación en docencia universitaria 12(2), 4–34 (2018)
37. Pedroza Flores, R.: La universidad 4.0 con currículo inteligente 1.0 en la cuarta revolución
industrial. RIDE. Revista Iberoamericana para la Investigación y el Desarrollo Educativo 9
(17), 168–194 (2018)
38. Cardona, D.: Implicaciones de la Cuarta Revolución Industrial en la Educación. Congreso
Iberoamericano: La educación ante el nuevo entorno digital (2018)
39. Suganya, G.: A study on challenges before higher education in the emerging fourth industrial
revolution. Int. J. Eng. Technol. Sci. Res. 4(10), 1–3 (2017)
40. Ocaña-Fernández, Y., Valenzuela-Fernández, L.A., Garro-Aburto, L.L.: Inteligencia artifi-
cial y sus implicaciones en la educación superior. Propósitos y Representaciones 7(2), 536–
568 (2019)
41. Cortés, C.B.Y., Landeta, J.M.I., Chacón, J.G.B.: El entorno de la industria 4.0:
implicaciones y perspectivas futuras. Conciencia tecnológica 54, 33–45 (2017)
42. Guzmán, D.S.: Industria y educación 4.0 en México: un estudio exploratorio. Implicaciones
de la industria 4.0 en la educación superior, vol. 39 (2019)
43. Benešová, A., Tupa, J.: Requirements for education and qualification of people in Industry
4.0. Procedia Manuf. 11, 2195–2202 (2017)
44. Mahomed, S.: Healthcare, artificial intelligence, and the fourth industrial revolution: ethical,
social and legal considerations. S. Afr. J. Bioethics Law 11(2), 93–95 (2018)
45. Intelligent Computing and Optimization. Conference proceedings ICO 2018. Springer,
Cham, ISBN 978-3-030-00978-6 (2018). https://www.springer.com/gp/book/9783030009
786
46. Intelligent Computing and Optimization. Proceedings of the 2nd International Conference on
Intelligent Computing and Optimization 2019 (ICO 2019). Springer International Publish-
ing. ISBN 978-3-030-33585 (2019). https://www.springer.com/gp/book/9783030335847
(2021). https://doi.org/10.1007/978-3-030-68154-8
Rainfall-Runoff Simulation and Storm Water
Management Model for SVNIT Campus
Using EPA SWMM 5.1
Nitin Singh Kachhawa(&), Prasit Girish Agnihotri,

and Azazkhan Ibrahimkhan Pathan
Department of Civil Engineering, Sardar Vallabhbhai National Institute

of Technology, Surat, India
Abstract. To prevent flooding in urban areas, the drainage systems should be

designed with adequate capacity. A dynamic rainfall-runoff model and SWMM
are developed for the SVNIT campus using EPA SWMM version 5.1 to access
the capacity of the existing drainage system. This model is useful to understand
the response of catchment for a given rainfall event. The catchment area was
divided into 11 different sub-catchments by careful observation of homogeneous
blocks. In this study, the Modified Green-Ampt infiltration model was used to
compute the infiltration losses and the kinematic wave routing method was used
for flow routing analysis. The location of the overflow section and flooded nodes
were identified for the single rainfall event.
Keywords: EPA SWMM 5.1 Modified Green-Ampt infiltration model

Kinematic wave routing Rainfall-Runoff simulation
1 Introduction
The problem of urban flooding is become a trend around the globe due to the variation
in rainfall trend with frequent extreme event also the land-use variability with rapid
development increases the impermeable surface area. Urban flooding is one of the
mankind disasters in urban areas. During the monsoon periods metropolitan cities like
Mumbai, Hyderabad, Chennai and Kolkata are frequently affected by urban flood.
Urban flood is caused by various reason such as inadequate carrying capacity of storm
drains, blockage of storm drains, improper planning of storm network, encroachment of
water bodies and changes in rainfall pattern. Thus, it is important to perform the
capacity analysis of existing storm network to identify the locations of overflow sec-
tions so as to come with prevention measures. The storm water drainage system is an
infrastructure which controls the water to prevent the flood and quick disposal of
surface water.
Ekmekcioglu et al. (2021) used EPA-SWMM model for the investigation of low
impact development (LID) effectiveness with reference to sustainable urban flood
storm water [1]. The developed model was calibrated using Parameter ESTimation
(PEST) tool and, the Nash-Sutcliffe efficiency coefficient (NSE) was computed as 0.809
after optimization, using NSE as objective function. Harianti and Sulaiman (2021)

https://doi.org/10.1007/978-3-030-93247-3_79
Rainfall-Runoff Simulation and Storm Water Management Model 833
assesses Universitas Gadjah Mada Yogyakarta drainage system using EPA SWMM 5.1
software, which includes the hydrological and hydraulic analysis, maintenance and
management analysis, land-use analysis [2]. Kian et al. (2021) studied the academic
complex of Universiti Teknologi PETRONAS (UTP), which was facing the problem of
flooding [3]. They designed the Bio-Ecological Drainage System (BIOECODS) by
using EPA SWMM, and suggested the mitigation measure as providing a new flow
channel for diversion, realigning of drainage slope based on the location of flooded
node. Dell et al. (2021) focused on Green Stormwater Infrastructure (GSI) and
developed Storm Water Management Model for Low Impact Technology Evaluation
(SWMM-LITE) that enables the assessment of municipal scale stormwater with min-
imum data and processing time [4]. Mhapsekar et al. (2021) determine the adequate
dimension of stormwater drainage system for Viva Institute of Technology, using
SWMM software, and determine the location of flooded area, identified the reasons as
aerial topography, land use [5]. Vidya, (2021) compared the runoff volume for
stormwater management model (SWMM) and soli conservation service (SCS-CN)
model, and concluded that SCS-CN models and SWMM models gives better results for
large rainfall depth and small rainfall depth [6]. Choo et al. (2021) developed a virtual
model using EPA SWMM, and analyse the effect of dam height on flood reduction for
Hoedong dam, situated on the Suyeong [7]. They found that the increasing in the height
of dam by 3m, 4m and 6m, reduces the flooding by 27%, 37% and 48% respectively.
Metto et al. (2021) developed the rainfall runoff model for Eldoret town using EPA
SWMM5 [8]. This model was developed using the catchment characteristics, observed
rainfall and runoff data. This model was calibrated and validated using observed runoff
data, optimized model attains a value of 0.97 and 0.99 for calibration and validation,
which gives the promising results. Parnas et al. (2021) evaluated the performance of
three infiltration model – Green-Ampt, Horton, and Holtan in urban sandy soil under
urban hydrological models SWMM and STORM [9]. They suggested that the field
measurements of infiltration rate are required to develop a robust SWMM model as
these models are sensitive to maximum and minimum infiltration capacity for dry and
wet condition respectively. Pachaly et al. (2021) used SWMM version 5.1.013 to
analyse the closed-pipe fast transient flows using the Preissmann slot pressurization
algorithm, and concluded that the SWMM is capable of handling the closed pipe
transient simulations [10]. Campisano et al. (2018) checked the novelty of
EPA SWMM for the simulation of the filling phase in intermittent water distribution
networks [11]. Thus, exposed the new perspective of analysis for intermittent type wate
distribution networks.
Wanniarachchi and wijesekera (2012) used SWMM5 software for the ungauged
area of Matara Municipal Council, and concluded that channel rougness is the most
sensitive parameter, by optimizing the channel roughness along with the provision of
detention storages, reduces the peak discharge up to 30% [12]. Warsta et al.
(2017) developed the sub-catchment generator program to automate the tedious task of
model development process in SWMM [13]. The peak discharge value was same for
manually developed model and automatically generated model. Barco et al.
(2008) developed Storm Water Management Model for Ballona Creek Watershed in
Southern California by integration of geographic information system and constrained
optimization method [14]. The optimized roughness value was higher, as compared to
834 N. S. Kachhawa et al.
obtained from land use information using GIS. Hossain et al. (2019) performed the
comparative study of single event and continuous event modelling in EPA SWMM
with reference to the direct runoff hydrograph and total runoff hydrograph, and con-
cluded that single event-based modelling outperforms the continuous event-based
modelling [15].
2 Study Area
Sardar Vallabhbhai National Institute of Technology (SVNIT) is situated in Surat,

Gujarat, India. The SVNIT campus is selected as the study area. The satellite image of
SVNIT campus is obtained from the Google Earth as shown in Fig. 1. The dimension
of image was measured by using ‘measure distance and area’ tool available in Google
Earth. The horizontal and vertical distance of image is measured using ‘measured area
and distance tool’ and found as 3747 feet and 2371 feet respectively.
The backdrop image selector dialog, available in EPA SWMM 5.1 was used to use
the google image as a backdrop image. The storm water network model was drawn by
using the google image as a backdrop image.
Fig. 1. Google earth image of SVNIT campus [17]
3 EPA SWMM Software
The EPA SWMM (Environmental Protection Agency Storm Water Management

Model) is a dynamic rainfall-runoff model, which can be used for single event or
continuous event-based simulation of runoff quality and quantity generated from the
catchments. This kind of modelling is used for planning, analysis and design related to
the storm water drainage systems, especially in urban areas.
EPA SWMM is an open-source public software developed on Windows based

desktop program. SWMM offers an integrated environment for editing the input data,
running of hydraulic, hydrologic and water quality simulations. SWMM display the
result in variety of ways like, color-coded catchment area, time series table and graphs,
profile plots and statistical frequency analysis. SWMM software can be applied in
variety of ways. Which include designing of drainage system component, sizing the
detention facilities, mapping of flood plains, and controlling runoff using green
infrastructure facilities.
4 Methodology
The development of SWMM comprise of following steps, which is described as the

flow chart in Fig. 2.
Fig. 2. Flow chart of methodology
5 SWMM Inputs
To develop the model under EPA SWMM platform, the following data was imported.
Which includes map dimension, sub-catchments, sub-catchments characteristics, ele-
vations, outlet of sub-catchments, conduits and rainfall. Each of these components are
discussed in detail as follows.
5.1 Map Dimensions

The google earth image was geo-referenced using the rectangle co-ordinate system
using the co-ordinate as given below.
The lower left point of image is (0, 0) and the upper right point of image is (3747,
2371). These measured distance on map was in feet.
5.2 Sub-catchments
The desired study area was divided in 11 sub-catchments. The sub-catchments were
labelled as S1, S2, S3, S4, S5, S6, S7, S8, S9, S10 and S11 as shown in Fig. 3. These
sub-catchments were drawn such that each sub-catchment produce runoff at an outlet,
based on the slope of catchment, further connected to the storm water network.
Fig. 3. Catchment visualization in SWMM
5.3 Sub-catchment Characteristics

The Infiltration Editor dialog is used to specify values for the parameters that describe
the rate at which rainfall infiltrates into the upper soil zone of a sub-catchment pervious
area. It is invoked when editing the Infiltration property of a sub-catchment. The
Modified Green-Ampt infiltration model is selected for the project. As the SVNIT
campus mainly have silty clay soil. All the parameters of Modified Green-Ampt
infiltration model are taken from Table 1. as described below. Where K, W, /, FC and
WP are hydraulic conductivity (in/hr), suction head (in), porosity (fraction), field
capacity (fraction) and wilting point (fraction) respectively.
Table 1. Modified Green-Ampt infiltration model parameters [16]

Soil texture class K W / FC WP
Sand 4.74 1.93 0.437 0.062 0.024
Loamy Sand 1.18 2.40 0.437 0.105 0.047
Sandy Loam 0.43 4.33 0.453 0.19 0.085
Loam 0.13 3.5 0.463 0.232 0.116
Silt Loam 0.26 6.69 0.501 0.284 0.135
Sandy Clay Loam 0.06 8.66 0.398 0.244 0.136
Clay Loam 0.04 8.27 0.464 0.31 0.187
Silty Clay Loam 0.04 10.63 0.471 0.342 0.21
Sandy Clay 0.02 9.45 0.43 0.321 0.221
Silty Clay 0.02 11.42 0.479 0.371 0.251
Clay 0.01 12.6 0.475 0.378 0.265
(Source-SWMM Reference Manual. http://www.epa.gov/water-research/storm-water-management-
model-swmm. Accessed on 5.2.2019)
5.4 Elevation
Elevation finder tool was used to find an estimate for the elevation of different points on
map.
Procedure to use elevation finder tool is as follows-
(1) Select the desired location on map.
(2) Click on the map to place a marker.
(3) Estimated elevation is will displayed below the map.
(4) Click again to place further markers and find the elevation.
These elevation values are helpful for computation catchment slope, pipe slope and
to decide the catchment outlet for each sub-catchment. The elevations of various points
on map are shown in Fig. 4.
Fig. 4. Map showing elevation of various points

5.5 Outlet of Sub-catchment

On the basis of elevation estimated each catchment is allocated with an outlet (denoted
as junction on map). The elevation of all corners of sub-catchment is known, the corner
having the least elevation will be outlet for the catchment. The runoff volume generated
by individual sun-catchment will be discharged at outlet as surface runoff. The outlet of
individual sub-catchment is tabulated in Table 2.
Table 2. Sub-catchment associated with outlet

Sub-catchment Outlet
S1 J1
S2 J2
S3 J3
S4 J4
S5 J5
S6 J6
S7 J7
S8 J8
S9 J9
S10 J10
S11 J11
5.6 Conduit
Outlets of sub-catchments is connected by the conduits, creating a network of conduit
on study area map. Outlet of the catchment is provided at the entrance gate of SVNIT.
Which can be furthermore connected to the main storm sewer of city.
Each conduit is assigned with the properties –
(i) Inlet Node
(ii) Outlet Node
(iii) Shape
(iv) Max. Depth
(v) Manning’s roughness coefficient.
5.7 Rainfall
A rain gauge is created in the map namely R1 at any location. Subcathments are linked
with rain gauge R1. A uniform rainfall oner the catchment is considered. As the study
area is relatively small, spatial vatiation of rainfall is neglected.
Rainfall and time unit is inches and hours respectively. The cumulative rainfall-
time series plot is shown in Fig. 5.
Fig. 5. Cumulative rainfall-time series plot
The rainfall-runoff simulation is done successfully having continuity error of -0.86% in

surface runoff and −1.29% in flow routing. It is also showing warnings because some
of the conduits are provided with zero slope but when this situation occur it provides a
slope of 0.001 automatically to run the simulation. The results obtained after simulation
are tabulated below. Sub-catchment parameters, conduit parameters, junction param-
eters and outfall parameter are tabulated as Table 3, Table 4, Table 5 and Table 6
respectively. The peak runoff was highest as 20.87 CFS for sub-catchment S6. The
maximum flow was highest as 22.43 CFS for conduit C55. The maximum and average
flow at outfall was 2.22 CFS and 1.85 CFS.
Table 3. Results of sub-catchment parameters

Sub- Total Total Total runoff Peak runoff Runoff
catchment precipitation (in) infiltration (in) (in) (CFS) coefficient
S1 10 0.39 9.67 17.36 0.967
S2 10 0.47 9.6 3.72 0.96
S3 10 0.39 9.68 15.44 0.968
S4 10 0.38 9.7 11.79 0.97
S5 10 1.79 8.27 11.87 0.827
S6 10 0.4 9.65 20.87 0.965
S7 10 2.79 7.27 4.18 0.727
S8 10 0.47 9.6 3.28 0.96
S9 10 0.39 9.67 18.2 0.967
S10 10 2.23 7.82 14.63 0.782
S11 10 3.02 7.01 10.17 0.701
Table 4. Results of conduit parameters

Conduit Maximum flow (CFS) Maximum velocity (ft/sec)
C52 2.22 0.4
C55 22.43 5.41
C60 2.88 0.51
C66 11.88 6.91
C68 6.27 1.01
C69 2.03 0.53
C71 2.21 0.39
C72 4.09 0.68
C73 1.83 0.36
C74 1.2 0.26
C75 1.96 0.33
C76 6.16 7.76
C77 0.32 0.41
C78 0.06 0.19
C79 2.97 0.55
C80 14.99 6.17
Table 5. Results of node/junction parameters

Junction Average depth (feet) Maximum depth (feet)
J1 1.91 4
J2 2.14 4
J3 2.3 4
J4 0.34 0.9
J5 2.26 4
J6 0.92 3
J7 1.27 4
J8 2.66 4
J9 1.64 4
J10 0.43 1.19
J11 1.69 4
J12 1.3 4
J13 2.16 4
J14 2.22 4
J15 1.41 3
J16 1.91 4
Table 6. Results of outfall parameters

Average depth Maximum depth Average flow Maximum flow
(feet) (feet) (CFS) (CFS)
Outfall 1.73 3 1.85 2.22
7 Conclusion
In this project storm water management model is developed for SVNIT campus using
EPA SWMM 5.1 software. This model helps to understand the complex hydrological
and hydraulic processes of the catchment. Further, this model can be validated by
collecting the actual rainfall-runoff data in catchment. SWMM helps to decide the
storm water carrying capacity of existing storm water network. The existing storm
water network can be modified to minimize the flooding risk in future. The location of
flooding can be identified and suitable measures can be implemented.
References
1. Ekmekcioğlu, Ö., Yılmaz, M., Özger, M., Tosunoğlu, F.: Investigation of the low impact
development strategies for highly urbanized area via auto-calibrated Storm Water
Management Model (SWMM). Water Sci. Technol. 84(9), 2194–2213 (2021)
2. Harianti, M., Sulaiman, M.: Evaluation of Gadjah Mada university Yogyakarta campus
drainage system using environmental protection agency storm Water Management Model
5.1 (EPA SWMM 5.1). IOP Conf. Ser. Earth Environ. Sci. 832(1), 012049 (2021)
3. Kian, N.Z., Takaijudin, H., Osman, M.: An analysis of stormwater runoff rehabilitation for
integrated BIOECODS using EPA-SWMM. IOP Conf. Ser. Earth Environ. Sci. 646(1),
012048 (2021)
4. Dell, T., Razzaghmanesh, M., Sharvelle, S., Arabi, M.: Development and application of a
SWMM-Based simulation model for municipal scale hydrologic assessments. Water 13(12),
1644 (2021)
5. Mhapsekar, V., Mote, S., Mukne, D., Kasle, M.: Simulation of drainage in VIT by SWMM
software. VIVA-Tech Int. J. Res. Innov. 1(4), 1–5 (2021)
6. Vidya, K.N.: Runoff assessment by storm water management model (SWMM)-A new
approach. J. Appl. Nat. Sci. 13(SI), 142–148 (2021)
7. Choo, Y.M., Sim, S.B., Choe, Y.W.: A study on urban inundation using SWMM in Busan,
Korea, using existing dams and artificial underground waterways. Water 13(12), 1708
(2021)
8. Metto, R. A., et al.: Calibration and validation of SWMM model in a sub catchment in
Eldoret town, Kenya (2020)
9. Parnas, F.E., Abdalla, E.M., Muthanna, T.M.: Evaluating three commonly used infiltration
methods for permeable surfaces in urban areas using the SWMM and STORM. Hydrol. Res.
52(1), 160–175 (2021)
10. Pachaly, R.L., Vasconcelos, J.G., Allasia, D.G., Bocchi, J.P.P.: Evaluating SWMM
capabilities to simulate closed pipe transients. J. Hydraul. Res. 1–8 (2021)
11. Campisano, A., Gullotta, A., Modica, C.: Using EPA-SWMM to simulate intermittent water
distribution systems. Urban Water J. 15(10), 925–933 (2018)
12. Wanniarachchi, S.S., Wijesekera, N.T.S.: Using SWMM as a tool for floodplain
management in ungauged urban watershed. Eng. J. Inst. Eng. Sri Lanka 45(1) (2012)
13. Warsta, L., et al.: Development and application of an automated subcatchment generator for
SWMM using open data. Urban Water J. 14(9), 954–963 (2017)
14. Barco, J., Wong, K.M., Stenstrom, M.K.: Automatic calibration of the US EPA SWMM
model for a large urban catchment. J. Hydraul. Eng. 134(4), 466–474 (2008)
15. Hossain, S., Hewa, G.A., Wella-Hewage, S.: A comparison of continuous and event-based
rainfall–runoff (RR) modelling using EPA-SWMM. Water 11(3), 611 (2019)
16. SWMM Reference Manual. http://www.epa.gov/water-research/storm-water-management-
model-swmm. Accessed 5 Feb 2019
17. Google Earth. https://earth.google.com/. Accessed 3 Mar 2019
Emerging Smart Technology
Applications
Evaluation and Customized Support
of Dynamic Query Form Through Web Search
B. Bazeer Ahamed1(&) and Murugan Krishnamurthy2

1
University of Technology and Applied Sciences - Al Musanna,
Muladhha, Oman
bazeer@act.edu.com
2
School of Computer Science and Engineering, Vellore Institute of Technology,
Vellore, Tamil Nadu, India
murugan.k@vit.ac.in
Abstract. New sorted out web databases keep up enormous and assorted
information and these genuine databases incorporate more than hundreds or even
a colossal number of relations and characteristics. Normal predefined request
structures are not prepared to satisfy distinctive uncommonly selected inquiries
from customers on those databases. Dynamic Question Structure (DQF) is a novel
database request structure interface, which can effectively make question shapes.
The focal thought of DQF is to get a customer's tendency and rank request
structure parts, profitable anyone to choose. The making of an inquiry structure is
an iterative strategy and is guided by the customer. At each accentuation, the
system subsequently makes situating courses of action of structure fragments and
the customer by then incorporates the favored structure parts into the request
structure. The situating framework relies upon the got customer inclination.
A customer can correspondingly stack the request structure and submit request to
see the inquiry result at each accentuation. Thusly, a request structure could be
viably refined till the customer satisfies with the inquiry results. A probabilistic
model is made for measure the respectability of an inquiry structure in DQF.
Exploratory appraisal and customer study demonstrate the sufficiency and ade-
quacy of the structure are finished up with an experimentation results.
Keywords: Query form creation User ınteraction Database Query form

Random query
1 Introduction
The query structure is a champion among the most extensively used by UIs for
addressing databases. Standard inquiry structures are arranged and predefined by
architects or database head in various information the board systems. With the fast
improvement of web information and sensible databases the front line databases
become amazingly huge and composite. In like manner sciences, for instance, geno-
mics and ailments, the databases have more than numerous components for substance
and natural data resources [11]. A couple of web databases, for instance, Freebase and
DBPedia, ordinarily have a colossal number of sorted out web substances. In this way,

https://doi.org/10.1007/978-3-030-93247-3_80
846 B. Bazeer Ahamed and Murugan Krishnamurthy
it is difficult to configuration a ton of static inquiry structures to impact distinctive

uniquely named database request on those composite databases. Many existing data-
base the administrators and improvement gadgets, for instance, Straightforward
Inquiry, Cold Blend, SAP and Microsoft Access; offer a couple of instruments to allow
customers to make balanced request on databases. In any case, the creation of modified
request thoroughly depends upon customers’ manual control. The substance of DQF is
to keep customer advantage during customer collaboration's and to change the request
structure iteratively. In each cycle which involves two sorts of customer participation:
It starts with a basic inquiry structure which contains not a lot of fundamental qualities
of the database. The fundamental request structure is then upgraded iteratively by
methods for the associations between the customer and the system until the customer is
content with the inquiry results, in this paper the situating of inquiry structure parts and
the dynamic periods of inquiry structures are executed [12].
The total capacities that can be utilized to manufacture informational collections in
a flat design (denormalized with accumulations), robotizing Organized Inquiry Lan-
guage (SQL) question composing and expanding SQL capabilities [17]. Assessing flat
conglomerations is a troublesome and wonderful issue. The anticipated even collec-
tions give a few exceptional highlights and remuneration. To start with, they symbolize
a format to produce SQL code from an information mining device. Such SQL code
motorizes composing SQL questions, advancing them, and testing them for precision.
This SQL code decreases physical work in the information readiness stage in an
information mining task. Second, since SQL code is consequently created it is likely to
be more effective than SQL code composed by an end client. For example, an indi-
vidual who does not know SQL well or somebody who isn't notable with the database
blueprint (e.g., an information mining specialist). Thusly, informational indexes can be
shaped in a littler sum time. Third, the informational index can be made altogether
inside the Database the executives Framework (DBMS).
This methodology results in noteworthy increment in execution without requiring
any progressions to the physical format of the information. This undertaking normally
requires composing extensive SQL articulations or redoing SQL code, on the off
chance that it is more than once created by a few devices. There are two fundamental
segments in such SQL code: for example joins and conglomerations; the proposed
experimentation center on the total procedure.
2 Related Work
2.1 Similarity Measures for Categorıcal Data: A Comparatıve

Evaluatıon
Assessing closeness or severance between two elements is a key advance for various
information mining instruments and learning revelation undertakings. The idea of
comparability for constant information is generally known, however for downright
information, the similitude computation isn’t direct [14]. A few information motivated
comparability measures have been thinks in the writing to figure the closeness between
two clear cut information outlines and their relative execution has not been assessed.
Evaluation and Customized Support of Dynamic Query Form 847
The introduction of a combination of similitude measures with regards to a scrupulous

information mining task: exception identification. Results on a determination of
informational collections express that while nobody measure manage others for a wide
range of issues, a few measures are skillful to have always elite. Estimating closeness
or separation between two information focuses is a center requirement for a few
information mining and learning revelation assignments that include remove calcula-
tion. Models take in bunching (k-implies), separate based exception recognition,
characterization and a few other information mining assignments. The key purpose of
all out information is that the disparate qualities that an unmitigated characteristic takes
intrinsically requested structure. In this manner, it is preposterous to straightforwardly
look at two changed clear cut qualities [10]. The least complex strategy to discover
likeness between two absolute credits is to dispatch a closeness of 1 if the qualities are
equivalent and a similitude of 0 if the qualities are not rise to. For two multivariate firm
information focuses, the comparability between them will be legitimately similar to the
quantity of properties in which they equivalent. This straightforward assess is otherwise
called the overlie assessment in the writing. There are fourteen disparate all out
methodology from various fields and study them all in all in a solitary situation. A large
number of these strategies have not been examined outside the field that they were
presented in, and not assessed with different occasions. The unmitigated measures in
three distinct ways dependent on how they use data in the information. The different
similitude measures for all out information on a wide assortment of benchmark
information sets [16]. The adequacy of information driven systems for the emergency
of persuasive comparability with unequivocal information.
2.2 A Case for Shared Query Framework Management

The information the board frameworks must give ground-breaking inquiry the exec-
utive’s capacities from question perusing to programmed question suggestions.
Essential talk with the necessities of general inquiry the executive’s framework. On
illustrating early framework engineering and examine the many research difficulties
related with structure such a motor. Present day DBMSs give refined highlights to help
clients in arranging, putting away, overseeing, and recovering information in a database
[1], yet just deficient capacities for affiliation the inquiries that clients worry on the
information. These abilities are restricted to inquiry by-model graphical apparatuses for
creating questions and inquiry logging went for physical tuning. Customarily,
increasingly confounded question the executives were not fundamental since applica-
tions would just issue canned inquiries over the information (e.g., bookkeeping or stock
administration applications). These questions were grown once and utilized over and
over. The client propose to manufacture a Community oriented Inquiry The board
Framework (CQMS) directed at these new, huge scale, shared-information environ-
ments [8]. A CQMS ought to consent to play out the client's basic assignments, for
example, peruse the log of all questions they presented the archive and their inquiries
by clarifying them. Thusly, the clients will be skilled to hurriedly discover, alter and re-
execute point of reference questions. Indeed, even upgraded framework ought to keep
up advanced pursuit capacities enabling clients to perceive inquiries that work on
unambiguous info information, have wanted properties (for example little outcome set,
quick execution time), or produce explicit results [15]. Structure of information such a
CQMS raises various significant innovative difficulties. To start with, dealing with a
gathering of questions is increasingly like dealing with a creating set of source code
pieces instead of overseeing customary information. Inquiries have expanded semantics
and complex associations with one another.
2.3 The Design and Erection of Query Forms

A standout amongst methods to request a web database is through a structure, some-
where a customer can load appropriate information to find desirable results by
exhibiting the structure. Arranging extraordinary static structures is a non-unimportant
manual endeavor, and the organizer needs a sound perception of both the data affili-
ation and the scrutinizing needs [5]. Also, structure design has two clashing goals:
structures should be undemanding to grasp, and meanwhile should make open the
broadest possible addressing ability to the customer. In his examination the presents of
structure for making outlines in a customized and principled way, given the database
plot and a model inquiry outstanding burden [2]. The customer plan a tunable gathering
count for structure up structure reliant on different “near” questions, which consolidates
an instrument for extending structure to help other “equivalent” request the system may
discover later on.
2.4 An Effective Data Clusterıng Method for VLDB

Discovering practical examples in outsized datasets has pulled in significant intrigue as
of late, and a standout amongst the most widely considered issues around the
acknowledgment of bunches and thickly occupied locales, in a multi-dimensional
dataset [3]. Past work does not sufficiently address the issue of huge datasets and
reduce of 1/0 costs. The creator introduces an information grouping technique named
called BIRCH (Adjusted Iterative Lessening and Bunching utilizing Chains of
importance), and show that it is particularly proper for exceptionally huge databases.
BIRCH gradually and enthusiastically bunches approaching multi-dimensional metric
information focuses to attempt to create the best quality grouping with the possible
assets [6]. BIRCH can normally locate a top notch grouping with a solitary output of
the information, and recoup the quality auxiliary with a couple of strengthening filters.
BIRCH is likewise the main grouping calculation proposed in the database territory to
deal with “clamor)’ successfully. The client assesses BIRCH'S time/space fitness,
information request affectability, and bunching quality through various analyses. An
exhibition relationship of BIRCH vs. CLARANS, a grouping technique planned nat-
urally for huge datasets, and S11OW that BIRCH is continually unrivaled. In this
paper, on analytical information bunching, which is a specific sort of information
mining issue, is introduced.
2.5 Construction of Dynamic Faceted Search Systems Over Web

Databases: DYNACET
Removing records and up and coming from gigantic databases is a tedious movement
and has regular significant research thought as of late. In this demo, DynaCet - a space
free framework is utilized that gives successful least exertion based powerful faceted
inquiry arrangements over big business databases [7]. At each progression, the creator
proposes characteristic relying upon the client response at past advance. Aspects are
chosen dependent on their capacity to quickly bore down to the most encouraging
tuples, just as on the capacity of the client to give wanted qualities to them. The
advantages gave incorporate quicker access to data put away in databases while con-
templating the change in client information and preferences [9]. Encouraging signifi-
cant quest for information records inside tremendous information stockrooms is one of
the principle challenges in present day days. Be that as it may, in most genuine
applications the client just has fractional data about the tuple and subsequently it is
important to empower a powerful inquiry technique. An exceptionally straightforward
faceted pursuit interface is one where the client is incited a property (e.g., On-screen
character), to which the client reacts with an ideal worth, after which the following
suitable ascribe is proposed to which the client reacts with an ideal worth (e.g.,
“Activity, etc. Along these lines the main test is to judiciously choose the aspects to be
prescribed, so the client comes to the ideal tuple(s) with least exertion [18]. Top-k
calculations to grow early end procedures that abstain from filtering the total database
for deciding the following most encouraging feature (Table 1).
Table 1. Various ınquiry streams

Query Streams Feature Problems
Query-by Afford a basic edge for the client to 1) Relational culmination
example penetrate questions 2) Ordering issue
Mechanized To fabricate a nonexclusive 1) Ranking capacity may neglect
positioning of computerized positioning to achieve
database query foundation for SQL databases
Moment Interface created to help the client 1) The clients data need is
reaction to type the database examination express
interfaces 2) Endeavor to apply productive
file structures for essential
catchphrase questioning
Structures based Automatic ways to compact with 1) Not suitable when the user
database & creation of database question shapes does not have hard catchphrases
inquiry interface without client participation to illustrate the queries
Structure A construction which permits end- 1) Database blueprint is huge so
customization user to tweak the present queries it is hard to make wanted
association during run time question shapes
In all the above writing audit it is noticed that (i) the web database construction is
extremely large, (ii) it is hard for the user to discover proper web database elements
credits and to make wanted query forms. (iii) they user contain the professional
designers to design their databases, not for end-users. (iv) It obviously construct a great
deal of question and shape forward of time so it baffle the user. (v) It stipulate new
terms recognized with the question or modify the terms as specified by the client in the
web directory. (vi) Projection parts manages the defer of the question organization and
which can’t be ignored. (vii) A database inquiry is an prepared in terms of social
question. (viii) Dealing with information passage shapes.
3 Dynamic Generation of Query Form System
DQF, query edge that is able to do intensely making request shapes for customers. Not
exactly equivalent to ordinary record recuperation, customers in database recuperation
are much of the time anxious to play out various rounds of exercises (i.e., refining
request conditions) before perceiving the last hopefuls [4]. The concentrate of DQF is
to confine customer interests during customer affiliations and to adjust the request
structure iteratively. The situating of request structure instrument and the dynamic age
of inquiry shapes. In this paper a customized procedures technique is to deliver the
database question outlines without customer intrigue. Sought after by a data driven
procedure is introduced. It first finds a great deal of data qualities, which are more
likely than not addressed reliant on the database arrangement and data occasions [13].
By then, the inquiry structures are made reliant on the favored characteristics. It applies
grouping estimation on requested inquiries to find the operator request. One issue of the
recently referenced systems is that, if the database example is huge and composite,
customer questions could be divergent. They make heaps of inquiry outlines early;
there are still customer request that can’t be satisfied by any of request shapes. Another
issue is that, when the customer makes endless request outlines, customers find a fitting
and needed inquiry structure would challenge. An explanation that join watchword
search with request structure age is foreseen. It accurately delivers a social occasion of
request outlines in proceed. The customer inputs a couple of catchphrases to find
relevant request outlines from a huge number of recouped question shapes. It works fit
in the databases, which have rich printed information in data tuples and outlines.
The dynamic request structure system, which makes the inquiry, structures as
demonstrated by the customer's hankering at run time. The structure supplies a response
for the request interface in tremendous and assorted databases. The F-measure is to
assess the dependability of an inquiry structure; The F-measure is a regular measurement
to figure the request results. This estimation is moreover proper for inquiry outlines since
request structures are resolved to empower customers to address of the databases. The
decency of an inquiry structure is managed by the request results made from the inquiry
structure. In perspective on the situating and recommend the potential inquiry structure
fragments with the objective that customers can refine the request structure viably. In
perspective on the proposed estimation to make, capable figuring has to evaluate the
honesty of the projection and assurance structure parts. Here capability is huge because
DQF is an online system where customers normally anticipate quick reaction. Inquiry
structures empower customers to fill parameters to make different inquiries. Offhand

inquiries joints aren’t managed by our vivacious request structure since join isn’t a
section of the request structure and is indistinct for customers. Regarding “Aggregate”
and “Solicitation by” in SQL, there are inadequate with regards to options for customers.
To pick whether a request structure is needed or not, a customer does not have space plan
insightful to go over every datum occasion in the inquiry results. In addition, different
database addresses yield a tremendous proportion of data cases. In order to evade this
“Many-Answer” issue, simply yield a stuffed result table to exhibit a strange state point
of view on the inquiry results first. Every model in the insusceptible table addresses a lot
of positive data events. By then, the customer can explore hypnotized bundles to see the
exacting data models. Request structures are considered to reestablish the customer's
favored result. There are two set up measures to survey the conspicuousness of the
request results: precision and audit. Request structures can convey different inquiries by
different data sources, and different request can yield assorted request results and achieve
different precisions and audits, so they use foreseen that exactness and obvious survey
ought to evaluate the ordinary introduction of the request structure.
Speak to a layout to produce SQL code from an information mining device.
Mechanizing testing them for accuracy.
More effective than SQL code composed by an end client.
The informational index can be made completely inside the DBMS.
4 Experimental Result
Game plan is a multi-step method that focuses on information constitution program-

ming, procedural subtleties, (calculations and so forth.) interface among modules. The
structure procedure additionally changes over the necessities into the game plan of
programming that can be gotten to for quality before coding start.
Info configuration is the strategy for changing over client started contributions to a
PC based arrangement. Information configuration is a standout amongst the most
exorbitant periods of the task of mechanized framework and is every now and again the
principle issue of a framework. Information configuration is the most noteworthy piece
of the general framework plan, which requires incredibly cautious thought. Frequently
the accumulation of info information is most costly piece of the framework.
Question Creation:
Information: Q = {Q1, Q2, ….} is the arrangement of past questions implements
on Fi
Result: Qone is the question of One-query
Calculation 1: Qon creation
Information: a is the division of occasions wanted by client, DQone is the
question aftereffect of Qone, Similar to the determination property.
Result: s*is the finest question state of As.
Start
// sort by As into an ordered set Dsorted
Dsorted ← Sort (DQone, As)
s ← fscore*← 0
n ← 0, d ← ab2
for i ← 1 to |Dsorted| do
d ← Dsorted[i]
s ← “As dAs” // compute fscore of “As dAs”
n ← n + Pu (dAFi) P (dAFi) P (rFi |d) P (s|d)
d ← d + P (dAFi) P (rFi |d) P (s|d)
fscore ← (1 + b2) n/d
if fscore fscore then
s*← s
fscore *← fscore
foreach form-group g 2 T do
Label g relative to its parent group (use absolute path if g is the root);
foreach form-element e 2 g do
Label e relative to g;
end
end
The user interest is based on the user’s click-through on query results displayed by
the query form by the method precision and recall, Fscore can be derived as follows
(Fig. 1 and Table 2)
PrecisionE ðFÞ RecallE ðFÞ

F ScoreE ðFÞ ¼ ð1 þ b2 Þ i:e
b2 PrecisionE ðFÞ RecallE ðFÞ ð1Þ
Fig. 1. Selecting database and table name

Table 2. Interactions among Clients and Dqf

b 1 2 Query structure 1) The customer adjusts the present request structure and
enrichment exhibits an inquiry. 2) DQF executes the request and exhibits
the results. 3) The customer gives the contribution about the
inquiry results
2 Query structure 1) DQF recommends a situated once-over of inquiry structure
improvement parts to the customer. 2) The customer picks the perfect
structure parts into the present inquiry structure
4.1 Pattern-Based Correctıon

The pattern is to develop a reachability file for identifying legitimacy of the inquiry.
A n n lattice is built, where entry (i,j) is valid if component name ‘I’ happens at
littlest measure of a nth point descendent of component name ‘j’ in any information
thing in the database, where ‘n’ is an upper bound farthest point set during database set
up. For instance, a quality “representative” could contain traits “name” and “nation” in
its information, while “designation” (an worker property) and “building type” (an place
of business’ characteristic) won't have any basic first or second dimension predecessors
in a staff database structure compares to a SQL inquiry design. A query structure ‘F’ is
described as a tuple (BF, TF, rF, (TF)), which addresses database request configuration
as seeks after: F = (SELECT B1, B2… AK FROM (TF) WHERE rF), Where,
BF = {B1, B2… Bk} are k characteristics for projection, k > 0. TF = {T1, T2… Tn} is
the plan of n relations (or substances) stressed in this inquiry, n > 0. Each credit in BF
has a spot with one association in TF. rF is a mix of explanations for judgments on
relations in TF. (TF) is a join ability to make a mix of verbalizations for joining
relations of TF. In the UI of a request structure F, BF is the game plan of fragments of
the result table. rF is the course of action of data fragments for customers to fill.
Question structures empower customers to fill parameters to deliver different request.
For a request structure F, (TF) is normally created by the outside keys among relations
in TF. Meanwhile, TF is constrained by ‘BF’ and ‘F’. TF is the affiliation set of
relations which contains at any rate one property of ‘BF’ or ‘rF’. Thusly, the
arrangement of inquiry structure ‘F’ is essentially managed by ‘BF’ and ‘rF’. As we
referenced, just BF and rF are recognizable to the customer in the UI.
Construction of Queries for preferred queries.
1. Select properties from table name
2. String s = “select” 3. String [] segment
3. For (i to segment. length; i++)
4. Segment [i] = section
5. String segment = join (section, ”)
6. String table name
7. String question = s + section + table name
The proposal highlight gives intuitive direction over longer learning and preparing
periods. Its point is summative and collective student evaluation. Finding based, i.e.,
customized considering singular understudy shortcomings separated from the exhibi-

tion model. Gathered, i.e., in light of weighted and positioned student evaluations
considering the presentation information from the understudy model.
Calculation 2
Question development
Information: I = {I1, I2…} is the arrangement of past inquiries executed on Fi.
Result: Ione is the inquiry of One-Question
Start.
rone ← − 0
for I 2 I do
rone ← − rone _rI
Aone ← − AFi
[ Ar (Fi)
Ione ← − Generate Query (Aone, rone)
Estimation
The target of our assessment is to verify the accompanying theories:
H1: Is DIF additional valuable than obtainable frameworks, for example, tweaked
question structure and static inquiry structure?
H2: Is DIF increasingly significant to rank protrusion and determination instrument
then the gauge and the arbitrary strategy?
H3: Is DIF viable to rank the related question structure segments in an online UI?
H4: Is DIF progressively functional for taking a decent choice between the resultant
qualities?
Fig. 2. Selecting query condition
Composing admirably controlled questions, in parlance, for example, SQL and

XQuery can be troublesome because of assorted clarifications, the client's absence of
recognition with the inquiry language and the client's numbness of the fundamental
diagram (Fig. 2). The framework has actualized as independent framework utilizing
java-5.1.12. The framework utilizes SQL Server as the database motor. The entire tests
are run utilizing machine with Intel(R) Center (1M) i3-4005U CPU @ 1.70 GHz,
(Approx.) and running on Windows XP. Representative informational collection is
utilized for trial. Representative Dataset comprises of 4 databases, 15 tables, 25
characteristics, and close around 500 records from every one of the tables. In result
examination the F-score is determined with the assistance of accuracy and review. F-
score is utilized to quantify the integrity of inquiry shapes.
Fig. 3. F-score result
It is to be noticed that what will be the input is given by the client to a specific
structure is likewise by tapping on the structure name appeared in the Fig. 3 F-Score
Result demonstrates that one can alter structure again and furthermore see the outcome
at whatever point required.
5 Conclusion
The Dynamic query constitution approach causes the customers effectively to make
request shapes. The key action is to utilize a probabilistic model to rank structure
fraction subject to customer tendencies. Using both chronicled request and run-time
contribution, for instance, explore, customer tendency can be gotten. Preliminary
outcomes demonstrate that the dynamic technique as frequently as potential prompts
higher accomplishment rate and less troublesome request structures measure up to a
static strategy. The situating of structure parts in like manner plan less complex for
customers to change request shapes. This social data is connected with non-social data
for better result and execution.
References
1. Balazinska, M., Khoussainova, N., Gatterbauer, W., Kwon, Y., Suciu, D.: A case for a
collaborative query management system. In: Proceedings of CIDR (2009)
2. Ahamed, B., Ramkumar, T.: An intelligent web search framework for performing efficient
retrieval of data. Comput. Electr. Eng. 56, 289–299 (2016)
3. Jayapandian, M., Jagadish, H.V.: Automating the design and construction of query forms.
IEEE TKDE 21(10), 1389–1402 (2009)
4. Ahamed, B., Ramkumar, B.: Predict keyword based search process using semantic method.
Int. J.Control Theory Appl.10(16) (2017)
5. Roy, S.B., Wang, H., Nambiar, U., Das, G., Mohania, M.K.: Dynacet: building dynamic
faceted search systems over databases. In: Proceedings of ICDE, pp. 1463–1466 (2009)
6. Seffah, A., Donyaee, M., Kline, R.B., Padda, H.K.: Usability measurement and metrics: a
consolidated model. Softw. Qual. J. 14(2), 159–178 (2006)
7. Ahamed, B., Ramkumar, B.: .Deduce user search progression with feedback session. Adv.
Syst. Sci. Appl. 15(4) (2015)
8. Chen, K., Chen, H., Conway, N., Hellerstein, J.M., Parikh, T.S.: Usher: Improving data
quality with dynamic forms. In: Proceedings of ICDE Conference, pp. 321–332, Long
Beach, March 2010
9. Tang, L., Li, T., Jiang, Y., Chen, Z.: Dynamic query forms for database queries. IEEE Trans.
Knowl. Data Eng 26, 2166–2178 (2013)
10. Tang, L., Li, T., Chen, Z.: Dynamic query forms for database queries. IEEE Trans. Knowl.
Data Eng. 26(9) (2014)
11. Chen, K., Chen, H., Conway, N., Hellerstein, J.M., Parikh, T.S.: Usher: ımproving data
quality with dynamic forms. In: Proceedings of ICDE Conference, pp. 321–332, Long
Beach, March 2010
12. Naeem, M., et al.: Trends and future perspective challenges in big data. In: Pan, J.S., Balas,
V.E., Chen, C.M. (eds.) Advances in Intelligent Data Analysis and Applications. SIST, vol.
253, pp. 309–325. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-5036-9_
30
13. Rivera Rios, E.J., Medina-Pérez, M.A., Lazo-Cortés, M.S., Monroy, R.: Learning-based
dissimilarity for clustering categorical data. Appl. Sci. 11(8), 3509 (2021)
14. Ahamed, B.B., Ramkumar, T., Hariharan, S.: Data integration progression in large data
source using mapping affinity. In: 2014 7th International Conference on Advanced Software
Engineering and Its Applications, pp. 16–21. IEEE, December 2014
15. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very
large databases. ACM Sigmod Rec. 25(2), 103–114 (1996)
16. Nandi, A., Jagadish, H.V.: Assisted querying using instant-response interfaces. In:
Proceedings of the 2007 ACM SIGMOD International Conference on Management of
Data, pp. 1156–1158, June 2007
17. Ahamed, B.B., Ramkumar, T.: Uncertainty relations system in semantic web search engine.
Int. J. Appl. Eng. Res. 10(20), 15456–15459 (2015)
18. Vasant, P., Zelinka, I., Weber, G.W. (eds.) (2018): Intelligent Computing & Optimization,
vol. 866. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00979-3
Enhancing Student Learning Productivity
with Gamification-Based E-learning Platform:
Empirical Study and Best Practices
Danijel Kučak, Adriana Biuk, and Leo Mršić(&)
Algebra University College, Ilica 242, 10000 Zagreb, Croatia

{danijel.kucak,adriana.biuk,leo.mrsic}@algebra.hr
Abstract. Gamification in education include usage of game mechanics,

dynamics, aesthetics and game-alike thinking in educational process. Its main
objective is to increase student motivation, engagement and overall experience.
The topic of this research is to present the possibilities of applying gamification
in the form of implementing personalized gamified e-learning platform that
supports selected courses as supporting tool for higher education institution. The
hypothesis of this research is to understand how to describe web platform
specifications able to increase the motivation for greater engagement in learning,
and thus increase the understanding of the educational material. The paper
consists of an introductory part that deals with the description of gamification as
well as e-learning and the advantages of using it. Introduction is followed by an
analysis of the survey conducted, that will examine the attitudes and opinions of
users before using the developed and implemented e-learning platform. The
survey was conducted on a sample of 50 students. Selected students used the
platform through two semesters, in total of two different courses (Object Ori-
ented Programming and Entrepreneurship). After completing the course, stu-
dents were re-surveyed and acquired results prove that motivation was increased
using proposed approach.
Keywords: Gamification e-learning Personalized learning e-learning

platforms HEI
1 Introduction
Gamification can be defined as the use of mechanics, aesthetics, and thinking used in
games to engage people, motivate action, and solve problems. It’s a relatively new
term, but the very idea of using elements of the game as well as in-game mechanics to
solve problems and engage the audience, is not entirely new. Properly implemented,
gamification helps maintain our interest through intrinsic player motivation that
increases with the mechanics and rewards that attract them. A simple idea like a game
can represent some of life’s strongest memories. In other words, turning an experience
into a game – using some kind of reward for a certain achievement [1, 4].
Various organizations have used elements of games such as points, scoreboards and
badges, to use them to motivate people in various ways. In previous years, this
engagement was limited to the physical world, while in recent years the focus has been

https://doi.org/10.1007/978-3-030-93247-3_81
858 D. Kučak et al.
on the digital environment. What’s new about gamification is that using a digital
engagement model, motivation can be packaged into an app or device, designed to
engage audiences of any size at very little cost. Raising people’s motivation for
physical activity is one of the areas where it is possible to use elements of games for
this purpose. Lack of physical activity is an increasing challenge that puts millions of
people at risk, but not because of a lack of information or knowledge. Many people
know that they should exercise and what are all the benefits of the same, but unfor-
tunately such knowledge is not reflected in their behaviour. Encouraging sports is an
approach to encourage physical activity [5].
2 Literature Review
Gamification uses primarily intrinsic rather than extrinsic rewards. The difference
between intrinsic and extrinsic rewards is one way we can distinguish gamification
from reward programs. Intrinsic rewards maintain engagement because they deal with
people on an emotional level. Extrinsic rewards can be used to motivate people, but
that motivation happens at the transactional level. It is possible to distinguish three
elements of motivation (autonomy, mastery and purpose) through the lens of gamifi-
cation [2, 15]: (i) Autonomy – the desire to direct one’s own lives. In effective gamified
solutions, players choose to participate, and once they make their decisions, they make
decisions about how they will progress through the challenges to achieve their goals.
Players are enabled to discover and learn through a variety of pathways to solutions.
Players are given tools, goals and rules and space to ‘play’ without being directed to the
next steps to be taken; (ii) Mastery – the desire to progress and become better at
something is important. We all have a deep need to improve in aspects of our lives.
Mastery is not an accessible goal; it is a journey. There are many signposts on the road
that show progress, but it is never the end point. In almost all life endeavors – running,
painting or learning a new language – there is always another level. Gamification is
about achieving something better; (iii) Purpose – longing to act. By definition, gamified
solutions differ from traditional games by purpose. Gamification is focused on one or
more goals: changing behaviour, developing skills, or encouraging innovation. Gam-
ification must begin and end with a purpose aimed at achieving significant player goals
[10–12].
3 Methodology
E-learning can be considered something learned from a YouTube video, an answer to a

question found from Wikipedia or from a forum. Every time we learn something from
an electronic source is considered e-learning. There are three very important concepts
of e-learning and they are: (i) enabling technology, (ii) learning content and
(iii) learning design. People usually focus on the first, on technology, because it is a
new and perhaps unknown component, but the other two are equally important. The
concept of learning is the process of transferring knowledge from point A to point B.
Effective e-learning is not just about merging technology and content. While this is a
Enhancing Student Learning Productivity 859
good formula for providing information online, learning alone is much more than that.
[6].
Perhaps the most important thing to keep in mind is that e-learning is more than just
another method or technique, like distance learning. It is an approach – an aggregation
of different methods that technologies enable. Organizations have looked to e-learning
to save training time and travel costs that are otherwise associated with face-to-face
learning. However, cost savings are only an illusion if e-learning does not build
knowledge and skills effectively [13]. Part of it depends on the quality content that is
embedded within the e-learning. Below are five unique features that encourage learning
through an e-learning app [7].
Tailored training: independent learning on an e-learning platform has the potential
to tailor learning to the unique needs of each student. These unique needs do not only
mean learning styles, but also tailoring content, teaching methods and navigation based
on the needs of individual students.
Engagement in learning: regardless of the medium delivered, learning requires
engagement. By behavioural engagement we mean any action that a student takes while
learning on an application. Some examples of behaviour in e-learning include pressing
the forward arrow, typing the answer in the answer box, clicking an option from the
menu, and the like. By psychological engagement we mean cognitive processing of
content in a way that leads to the acquisition of new knowledge and skills. Some
cognitive processes that lead to learning include paying attention to relevant material,
mental organization, and integration with appropriate prior knowledge. Some examples
of methods in e-learning that aim to enhance psychological engagement include adding
relevant visuals and on-screen displays.
Multimedia: in e-learning we can use a combination of text, sound as well as
various visuals to create content and help students gain adequate knowledge and skills.
The effect of expertise through scenarios: studies by experts from a wide range of
domains prove that it takes ten years of experience to achieve a high level of expertise.
In some work settings, this can take years because situations that require certain skills
are rarely presented. E-learning provides an opportunity to put students in a realistic
work environment that requires them to solve fewer common problems or complete a
task in minutes, which could take hours or even days in the real world. A computer
simulation, for example, can show such problems and give students the opportunity to
solve them in a real work environment.
Learning through digital play: an approach known as gamification. These are
simultaneous rules-based systems, responsive to players, challenging, cumulative,
allowing progress to be assessed against goals and the like. The goal of gamification is
to provide a learning experience that is motivating, interesting, and effective.
E-learning can be divided into three main types. All three types are based on the use
of instructors, course time, and involvement with others. The selection of the appro-
priate type considers the prior knowledge of the students, the speed of learning as well
as the available time and geographical separation.
These three main types are: (i) synchronous learning, (ii) asynchronous learning,
and (iii) cohort learning. [8].
Synchronous learning occurs when the instructor and the student are together at the
same time, but not necessarily in the same space. Traditional classrooms are a typical
example of synchronized learning – students are at a given time, having a conversation

and learning. Synchronous e-learning uses a similar approach. During a given period,
the instructor and one or more students participate through a platform such as
GoToMeeting. This format can be called a webcast, webinar or virtual classroom.
Asynchronous learning is considered the opposite of synchronous e-learning in a
way that the student has their own learning rhythm. And it happens when the instructor
and the student do not participate at the same time. In the words of traditional edu-
cation, homework writing can be considered asynchronous learning. Students are given
a specific activity that they can solve at any time.
Cohort learning has instructors and students who perform activities such as reading,
projects and assignments. The start and end times are set, but within that time frame,
students learn and communicate at their own pace. For example, in a synchronous
webinar, all participants hear at the same time, let’s say it’s at 2 pm and participate in
the presentation until it’s done by 4 pm. In the cohort model, students register at the
beginning of the week and can then read materials, determine activities, or talk to other
participants at any time during the week [3, 14].
Today’s students want relevant, mobile, independent and customized content. This
need developed into the idea of e-learning (Table 1).
Table 1. Advantages of e-learning.

E-learning advantages
Online learning adapts to all needs
Lessons can be passed countless times
Offers access to updated content
Fast delivery of lessons
Scalability
Consistency
Reduced costs
Efficiency
Environmental impact
Interface
Text
Navigation
Interaction
Testing
Media
And some of these advantages are the following [8]:

Online learning adapts to all needs: online learning is the most suitable for
everyone. The digital revolution has led to significant changes in access to, con-
sumption, discussion and content sharing. Online educational courses can be taken by
people of all profiles, at a time that suits them.
Lessons can be passed countless times: unlike classroom teaching, with online
learning we can access content an unlimited number of times. This is especially nec-
essary at the time of revision when preparing for the exam. In the traditional form of
learning, if you cannot attend a lecture, you have to prepare yourself for a certain topic.
Offers access to updated content: The main advantage of learning online is that it
ensures that you are in sync with the content. This allows the student to access updated
content.
Fast delivery of lessons: e-learning is a way to provide fast delivery of lessons.
Compared to the traditional classroom teaching method, this way of working has
relatively fast delivery cycles, and indicates that the learning time in this way is reduced
to 25%–60%. The reasons why learning time is reduced by e-learning are as follows:
(i) lessons start quickly and are in one learning session. This allows for training
programs within weeks, or even days; (ii) students can define their own learning speed
instead of following the speed of the whole group; (iii) it saves time because the student
does not have to travel to the place of study; (iv) students can choose to study specific
and relevant areas that interest them, without focusing on each area.
Scalability: e-learning helps to create and communicate new trainings, concepts
and ideas. Whether it’s formal education or entertainment, e-learning is a quick way to
learn.
Consistency: e-learning allows teachers to gain a greater degree of coverage and to
communicate consistently for their target audience. This ensures that all students
receive the same type of training with this way of learning.
Reduced costs: e-learning is cost-effective compared to traditional forms of
learning. The reason for the reduced prices is that learning in this way is quick and
easy. A lot of training time is reduced due to instructors, travel, course materials and
the like. This cost-effectiveness also helps improve an organization’s profitability.
Efficiency: e-learning has a positive impact on the profitability of the organization.
It makes it easier to understand the content.
Environmental impact: since e-learning does not use paper-based learning, it
protects the environment to a great extent. According to some research, distance
learning programs consume about 90% less power and achieve 85% less CO2 emis-
sions compared to traditional learning. Applications that use e-learning can come in a
variety of forms and formats. But there are a certain number of elements that are
common in most applications. Understanding such elements will help in planning and
analysing the application [8].
Interface: The interface is the visual frame for each screen. This includes the brand
identity, titles, buttons, features and navigation used while using the application.
Text: text can be the primary way to communicate content or it can be as a support
for audio narration.
Navigation: course navigation allows the student to navigate through the appli-
cation. Navigation buttons, such as arrows, hyperlinks, and menus, guide students
through the course. Navigation can be fixed (where the student must continue linearly
from start to finish) or flexible (where the student can choose where he wants to move).
Interaction: We consider interaction to be any action that requires a student to
respond in some way. An example of this might be where a student clicks to be shown
additional information, a question to be answered, or the like. Interactions help

strengthen key points and keep the student interested and engaged.
Testing: test questions can be in various formats such as multiple choice,
correct/incorrect, fill in whiteness, essay and the like. Some of these formats (such as
multiple-choice) can be graded directly within the application, while others, such as an
essay, cannot. The test can be used at the beginning of the course, at the end of the
course, at the end of the individual module or it can be dispersed during the course.
Media: technically, an application that uses e-learning can only consist of text on
the screen. However, a slightly more interesting application would be if a number of
media elements such as sound, video, graphics and animation were used.
4.1 Application and UX

The JavaScript framework Angular 6 and the library NgRx were used for the client part
of the system (frontend). Their functionality as well as examples of use from the
created application are visible below.
Google’s Firebase solution was used for the server part of the application. It allows
us, among other things, to store and retrieve data as well as authorize users.
The application is designed in a way that follows the courses taught at the
University of Algebra. Currently, the courses object-oriented programming and
entrepreneurship have been implemented. The gamification elements used in the
development of this application are the collection of points, display of the best results
and return results, which were also among the 4 most common elements of gamifi-
cation that students prefer. The student has the opportunity to review the course by
chapters that are performed in the curriculum as well as the opportunity to take the test.
In this part, e-learning is manifested, i.e. learning the content of the course, where
through feedback we get answers to the questions of the test with an explanation.
Going to the application called EduApp, we find the home screen which offers the
option Login and Sign Up.
The Find courses button is disabled until the user registers with the application or
login.
Before using the application, it is necessary to enter an email and password in order
to be authorized to use the application. To do this, go to the Sign-Up link where we
enter the required data.
After successful registration, we can use the same information with each login
when using the application.
After successful registration, the site automatically redirects you to courses. Here is
a list of available items.
By selecting one of the subjects, let’s say we have selected Object-Oriented Pro-
gramming, we come to the URL/course/OOP.
Next screen shows the chapters taught in that course. Each chapter on the left has a
list of the most important terms mentioned in it, and on the right is a short description
of the title and what awaits us by selecting each chapter. By selecting some of the
above chapters, we come to a quiz that has two possible forms of questions, and they
are – supplementing and selecting the options offered.
At the end of the test, we click on the Submit Test located at the bottom of the page,
or at the end of the test, which leads us to the points earned and the test results.
The results page shows the questions we answered, the correct answer and the
user’s answer, which, if not correct, is printed in red text. The points won and the
explanation of the correct answer in the blue box are also displayed.
Each correctly answered question brings a certain number of points defined in the
database, which range from 1 to 15 depending on the difficulty of the question. An
incorrectly answered question yields zero points.
In the navigation menu we have a link for Previous Tests.
It shows which tests the user took with the exact date and time and how many
points he won in that attempt.
Also, in the navigation menu we have a link to the Leader board which is the panel
with the best results. Here are all registered users of the application sorted by points,
from the most collected to the least.
The high-score board or Leader board allows users to compare with other coun-
terparts. The board with the best results is a recognized way of competition where the
goal is to continuously improve in order to reach the top of the list.
4.2 Entry Survey

There are a large number of gamified elements, and the focus here is to find those that
will encourage a positive effect on the e-learning application. In order to investigate the
perspective of users, i.e. their needs and attitudes, and whether there is a need for such
an application at all, a survey was used. The survey began with a brief introduction to
the app to help users with more understanding fill in the required answers.
The survey consists of six questions:
1. What gender are you?
2. Level of study?
3. Are you familiar with the term gamification?
4. Have you ever used sites like HackerRank, Pluralsight, Udemy and the like? If so,
did elements like scoring, tracking progress, and the like have had any impact?
5. What elements of gamification do you prefer?
6. Would you be more productive if your college had a gamified learning app?
The first and second questions are not mandatory, but they are there to reveal the
potential link between the common opinion that younger men play more games and
thus find the idea of gamification more attractive. Possible answers to the question
about the level of study are ‘Undergraduate’ or ‘Graduate study’.
The third question has possible answers ‘Yes’ or ‘No’. It tells us how familiar
students are with the term gamification, and the results will greatly help, if the answer is
‘Yes’, in navigating the application itself.
The fourth question concerns whether large applications, such as those listed above,
had any impact on the user himself. The assumption is that just such large applications
made the application of gamification extremely well. The answer to this question is
complex in such a way that the user can write a short answer.
The fifth question tells us which elements of gamification users like to see and use
the most. Possible answers are: ‘progress to a higher level’, ‘points’, real-time feed-
back’, ‘progress bar’, ‘badges’,’ scoreboard ‘,’ avatar ‘,’ time pressure ‘,’ score ‘,
‘being part of a story/narrative way’. And as the last sixth question, you need to answer
‘Yes’, ‘No’ or ‘Maybe’.
The research was conducted in the period September 2019 – September 2020,
among students, a total of 54. The results of the research are as follows (*HackerRank,
Pluralsight, Udemy or similar) (Table 2):
Table 2. Entry survey results

Question Answer
Gender? Male Female No answer
62% 38% 0%
Graduate level? Under-graduate Graduate No answer
17% 83% 0%
Have you ever used Yes No No answer
EDU platforms*? 82% 18% 0%
Do you believe that Yes No Maybe
gamification will improve 82% 0% 8%
your learning productivity?
4.3 Exit Survey

After the application was fully developed, an exit survey was conducted which will tell
us about the success of the implemented solution. Asked questions:
1. Do you agree that the offered web application can increase productivity?
2. Do you agree that gamified elements have increased the overall satisfaction of using
the app?
3. Would you recommend this application to colleagues?
4. Do you think that the concepts taught in schools are better understood in this way?
5. Do you think that learning through such an application has given you under-
standable feedback?
The exit survey that the students filled out after using the application was structured
in such a way that the possible answers were ‘Yes’ or ‘No’. Taught by the experience
of a previous survey that had an empty field to enter an answer, they were largely
answered with ‘Yes’ and ‘No’. Therefore, I decided on this type of exit survey with
only two possible answers where the views of the users will be very easy to see.
This survey was conducted in September also among fellow students, a total of 43.
Table 3. Exit survey results

Question Answer
Do you agree that the web application offered can increase productivity? Yes No
79% 21%
Do you agree that gamified elements have increased the overall satisfaction of Yes No
using the app? 77% 23%
Would you recommend the app to colleagues? Yes No
81% 19%
Do you think that the concepts taught in the classroom are better understood in Yes No
this way? 77% 23%
Do you think that learning through such an application gave understandable Yes No
feedback? 79% 21%
The results of the survey show that participants are generally satisfied with the
application. 79.1% answered that the offered web application can increase their pro-
ductivity. When asked if they agreed that the gamified elements increased the overall
satisfaction of using the application, 76.7% of them answered positively. This is an
indicator of how I put the gamified elements together meaningfully, and that they are
the ones that increase productivity and motivation when learning for a particular
course. 81.4% of students would recommend this app to colleagues. Through the
created table with the best results, students can compete with each other in gaining
points. 76.7% of students believe that in this way the concepts taught in universities are
better understood. This method is great for testing knowledge after listening to the
course and even before so that students can prepare for class in time. 79.1% of students
believe that learning has provided them with understandable feedback. This part is very
important because the feedback within the application has the role of learning or e-
learning (Table 3).
5 Conclusion
This paper deals with the topics of gamification and e-learning. Motivation is pro-
cessed, i.e. what encourages people to do a certain activity. It stems from intrinsic
feelings that lead a person to act for fun or challenge instead of external pressures or
rewards. E-learning has fitted in here as a way of learning using electronic media
where, by applying gamification, the learning process can be facilitated.
Engagement, involvement, and motivation may not have been so much achieved
with gamification (although survey results show the opposite) but with the enthusiasm
and fun that students had as they tested the application. In doing so, it is important to
keep in mind that all respondents are IT literate. Many of them are familiar with the
concept of gamified e-learning, and many uses it regularly for learning. The application
is designed to be used as a supplement to learning, which results in a major limitation
in this solution, which are – student obligations. The use of such an application is on a
volunteer level. Individual tasks and obligations at the level of the study program are
not included in the application.
We believe that the future of education and learning in general in this way is what
follows us. Students will have more opportunities, personalized content, there will be
less pressure on teachers and the like. The created application is easily upgradable if the
need for that arises. Also, minor modifications to this application can provide, for
example, the necessary training for employees within a company.
References
1. Zicherman, Z., Cunningham, C.: Gamification by Design: Implementing Game Mechanics in
Web and Mobile Apps. O’Reilly (2011)
2. Marcewski, A.: Gamification: A Simple Introduction (2013)
3. Sonts, K.: Gamificatin in Higher Education, Graduate Thesis. Tallin University (2013)
4. Webach, K.: For the Win. Wharton Digital Press (2012)
5. Pašić, Đ., Kučak, D.: Gamification in sport – improving motivation for recreational sport. In:
41st International Convention on Information and Communication Technology, Electronics
and Microelectronics (MIPRO), IEEE, pp. 0867–0871 (2018)
6. Fee, K.: Delivering E-Learning. Kogan Page Limited (2009)
7. Mayer, R., Clark, R.: E-Learning and the Science of Instruction. Wiley, New York (2016)
8. Elkins, D., Pinder, D.: E-Learning Fundamentals. Association for Talent Development
(2015)
9. http://www.dailymail.co.uk/sciencetech/article-1218944/Scaling-new-heights-Piano-stairwa
y-encourages-commuters-ditch-escalators.html (2019). August 2019
10. https://www.wired.com/2010/12/swedish-speed-camera-pays-drivers-to-slow-down/ (2018).
August 2018
11. Mrsic, L., Jerkovic, H., Balkovic, M.: Interactive skill based labor market mechanics and
dynamics analysis system using machine learning and big data. In: Sitek, P., Pietranik, M.,
Krótkiewicz, M., Srinilta, C. (eds.) ACIIDS 2020. CCIS, vol. 1178, pp. 505–516. Springer,
Singapore (2020). https://doi.org/10.1007/978-981-15-3380-8_44
12. Tolic, A., Mrsic, L., Jerkovic, H.: Learning success prediction model for early age children
using educational games and advanced data analytics. In: Vasant, P., Zelinka, I., Weber, G.-
W. (eds.) ICO 2020. AISC, vol. 1324, pp. 708–719. Springer, Cham (2021). https://doi.org/
10.1007/978-3-030-68154-8_61
13. Intelligent Computing and Optimization. In: Conference Proceedings ICO 2018, Springer,
9783030335847
book/10.1007/978-3-030-68154-8
Development of Distributed Data Acquisition
System
Bertram Losper1,2(&), Vipin Balyan1,2, and B. Groenewald1,2

1
Department of Electrical, Electronics and Computer Engineering, Cape
Peninsula University of Technology, Cape Town, South Africa
2
iThemba LABS, PO Box 722 Somerset West 7129, Johannesburg,
South Africa
Abstract. This paper discusses the evaluation of using streaming technologies

as an external memory buffer for use in Physics data acquisition systems. It will
show the development of a prototype distributed readout software system for an
experimental facility at iThemba LABS. This work builds on an ongoing project
known as the Dolosse Data Acquisition System and will focus on the devel-
opment of a new distributed data acquisition readout software on the readout
computer that is based on the project’s architecture. A new fragmented readout
method is proposed to allow efficient data transmission using the open-source
stream-processing software platform that can be used as an external memory
buffer while still maintaining the real-time requirements of the system.
Keywords: Python Confluent Kafka DAQ VME Dolosse FPGA
1 Introduction
This paper will discuss the development of a distributed software system for a Data
Acquisition System (DAQ) for use in High Energy Physics (HEP) experiments at
iThemba Laboratory for Accelerator-Based Sciences (iThemba LABS) to capture
experimental data and archive it for later analysis. The new prototype DAQ uses open-
source streaming technologies that run on a high-performance computing cluster to
provide a reliable and robust data readout path. With the latest developments in
streaming platforms like Kafka, the throughput of data processing has increased and it
has become well suited to use with HEP experiments.
The K600 Spectrometer facility is an experimental facility that is used to measure
inelastic scattered particles and reactions at extreme forward angles that includes zero
degrees [2]. The K600 facility’s DAQ is a Versa Module Eurocard (VME) [3] based
data acquisition system.
As a result, for the need of an extensible and scalable DAQ that is reliable and still
maintains stringent requirements of data readout that will ensure accuracy of the
computational analysis of nuclear reactions an investigation was launched to determine
what latest technologies available to use is available to be able to develop such a
system. The real-time DAQ presented in this paper is introduced to address the need for
these improvements mentioned in the previous section. A new readout method is
introduced that consists of a run-time process running on the DAQ’s Single Board

https://doi.org/10.1007/978-3-030-93247-3_82
868 B. Losper et al.
Computer (SBC), that moves the data from the detector electronics captured by the
readout cards (e.g. ADC) into the memory of the SBC. This data is produced using a
streaming platform to further processes in the readout chain (e.g. event processing and
data archiving).
2 Data Acquisition and Communication System Design
The new software system for the spectrometer is based on the Dolosse DAQ archi-
tecture as seen in Fig. 1.
2.1 System Design

The readout chain consists of an existing data source which is the current VME
frontend electronics for the data acquisition system, a Kafka computer cluster, and
downstream data consumers as seen in Fig. 1.
Fig. 1. Dolosse high level architecture [4]
Dolosse DAQ is a distributed DAQ architecture whose main objective is to create a

new DAQ using open, supported software solutions as a framework [4]. The Dolosse
project is an Open Source project aimed at the development of a physics data acqui-
sition and management system using the Open Source Apache Kafka framework [5]. It
is the brainchild of Dr. Stanley Paulauskas from Project Science who subsequently
collaborated with iThemba LABS and some universities to develop it.
Development of Distributed Data Acquisition System 869
2.2 DAQ Design

The overall functional block diagram of the prototype DAQ for the Spectrometer
facility is shown in Fig. 2.
Fig. 2. High level architecture of the Dolosse DAQ for the k = 600 Spectrometer
The DAQ can be divided into two specialized groups:

i. Digitiser Devices – which is the frontend VME electronics, and these devices
provide services such as digitization logic. These are VMEbus slave devices and
they perform their tasks according to the orchestration of the control and man-
agement messages that are being passed to them.
ii. System Computers – these devices perform user services, such as the control and
monitoring of the instrumentation, providing the external memory buffer, event
building, real-time data analysis, and the storage of analyzed data. All systems
which include the ReadOut Computer (ROC) and Linux workstations are imple-
mented on the same network.
The different systems that make up the DAQ communicate using instructions that
are structured in a straightforward manner using JavaScript Object Notation (JSON) as
seen in Fig. 3.
The data contained in these messages can be the non-formatted (raw) transducer
data, feedback, control, or configuration information that is passed through different
subsystems using Kafka topics as seen in Fig. 3.
The fields that the message consists of is:
• Category – describes the type of message that is being produced
• Technique – describes what type the system is being communicated with
• Run ID (optional) – is the current run number being executed
• Data – is the interesting information that that is being transmitted. This data can be
of type list or dictionary.
Fig. 3. Dolosse VME communication message
The communication message topics can be divided into four types:

i. Management topics
The acquisition software on the ROC will be written in such a way that is should
be able to receive configuration data by polling the Kafka topic that will be used
to send configuration data. The management topic is used to transfer configuration
data that is used to set up the different modules used in the VME crate.
ii. Feedback topics
Feedback topics are used to convey status and error information regarding the
status of Frontend Electronics. This data message contains the status regarding the
experimental run (start, stop or pause), warnings, and any errors that might occur
during a run like event misalignment.
iii. Data and event topics
Equipment (data or event) topics are used to send digitised transducer data to
external memory buffer which is a topic in the Kafka cluster. This external
memory buffer will contain the data from different modules in raw binary format
(fragments) described in [9–11].
iv. Control topics
The control topics are used to transmit the run control of an experiment. Data is read
from control topics by the ROC which will then react to the command specified.
The implementation of the topic structure for the spectrometer is shown in Fig. 4.
The ROC is a data source i.e. it produces information of interest subscribes to man-
agement and control topics and it provides equipment and feedback topics, which the
Linux workstation and Analysis computer will subscribe to.
Fig. 4. Message interface for the new DAQ
3 Software System Design
The K600 implementation of the DAQ comprises a single 9U VME crate that houses a
single board computer running Linux Operating System (OS) acting as the Read Out
Master (ROM). The ROM contains all the software and libraries needed to run the
Dolosse readout software, communicate with the Kafka cluster and transfer data from
the VMEbus modules to the external memory buffer. For the prototype software sys-
tem, a single producer to multiple topic design was selected for readout purposes
because it is much faster when used across multiple threads when compared to using
multiple instances of a producer as described in [8].
To enable the ROM to communicate with the Kafka cluster a VME producer and
VME consumer Class was developed using librdkafka as described in [12]. This
software interface is developed in C/C++ and is used to interface with the Kafka cluster
as shown in Fig. 5.
Fig. 5. Architecture of VME Kafka manager
Since all communication in Dolosse uses JSON messages a JSON Parser was
developed using the JSONCPP library with the VME Consumer Class so the readout
software application was able to consume and extract the important data that was
intended for it.
Using the C++ consumer and producer Classes with external memory buffer the
readout software for the prototype DAQ was developed to be flexible, and it is able to
expand with the arrival of new technologies or new requirements.
3.1 DAQ Configuration
Fig. 6. Configuration and handshaking of VME DAQ
The new DAQ will be configured by transmitting configuration data using the man-
agement topic from the user’s workstation through Kafka messaging system to the
ROC. Figure 6 shows the startup handshaking between the data acquisition system and
the backend server system where configuration information is being sent to the DAQ.
A custom JSON data structure is used to hold the configuration information sent by the
backend server system to Kafka which is consumed by the readout computer. When the
system has been configured successfully it returns a simple JSON string stating that the
system configuration was successful and the DAQ is in a ready state. In this state, the
DAQ is ready to receive control information to start acquiring data.
3.2 Readout Algorithm Design

For the new DAQ software to accommodate the external memory buffer, the software is
designed to produce data that as it is being read-out the form the signal digitizers as
seen in Fig. 7. Because of this fragmented data read out, the algorithm selected for
readout is a modified and simplified version of the multi-sensor readout described in [1]
as seen in Eq. 1. Using fragmented readout frees up processing resources on the ROM
because no event formatting is done on the ROM.
Fig. 7. Readout algorithm diagram
The readout algorithm used is described in Fig. 7 and was designed to operate as
follow: S1 to S4 in the readout process represents the modules that should be read out.
The four readings that need to be processed are carried out sequentially in four phases:
F1 = (s1), F2 = (s2), F3 = s3 * N and F4 = s4 * N where x represents the total data
readout, F represents the function of reading and producing of data from the ROC to
Kafka, and N is the number of modules that will be readout. The final output is
x = {F1, F2, F3, F4} as described in Eq. (1).
Fð xÞ ¼ fF1; F2; F3; F4g ð1Þ
The number of events that will be read out from each module is specified when data
transfer is initiated.
Upon completion of data transfer from signal modules to ROC, the data is parsed
and validated by checking that there weren’t any errors during readout. After the data
for each module was validated it is then extracted from the readout buffer, and pro-
duced from the ROC to the Kafka cluster, where the Kafka cluster functions as First-In-
First-Out (FIFO) external memory buffer for the event fragments using the data topics
as shown in Fig. 8.
Fig. 8. Data sources producing to Kafka topic

3.3 Collated Event Builder Design

The event builder is a software module that is written in Python which is used as the
name suggests builds events from the event fragments that were produced from the
ROM. The newly developed event builder is designed so that it is able to process a
single event readout from the module or buffered (multi) events.
The event builder is the next essential step in the readout process because it is
responsible for creating a presentable JSON event as seen in Fig. 9.
Though the event fragments that are buffered in the Kafka topic are usable by the
user, it would be very difficult for the user to use the data in this binary format. To
assist the user with analysis the event builder is used to consume the raw binary data
from the cluster, extract the event number from each data fragment and store this event
in a python dictionary [6] with the event number as the key.
Fig. 9. Event Builder functional diagram
The data in the dictionary is used to compare the event numbers of the fragments
from the different modules and when there’s a match between event numbers those
fragments are collated and reformatted into a usable JSON string. This structured event
is then produced back into the cluster and buffered using a different Kafka topic where
it can be accessed by any software module that consumes from that topic.
4.1 DAQ Readout Results

To verify the new data acquisition system operation, the DAQ was set up with the
ROC, and 1 QDC and 1 TDC VMEbus modules. The system was evaluated using two
uncorrelated Pulsars as seen in Fig. 10. Two uncorrelated pulsars were used to simu-
lated detector output signal above threshold that will generate triggers from the detector
electronics. Fixed event sizes were read out from the VMEbus modules to test that the
new readout software is able to operate according to the functional requirements.
The system tests that were conducted were divided into measuring the following
metrics namely;
1. Readout integration tests using DMA direct transfer and Multi-event (Block) DMA
direct transfer.
2. Event builder evaluation.
Fig. 10. Block diagram of DAQ test setup
A series of system-level tests were performed that were repeated multiple times to
establish that the readout software system using Kafka meets the operational require-
ments. Preliminary readout metrics obtained are shown in Table 1.
Table 1. Statistical results achieved

Heading level Single event Multi-event
readout readout
Wirechamber events (VMEcrate) 52318 24471
Total run length (s) 30 14
Average QDC readout and data transmission 29,5 us 51,6 us
times
Average TDC readout and data transmission 44,2 us 144,4 us
times
Average readout time (s) 76,4 us 198,6 us
Data Tx rate (kB/s) 132 124
The results from these experimental runs determined as shown in Table 1 that the
Kafka messaging is more than suitable to be used as transport medium because the
readout rate of the current DAQ is 132 and 124 kb/s for single and multi-event
respectively using 1 QDC and 1 TDC. These results provided evidence that using
Kafka as external event memory is more than capable for the data rate of the ROC
because Kafka was bench-marked at 605 MB/s as it was demonstrated in [7].
4.2 Event Builder Readout Results

To measure the performance of the event builder the Python Class threading. Timer
was used to create a periodic time-out thread. When the timer expires it calculates and
sends the status and performance metrics to Kafka where it can be used by the
downstream data consumers. The event builder performance is measured in events per
second that were collated and parsed in a run which is given by the following formula:
! !
1 X
1
x¼ Et Ep ð2Þ
T i¼0
The number of events per period was determined from the total sum of events per
run and the difference between the total events and the event count in the previous
period, divided by the period of the thread timeout.
Figure 11a and b show the distribution of the number of events per second/period
that was created with the event builder. The discrete nature of the histogram is due to
the way the metrics were collected when running the event builder application.
Fig. 11. Event builder results histogram for single and multi-event readout.
To verify the integrity of the collated events that was created with the event builder,
a downstream consumer was created and used with a modified version of the
K600RootAna analyser [13]. The analyzer was used to consume the data from the
cluster every 100ms. Using the consumed data from the cluster the data was visualized
by creating the following plots with the RootAna Analyser showing the RAW ADC
data from the QDC (see Fig. 12a) and energy loss through the paddle1 vs paddle2 of
the focal plane detector (see Fig. 12b).
Fig. 12. Histograms of Visualized using the Collated data
5 Conclusion and Discussions
A distributed readout software system was developed and implemented using the
Dolosse DAQ architecture. The DRSS met the requirements of data readout and event
building. Due to the distributed design of the DRSS software, new modules can be
added to meet new requirements of the readout software by just adding a new
consumer/producer to the readout pipeline. This new design of readout software pro-
vides a simple and cost-effective method by using open source streaming technologies
to develop a data acquisition system for small experiments. The current version of
DRSS with event timestamp is being tested using 2 uncorrelated pulsars and is meeting
the functional requirements.
Future tests that need to be done is will be evaluating the new readout system under
beam-on conditions.
Acknowledgment. The authors would like to acknowledge the National Research Foundation
for their financial assistance towards this research. The authors would also like to thank and
acknowledge the Software Engineering Division at iThemba LABS for their contribution to the
work presented in this paper.
References
1. González, V., Barrientos, D., Blasco, J.M., Carrió, F., Egea, X., Sanchis, E.: Data
Acquisition in particle physics experiments. In: Data Acquisition Applications. IntechOpen
(2012). https://www.intechopen.com/books/data-acquisition-applications/data-acquisition-
in-particle-physics-experiments. 8 Aug 2020
2. iThemba LABS. Subatomic Physics – K = 600 Magnetic Spectrometer https://tlabs.ac.za/
subatomic-physics/k600-magnetic-spectrometer/ (n.d.). 18 Aug 2020
3. VITA. American National Standard for VME64. https://www.ge.infn.it/*musico/Vme/
Vme64.pdf (1995). 18 Aug 2020
4. Dolosse. Modernizing Nuclear Physics Data Processing. https://dolosse.org/modernizing-
nuclear-physics-data-processing/ (n.d.). 24 Aug 2020
5. Kafka: Introduction, Everything You Need to Know About Kafka in 10 Minutes https://
kafka.apache.org/intro (n.d.). 2 Sep 2020
6. Real Python. Dictionaries in Python. https://realpython.com/python-dicts/ (n.d.). 5 June

2021
7. Confluent. Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ. https://www.
confluent.io/blog/kafka-fastest-messaging-system/ (n.d.). 5 Sep 2020
8. Kafka. Class KafkaProducer. https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/
producer/KafkaProducer.html (n.d.). 7 Mar 2021
9. CAEN. V792 Technical Information Manual. https://www.caen.it/?downloadfile=4021
(2010). 9 Mar 2021
10. CAEN. V785 Technical Information Manual. https://www.caen.it/?downloadfile=3976
(2012). 9 Mar 2021
11. CAEN. V1190A/B-2eSST 128 Channel Multihit TDC Technical Information Manual.
https://www.caen.it/?downloadzip=4024-4660 (2012). 9 Mar 2021
12. librdkafka. The Apache Kafka C/C++ Client Library. https://docs.confluent.io/platform/
current/clients/librdkafka/html/index.html (n.d.). 6 Aug 2020
13. RootAna. MIDAS Documentation. https://midas.triumf.ca/MidasWiki/index.php/ROOTA
NA (n.d.). 10 Jan 2021
Images Within Images? A Multi-image
Paradigm with Novel Key-Value Graph
Oriented Steganography
Subhrangshu Adhikary(&)
Department of Computer Science and Engineering, Dr. B.C. Roy Engineering

College, Durgapur 713206, West Bengal, India
subhrangshu.adhikary@spiraldevs.com
Abstract. Steganographic methods have been in the limelight of research and

development for concealing secret data within a cover media without being
noticed through general visualization. The Least Significant Bits (LSBs) of 8-bit
color code for the RGB image arises the possibility of replacing the last two bits
with the bits of the encrypted message. Several procedures have been developed
to hide an image within another image however in most cases the payload image
has to be within the accommodatable range of the cover image and very little
literature have shown methods to hide multiple images within multiple images.
This paper presents a novel approach to split the image into JSON styled dic-
tionary of key-value pairs and using a metadata graph to locate different parts
and positions of the payload images in the entire cluster of cover images. The
model could be easily used in the real world scenario for privately sharing secret
data over public communication channels without being noticed.
Keywords: Image steganography Key-value mapping graph Least

significant bit insertion Privacy protection Image processing
1 Introduction
The quest for developing methods to ensure privacy for data transmission over public
communication channels has given rise to different steganographic techniques [1].
Steganography is the method of hiding data in a host file without being noticed and is
particularly used in environments where directly applying cryptographic encryption
arise suspicion [2]. Based on the host medium, different types of steganography include
hiding data in images, sounds, text, video or other computer files. Images are one of the
most widely used steganographic cover media [3]. Different methods can be applied for
hiding data within images and these are generally Least Significant Bit (LSB) Insertion,
Masking and Filtering techniques, Redundant Pattern Encoding, Encrypt and Scatter
and finally Coding and Cosine transformation [4]. Among all these, LSB Insertion is
the simplest and most widely used technique [5]. This method does not invoke sus-
picion as a significant amount of data can be hidden within an image and the size of the
stego image is always close to the original image [6]. In this method, stego image is
created by replacing the least significant bits of the image with the secret data. Using 1

https://doi.org/10.1007/978-3-030-93247-3_83
880 S. Adhikary
bit generally alters the pixel color intensity by approximately ±1 unit and therefore the
stego image is almost indistinguishable however this gives a very little capacity for
storing data [7]. Using 3 bits alters the pixel color intensity of up to ±5 units which
gives a large capacity to store data but the stego image has visible differences compared
to the original image [8]. Therefore generally 2 bits are used to store the data which
alters the color intensity by up to ±3 units and therefore the differences between the
stego image and original image have very little differences and are indistinguishable by
human eyes and besides this, using 2 bits gives a decent capacity to store the data [9].
JavaScript Object Notation (JSON) is a format of data communication based on
dictionaries of key-value pair where ‘{’ and ‘}’ are used to start and end a block, keys
are surrounded by “..” quotation marks and values are written based on their corre-
sponding data types [10]. The data can be nested and values can be of array types as
well. This method can be used to store metadata of the dataset as well as different
values associated with the intermediate stages of the secret data [11].
1.1 Motivation and Contribution of the Work

Most steganographic works for hiding data within images based on LSB has been
performed to hide text data. Recently different techniques of flattening the image matrix
have become popular to store the image data within an image, however, when the
payload file size is large, the cover media fails to accommodate the secret image [12,
13]. Other techniques have evolved to accommodate data of a single image over
multiple cover images but very lesser studies have been performed to store multiple
secret images spread across multiple cover images [14, 15].
This motivated us to develop this novel method to store multiple secret images
across multiple cover images by flatmapping the image, splitting it into chunks and
arranging JSON with metadata and graph based mapping. The details of the technique
and performances are discussed later in the text.
2 Methodology
The proposed methodology for hiding and retrieving of multiple images within mul-
tiple images having undergone multiple intermediate steps. The details have been
explained in the following text (Fig. 1).
Images Within Images? A Multi-image Paradigm 881
Fig. 1. The representation of the proposed multi-image paradigm method where the RGB matrix
of the image is first flattened and dividing into chunks of n bytes which along with the metadata
properties of the image are then used to convert into JSON Objects. Following this, the objects
are used to create the JSON Mapping graph in the form of a string and the string is finally
embedded into the cover image with LSB insertion method
2.1 Processing the Payload Images

The most important part of the proposed algorithm is processing the payload images to
embed them in such a way that it is easily recovered even after distributing the data in
multiple images [16]. For this purpose, the flatmap technique has been used to convert
a single image matrix into a 1D array. In this, the matrix for each of the colors namely
Red, Greed and Blue have been individually converted into a 1D array by appending
each row one after another and finally appending a 1D array for each color one after
another. The same process is repeated for each of the payload images.
After this step, each of these 1D arrays has been split into chunks of size based on
the cover size of cover images. If the chunk size is too small, a lot of data is consumed
by JSON Graph metadata but very little amount of pixels are left vacant in this process.
When the chunk size is too large, the size of JSON Graph metadata is small however a
large number of pixels in the cover image are left vacant. Therefore keeping these in
mind, 512 byte chunks have been used for the experiment.
2.2 The Mapping Graph and JSON Key-Value Pair Dictionary

After flattening the payload images and dividing them into chunks of suitable sizes, the
mapping graph is created using JSON. For this purpose, the first two key-value pair of
the JSON contains the number of payload and cover images. Followed by this, the
child of the nested JSON objects contain details of each image [17, 18]. In these, the
metadata contains the shape of the original payload image including height, width,
882 S. Adhikary
colormaps, number of chunks, etc. And the corresponding image-data key contains the
array of tuples containing the position of the chunk in the graph, numbering to identify
the payload image and the processed chunks of 512 bytes each. This JSON is then used
to prepare the final string to be embedded within the cover images [19, 20].
2.3 LSB Insertion to the Cover Images

Followed by the processing of the payload images and preparing the JSON graph string
for each payload image, the prepared payload strings are then inserted into the least two
significant bits of each of the image pixels [21, 22]. Each bit of the payload strings is
serially incorporated within the cover images one after another based on their encod-
able capacity according to the general Steganographic LSB insertion method. Hence
the stego images are generated [23, 24].
2.4 Retrieval of Payload Images

Once the stego images have been generated, it is also required to be decoded to retrieve
the payload. For this, the LSBs of all the stego images are individually decoded and
combined to the required location based on the graph mapping within the JSON strings
of each stego image. Then the combined string is arranged to merge the split chunks for
each image separately and then based on the metadata for the images, the flatmap 1D
array is reshaped to form the image matrix and finally, the hidden images are retrieved
[25, 26].
The LSB insertion of the payload string was prepared by combining JSON objects of
chunks of flattened RGB matrix forming the mapping graph. The text has been per-
formed on 8 cover images and 2 payload images. The 8 cover images include pho-
tographs of a dog, whale, giraffe, horse, squirrel, camel, tiger and fish and the 2 payload
images includes a photograph of a cat and a parrot as shown in Fig. 2. Combining these
8 images, a total of 38115048 slots of 2 bits each are the available space to store data
while using 2 LSB for each point. The 2 images of the payload combined require space
of 35047586 slots of 2 bit however because of the creation of the JSON Mapping
Graph and padding, approximately an additional 5% space is required making the
payload string size 37228972. This makes utilization of 97.67% available space. On a
computer of Intel i3 6th Gen CPU and 12 Gigabytes of RAM the method required
74.7 s for encoding and 19.1 s for decoding the images.
Fig. 2. The figure demonstrates the 8 stego images on the left which had hidden the two payload
images in the right without being visibly noticed
The stego images are visually indifferent from the original image. However, the
differences are clear on observing their color intensity histogram. The histogram for
one image, for example, whale compared to its steganographed counterpart have been
shown in Fig. 3. It can be observed that although there are indetectable visual differ-
ences between the original image and stego image, their histograms have significant
differences denoting the modification of the LSBs. The maxima of the counts for
specific intensity values of all pixels combined for the original image was around
110000 however the maxima for the stego image was around 175000. Observing the
color bands individually, it could be noticed that the red color have maximum
occurrence near 110 intensity values for both original and stego image however the
maximum number of times the red value appeared is around 78000 original image and
around 123000 for stego image. Similarly for green, the intensity maxima had occurred
for both the images at around 140 but the highest number of times the intensity value
has occurred is around 76000 for the original image and around 125000 for stego
image. Finally, the blue color maxima had occurred around the intensity value range
near 165 for both the images but the maxima for the original image is approximately
around 80000 and for stego image is around 124000.
884 S. Adhikary
Fig. 3. The side by side comparison of original image and stego image with their corresponding
histogram of color intensity
At a closer look, it can be observed that the histogram of the original image has
followed an evenly distributed pattern without sudden spikes but the histogram of stego
image appears discrete with gaps at almost regular intervals. This is because, in a natural
image, the neighbouring pixel values change smoothly besides sharp boundaries of
different colors, textures or intensities. On the other hand, the last two bits of the stego
image has been altered causing a difference of ±3 intensities for every neighbouring
pixel as well as for all three color matrix. This breaks the natural continuity of the
changing color and makes the histogram discrete. This is a limitation of the LSB method
as well where although the differences of the images cannot be distinguished visually
but a simple histogram can be easily used to detect whether an image is steganographed
and this is why encrypting the payload with cryptographic techniques are also recom-
mended along with steganography to completely safeguard the data. This phenomenon
also explains why the maxima of the color intensity for all the colors is much higher for
stego image compared to the original image as the gaps within the color intensity values
have been appended to the spikes of the histogram of the stego image.
4 Conclusion
The steganographic methods are used to hide messages within another cover media
without being noticed. The Least Significant Bit (LSB) insertion method is a popular
steganographic method to hide data within an image making use of very insignificant
bits. Different approaches have been used to hide an image within another image
however they have a limitation of encodable capacity. This paper presents a method to
solve this issue by introducing a novel approach with JSON Mapping Graph based to
encode multiple images within multiple images.
The method uses flatmap technique on RGB image matrix and then split the data
into chunks of n bytes each and then JSON object is created with those chunks and
metadata of the image. Finally, the objects hence created are used to create the Map-
ping Graph which finally creates the encodable payload string. These strings are then
embedded into the cover images by LSB insertion method. The method is visually
indistinguishable but can be detected with histogram and hence cryptographic
encryption is also suggested on the payload string for added safety.
The method can be easily used to embedded large payload image files over multiple
cover images in any order and then can be combined to get back the payload file. The
stego images can be shared over public communication channels without being visually
noticed maintaining privacy. Further, the model could be improved to fit very large
data files within images without being noticed and reduce the discrete histogram spikes
with more evenly spread spikes.
Acknowledgments. The work is a part of The Gyanam Project as a joint collaboration between
Spiraldevs Automation Industries Pvt. Ltd., India and Wingbotics, India.
References
1. Gutub, A., Al-Ghamdi, M.: Hiding shares by multimedia image steganography for optimized
counting-based secret sharing. Multimedia Tools and Applications 79(11), 7951–7985 (2020)
2. Duan, X., Guo, D., Liu, N., Li, B., Gou, M., Qin, C.: A new high capacity image
steganography method combined with image elliptic curve cryptography and deep neural
network. IEEE Access 8, 25777–25788 (2020)
3. Gutub, A., Al-Shaarani, F.: Efficient implementation of multi-image secret hiding based on
LSB and DWT steganography comparisons. Arab. J. Sci. Eng. 45(4), 2631–2644 (2020)
4. Adhikary, S., Ghosh, R., Ghosh, A.: Gait abnormality detection without clinical intervention
using wearable sensors and machine learning. In: Muthukumar, P., Sarkar, D.K., De, D., De,
C.K. (eds.) Innovations in Sustainable Energy and Technology. ASST, pp. 359–368.
5. Almazaydeh, L.: Secure RGB image steganography based on modified LSB substitution. Int.
J. Embedded Syst. 12(4), 453–457 (2020)
6. Islam, M.A., Riad, M.A.A.K., Pias, T.S., Enhancing security of image steganography using
visual cryptography. In: 2021 2nd International Conference on Robotics, Electrical and
Signal Processing Techniques (ICREST), IEEE, pp. 694–698.
7. AlKhodaidi, T., Gutub, A.: Refining image steganography distribution for higher security
multimedia counting-based secret-sharing. Multimedia Tools Appl. 80(1), 1143–1173
(2020). https://doi.org/10.1007/s11042-020-09720-w
8. Adhikary, S., Chaturvedi, S., Chaturvedi, S.K., Banerjee, S.: COVID-19 spreading
prediction and impact analysis by using artificial intelligence for sustainable global health
assessment. In: Siddiqui, N.A., Bahukhandi, K.D., Tauseef, S.M., Koranga, N. (eds.)
Advances in Environment Engineering and Management. SPEES, pp. 375–386. Springer,
886 S. Adhikary
9. Liao, X., Yin, J., Chen, M., Qin, Z.: Adaptive payload distribution in multiple images
steganography based on image texture features. IEEE Trans. Depend. Secure Comput.
(2020)
10. Hureib, E.S., Gutub, A.A.: Enhancing medical data security via combining elliptic curve
cryptography and image steganography. Int. J. Comput. Sci. Netw. Secur. (IJCSNS) 20(8),
1–8 (2020)
11. Sukumar, A., Subramaniyaswamy, V., Ravi, L., Vijayakumar, V., Indragandhi, V.: Robust
image steganography approach based on RIWT-Laplacian pyramid and histogram shifting
using deep learning. Multimedia Syst. 27(4), 651–666 (2020). https://doi.org/10.1007/
s00530-020-00665-6
12. Shah, P.D., Bichkar, R.: Genetic algorithm-based imperceptible image steganography
technique with histogram distortion minimization. In: Balas, V.E., Hassanien, A.E.,
Chakrabarti, S., Mandal, L. (eds.) Proceedings of International Conference on Computa-
tional Intelligence, Data Science and Cloud Computing. LNDECT, vol. 62, pp. 267–278.
13. Kumar, S., Kumar, S., Singh, N.K., Majumder, A., Changder, S.: A novel approach to hide
text data in colour image. In: 2018 7th International Conference on Reliability, Infocom
Technologies and Optimization (Trends and Future Directions) (ICRITO), IEEE, pp. 577–
581 (2018)
14. Mukherjee, S., Sanyal, G.: Image steganography with N-puzzle encryption. Multimedia
Tools Appl. 79(39–40), 29951–29975 (2020). https://doi.org/10.1007/s11042-020-09522-0
15. Alexan, W., El Beheiry, M., Gamal-Eldin, O.: A comparative study among different
mathematical sequences in 3d image steganography. Int. J. Comput. Digit. Syst. 9(4), 545–
552 (2020)
16. Lu, S.P., Wang, R., Zhong, T., Rosin, P.L.: Large-capacity image steganography based on
invertible neural networks. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 10816–10825
17. Pramanik, S., Singh, R.P., Ghosh, R.: Application of bi-orthogonal wavelet transform and
genetic algorithm in image steganography. Multimedia Tools Appl. 79(25–26), 17463–
17482 (2020). https://doi.org/10.1007/s11042-020-08676-1
18. Cogranne, R., Giboulot, Q., Bas, P.: Steganography by minimizing statistical detectability:
the cases of JPEG and color images. In: Proceedings of the 2020 ACM Workshop on
Information Hiding and Multimedia Security, pp. 161–167 (2020)
19. Vishnu, B., Namboothiri, L.V., Sajeesh, S.R.: Enhanced image steganography with PVD
and edge detection. In: 2020 Fourth International Conference on Computing Methodologies
and Communication (ICCMC), IEEE, pp. 827–832 (2020)
20. Adhikary, S., Chaturvedi, S.K., Banerjee, S., Basu, S.: Dependence of physiochemical
features on marine chlorophyll analysis with learning techniques. In: Siddiqui, N.A.,
Bahukhandi, K.D., Tauseef, S.M., Koranga, N. (eds.) Advances in Environment Engineering
and Management. SPEES, pp. 361–373. Springer, Cham (2021). https://doi.org/10.1007/
978-3-030-79065-3_29
21. Kweon, H., Park, J., Woo, S., Cho, D.: Deep multi-image steganography with private keys.
Electronics 10(16), 1906 (2021)
22. Das, A., Wahi, J.S., Anand, M., Rana, Y.: Multi-Image Steganography Using Deep Neural
Networks. arXiv preprint arXiv:2101.00350 (2021)
23. Hong, T.V.T., Do, P.: SAR: a graph-based system with text stream burst detection and
visualization. In: Vasant, P., Zelinka, I., Weber, G.-W. (eds.) ICO 2018. AISC, vol. 866,
pp. 35–45. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00979-3_4
24. Intelligent Computing and Optimization. In: Conference Proceedings ICO 2018, Springer,
9783030335847
book/10.1007/978-3-030-68154-8
Application of Queuing Theory to Analyse
an ATM Queuing System
Kolentino N. Mpeta(&) and Otsile R. Selaotswe
North West University, Mmabatho, South Africa

kolentino.mpeta@nwu.ac.za
Abstract. This study determined the current queueing characteristics for 2

ATMs at a bank in Mmabatho, South Africa as output for a M/M/s model. Time
between arrivals (TBA) and service times served as basic inputs to the system.
Data was collected by observation for both busy and non-busy periods. Results
showed that during the busy period the capacity utilization was 60% with 40%
idle time. Customers spent an average of about 2 min in the system with a 25%
probability of having no customers in the system. Furthermore, the status quo of
a single queue leading to the 2 ATMs was found to be efficient. The study
concluded that the two ATMs currently available to customers were sufficient to
handle customer demands optimally thus there was no need to add another
machine.
Keywords: Queuing Time between arrivals Capacity utilization
1 Introduction and Background
Banks use automated teller machines (ATMs) as a way of limiting customers that need
to seek services inside the banking halls while at the same time making banking
services accessible to customers outside conventional working hours. The ATMs have
as a result become subject to large service demands which directly turn to queues for
service when these demands cannot be swiftly satisfied especially during weekends,
festive periods and month-ends where the demand for cash is high.
Banks use automated teller machines (ATMs) as a way of limiting customers that
need to seek services inside the banking halls while at the same time making banking
services accessible to customers outside conventional working hours. According to
Stevenson (2018), queues or waiting lines neither add to customers’ pleasure nor
generate extra returns for business enterprises thus the need to reduce queues. The
experience of waiting long hours at ATMs is not only bothersome to customers but also
unprofessional on the part of providers who manage this process (Yakubu and Najim
2014). Long waiting time in queues accounts for the loss of customers and associated
losses such as loss of goodwill and decline in customer satisfaction. Olu (2019) concurs
that long queues are a source of anxiety and dissatisfaction among customers and, in
many cases, bankers and may result in loss of customers.

https://doi.org/10.1007/978-3-030-93247-3_84
Application of Queuing Theory to Analyse an ATM Queuing System 889
1.1 Problem Statement

Long queues at ATMs during certain periods are a problem with South African banks.
The long periods of waiting often lead customers to leave a queue without performing
their desired transaction. Over time, impatient and dissatisfied customers may decide to
withdraw from banks that always experience long queues and switch to others offering
better service delivery. The move or switch of customers from the customer insensitive
banks results in a slow but sure loss of customer patronage (Adedoyin et al. 2014).
Banks on the other hand may not see the necessity of increasing the number of ATMs
since at other periods the machines are not utilised as there will not be any customers
queueing. Striking a balance between having adequate machines and reducing cus-
tomer queues is therefore something banks need to achieve.
This study thus sought to:
1. Determine the queuing characteristics for the present scenario.
2. Determine whether the number of ATM servers is adequate.

A study to evaluate an ATM service using queuing theory conducted by Ajiboye
(2014) focused on three periods namely: 7am – 10 am, 10am – 1pm and 1pm – 4pm,
corresponding to busy periods of the day. According to Ajiboye (2014), if satisfactory
quality of service could be documented during these busy periods then the system
could be presumed to be performing well. The arrival time, time at commencement of
service and time of departure from service were recorded. Results for the combined
three periods gave an arrival rate = 0.587 and service rate = 0.6982. The average
utilization of the system was 84.07% indicating that the system was well utilized.
Customers spent approximately 9 min in the system with an average of about 8 min
spent in the queue.
More recently, Burodo et al. (2019) conducted a study that analyzed queuing
characteristics at a branch of First Bank Ltd in Nigeria. Queuing data in this study was
collected through observation from Monday to Wednesday before an optimal queuing
model was developed. Service efficiency parameters for single, two and three servers
were compared. Findings provided evidence for the support of the multiple-server
model over the single server model. For example, whereas customers in the two or
three server model spent on average about 3 min and 2 min in the system respectively,
the same customer spent an average of 12 min in the single server model.
Furthermore, Olu (2019) carried out a study to explore the number of servers
required for optimal service delivery at Heritage Bank in Nigeria, which had two
ATMs. 8-h observation periods during weekdays revealed that on a daily basis an
average of 90 customers sought service at the ATMs. The total time between arrivals
for 191 customers, T = 1315 min while the total time taken for these customers to be
served was 1585 min. The arrival rate, k = 191/1315 = 0.1452 and the service rate,
µ = 191/1585 = 0.1205. This corresponded to a traffic intensity of 60%.
890 K. N. Mpeta and O. R. Selaotswe
1.3 Methods
Secondary data used in this study was obtained from the bank. The data had been
collected for three different time periods covering both busy and non-busy situations on
weekdays, month-end as well as on a Sunday respectively. The multiple server queuing
model (M/M/s) which allows for two or more servers to process or handle arriving
customers was adopted in this study. The data focused on customer arrival rates,
service start times, time availability for service, ATM idle time and customer waiting
time.
The observed time between arrivals (TBA) of the customers was recorded in col-
umn while the time taken by a customer getting service from the machine was recorded
as “service time” (ST). The arrival time (AT) for the first customer was considered the
start time (00:00:00) for the observations. The other quantities were obtained in the
following manner:
Arrival time for customer 2 onwards
ATn þ 1 ¼ ATn þ TBAn þ 1 with AT1 ¼ 00 : 00 : 00 ð1Þ
The Service Start Time (SST). Two scenarios were considered.

Scenario 1. If a customer arrived when one or both the ATMs were not in use (idle)
then service would start immediately, that is,
SSTn ¼ ATn for n 2 ð2Þ
Scenario 2. If a customer arrived when both ATMs were in use, they would wait in
the queue and start service on the ATM where a customer finished first.
Time in Queue (TIQ). The time that a customer spent waiting in the queue is given by
TIQn ¼ SSTn ATn ð3Þ
Finish Service Time (FST). The time that a customer spent waiting in the queue is
given by
FSTn ¼ SSTn þ STn ð4Þ
Time in System (TIS). The time that a customer spent in the system is given by
TISn ¼ TIQn þ STn ð5Þ
The arrival rate, k, is determined using the formula:

Application of Queuing Theory to Analyse an ATM Queuing System 891
Total number of customers that arrived in period

k¼ P ð6Þ
TBA
while the service rate, µ, is given by
Total number of customers served in period

l¼ P ð7Þ
ST
1.4 Results
From the weekday sample, the arrival rate, k ¼ 10899144
¼ 0:0132 customers=sec while the
service rate, l ¼ 13545 ¼ 0:0106 customers=sec. These rates translate to an arrival rate
144
of approximately 48 customers per hour and a service rate of approximately 39 cus-

tomers per hour. The non-busy-period (Sunday) gave an arrival rate = 15 customers
per hour while the service rate was 30 customers per hour. Lastly the month-end
sample gave an arrival rate of approximately 49 customers per hour with a service rate
of almost 41 customers per hour.
The arrival rates as well as service rates in the foregoing paragraph were entered as
input in excel QM in order to generate the queue characteristics shown in Table 1
below.
Table 1. Comparison of queue characteristics

Characteristic Weekday Month-end Sunday
Average server utilization 0.6154 0.5976 0.2500
Average number of customers in the queue 0.7502 0.6638 0.0333
Average number of customers in the system 1.981 1.5889 0.5333
Average waiting time in the queue 0.0156 0.0135 0.0022
Average time in the system 0.0413 0.0379 0.0356
Probability (% of time) system is empty 0.2381 0.2519 0.6000
Results displayed in Table 1 show that the average server utilisation for the
weekday sample was 61.54%, that is, the two ATMs were busy 61,54% of the time.
There were on average two customers either in the queue or being served at any given
time. Customers were expected to spend 0.0156 h (56 s) and 0.0413 h (149 s) on
average in the queue and in the system respectively. On the other hand, the average
server utilization for the month-end sample was 0.5976, implying that the two ATMs
were busy approximately 60% of the time. Customers were expected to spend an
average of 0.0135 h (7.9 s) and 0.0379 h (128 s) in the queue and in the system
respectively. The average number of customers in the system was two. The results also
show that there was a 25% chance of the system being empty. Lastly, for the non-busy
period, represented by Sunday in Table 1, the average server utilisation was 0.25,
implying that the two ATMs were busy 25% of the time. In other words, the ATMs
892 K. N. Mpeta and O. R. Selaotswe
were idle 75% of the time. Customers were expected to spend an average of 0.0022 h
(7.9 s) and 0.0356 h (128 s) in the queue and in the system respectively. The results
also show that there was a 60% chance of the system being empty.
1.5 Discussion
Since during the busy period the ATMs were busy approximately 60% of the time,
based on the M/M/s model, it is not necessary to increase the number of ATM servers.
The current system is operating efficiently. The two ATMs that the bank is currently
using are adequate. The expected customer waiting time is less than half a minute and
thus very reasonable. During the non-busy period, the ATMs are mainly idle indicating
that there is no need of incurring the extra cost of installing and operating extra ATMs.
1.6 Limitations of the Study

This study only considered the queue characteristics for comparison. No queuing costs
were included thus a comparison based on minimising costs was not possible.
1.7 Recommendations
Based on the observed data and calculations done via Excel QM, this study proposes
that the status quo should be maintained. The bank should however make sure that once
a machine is out of service it gets repaired as soon as possible to avoid inconveniencing
the customers.
References
Adedoyin, S.I., Alawaye, I., Taofeek-Ibrahim, F.A.: Application of queuing theory to the
congestion problem in banking sector (A case study of First Bank Plc). Ilorin. Int. J. Adv.
Res. Comput. Sci. Technol. 2(2), 357–360 (2014)
Ajiboye, A.S.: Evaluating an ATM service using queue theory. J. Stat. Manage. Syst. 17(5–6),
519–527 (2014)
Burodo, M.S., Suleiman, S., Shaba, Y.: Queuing theory and ATM service optimization: empirical
evidence from first bank Plc, kaura namoda branch, Zamfara state. Am. J. Oper. Manage. Inf.
Syst. 4(3), 80 (2019). https://doi.org/10.11648/j.ajomis.20190403.12
Olu, O.T.: Application of queuing theory to a bank’s automated teller machine (ATM) service
optimization. Math. Lett. 5(1), 8–12 (2019)
Stevenson, W.J.: Operations Management, 13th edn. McGraw Hill, New York (2018)
Yakubu, A.W.N., Najim, U.: An application of queuing theory to ATM service optimization: a
case study. Math. Theory Model. 4(6), 11–24 (2014)
A Novel Prevention Technique Using Deep
Analysis Intruder Tracing with a Bottom-Up
Approach Against Flood Attacks in VoIP
Systems
Sheeba Armoogum1(&) and Nawaz Mohamudally2

1
Faculty of Information, Communication and Digital Technologies, Port Louis,
Mauritius
s.armoogum@uom.ac.mu
2
School of Innovative Technologies and Engineering, University of Technology
Mauritius, La Tour Koenig, Port Louis, Mauritius
alimohamudally@umail.utm.ac.mu
Abstract. The Voice over Internet Protocol (VoIP), which is the least expen-
sive system used for voice communication has taken another direction when a
layer of vulnerabilities is created to detain its progress. Amongst the many
threats described by the Security Alliance official document, the flood attacks
are the most difficult ones for end-point servers. In this paper, a deep analysis
method for tracing intruders to VoIP networks using a bottom-up approach is
been considered. The model can address small rate and high rate attacks with
different behaviour. For a 16-s timeframe set by the firewall for the broadcast of
each packet to the Session Initiation Protocol (SIP) server in a time interval of
500 ms each, a false alarm rate of zero and a detection rate of 100% for the
entire timeframe are observed, irrespective of the attack rates. The proposed
model has a performance accuracy of 98.7% in each timeframe of 500 ms,
which is better than the value prescribed in previous work from the literature.
Keywords: Denial of Service (DoS) Deep analysis Deep learning Flood

attacks Intrusion Detection and Prevention System (IDPS) Session Initial
Protocol (SIP) Voice over Internet Protocol (VoIP)
1 Introduction
Internet Protocol Telephony has evolved since its inception and has now reached a
level of maturity. The growth of VoIP technology has escalated rapidly for developing
countries [1]. With the rise of VoIP technology, new capabilities get embedded into our
modern communications [2, 3]. The Global Market Insights Inc [4], a global market
research and management consulting organization, forecasted a penetration of 12%
between 2019 and 2025 while the Global share will rise to USD 55 billion [5, 6]. Over
the last ten years, especially during the Covid-19 pandemic situation, frequent cyber-
attacks have been recorded. Unfortunately, this innovative technology raised concerns
by many industries, service providers, and the research community. Indeed, the rise of

https://doi.org/10.1007/978-3-030-93247-3_85
894 S. Armoogum and N. Mohamudally
VoIP has given rise to negative impact as scammers and phishers realize how useful it
is right alongside honest businesses. According to MyBroadband [7], the largest South
African technology industry, around 46% of all illegally made phone calls used VoIP
technologies. However, these statistics will hardly decrease due to the smartness of
attackers.
Despite the above-mentioned positive discussion, reports have indicated VoIP
security raised a challenge over the years. Security has always been an issue when it
surfaces the transmission of text, voice, video, and other real-time media. It is a
common practice for an organization to design and implement two or three network
systems (IP telephony, wired, and wireless) using the same network backbone to
reduce the implementation cost. However, these types of integrated network systems
produce more security weaknesses and threats such as flood-intentional attacks, mes-
sage tampering, and spoofing and integrity [8]. Attacks pose a high risk when massive
amounts of fake messages are been injected into the system to target either the SIP
server or its nodes [9], thereby creating network congestion. This kind of issue leads to
freezing or stopping the system from operating by causing some disruption to the
logistic resources [29]. The main component of this technology is the protocol, namely,
the Session Initiation Protocol (SIP) used for message transmission. Since the SIP is a
signaling protocol working at layer 7 of the OSI model, illegitimate users find it very
easy to attack at this layer. Furthermore, very few rates of messages were sent to the
server [8]. An attacker grabs this opportunity to send a pool of enormous fake packets
in a lapse of time to exhaust the network system [29]. The system will be unable to
detect such attacks as the latter are not malformed ones and therefore they penetrate the
system without being noticed [11–13].
Since the inception of VoIP technology, researchers and industries have strived
hard to find solutions to combat intruders and attackers. Moreover, to develop the IDS,
researchers adopted Artificial intelligence and other innovative algorithmic techniques.
In addition, to complement the work of firewalls and the SIP servers as a second tier to
the defense system, some designers [14–16] used rule-based and statistical techniques
to build concrete IDPS. The authors in [29] state that signature-based or anomaly-based
prevention models can further be enhanced using statistical methods. Although the
literature shows that many prevention tools have originated in the past 15 years,
hackers still circumvent in conducting malicious activities on private and public net-
works. Flood-based attacks remain one of the significant threats to the VoIP envi-
ronment. However, research gaps remain in the security of VoIP technology due to the
disparity in the effectiveness, efficacy, and security measures of the defense systems.
The contemporary critical situation of the COVID-19 pandemic has elevated the
enormous usage of the VoIP system. The fundamental significance of the study is to
mitigate the flood attacks to VoIP systems forbye enhancing the existing security
systems.
The purpose of this research is to develop an IDPS using deep analysis adopting the
bottom-up approach to detect and remove illegitimate SIP INVITE messages entering a
VoIP system. Accordingly, this method is different from other existing methods since it
can work with an unsorted list of message information. The new model, which is
termed as the Deep Analysis Intruder Tracing (DAIT), has two objectives: firstly, to
A Novel Prevention Technique Using Deep Analysis Intruder Tracing 895
address flood issues by considering different kinds of behaviours, and secondly, an

attempt to reduce the false positive rate or improve the detection rate.
For remaining part of the paper, it starts with a review of the literature on IDPS
(Sect. 2). The research methodology and the walkthrough to devise the proposed model
are in Sect. 3. Section 4 discusses the results and performance analysis and finally, a
conclusion is been presented in the last section.
2 Literature Study
This section summarises the efforts conducted by the research community on flood
attacks to fight against intruders and on the attempt to mitigate network issues.
Among many threats discussed in the VOIPSA Taxonomy report [17] and the
Federal Communication Commission report [18], flooding attacks are still the most
difficult to mitigate, as it requires several settings at the level of the SIP server. Fur-
thermore, concerning single flood attacks, it is more diverse, and hence defense system
designers find it a challenge to defeat. Henceforth, it is the reason to be known as the
destructive impact DoS attack [19]. Using this attack type, illegitimate users target both
the user clients and the SIP server. At the infant time of VoIP systems, several
researchers used the recognized work conducted by Iancu [20] and later adopted by
other researchers [21, 22]. For a long time, this method has been cited as a standard
model and thereby implemented in SIP servers for business enterprises. The algorithm
is known for its simplicity and efficacy. It counts the number of messages irrespective
of their source addresses and forbids those messages to enter the system based on a
defined number within a defined timeframe. One drawback identified is that the
algorithm requires high processing operation, as it has to separate counters from each
source IP address.
Markl et al. [23] proposed a reliable security framework using a DoS flooding
preventing algorithm based on the Snort IDS [24]. This algorithm is again been
inspired by Iancu [20], where a single source flooding attack is detected by counting an
upper defined limit and, above this value, all messages are then denied. Similarly,
Ehlert et al. [11] proposed a laboratory-tested two-layer defense system to handle DoS
and DDoS flooding attacks. The double-level security architecture consists of a first-
line Bastion host to provide essential security checks against well-known TCP/IP-
related attacks and to detect and prevent SIP message flooding against the host. In the
second line of defense, security algorithms that provide specialized SIP-related security
features added enhanced the security of the proxy server.
Further to the above-mentioned distressful problem, the situation can deteriorate
more if attackers use scattered methods to bombard the target nodes. As per discussions
in many research studies (Akbar & Farooq [25] Armoogum & Mohamudally [26]), the
DDoS flooding attack, as specified in the VOIPSA and FCC reports [17, 18], is the
most significant threat to address. The DDoS concept of attack is the same as the simple
DoS attacks (single) but in a more complex setup. In this case, many sources attack a
server or SIP user(s) (Ahmad & Singh [1] & Chauhan et al. [27]), whereby mitigation
algorithms face difficulties in capturing the illegitimate messages. Indeed, attackers use
several tricky ways to penetrate a system and exhaust the internal nodes mainly due to
the cheap and easy implementation. Results from [1] confirmed that a detection rate of
95.5% was obtained. The destructive impact as mentioned by Hussain et al. [19] for a
single flood attack is then multiplied here. The authors considered algorithms that
catered for both single and distributed modes of attack using the INVITE messages.
Moreover, while developing a prevention model using the sorted galloping technique,
the authors in [6] supported the same ideas and argued that illegitimate users used
INVITE messages to extract the strategic features like IP addresses, and executed the
various behavioral patterns.
Lee et al. [28] presented a DoS flood attack detection algorithm based on statistics
and behavior of SIP traffic to detect low false negatives. The algorithm could also
recognize SPAM with a low false-positive using a caller behavior-based detection
algorithm. The testing was accomplished by generating enormous messages using an
automated machine. The statistics-based detection algorithm analyses the incoming
four message fields (IP, URI, Call-ID, and Method) to identify abnormal traffic.
A threshold value was determined for each detection rule which was further learned by
machine learning codes per hour of a day and for several days. Likewise, the pre-
vention system developed by Semerci et al. [15] could detect and monitor illegitimate
packets in real-time [15]. The authors proposed a system that used a smart algorithm to
separate genuine users from illegitimate ones and hence, the obtained fake messages
are then sent to the server for further pattern analysis.
A recent effort was conducted by a group of researchers to detect flooding attacks
using a mathematical linear Support Vector Machine with regularisation (i.e. l-SVM)
classifier [16]. The machine learning classifier was trained to mitigate flood attacks.
Finally, a new model for mitigation, known as extended-genetic algorithm prevention
system (e-GAP), was proposed by Armoogum and Mohamudally [29] by modifying an
existing genetic algorithm. Similar to the work in [13], the system is a two-tier
architecture, capable to detect and prevent single and distributed potential attacks on
the server. The list of attack messages was sent to a monitoring system for further
analysis [29]. The model could detect two unexpected behaviors of attackers. Results
showed that for low attack rates, the accuracy is 98.4% while for high attack rates, it
was 98.7%. Moreover, the false alarm rate was almost zero and a detection rate of
100% was obtained for attack rates between 20 messages per second (mps) and
500 mps.
The review indicates that the previous research on VoIP defense systems still has a
gap in the disparity on efficacy, effectiveness, and security measures. Although the
researchers used different techniques to mitigate attacks, there was a lack of bench-
marking in their results with other works. This current study is an additional work to
complement the above literature on different challenges confronted by organizations
against attackers since the introduction of VoIP. Our target hence is to propose a new
technique to keep the detection rate above 95.5%, obtained by Ahmed [1], or to
challenge the results attained in our previous study [29].
3 Methodology and Proposed Model
This section covers the research methods used for collecting the data and the testbed set
up for simulating attacks on a VoIP network. Finally, for collecting the data, a walk-
through towards the development of the Deep Analysis Intruder Tracing (DAIT) using
a Bottom-up Approach is explained.
3.1 Experimental Research Method

The experiment was performed for five days where each model was tested several times
a day. The live data were captured from various instances (legitimate and illegitimate
users, etc.) in a CSV by firewall for later analysis. Other messages were added to the
file using the information provided by Bad packets [30]. Additionally, messages were
created and added to the CSV file to make the dataset look real. Data were collected via
experiments using IP softphones during communication (real network traffic), using
SIP generators, or penetrating testing tools (fake messages), from the Bad Packets
database [30] and from the authors’ created data lists which had real and fake mes-
sages. Various attack rates were injected into the VoIP system. The proposed system is
depicted in Fig. 1.
Due to security reasons, the experiment was conducted in a private network,
whereby the IDPS and an application layer firewall run in one Intel Core i5 processor
4 GHz device. The MiniSIPServer, which can connect up to 500 for medium-sized
enterprises, was integrated to build the isolated-server, mounted in an i7 device of
3.3 GHz-frequency, supporting Gigabit Ethernet connection using an updated Linux
operating system. In addition, during the experiment, the authors used smartphones and
laptops to install Zoiper, LinPhone, and MiniSipPhone to launch call messages. An
attack tool was used to generate INVITE messages from different devices for 16 s.
Finally, a router of 20 MBps was used as a gateway to connect all backbone network
devices. SIP packets from internal or external networks could reach the server after
successfully passing a verification test. The firewall filtered the incoming packets using
the existing lists of quarantined illegitimate SIP packets/addresses and delivered them
in CSV format to the prevention system. Based on the configuration rules, the firewall
waits for the decision from the prevention system before taking any further action. The
prevention system scans the list to remove the fake messages from the legitimate ones.
Henceforth, the two lists (The quarantined list and the legitimate list) were returned to
the firewall for necessary action based on the configuration settings of rules. The
quarantine list is further appended to the existing list (log file) at the firewall for deeper
filtering. The firewall uses the legitimate list to allow genuine packets to communicate
with the SIP server.
The experiments were conducted by taking measurements every 500 ms for 16 s
(32 slots) for varying time frames, computed by the models. This experiment was
performed for six attack rates (20 mps to 500 mps). For our research design, we use the
repetitive process to validate the data collected. That is, the same dataset was injected
several times at different times to verify the reliability of the proposed prevention tool
(before-and-after, cross-sectional). To test the efficacy of the proposed model, different
populations of messages containing both fake and genuine messages were injected.
This experiment was conducted at different times as well. To improve the effectiveness,
the model was trained within a cycle of 32 window sizes to allow it to analyse the
number of false-negative and false-positive cases together with the list of non-genuine
messages entering the system.
3.2 The DAIT Novel Prevention Model

Searching a large dataset is continually a challenge when it comes to processing time,
effectiveness and efficiency. The accuracy and reliability of the system highly depend
on the method or technique adapted to conduct such an experiment. One efficient
approach to identify the accurate search component is a bottom-up approach on the
dataset as compared to the previous models proposed in this paper (e.g. sorted gal-
loping model [6]) which used a top-down approach. More explicitly, for example, for a
deep analysis security system using a top-down approach developed in [24], the
experiment is been performed on a sorted dataset hybrid with a search of each source IP
address on the same dataset for an interval of 500 ms each. The bottom-up approach
works in a tactic of distinguishing the final count of each source IP address in the initial
layer and processes backward for further analysis. In contrast to the top-down approach
for the sorted galloping model [6] that used a method of eliminating the attack mes-
sages at the initial level by the total source IP address count from an unsorted list, this
technique is better due to an improvement in time complexity. Deep analysis at dif-
ferent WindowSize, each with an interval of 500 ms was done in this approach to clip
the intruders and suspected intruders.
Fig. 1. Security testbed system.

This is a novel method to prevent intruders to access a system without authoriza-

tion. Intrusion Detection and Prevention System (IDPS) aims at detecting intruders to
the server and minimizes the attacks on the server. The DAIT prevention model
introduces a bottom-Up approach to process the data by computing the total legitimate
count of Source IP addresses sent by the firewall in the initial layer and process to
classify the identified Source IPs as genuine users or intruders. To start with, the dataset
sends by the firewall in CSV format is been cross verified with the existing Source IP
addresses list from the IDPS. This method avoids any anomaly filtering at the level of
the firewall. To create the dataset, features of the INVITE messages are extracted and
the resultant dataset list is then cleansed on the features Source, Time, Destination,
Length, Info, To, From, CSeq, Call-ID, Max-Forwards, Via, and Contact and further
sent to the IDPS in CSV format. Indeed, the number of accepted INVITE messages
from one source IP address depends on the WindowSize with the following conditions
shown in Table 1 set by the firewall:
Table 1. Firewall conditions for WindowSize.

WindowSize Legitimate source IP WindowSize Legitimate source IP
(ms) count (ms) count
<500 ms 1 <16000 ms 6
<1000 ms 2 <20000 ms 7
<2000 ms 3 <24000 ms 8
<4000 ms 4 <28000 ms 9
<8000 ms 5 <320000 ms 10
A two-dimensional list is created from the original dataset using the features
Source, Destination, and Info. The resultant dataset is then processed to identify the
unique source IP addresses of the dataset and its count of occurrence in the same
dataset. The identified source IP addresses and their count are then added to the dataset.
If the count matches the WindowSize condition, then the respective element from the
list is then added to the legitimate list and otherwise, the other elements are been added
to the quarantine list. The unique list of Source IP addresses from the quarantine list is
then sent to the firewall for further filtering and the list of internal attackers (maybe
zombies) is sent to the administrator for actions to be taken. A deep analysis is done on
the legitimate list for identifying false positives and false negatives. The resultant data
are further analysed using the Statistical Package for the Social Sciences (SPSS) tool.
The proposed prevention tool was been assessed using nine parameters of importance
which are calculated using the False Positive (FP), the False Negative (FN), the True
Positive (TP), and the True Negative (TN) values. These four variables were collected
during the 32 WindowSizes, each in an interval of 500 ms each. Except for the
measurement of the memory usage and the processor utilization, the remaining seven
KPIs were computed as explained:
a) True Positive (TP): a situation when an IDPS detects an illegitimate message and
alarms the Monitoring and Analysing Station (MAS) and the firewall. The obtained
(if ever) message is quarantined.
b) False Positive (FP): a situation when an IDPS will trigger an alarm to the MAS and
the firewall. In this case, a genuine message is quarantined.
c) True Negative (TN): a situation when an IDPS does not alarm since there are no
illegitimate messages passing the system.
d) False Negative (FN): a situation when an IDPS could not detect an illegitimate
message penetrating the system and hence no action is taken.
Therefore, to simplify the performance analysis for the mitigation mechanisms, the
following parameters are been taken into consideration:
(i) The System Recall or System Sensitivity, measures the percentage of actual
positives that are correctly been identified by the IDPS. A high Sensitivity means
high accuracy of the IDPS.
(ii) The System Specificity, measures the percentage of actual negatives that are
correctly been identified by the system. A high Specificity corresponds to high
accuracy of the IDPS.
(iii) The System Precision, determines the performance of the statistical measure.
(iv) The System F-measure score confers the accuracy of the system where a score
near to one converges to perfect precision IDPS and a score near to zero con-
verges to a weak IDPS.
(v) The false Positive Rate also termed as the False Alarm Rate determines whether
the IDPS will falsely detect legitimate messages as attackers.
(vi) The System Accuracy measures the ratio for which the IDPS correctly achieves
predictions to the total number of cases examined.
(vii) The Attack Detection Rate measures the total number of illegitimate messages
detected by the IDPS to the total number of illegitimate messages injected into
the IDPS for testing.
The data in Table 2 and results in Table 3 and Fig. 2 shows that there is a small
variation between the lower attack rates and the highest attack rate. It is also observed
that for an average attack rate of 80 mps, the IDPS consumes a slightly higher memory
than for other experiments conducted. However, findings demonstrate that the DAIT
model uses on average 50.3 bytes, which is a very small amount of memory to process
a very small and high rate of flood attacks.
Table 2. Attack scenario

Attack rates Number of genuine Number of attacks Total number of
(mps) messages injected messages
20 28 320 348
40 28 640 668
80 29 1280 1309
200 30 3,200 3,230
350 29 5,600 5,629
500 30 8,000 8,030
Table 3. Performance of the proposed deep analysis intruder tracing method

Parameters Attack rate (mps)
20 40 80 200 350 500
Memory usage (B) 49.76 49.76 50.42 50.23 50.05 49.96
Processing time (ms) 734.38 656.25 765.62 750.00 761.72 765.62
Sensitivity (%) 98.00 98.00 98.00 98.00 98.00 98.00
Specificity (%) 100.00 100.00 100.00 100.00 100.00 100.00
Precision (%) 100.00 100.00 100.00 100.00 100.00 100.00
F-measure (%) 98.40 99.00 99.00 99.00 99.00 99.00
False alarm rate (%) 0.00 00.00 00.00 00.00 00.00 00.00
Accuracy (%) 98.44 98.73 98.75 98.75 98.74 98.74
Detection rate (%) 100.00 100.00 100.00 100.00 100.00 100.00
Fig. 2. Analysis of memory consumption
Regarding the CPU utilization, except for the attack rate of 40 mps, the time taken
to process messages is above 740 ms (Fig. 3). The mean value of 0.73 s indicates that
it takes very few times to filter out illegitimate messages from a pool of messages. The
accuracy of the method is of a mean value of 98.7%. Finally, it was been observed that
the model gives a false alarm rate of zero and a 100% detection rate.
Fig. 3. Analysis of CPU utilization
The results of this study are compared with recent work conducted by Ahmed & Ali
[1], Nazih et al. [16] and a recent study in [6, 29]. They argue that based on the existing
literature, reports and publications indicate that most mitigation tools are inefficient
since their detection rates are less than 95.5% and false-positive alarm of around 1.8%.
This study shows that the DAIT is better in terms of efficacy, accuracy, false-positive
rate and detection rate.
5 Conclusion
An efficient deep analysis mechanism using a bottom-up approach is introduced to

mitigate attacks on VoIP systems. The novel approach operates differently compared to
other machine learning or deep learning algorithmic methods available in the literature.
This IDPS can distinguish the final count of each source IP address in the initial layer
and then processes it backward for further analysis. A deep analysis of different
window sizes for a period of time is carried out in this approach to pin the intruders and
even suspected attackers. The model is tested in an isolated network due to security
reasons and simulation is performed using various metrics. Result analysis clearly
shows that the DAIT prevention model improves reliable voice transmission. This
model has a better detection rate as compared to the latest work conducted.
In our future work, the model will further be using higher attack rates. We intend to
modify the algorithm by using the Object-Oriented strategies whereby various DAIT
objects will be created to address chunks of INVITE messages, instead of one algo-
rithm addressing the entire list. In this way, we assume that the memory consumption
and the processing time might be further reduced.
References
1. Ahmad, W., Singh, D.: VoIP Security: A Model Proposed to Mitigate DDoS Attacks on SIP
Based VoIP Network. A Multi-Disciplinary Research Book, pp. 37–48 (2018)
2. Chen, Y., Hwang, K.: Collaborative change detection of DDoS attacks on community and
ISP networks. In: Proceedings of IEEE International Symposium on Collaboration
Technologies and Systems (2006)
3. Azad, A.M., Morla, R., Salah, K.: Systems and methods for SPIT detection in VoIP: survey
and future directions. J. Comput. Secur. 77, 1–20 (2018)
4. Global Insights: Insights to Innovation. https://www.gminsights.com/industry-analysis/
voice-over-internet-protocol-voip-market (2021). Accessed 27 May 2021
5. Bhutani, A., Wadhwani, P.: Voice over Internet Protocol (VoIP) Market Size By Type
(Integrated Access/Session Initiation Protoc. Selbyville (2019)
6. Armoogum, S., Mohamudally, N.: Sorted Galloping prevention mechanisms against denial
of service attacks in SIP-based systems. In: Panigrahi, C.R., Pati, B., Pattanayak, B.K.,
Amic, S., Li, K.-C. (eds.) Progress in Advanced Computing and Intelligent Engineering.
AISC, vol. 1299, pp. 571–583. Springer, Singapore (2021). https://doi.org/10.1007/978-981-
33-4299-6_47
7. Tech.Co Ltd: VoIP Statistics That Prove the Importance of the Business Tech. https://tech.
co/business-phone-systems/voip-statistics (2021). Accessed 15 May 2021
8. Gupta, B.B.: Predicting number of zombies in DDoS attacks using pace regression model.
J. Comput. Inf. Technol. 20(1), 33–39 (2012)
9. Xin, L., Hu, L., Hongbin, L., Xiongwei, X.: Distributed intrusion prevention system for SIP
DDoS attack. J. Chin. Comput. Syst. 34, 2095–2099 (2013)
10. Guo, S., Ran, L., Jing, Y.W.: SIP Flood Attack Detection Method Based on Convolution
Neural. In: Proceedings of IEEE International Computers, Signals and Systems Conference
(ICOMSSC) (2018)
11. Ehlert, S., Zhang, G., Geneiatakis, D., Kambourakis, G., Dagiuklas, T.: Two layer denial of
service prevention on SIP VoIP infrastructure. J. Comput. Commun. 31, 2443–2456 (2008)
12. Lahmadi, A., Festor, O.: A framework for automated exploit prevention from known
vulnerabilities in voice over IP services. IEEE Trans. Netw. Serv. Manage. 9, 114–127
(2012)
13. Tang, J., Cheng, Y., Yong, H.: Detection and prevention of SIP flooding attacks in voice
over IP networks. In: Proceedings of IEEE INFOCOM, pp 1161–1169 (2012)
14. Azeez, N.A., Bada, T.M., Misra, S., Adewumi, A., Van der Vyver, C., Ahuja, R.: Intrusion
detection and prevention systems: an updated review. In: Sharma, N., Chakrabarti, A., Balas,
V.E. (eds.) Data Management, Analytics and Innovation: Proceedings of ICDMAI 2019,
Volume 1, pp. 685–696. Springer Singapore, Singapore (2020). https://doi.org/10.1007/978-
981-32-9949-8_48
15. Semerci, M., Cemgil, A.T., Sankur, B.: An intelligent cybersecurity system against DDoS
attacks in SIP networks. J. Comput. Netw. 136, 137–154 (2018)
16. Waleed, N., Yasser, H., Wail, E., Tamer, A., Hossam, F.: Efficient detection of attacks in SIP
based VoIP networks using linear L1-SVM classifier. Int. J. Comput. Commun. Control 14
(4), 518–529 (2019)
17. VOIPSA: VoIP Security and Privacy Threat Taxonomy (2005)
18. Federal Communication Commission, Communications Security, Reliability, and Interop-
erability Reports. https://www.fcc.gov/CSRICReports (2021). Accessed 17 May 2021
19. Hussain, I., Djahel, S., Zhang, Z., Naït-Abdesselam, F.: A comprehensive study of flooding
attack consequences and countermeasures in the Session Initiation Protocol (SIP). In:
Security and Communication Networks, vol. 8, no. 18, pp. 4436–4451. ACM & Wiley
(2015)
20. Lancu, B., Babu, M.: SER PIKE excessive traffic monitoring module. http://www.iptel.org/
ser/doc/modules/pike (2003). Accessed 2021
21. Bouzida, Y., Mangin, C.: A framework for detecting anomalies in VoIP networks, pp. 204–
211 (2008)
22. Gaston, O., Nagpal, S., Eilon, Y., Henning, S.: Secure SIP: a scalable prevention mechanism
for dos attacks on sip based voip systems. In: Principles Systems and Applications of IP
Telecommunications, pp. 107–132. Springer, Berlin, Heidelberg (2008)
23. Markl, J., Sisalem, D., Ehlert, S., Geneiatakis, D., Kambourakis, G., Dagiuklas, T.: General
reliability and security framework for VoIP infrastructures. SNOCER-D2.2 (2005)
24. Roesch, M.: Snort – lightweight intrusion detection for networks. In: 13th USENIX Large
Installation System Administration Conference (1999)
25. Ali, A.M., Farooq, M.: Application of evolutionary algorithms in detection of SIP-based
flooding attacks. In: Annual Conference on Genetic and Evolutionary Computation,
pp. 1419–1426. ACM (2009)
26. Armoogum, S., Mohamudally, N.: Survey of Practical Security Frameworks for Defend-
ing SIP Based VoIP Systems against DoS/DDoS Attacks. IEEE Xplore Digital Library
(2014)
27. Chauhan, A., Mahajan, N., Kumar, H., Kaushal, S.: Analysis of DDoS attacks in
heterogeneous VoIP networks: a survey. Int. J. Innovative Technol. Exploring Eng. 8(6),
242–246 (2019)
28. Lee, J., Cho, K., Lee, C.Y., Kim, S.: VoIP-aware network attack detection based on statistics
and behavior of SIP traffic. Peer-to-Peer Netw. Appl. 8, 872–880 (2014)
29. Armoogum, S., Mohamudally, N.: An extended genetic algorithm-based prevention system
against DoS/DDoS flood attacks in VoIP systems. In: Panigrahi, C.R., Pati, B., Pattanayak,
B.K., Amic, S., Li, K.-C. (eds.) Progress in Advanced Computing and Intelligent
Engineering. AISC, vol. 1299, pp. 301–312. Springer, Singapore (2021). https://doi.org/
10.1007/978-981-33-4299-6_25
30. Bad Packets: Meaningful Intelligence for an Evolving Cybersecurity Landscape. https://
badpackets.net/ (2020). Retrieved 4 June 2017
Data Mining for Software Engineering:
A Survey
Maisha Maimuna1, Nafiza Rahman1, Razu Ahmed2,

and Mohammad Shamsul Arefin1(&)
1
Department of Computer Science and Engineering, Chittagong University of
Engineering and Technology, Chattogram 4349, Bangladesh
sarefin@cuet.ac.bd
2
Department of Electronic and Telecommunication Engineering, International
Islamic University Chittagong, Chattogram 4318, Bangladesh
razu17@iiuc.ac.bd
Abstract. In this present world of technology, software engineering is needed

almost in every industry, institution etc. On the other hand, data mining pro-
cesses raw data to obtain useful information. By implementing data mining in
software engineering, software quality and productivity can be improved. This
paper examines this fascinating and still advancing research area, so that readers
can easily get an elaborate outline. We review in detail existing techniques of
data mining for software engineering research and provide a comparative
evaluation.
Keywords: Software engineering Data mining
1 Introduction
Software engineering is one of the buzzwords in the present world. It is required in

almost every industry for more efficient, quicker and easier processing. Over these
years researchers are trying hard to improve the quality of processing of software and
encountering many difficulties. In today’s world the engineers have to deal with larger,
more complex dataset. These datasets are so voluminous that traditional software
engineering algorithms cannot handle them efficiently. For solving these problems, data
mining is introduced in the field of software engineering. Data mining can turn large
data into useful information, can find patterns in data and can handle errors very
efficiently. This paper reviews research works that implemented data mining techniques
in software engineering approaches. A variety of algorithms are reviewed that include
software defect prediction [2, 4, 6, 8, 10, 12, 14], software pattern recognition [11, 13,
15], software development [5, 7] using data mining algorithms. In [2], a classification
algorithm based on Ant Colony Optimization is introduced to detect fault prone soft-
ware modules. The other algorithm for software defect prediction include Hierarchical
Clustering, K-means Clustering and Kohonen’s Neural Network [4], AdaBoost.NC [8],
Multilayer Perceptron Neural Network [10], Graph based Semi-supervised Learning
[12], Ensemble Oversampling [14]. The reviewed algorithms for software pattern
recognition include Recurrent Neural network and decision Tree [11], Rule Learning

https://doi.org/10.1007/978-3-030-93247-3_86
906 M. Maimuna et al.
on Metric based Data [15]. Other types of algorithms for software development
comprise SVM with Tabu Search [5] and API Usage Mining [7]. Total 15 papers are
surveyed which are chronicled in Table 1. Major contribution, dataset description and
implementation and evaluation of each paper are described briefly.
The organization of this paper is as demonstrated: Sect. 2 describes the prelimi-
naries of software mining, Sect. 3 discusses the related works which comprise of
contribution, dataset and implementation and evaluation and lastly Sect. 4 gives a brief
conclusion.
Table 1. Evolution of Data Mining in Software Engineering research

2006–2008 2009–2011 2012–2014 2015–2016 2017–2019
2006 2010 Software 2012 Software 2014 Software 2017 Software
Software defect defect defect prediction defect
failure prediction prediction using multilayer prediction
probability using using multi- neural network based on class-
predictor [1] clustering and classifier [10] association
neural network modeling [6] rules [13]
[4]
2007 2011 Software 2013 Software 2016 Software 2018 Software
Software development development pattern detection defect
mining using using SVM using API using recurrent prediction
ant colony with Tabu usage mining neural network using ensemble
optimization search [5] [7] and decision tree oversampling
[2] [11] model [14]
2008 2013 Software 2016 Software 2018 Software
Software defect defect prediction pattern
engineering prediction using graph based detection using
using using semi-supervised rule learning
ontological AdaBoost.NC learning [12] [15]
text mining [8]
[3]
2014 Software
vulnerability
detection using
probabilistic
rule classifier
[9]
Data Mining for Software Engineering: A Survey 907
2 Preliminaries
2.1 Definition
Software Mining incorporates data mining techniques to explore useful knowledge
from big and raw datasets for software pattern recognition, defect prediction and the
ultimate software development. There are many algorithms for this purpose like
instance classification, clustering, association rule mining, prediction, pattern detection
etc.
2.2 Applications
Software development comprises of different stages such as requirement specifications,
finding design patterns, analyzing source codes etc. Each stage is complex and requires
voluminous data. The data mining algorithms make these steps less multiplex by
minimizing human interventions and enabling the discovery of useful patterns and
knowledge from software engineering data.
2.3 Challenges
In recent years, employing data mining algorithms for software engineering has
evolved to a great extent. These introduced some challenges also. The main challenge
includes the maintainability and preprocessing of such huge raw data. The success of
any algorithm greatly depends on the proper preprocessing of data. The domain and
background knowledge, constraints etc. should be integrated properly in this process.
3 Review Details
In this section, we will discuss about the various data mining techniques implemented
for software defect prediction, pattern recognition etc. Total 15 papers are reviewed in
this regard which include the major contributions, dataset properties and implemen-
tation and evaluation techniques for each of these papers.
3.1 Software Failure Probability Predictor [1]
Contributions. The proposed method in this paper identifies those pieces of software
that tend to fail most. The major contributions of this paper include developing failure
predictors for post-release defects. For this purpose the predictors look over the object-
oriented metrics. Moreover, it examines if predictors derived from one project can be
applied to other ones.
Dataset. The projects used in this paper have large team size of 250 engineers as well
as large user base. For example, DirectX, a segment of Windows operating system, has
600 million users approximately.
Implementation and Evaluation. In this paper an algorithm is developed to deter-

mine failure-prone software sources by pieces used from the past. Bug databases
contain this type of information. The implementation process consists of collecting
input data, mapping post-release failures in entities to defects and predicting failure
probability for new entities. The authors used regression model for evaluation process.
R2, adjusted R2 and F-test were used for measuring prediction accuracy.
3.2 Software Mining Using Ant Colony Optimization [2]
Contributions. In this paper, a classification technique AntMiner+ based on Ant

Colony is developed to predict faulty software modules. The predictive accuracy of
AntMiner+ algorithm is more greater than other classification models like logistic
regression, C4.5 and support vector machine. Moreover, its intuitiveness and com-
prehensibility are higher than the latter models.
Dataset. To implement AntMiner+ algorithm three publicly accessible datasets of
NASA software projects3 [16] are used. These are PC1, PC4 and KC1. The first two
projects are coded in C but the latter one is coded in C++. PC1, PC4 and KC1 contain
40000, 36000 and 43000 lines of code (LOC) respectively. Both PC1 and PC4 are
flight software projects intended for an earth orbiting satellite whereas KC1 is a soft-
ware project used as a subsystem for a large ground control system.
Implementation and Evaluation. The original datasets are split up into training set
and testing set in a stratified proportion of 70%/30%. For validation purpose one third
of the training dataset is used. During implementation, data preprocessing is imple-
mented for input selection, discretization and oversampling. The results of AntMiner+
are compared with other commonly used classification algorithms – RIPPER, C4.5,
logistic regression, 1-nearest neighbor and support vector machine. C4.5 extracts an
unordered rule set with a confidence factor of 0.25 while AntMiner+ extracts an
ordered rule set.
3.3 Software Engineering Using Ontological Text Mining [3]
Contributions. In this research, a text mining algorithm is proposed to analyze the

software documents at semantic level. This paper presents an advanced strategy that
derives knowledge from the analysis of source code to implement the knowledge in
practical use. Using ontology queries and automated reasoning the software engineers
can derive applicative knowledge from various resources.
Dataset. This ontological text mining system proposed in this paper uses the
component-based GATE (General Architecture for Text Engineering) framework. This
system utilizes the GATE’s standard tools and custom components which is developed
particularly for mining software text. Moreover, information is extracted from source
code and document to be used in source code ontology and document ontology
respectively.
Implementation and Evaluation. In this paper, the authors developed ontology based
program for utilizing the semantic and structural information in various software
artifacts. The implementation steps are – preprocessing, ontology initialization, name
entity detection, coreference resolution, normalization, relation detection and ontology
export. The researchers evaluated the developed approach on two groups of texts. The
first one consists of five documents containing 7743 words from the Java 1.5 docu-
mentation and the second one consists set of seven documents having 3656 words from
the documentation of the uDig geographic information system (GIS).
3.4 Software Defect Prediction Using Clustering and Neural Network [4]
Contributions. The major contribution of this research is that the authors performed
clustering on software projects for identifying project groups that hold similar software
defect characteristics. Hierarchical, k-means clustering and Kohonen’s neural network
algorithms are implemented for this purpose. The derived clusters are evaluated with
discriminant analysis.
Dataset. Mainly 27 released versions of 6 software projects were used as dataset. Five
of the projects were custom build solutions and installed successfully in the customer
environment. The 6th project was developed for software quality assurances. In
addition, 17 academic software projects were also studied.
Implementation and Evaluation. In implementation, hierarchical and k-means
clustering were used in this research to obtain the project clusters. Kohonen’s neural
network was also developed with different number of the output neurons. A defect
prediction model was created for each of the identified cluster. As training set the
researchers used a general defect prediction model by using data of all the mentioned
released projects. The obtained results of this research are not satisfactory as the
identified clusters are outlying i.e. do not cover all software projects.
3.5 Software Development Using SVM with Tabu Search [5]
Contributions. In this paper a method is proposed for software development effort

estimation. The researchers obtained the parameters of Support Vector Regression
(SVR) algorithm and the kernel function RBF by designing Tabu Search (TS). The
optimal SVM parameters were derived successfully by using TS. The combined TS and
SVR algorithm notably surpassed the other algorithms.
Dataset. In this research, different types of public datasets were used from PROMISE
repository [17] and Tukutuku database. The PROMISE included 13 datasets from
single as well as cross-companies and the Tukutuku datasets included 13 Web projects.
Implementation and Evaluation. The algorithm has two main steps: SVR parameter
selection, effort estimation. In the first step, datasets of past projects were taken as input
of Tabu search. In the second step SVR model was implemented [18]. The model had
two inputs: data on a new software project and the output of TS. Finally we obtain the
effort estimation. To examine the effectiveness of the proposed effort prediction

algorithm, 10-fold cross-validation was performed in a total of 21 datasets. The pro-
posed SVR+TS approach performed remarkably better than other existing ones like
SVR with random configuration, Grid-search, Case-based reasoning (CBR) etc.
3.6 Software Defect Prediction Using Multi-classifier Modeling [6]
Contributions. This study presents an advanced method to predict faulty software

modules by addressing the binary class-imbalance problem. At first the skewed data
was removed by converting class-imbalance into a multi-classification. In addition, for
solving the multi-classification problem three different coding schemes are imple-
mented. One of the major contributions of this paper is that the proposed method can
handle highly imbalanced data effectively.
Dataset. The datasets for this study are 14 publicly available NASA datasets that are
used widely for predicting software defects. Each dataset includes the number of
defects in software modules and the static code metrics such as LOC (lines of code)
counts, Halstead attributes, McCabe complexity measures etc. In addition, the maxi-
mum and minimum numbers of total software components of these 14 datasets are
17186 and 125 respectively.
Implementation and Evaluation. The researchers used four different classification
algorithms: Random Forest, C4.5, Ripper and Naive Bayes in this study. Three coding
schemes including 1-against-1, 1-against-all and Random Correction Code were
implemented to create binary-class data. The proposed method was compared with
some existing methods for instance, sampling, bagging, boosting and cost sensitive
learning. It significantly outperforms random under-sampling, random over-sampling
synthetic minority over-sampling, cost sensitive learning and boosting methods and
attain similar results to bagging method.
3.7 Software Development Using API Usage Mining [7]
Contributions. The researchers proposed two standard metrics (succinctness and

coverage) for mining usage patterns of Application Programming Interface
(API) methods. In addition, an inventive approach named Usage Pattern Miner (UP-
Miner) was also developed that mines succinct and high-coverage API usage patterns
from source code.
Dataset. The large Microsoft codebase is used as the dataset in this paper. 20 widely
used.NET API methods are also selected which are used in enterprise applications and
online service systems in present time.
Implementation and Evaluation. The model is implemented in three main steps:
mining frequent API usage patterns, determination of the optimal number of patterns
and tool implementation. This approach is evaluated on Microsoft codebase. The
results showed in paper indicate that the proposed algorithm performs well than the
existing method MAPO. The proposed method is proven to be effective in practice as

confirmed by user studies supervised by Microsoft developers.
3.8 Software Defect Prediction Using AdaBoost.NC [8]
Contributions. In this paper, the researchers showed how class imbalance learning
methods can find better solutions in the field of software defect prediction. Different
class imbalance learning methods for instance, threshold moving, ensemble algorithms
and resampling techniques were explored. Among the methods, AdaBoost.NC had the
highest performance in respect of the measures like G-mean, balance, and Area Under
the Curve (AUC). Moreover, a dynamic version of AdaBoost.NC was proposed for the
improved performance. This improved version has an advantage that it updates its
parameters automatically during training period and found to be more efficient and
effective than the initial algorithm.
Dataset. In this paper ten SDP datasets were used that can be easily obtained from
PROMISE [19] repository. Moreover, the datasets have defective modules ranging
from 6.94% to 32.29%
Implementation and Evaluation. For implementation, 10-fold split was applied on
the dataset among which 9-fold was applied in training data and 1-fold was applied in
testing data. From the training data performance report was obtained. To evaluate the
proposed method total five performance metrics including PD, PF, balance, G-mean,
and AUC were used. The results illustrates that AdaBoost.NC has the highest per-
formance altogether. The proposed effective version of AdaBoost.NC has the advan-
tages of reduced training time.
3.9 Software Vulnerability Detection Using Probabilistic Rule Classifier

[9]
Contributions. In recent time many software systems are developed using application
development frameworks (ADF). The authors used this information to develop an
approach in this study. This allows researchers to raise the abstraction level for
detecting software vulnerabilities. The prime contribution of this paper is that the
researchers developed a probabilistic rule classifier to efficiently find security vulner-
abilities. Moreover, the advantages of ADF in security vulnerability are illustrated
experimentally.
Dataset. This paper uses open-source software repositories for dataset. These includes
more than 7 million lines of code. Moreover, the researchers used Java and Android
applications for evaluating their proposed approach.
Implementation and Evaluation. The proposed method was implemented in two
main steps: a) rule ranks and b) efficient vulnerability analysis. In step (a) static analysis
tool set takes two inputs – categorized repository and vulnerability decision rules. The
output of this step is rule ranking which is given as input in the rule selector. Then from
step (b) we obtain detected vulnerabilities. The efficiency of static analysis for Android
apps was improved by 68%, while maintaining the rate of vulnerability detection at
100%. But for plain Java applications the results were not as significant as the former
one as plain Java applications do not require any ADF. This resulted in the overall
efficiency improvement of 37%, while the rate of vulnerability detection was 96%.
3.10 Software Defect Prediction Using Multilayer Neural Network [10]
Contributions. In this study, an advanced Multilayer Perceptron Neural Network is

proposed for predicting software defects. It is based on machine learning techniques.
The researchers claimed that their proposed MLP neural network (MLP-NN) algorithm
is the most efficient one for predicting software defects.
Dataset. KC1 data set is used in this research which is collected from the NASA’s
Metric Data Program (MDP) data repository. KC1 includes logical groups of computer
software components (CSCs) containing 43,000 LOC (lines of code) written in C++.
The dataset is made up of 2,107 instances (modules) among which 1,782 have no faults
while 325 have one or more faults.
Implementation and Evaluation. The proposed MLP-NN model is a modified ver-
sion of the existing one. It introduces bell activation function in the hidden layer. It’s
first hidden layer is a tanh activation function while the second one is bell activation
function. In evaluation, the researchers used 10-fold cross-validation for generating the
training sets and testing sets. 65% of the overall data were used as the training set and
the remaining 35% as the test set. The results illustrates that the proposed MLP-NN
model has greater classification accuracy (98.22%) than Random Tree (94.55), Logistic
Regression (95.67%), CART (96.79%) and existing MLP-NN model (94.28%).
3.11 Software Pattern Detection Using Recurrent Neural Network

and Decision Tree [11]
Contributions. In this research, software pattern recognition is implemented using

machine learning algorithms like Layer Recurrent Neural Network (LRNN) and
Decision Tree. The higher accuracy of the proposed method is the result of prepro-
cessing of dataset for cutting down the candidate patterns. The learning models
developed in this study also improve the classification parameters for the recognition of
design patterns.
Dataset. In order to perform the detection of software patterns, total 67 software
metrics were extracted using Jbuilder tool. Source code of JHotDraw is given as input
to pattern detection tools for extracting pattern instances.
Implementation and Evaluation. After preparing dataset the researchers imple-
mented the proposed method. The steps are – recognition of software design pattern,
learning of metric-based feature vector, design pattern recognition process and vali-
dation for result conformance. For evaluation, the authors used a pattern repository
P-MARt which contains pattern instances for 9 open source softwares like JHotDraw.
In this study JhotDraw_7.0.6 is used as it is enriched with classes. Moreover, for
measuring accuracy of the detected patterns precision, recall and F- were used. The test
accuracy of LRNN was 100% while decision tree showed test accuracy of 97.7% for a
particular software pattern.
3.12 Software Defect Prediction Using Graph Based Semi-supervised

Learning [12]
Contributions. In this study, an inventive algorithm is proposed for predicting soft-

ware defects. It implements label propagation based semi supervised learning. The
authors are the one to instigate the sparse representation technique in this research area.
For improving the prediction ability, a nonnegative sparse graph based label propa-
gation (NSGLP) method is designed. NSGLP makes use of the Laplacian score sam-
pling technique to develop a class balance labeled training dataset.
Dataset. In this experiment, the authors took ten datasets as the test data from NASA
Metrics Data Program (MDP). NASA benchmark datasets were used extensively as
they are publicly available.
Implementation and Evaluation. At first the researchers constructed the dataset by
using Laplacian score sampling technique. Then they identified clusters by imple-
menting a nonnegative sparse algorithm which computed the nonnegative sparse
weights of a relationship graph. At the end, a label propagation approach was imple-
mented on the nonnegative sparse graph to predict the labels of unlabeled software
modules. For evaluation, true positive (TP), false negative (FN), false positive (FP),
and true negative (TN) are used to compare performance with other methods. The
evaluations on 10 NASA datasets illustrate that the proposed NSGLP method has a
better performance.
3.13 Software Defect Prediction Using Class – Association Rules [13]
Contributions. This paper introduces an inventive approach for identifying defective

softwares by using Class-Association Rules (SDP-CAR). The performance of SDP-
CAR is satisfying compared to other existing algorithms but more work should be done
to improve the performance difference.
Dataset. Four NASA datasets from PROMISE repository were used in this experiment
namely – CM1, KC1, KC2 and PC1. All the four datasets were written in C or C++.
Implementation and Evaluation. The proposed algorithm is implemented in three
main steps – data preprocessing, predictor development and performance evaluation.
Data preprocessing is divided into three basic steps – normalization, partitioning and
discretization. Similarly predictor development also has three sub steps – rule gener-
ation, rule pruning and rule scoring. For evaluation, the authors used accuracy, AUC,
sensitivity and specificity. The proposed algorithm is compared with C4.5, NB, Ripper
and OneR algorithms and showed best results for CM1, PC1 and AR datasets.
3.14 Software Defect Prediction Using Ensemble Oversampling Model

[14]
Contributions. An ensemble model for predicting software defect is proposed in this

paper which takes into account the class imbalance problem in practical software
datasets. This model can help to restore imbalanced data by integrating some other
methods like Random Oversampling, Majority Weighed Minority Oversampling
Technique and Fuzzy-Based Feature and Instance Recovery into a combined approach.
Dataset. The datasets used in this research were collected from the PROMISE
repository software engineering databases [20]. The minimum imbalance ratio is 3.50
indicating slightly imbalanced data while the maximum ratio is 45.56 indicating highly
imbalanced data. Moreover, the samples varies from 36 to 17,186 for the smallest and
the largest datasets respectively.
Implementation and Evaluation. A hybrid of oversampling techniques is proposed
in this paper which is basically implemented in two steps: Multiple Oversampling and
Generating Training Data for Ensemble Classifier, Training the Individual-Based
Learner for Ensemble Classification. The proposed ensemble approach is compared
with the existing ones including original datasets without sampling (ORI), Random
Over sampling (ROS), FIDos, MWMOTE/MWM and fuzzy information decomposi-
tion based sampling technique. The results illustrate that the proposed approach can
eradicate the false negative (FN) rate compared to the mentioned existing approaches.
Moreover, it can detect the faulty software components more precisely.
3.15 Software Pattern Detection Using Rule Learning [15]
Contributions. In this study, software design patterns are detected by using software
metrics and classification-based techniques. The major contribution of this paper is that,
it aims to overcome the problems with variants by developing a method that maps the
software pattern detection process into a learning process.
Dataset. Three open source projects were used in this paper – QuickUML, JHotDraw
and JUnit having sizes of 3956 269, 79 269, and 64 269 respectively. The
ratio of training and testing dataset is 80%/20%. This partition ratio contributed in
achieving greater prediction accuracy. Moreover, 67 object-oriented metrics are used in
developing the metrics-based dataset.
Implementation and Evaluation. The study in this paper is conducted in two main
steps: creation of metrics-oriented dataset and detection of software design patterns.
The first step has three sub steps: software pattern definition, identifying pattern par-
ticipants and formation of dataset. The second step has three sub steps: preprocessing
of dataset, rule learning on metrics-based data and design pattern mining. For
evaluation, the authors applied 10-fold cross validation into 70% of training sets and
30% of validation sets. A comparative analysis was presented in this paper where the
proposed approach perform better than other two previously developed methods. It is
also discovered that the proposed approach had notably better precision, recall, and F-
measure outputs.
4 Conclusion
Less than a decade ago, the developers began to pay rising attention in developing
software quality using data mining. The popularity of this field is mainly due to its
ability to identify software patterns, defect software faults etc. in a large database. This
paper provides a survey of existing approaches for implementing data mining algo-
rithms in software engineering. We outlined the major contributions, dataset properties
and implementation and evaluation for total 15 papers. The surveyed techniques will
give the researchers a clear direction regarding data mining approaches for software
detect detection, software pattern recognition and software development.
References
1. Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: 28th
International Conference on Software Engineering, pp. 452–461. Assosiation for Computing
Machinary, New York (2006)
2. Vandecruys, O., Martens, D., Baesens, B., Mue, C., Backer, M.D., Haesen, R.: Mining
software repositories for comprehensible software fault prediction models. J. Syst. Softw. 81
(5), 832–839 (2008)
3. Witte, R., Li, Q., Zhang, Y., Rilling, J.: Text mining and software engineering: an integrated
source code and document analysis approach. J. Eng. 2(1), 3–16 (2008)
4. Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to
defect prediction. In: 6th International Conference on Predictive Models in Software
Engineering, pp. 1–10 (2010)
5. Corazza, A., Martino, S.D., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: Using Tabu
search to configure support vector regression for effort estimation. Empir. Software Eng. 18,
506–546 (2011)
6. Sun, Z., Song, Q., Zhu, X.: Using coding based ensemble learning to improve software
defect prediction. IEEE Trans. Syst. Man Cybern. 42(6), 1806–1817 (2012)
7. Wang, J., Dang, Y., Zhang, H., Chen, K., Xie, T., Zhang, D.: Mining succinct and high-
coverage API usage patterns from source code. In: 10th Working Conference on Mining
Software Repositories, pp. 319–328. San Francisco (2013)
8. Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE
Trans. Reliab. 62(2), 434–443 (2013)
9. Sadeghi, A., Esfahani, N., Malek, S.: Mining the categorized software repositories to
improve the analysis of security vulnerabilities. In: Gnesi, S., Rensink, A. (eds.) FASE 2014.
LNCS, vol. 8411, pp. 155–169. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-
642-54804-8_11
10. Gayathri, M., Sudha, A.: Software defect prediction system using multilayer perceptron
neural network with data mining. Int. J. Recent Technol. Eng. 3(2) (2014)
11. Dwivedi, A.K., Tirkey, A., Ray, R.B., Rath, S.K.: Software design pattern recognition using
machine learning techniques. In: IEEE Region 10 Conference (TENCON), pp. 222–227.
IEEE Press, Signapore (2006)
12. Zhang, Z., Jing, X., Wang, T.: Label propagation based semi-supervised learning for
software defect prediction. Autom. Softw. Eng. 24, 47–69 (2016)
13. Shao, Y., Liu, B., Li, G., Wang, S.: Software defect prediction based on class-association
rules. In: 2nd International Conference on Reliability Systems Engineering (ICRSE), pp. 1–
5. IEEE, China (2017)
14. Huda, S., et al.: An ensemble oversampling model for class imbalance problem in software
defect prediction. IEEE Access. 6, 24184–24195 (2018)
15. Dwivedi, A.K., Tirkey, A., Rath, S.K.: Software design pattern mining using classification-
based techniques. Front. Comput. Sci. 12, 908–922 (2018)
16. NASA Software Project Dataset. http://mdp.ivv.nasa.gov. Accessed 2 Dec 2004
17. PROMISE: Repository of Empirical Software Engineering Data. http://promisedata.org/
repository (2011)
18. Hsu, C., Chang, C., Lin, C.: A practical guide to support vector classification. http://www.
csie.ntu.edu.tw/\*cjlin/papers/guide/guide.pdf (2010)
19. Boetticher, G., Menzies, T., Ostrand, T.J.: Promise Repository of Empirical Software
Engineering Data. http://promisedata.org/repository
20. Shirabad, J.S., Menzies, T.J.: The PROMISE repository of software engineering databases.
School Inf. Technol. Eng. http://promise.site.uottawa.ca/SERepository. Accessed 12 Apr
2018
21. Intelligent computing optimization. In: Conference Proceedings ICO. Springer, Cham
(2018). ISBN 978-3-030-00978-6 https://www.springer.com/gp/book/9783030009786
22. Intelligent computing and optimization. In: Proceedings of the 2nd International Conference
on Intelligent Computing and Optimization 2019 (ICO 2019), Springer International
Publishing (2019). ISBN 978-3-030-33585-4 https://www.springer.com/gp/book/
9783030335847
23. Intelligent computing and optimization. In: Proceedings of the 3rd International Conference
book/10.1007/978-3-030-68154-8
Simulation of Load Absorption and Deflection
of Helical Suspension Spring: A Case of Finite
Element Method
Rajib Karmaker1(&), Shipan Chandra Deb Nath1,

and Ujjwal Kumar Deb2
1
Department of Mathematics, Premier University, Chattogram, Bangladesh
rajib.math@puc.ac.bd
2
Department of Mathematics, Chittagong University of Engineering and
Technology, Chattogram, Bangladesh
ukdebmath@cuet.ac.bd
Abstract. Suspension is an essential component of automobiles, which

responsible for ensuring vehicle stability and driving comfort. The spring’s
rigidity and the presence of a suspension system have a major impact on wheel
load. Spiraling springs are the most common form of spring found in wheeled
vehicles’ hydraulic systems. In this study, helix curve is being used to design the
geometry of the helical spring, while the harmonic oscillator estimation is used
to design the geometry of the wave spring. To compare their performance and
minimize the suspension’s drawbacks, a static cum dynamic analysis is per-
formed using Finite Element Method. Hydraulic analysis compares the pressure
and deformations of spiraling torsion springs made of iron, silicon steel, and
titanium. In each loading condition, the final values of total deformation and
von-misses are recorded and compared. Finally, silicon springs can replace
helical springs and shock absorbers for both static and dynamic load situations.
Keywords: Helical suspension springs Finite Element Analysis Static

analysis Crack detection
1 Introduction
Springs are mechanical dampers that store and transmit energy. They are compression
helical in nature. They’re used in a variety of applications that demand dependable flexi-
bility. A hydraulic spring is an elastic body that distorts underweight before returning to its
original structure after the load has just been released. Springs are flexible machine archi-
tecture components that deform greatly when loaded, allowing them to store and recover
kinetic energy. When a wheel in a vehicle suspension collides with an impediment, the
springs allows the wheel to move past the barrier and then returns it to its original position
Please note that the AISC Editorial assumes that all authors have used the western naming
convention, with given names preceding surnames. This determines the structure of the names in the
running heads and the author index.

https://doi.org/10.1007/978-3-030-93247-3_87
918 R. Karmaker et al.
[1]. In terms of passenger comfort, active suspension system mechanisms are inadequate.
The current trend in the business is to lower the weight of every component of an electric
vehicle, which is crucial for enhancing an electric wheeler’s battery performance. Com-
pression springs are often made of round wire twisted into a helix and can be cylindrical,
conical, tapered, curved, or flat in shape. Cracks are a significant indicator of an infras-
tructure’s safety state. Crack was among the most common damages that structural com-
ponents experience during their operational lifetime, and it can lead to a huge reduction in
flexibility and, in rare cases, even catastrophic failure. As a result, crack identification and
analysis of broken structures are of interest to researchers. Modifications in local stiffness,
which have a substantial impact on natural frequencies and dynamic model, can be used to
indicate the existence of cracks in material structures [2]. When a helical compression spring
in a suspension system is exposed to axially compression load, it absorbs the energy while
also recovering the starting spot of a part once the force is released.

According to Anderson’s [3] solution, utilization of energy absorption and composites,
are needed to improve crash protection in vehicles, the design and material properties
of a component determine its ability to absorb energy. Evans and Morgan [4] believed
that bumper system developments will be necessary to create creative goods to satisfy
into the reduced packing areas while still exceeding vehicle effectiveness and cost
standards. It was advised that new Expanded Polypropylene (EPP) foam technologies
and techniques can be develop. L. Del Liano Vizcaya [5] studied the production
process of structural springs and found that tensile residual strain form on the internal
coil surface, resulting in significant reductions in spring strength and service life. Youli
Zhu, Yanli Wang et al. [6] analyzed why a compressive coil spring fractured at the
transition position from the bearing coil to the first active coil in service. While the
minimal stress should always be lower than that of a fully active coil’s inner coil
position. R. Puff et al. [7] investigated the effect of non-metallic contaminants on the
inevitable rejection of a helical spring under existing design stresses during service.
Different damage criteria were used to forecast the crack start life. Sid Ali Kaoua,
Kamel Taibi [8] applied 3D geometric modeling and Finite Element Analysis (FEA) to
examine the mechanical behavior of a dual helical spring under tensile axial force. The
spiraling shape visual design, where a FEM was built, is created using computer-aided
graphics (CAD) tools. Stefanie Stanzl-Tschegg [9] studied metals’ and alloys’ very
high cycle fatigue characteristics. The findings in the areas of crack development, tiny
crack growth, long crack propagation, and thresholds are described in their review, as
well as the concepts and testing procedures for very cyclic fatigue tests. The elastic
device proposed by J.M. Chacón and A.González Rodrguez [10] is comprised of two
hostile quasi springs that function under bending conditions under significant dis-
placements. Due to this geometric non-linearity, the global rigidity of the actuation was
adjusted by altering the shape of the leaf springs. A finite element optimization pro-
gram is used in the study of S. Kilian, U. Zander, and F.E. Talke [11] to improve the
design of suspensions in hard disk drives. They validated the model by comparing
modal analysis to experimental resonance data. Because the shot peening procedure
utilized to impose residual compressive forces on the surface was insufficient, the
Simulation of Load Absorption and Deflection of Helical Suspension Spring 919
spring failed early. Excessive oxide inclusions in the steel may have also exacerbated
the situation [12]. In the study of G Harinath Gowd and E Venugopal Goud [13]
proposed “static analysis of flexure,” which is employed in vehicle suspension systems.
The advantages of a leaf spring over a coil spring is that the spring’s ends can be
steered along a certain path as it tries to deflect, acting as a structural member as well as
an energy absorber. Suspension systems coil springs, their essential stress distribution,
materials characteristics, manufacture, and common failures are discussed by Y. Pra-
woto, M. Ikeda, S.K. Manville, and A. Nishikawa [14]. Gajendra Singh Rathore and
Upendra Kumar Joshi [15] examined the research to provide data on helical com-
pression spring and conclude that the Finite Element method is the best way for
numerical solution and calculation of helical compression spring.
A helical torsion spring utilized in a vehicle was investigated in this study. Ana-
lytical and Finite Element Analysis are being used to check the spring’s deformation
and load behavior. Stress analysis is crucial in helical coil compression springs for
shear force and deflection induced in the elastic at peak loading circumstances. Since
these springs are exposed to varying loads throughout their service life, their energy
restriction has been calculated. The use of composite materials in a helical coil sus-
pension system is demonstrated in this research. Using a resonance model, COMSOL
Multiphysics software is used to simulate cracked and uncracked steel, iron, and
titanium composite material. It was established that load rate affects fatigue life and
strain rate. Finally, we discovered that steel bodies deflect more than iron bodies, and
titanium can withstand more weight than iron and titanium can be bent without really
breaking for cracked or uncracked bodies.
2 Methodology
A coil spring is designed and analyzed by COMSOL Multiphysics software. Here, the
spring behavior will be observed by applying loads for different materials, to optimize
stresses and the result suggest the best material. Pitch is calculated by free height of coil
the spring divided by the number of turns. We consider the circle of wire diameter
87 mm of spring and create Solid model of helical spring as shown in Fig. 1.
2.1 Analytical Method

Initially, when designing a helical spring, various dimensions are examined. The
entity’s forces are decomposed into a radial plane twisted moment PD/2 and a straight
axial shearing force P, where DO is the coil’s outward diameter and D is the coil’s
average diameter. The strains created by the twisting moment are considered first,
followed by created the stresses. When it is subjected to load, however, the spring will
function in accordance with the load and become deformed, therefore it is necessary to
know what the bending moment and shear stress are when the load applied on spring.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
D2 n2 Lð46D þ 102lÞ 1
LðlH Þ ¼ ;n ¼ ;d ¼
46D þ 102l D n
Fig. 1. Geometry and applied load on helical spring
2.2 Suspension of Spring Using Finite Element Method

We considered a helical coil cracked and intact spring of three different materials (Iron,
Silicon Steel and Titanium) then the model imported to analysis using FEM and the
governing equation can be written as,
@2u
q ¼ rS þ FV ð1Þ
@t2
For linear buckling so u = constant then
rS þ FV ¼ 0 ð2Þ
Where,
S ¼ Sad þ C ðe einel Þ
Sad ¼ S0 þ Sext þ Sq
einel ¼ e0 þ eext þ eth þ ehs þ epl þ ecr
1
e¼ ðruÞT þ ru ð3Þ
2
For stiffness matrix we have,

k k xi Fi
k:x ¼ F ) ¼
k k xj Fj
For the first the stiffness matrix becomes,

k1 k1 x1 F1
¼ ð4Þ
k1 k1 x2 F2
For the second the stiffness matrix becomes,


k2 k2 x2 F2
¼ ð5Þ
k2 k2 x3 F3
Combining (4) and (5) we get,

2 32 3 2 32 3 2 3 2 3
k1 k1 0 x1 0 0 0 0 0 F1
4 k1 k1 0 54 x2 5 þ 4 0 k2 k2 54 x2 5 ¼ 4 F2 5 þ 4 F2 5
0 0 02 0 0 k2 3 k22 3x3 2 3F3 0
k1 k1 0 x1 F1
) 4 k1 ðk1 þ k2 Þ k2 54 x2 5 ¼ 4 F2 5
0 k2 k2 x3 F3
So the global stiffness matrix become,

2 32 3 2 3
k1 k1 0 0 x0 F0
6 k1 k1 þ k2 þ k3 þ k4 k2 k3 k4 7 6 x1 7 6 F 1 7
6 76 7 ¼ 6 7 ð6Þ
4 0 k2 k3 k2 þ k3 þ k5 k5 54 x2 5 4 F2 5
0 k4 k5 k5 x3 F3

The helical spring is directly fastened between both the horn button and the casing with
a frame, allowing the spring to move horizontally around the axis translation. The
spring’s bottom end is fixed, while the spring’s other end is attached to the vehicle’s
button. When a load is given to the spring, the horn button has the ability to move in
the lateral axis and due to load application moves in the Y-direction. As a result, the
nodes of the compression spring’s bottom end are limited in all translational degrees of
freedom.
Boundary Load; S:~n ¼ FA and FA ¼ FAtot

ð7Þ
Fixed Constraint; u¼ 0
3 Computational Domain and Mesh Generation
In this study, we construct a solid helical coil spring (Fig. 2) and have used different
parameter values according to the Table 1 and Table 2.
Fig. 2. Solid helical coil spring
Table 1. Geometries of computational domain

Description Value
Number of turns 8
Number of active coils 6
Free length of spring 210 mm
Outer diameter of spring 87 mm
Wire diameter (d) 12 mm
Axial pitch 27
Chirality Right handed
Load on spring (F) 2500 N
Number of turns 8
Table 2. Properties of materials of computational domain

Description Value
Material properties High carbon Silicon steel Titanium Unit
Density 7850 7350 4940 kg/m3
Young’s modulus 207 207 105 GPa
Poisson’s ratio 0.35 0.27 0.33 1
Shear modulus 5.5e9 79.3e9 39.5 e9 N/m2
Tensile strength 635 420 434 MPa
Heat capacity at constant pressure 475 460.548 710 J/(kg*K)
Electrical conductivity 4.032e6 1.72117e6 7.407e5 S/m
Material properties High carbon Silicon steel Titanium Unit
3.1 Meshing
On the FE model, a mesh study is undertaken to verify that suitable fine sizes which is
programmed to contain the material and structural properties that define how the
structure will react to certain loading conditions. COMSOL Multiphysics Software was
used to mesh this spring using various elements and meshing types. Table 3 shows
existing meshing (Fig. 3).
Fig. 3. Mesh design of computational domain
Table 3. Mesh Properties of the computational domain

Description Value
Properties Uncracked value Cracked value
Minimum element quality 0.218 0.02947
Average element quality 0.681 0.6999
Tetrahedron 78800 73544
Triangle 23842 24098
Edge element 2956 3007
Vertex element 8 18
Number of degrees of freedom solved for 20476 370567
4 Numerical Results and Discussions
(a) Cracked Iron body (b) Cracked Steel body (c) Cracked Titanium body
(d) Uncracked Iron body (e) Uncracked Steel body (f) Uncracked Titanium body
Fig. 4. Absorbing shear stress in computational domain after simulation
Figure 4 [(a)-(c)] shows that, after loading a cracked body, titanium and iron bodies
transferred the load to the end point more quickly than iron. The stress distribution
clearly demonstrates that the shear stress is greatest on the inner side of each coil. Every
coil has a consistent stress distribution. So, with the exception of end turns, the
probability of a spring failing in each coil is the same.
(a) Cracked Iron body (b) Cracked Steel body (c) Cracked Titanium body
(d) Uncracked Iron body (e) Uncracked Steel body (f) Uncracked Titanium body
Fig. 5. Deflection of springs
Figure 5 depicts the deflection caused by load absorption at each fractured and
intact domain position. The load is transferred to the bottom of the cracked point by
Steel’s cracked body, which is continually propagated to the domain’s border. Iron and
Titanium have the same conductivity as steel, but they do not distribute the load as
evenly. Titanium is known to be more deflective than other metals. For uncracked
cases, the load absorption and deflection rates of iron and silicon steel are nearly
identical. However, Titanium deflects too much and the entire domain absorbs too
much load at the cracked site.
(a) Iron Cracked Body (b) Steel Cracked Body (c) Titanium Cracked Body
(d) Iron uncracked Body (e) Steel uncracked Body (f) Titanium uncracked Body
Fig. 6. Frequency line graph for different materials

The frequency curves for Iron and Silicon fractured bodies are nearly comparable,
as shown in Fig. 6 [(a), (b), (c)]. However, there is a trend in favor of the uncracked
body between them. Silicon Steel has a higher frequency than Iron, yet it has a more
regular shape than Iron. However, for Cracked and Intact Titanium structures, the
frequency curve fluctuates more in the body. So, we can declare that the spring has
produced a significant result in comparison to others.
5 4
Iron Silicon Titanium Iron Silicon Titanium
Deflection
Deflection
0 Position 0 Position
0 100 200 300 0 100 200 300
Fig. 7. Deflection of cracked body Fig. 8. Deflection of uncracked body
From Fig. 7 we see that the Titanium deflection shift is higher for cracked body
than the others. The deviation of Iron and Silicon steel after loading was virtually same,
however it was found that Iron deflected less than others and for lower bending of iron,
the particle reserves a large amount of load at any particular location. For causing a lot
of deflection in the Titanium body, so it won’t be able to keep its elasticity. So, there
may be a significant risk of injury at that point. From Fig. 8 for uncracked body we
notice that the deflection of Iron and Steel is nearly same, although the former has an
erratic shape, whereas Titanium follows a nearly regular pattern.
5 Conclusion
In this study, Finite Element Analysis on Iron, Silicon Steel, and Titanium was per-
formed using COMSOL Multi-Physics program. According to our findings, the
imposed stress was distributed uniformly across the helical spring and damaged the
structure without destroying it. The structure of silicon steel can sustain a wide variety
of loads, create vibration, and be properly bent. Steel’s flexibility is maintained by
order to keep the body intact. Titanium deflected too much, due to the excess resonance
and frequency at that fraction or influenced region of the body, trying to pose a risk of
injury. Iron, on the other hand, retains the load at any fixed space, whereas Titanium
deflected quite so much, resulting in high vibration and frequency at that portion or
affected region of interest, posing a chances of damage. Finally, the Silicon steel spring
architecture, with its good flexibility and deflection, can be judged appropriate for
every situation due to its adaptability. It would also be able to soak up any vehicle’s
strain and vibration suspensions while maintaining its suppleness. Our created spring
has a lower pressure value than the original, which is a benefit of our design. By
comparing the data, we can determine if the weight of the Silicon Steel spring has
decreased and is secure.
6 Future Study
The proposed technology can be applied to components with variable cross sections,
dimensions, and boundary conditions or might be extend to detect cracks in vehicle
springs of varied sizes and forms, making it more useful for vehicle development.
Acknowledgement. The authors would like to thank to The Centre of Excellence in Mathe-
matics, Department of Mathematics, Mahidol University, Bangkok, Thailand and The Simulation
Lab, Department of Mathematics, Chittagong University of Engineering and Technology,
Bangladesh for their technical assistance.
References
1. Pawar, H.B., Patil, A.R., Zope, S.B.: Analysis and optimization of a helical compression coil
spring used for TWV. Int. J. Adv. Res. Innov. Ideas Educ. 2, 524–529 (2016)
2. Karmaker, R., Deb, U.K.: Crack detection of iron and steel bar using natural frequencies: a
CFD approach. In: 3rd International Conference on Intelligent Computing and Optimization.
ICO 2020. Advances in Intelligent Systems and Computing, vol. 1324, pp. 224–236.
Springer (2021)
3. Burgul, S.: Literature review on design, analysis and fatigue life of a mechanical spring. Int.
J. Res. Aeronautical Mech. Eng. 2, 76–83 (2014)
4. Shevale, D.V., Khaire, N.D.: Review on failure analysis of helical compression spring. Int.
J. Sci. Eng. Technol. Res. (IJSETR) 5, 892–898 (2016)
5. Rao, G.S., Deshmukh, R.R.: Art of fatigue analysis of helical compression spring used in
two-wheeler horn. Int. J. Mech. Eng. Technol. 1, 1–14 (2019)
6. Dolas, D.R., Jagtap, K.K.: Analysis of coil spring used in shock absorber using CAE. Int.
J. Eng. Res. 11, 5163–5168 (2016)
7. Singh, N.: General review of mechanical springs used in automobiles suspension system.
Singh Int. J. Adv. Eng. Res. Stud. 3, 115–122 (2013)
8. Chavan, C., Kakandikar, G.M., Kulkarni, S.S.: Analysis for suspension spring to determine
and improve its fatigue life. Int. J. Sci. Res. Manag. Studies. 1, 352–362
9. Jain, A., Jindal, A., Lakhiani, P., Mishra, S.: Mathematical approach to helical spring used in
suspension system. Int. J. Mech. Prod. Eng. 5, 78–82 (2017)
10. Rodriguez, G.A., Chacón, J.M., Donoso, A.: Design of an adjustable-stiffness spring:
mathematical modeling and simulation, fabrication and experimental validation. Mechanism
and Machine Theory - MECH MACH THEOR. 46, 1970–1979 (2011)
11. Kilian, S., Zander, U., Talke, F.E.: Suspension modeling and optimization using finite
element analysis. Tribol. Int. 36, 317–332 (2003)
12. Das, S.K., Mukhopadhyay, N.K., Ravi, B.K., Bhattacharya, D.K.: Failure analysis of a
passenger car coil spring. Eng. Fail. Anal. 14, 158–163 (2007)
13. Gowd, G.H., Goud, E.V.: Static analysis of leaf spring. Int. J. Eng. Sci. Technol. 4, 3794–
3803 (2012)
14. Harale, S.G., Elango, M.: Design of helical coil suspension system by combination of
conventional steel and composite material. IJIRSET 3, 15144–15150 (2014)
15. Rathore, G.S., Joshi, U.K.: Fatigue stress analysis of helical compression spring: a review.
Int. J. Emerg. Trends Eng. Dev. 2, 512–520 (2013)
Prediction of Glucose Concentration
Hydrolysed from Oil Palm Trunks Using
a PLSR-Based Model
Wan Sieng Yeo1, Mieow Kee Chan2(&), and Nurul Adela Bukhari3
1
Department of Chemical Engineering, Faculty of Engineering and Sciences,
Curtin University Malaysia, CDT 250, 98000 Miri, Sarawak, Malaysia
2
Centre for Bioprocess Engineering, Faculty of Engineering and the Built
Environment, SEGi University, Jalan Teknologi, Kota Damansara,
47810 Petaling Jaya, Selangor Darul Ehsan, Malaysia
mkchan@segi.edu.my
3
Energy and Environment Unit, Engineering and Processing Research Division,
Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi,
43000 Kajang, Selangor, Malaysia
Abstract. Oil palm trunks are biomass and it contains starch that can be used to
produce higher value-added glucose for bioethanol, lactic acid, food and bev-
erage productions. An immobilised enzymes hydrolysis that does not require
high temperature, strong acids, and an additional separation process is preferable
for the conversion of starch to glucose as compared to acid hydrolysis involving
hydrochloric acid or sulphuric acid. Notice that a limited study focuses on
utilisation of least-square regression models to predict the glucose concentration
from an immobilised enzymes hydrolysis. Hence, this study developed a least
square model, namely a locally weighted kernel partial least square regression
(LW-KPLSR) model to forecast the glucose concentration produced from the
immobilised enzymes hydrolysis of the oil palm trunks. Its predictive perfor-
mance results were determined, evaluated, and compared with its counterparts.
LW-KPLSR has a more accurate glucose concentrations prediction than others
since its Ea value is 103% to 195% lower.
Keywords: Oil palm trunks Glucose concentration Hydrolysis process

Prediction Partial least square regression model
1 Introduction
Malaysia is the second-largest palm oil producer in the world after Indonesia [1] with
19.47 million tonnes of palm oil was produced in the year 2020 [2, 3]. This implies the
generation of a large amount of biomass due to replantation and milling activities.
Biomass such as oil palm trunk (OPT) consists of a high amount of starch [4] which
can be converted to glucose. Acid hydrolysis is the commonly used method to convert
starch to glucose with the help of hydrochloric acid and sulphuric acid [5, 6]. However,
an additional separation process is required to purify glucose from the by-products, for
instance, furans, before glucose can be used as the substrate for the fermentation

https://doi.org/10.1007/978-3-030-93247-3_88
928 W. S. Yeo et al.
process. Enzymatic hydrolysis produces a high yield of glucose from starch at a mild
process condition due to the selectivity of the enzyme. It could be done by adding
a-amylase and glucoamylase into the starch mixture at the same time, to produce glucose.
Attempts have been done to optimise glucose production via enzymatic hydrolysis,
to reduce the operating cost and obtain high quality yield [7–9]. Studies also focus on
modeling, where the enzyme kinetic parameters were calculated [10] to describe the
reaction mechanism [11]. Prediction of yield could be obtained when these
parameters/constants are available. However, these constant values are experiment
dependent and in depth knowledge of biochemistry and microbiology is required. On
another hand, machine learning makes use of massive data to develop models via a
mathematical approach. It recognises the pattern of data distribution and can perform
prediction without explicit, rule-based programming [12].
Machine learning algorithms including partial least square regression (PLSR) based
models that involve mathematical approaches have been widely used in a variety of
applications [13–15]. PLSR based models are famous since they are dimension
reduction methods, simple, and can cope with collinearity between variables [16].
Recently, a PLSR-based model, namely locally weighted Kernel partial least square
regression (LW-KPLSR) has been developed by Yeo, Saptoro and Kumar [17] for
nonlinear processes. Its predictive capability has been investigated using different case
studies from the literature [17] and the experimental data for the bleaching of fabric
cotton. However, it has not been applied to the experimental data for a hydrolysis
process.
Besides, it is found that a minimal study is considering regression models including
PLSR based model to estimate the glucose concentration from the hydrolysis pro-
cesses. Hence, this study aims to develop LW-KPLSR using the experimental data for
immobilized enzymes hydrolysis of the oil palm trunks to predict the glucose con-
centration. Then, the predictive performance of the LW-PLSR was evaluated using the
root mean square error (RMSE), the error of approximation (Ea), and the coefficient of
determination (R2). And these predictive results were also compared with other existing
models such as locally weighted partial least square (LW-PLSR), PLSR, and principal
component regression (PCR). The following sections are the research methodology,
results, and discussions, as well as the conclusions.
In this section, hydrolysis of OPT was described, followed by the descriptions of the
LW-KPLSR model development, the data splitting and parameters setting, as well as
the evaluation of the predictive performance of the regression models. Lastly, the
computer configurations and software used in this study are illustrated.
2.1 Hydrolysis of OPT

Starch was extracted from OPT by heating method [18]. The immobilised enzymes,
namely a-amylase and glucoamylase, were dispersed into the extract after it was cold to
room temperature. The hydrolysis experiment was conducted at varied stirring speeds
Prediction of Glucose Concentration Hydrolysed from Oil Palm Trunks 929
(150 to 300 rpm), the mass of OPT (5 to 20 g), and hydrolysis time (8 to 24 h). The
concentration of glucose was measured by using the One Touch Select Simple Glu-
cometer [19].
2.2 Regression Model Development

Generally, the LW-KPLSR model is an improved model from LW-PLSR [17, 20] and
the LW-PLSR is extended from PLSR [21]. The LW-KPLSR that was developed by
Yeo, Saptoro and Kumar [17] for highly nonlinear processes are utilised in this study.
The similarity measurement used in this LW-KPLSR model is the Euclidean distance-
based similarity index, xn which is obtained based on the distance, dn between a query,
xq, and the historical input data, xn. The xn and a similarity matrix, X can be deter-
mined using Eqs. 1 and 2, respectively [22].
dn
xn ¼ expð Þ ð1Þ
/rn
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T
dn ¼ x n xq xn xq ð2Þ
where / is a localisation parameter, rd is the standard deviation of d n (n = 1, 2,

…,N).
In the LW-KPLS model, the input and output variables, x and y for n number of the
sample can be denoted as Eqs. 3 and 4 where M and L are numbers of x and y,
respectively.
xn ¼ ½xn1 ; xn2 ; :::; xnM T ð3Þ
yn ¼ ½yn1 ; yn2 ; :::; ynL T ð4Þ
To obtain the predicted output, ^y from the LW-KPLSR model, the following steps
q
have to be followed and conducted [17].

1. Obtain both Kernel matrices for input variables, V and query, Vq in which the input
and output variables in Eqs. 3 and 4 are mapped into a higher dimensional feature
space utilising the polynomial Kernel function as shown in Eq. 5.
b
kðx,yÞ ¼ xT y þ 1 ð5Þ
2. Perform mean centering on these obtained Kernel matrices, V and V q using the
following Eqs. 6 and 7.

~¼ 1 1
V I 1n 1Tn V I 1n 1Tn ð6Þ
n n
~ q ¼ ðV q 1 1N 2 1Tn VÞðI 1 1n 1Tn Þ

V ð7Þ
n n
3. Obtain a dual representation of a scaled version of projection direction, B via a dual

kernel partial least square discrimination using Eq. 8.
0
B ¼ YY Vb with the normalisation,
b
b¼ ð8Þ
kbk
4. Calculate the re-scaled query, and input variable matrices, Vq, and V using Eqs. 9
and 10.
Xq ¼ V qB ð9Þ
X ¼ VB ð10Þ
5. Figure out the number of latent variables K and set k = 1.

6. Obtain a similarity matrix X using Eqs. 1, 2 and 11.
X ¼ digfx1 ; x2 ; :::; xN g ð11Þ
7. Determine X k , Y k , and X q;k using Eqs. 12–16.

X k ¼ X 1N X 1 ; X 2 ; :::; X M ð12Þ

Y k ¼ Y 1N Y 1 ; Y 2 ; :::; Y L ð13Þ
T
Xq;k ¼ Xq 1N X1 ; X2 ; :::; XM ð14Þ
PN
xn X nm
Xm ¼ P
n¼1
N ð15Þ
n¼1 xn
PN
xn Y nl
Yl ¼ P
n¼1
N ð16Þ
n¼1 xn
8. Let y^q ¼ ½y1 ; y2 ; :::; yL T .

9. Get the kth latent variable of X k using Eqs. 17 and 18.
t k ¼ X k wk ð17Þ
X Tk XY k
wk ¼ ð18Þ
kX Tk XY k k
10. Attain the kth loading vector of X k and the kth regression coefficient vector using
Eqs. 19 and 20.
X Tk Xtk
pk ¼ ð19Þ
tTk Xtk
Y Tk Xtk
qk ¼ ð20Þ
tTk Xtk
11. Obtain the kth latent variable of Xq using Eq. 21.
tq;k ¼ X Tq;k wk ð21Þ
12. Substitute ^y with ^y þ tq;j qj where tq;k is the kth latent variable of X q .
q q
13. If k = K, then complete the prediction using LW-KPLSR model. Otherwise, place
Eqs. 22–24.
X k þ 1 ¼ X k tk pTk ð22Þ
Y k þ 1 ¼ Y k tk qTk ð23Þ
X q;k þ 1 ¼ X q;k tq;k pk ð24Þ
14. Let k = k + 1 and go back to Step 9.
2.3 Data Splitting and Parameters Set for the Least Square Regression
Models
In this study, a total number, N of 18 datasets were collected from the immobilised
enzymes hydrolysis of the oil palm trunks. They were saved in a CSV file and were
divided into a ratio of 75:25 where the number of training data, N1 is 14, and the
number of testing data, N2 is 4. These datasets consist of the input or observed vari-
ables, namely stirring speed (rpm), the mass of OPT (g) and hydrolysis time (h).
Meanwhile, the output or target variable is glucose concentration in mmol per litre.
Both training and testing data involving the input and output variables were executed in
MATLAB software using LW-KPLSR, LW-PLSR, PLSR, and PCR models. The
number of latent variables for all these least-square regression models is set as 1.
Besides, for LW-KPLSR and LW-PLSR models, their u is 0.1 as it gives the best
results [17]. Meanwhile, since LW-KPLSR consists of a kernel function, its kernel
parameter, b has to be tuned. According to Mongillo [23], and Orr [24], the value of b
within the range of 0.01 to 10 provides the lower error. Usually, b equals 1 gives the
lowest predicted error [20], hence b is fixed as 1 in this study. The parameters used in
the least square regression models for this study are tabulated in Table 1.
Table 1. Parameters used for the LW-KPLSR, LW-PLSR, PLSR, and PCR models.
Parameters N N1 N2 LV u b
Values 18 14 4 1 0.1 1
2.4 Evaluation of the Predictive Performance of the Regression Models

This study utilised RMSE, Ea, and R2 to evaluate the predictive performance of the LW-
KPLSR, LW-PLSR, PLSR, and PCR models. RMSE is a goodness‐of‐fit indicator that
shows the differences in observed and target values [25, 26]. The lower the RMSE, the
better the predictive performance of the model. RMSE can be calculated using Eq. 25
as shown below [25]:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uN
uP ^
u ðyi yi Þ2
ti¼1
RMSE ¼ ð25Þ
N
where yi and ^yi are the actual and predicted output values, respectively.
However, there could be some possible cases that its RMSE for the training dataset
is the lowest and testing datasets have the highest RMSE [20, 27]. Then, it will be a
trouble to evaluate the overall predictive performance of this regression model. Ea that
is shown in Eq. 26 was adopted from Saptoro, Vuthaluru and Tadé [28] and Yeo,
Saptoro, Kumar and Research [20] to address this problem.

N1 N2
Ea ¼ RMSE 1 þ RMSE 2 þ jRMSE1 RMSE2 j ð26Þ
N N
Moreover, R2 indicates the comparison between the total of the squared errors to
the total of the squared deviations about its mean. In this study, the R2 that is shown in
Eq. 27 [25] was employed to measure the goodness of fit between the actual and the
predicted values. The closer the R2 to 1 the better the predictive performance of the
model [29].
P
ðyi ^yi Þ2
R ¼ 1 Pi
2
ð27Þ
i ðyi yÞ2
Besides, the percentage error (PE) which is displayed in Eq. 28 [25, 30] was also
adopted in this study. PE was used to understand the differences between the RMSE,
Ea, and R2 between two regression models.
by i yi
PE ¼ 100% ð28Þ
yi
2.5 Computer Configurations and Software

This study was performed using an Acer Swift 5 Thin and Light Laptop Intel Core i7
11th gen to perform the simulation works. The hardware and software computer
configuration specifications of this laptop are Windows 10 Home 64-bit, up to 4.2 GHz
Intel Core i7, 16.0 GB random-access memory, 512 GB solid-state drive storage, and
MATLAB version R2021a.
Figure 1 presented the combination effect of stirring speed, the mass of OPT, and
hydrolysis time on glucose production. The size of bubbles indicated the amount of
glucose produced in mmol/L. Big bubbles were observed in Fig. 1(a) at the region
200–250 rpm and >20 g of OPT. This showed that high stirring speed and high amount
of OPT produced were desired as a high amount of glucose was produced. This is
because a high stirring speed increased the mass transfer, while the high mass of OPT
provided more substrate for the enzyme to catalyse the hydrolysis process. In terms of
hydrolysis time, 15–25 h was desired, as illustrated in Fig. 1(b). Under the optimised
condition, the experiment was repeated by increasing the mass of OPT to 40 g. The
result showed that the highest concentration of glucose, 30.1 mmol/L, was produced by
using 30 g of OPT, 225 rpm for 16 h at 60 °C.
30
Mass of OPT (g)
20
10
0
0 200 400
Stirring speed (rpm)
(a) (b)
Fig. 1. The effect of stirring speed and (a) mass of OPT (g), (b) hydrolysis time on glucose
production.
In this study, the experimental data for the immobilised enzymes hydrolysis of the
oil palm trunks were utilised to build LW-KPLSR, LW-PLSR, PLSR, and PCR models.
To evaluate the predictive performance of these regression models, RMSE, and R2 for
both training and testing data as well as Ea were calculated and tabulated in Table 2.
RMSE1 and RMSE2 represent the RMSE for training and testing data, respectively.
Meanwhile, the R12 and R22 are the R2 for training and testing data, respectively. From
Table 2, it can be seen that LW-KPLSR has the lowest value of Ea while both of its R2
are more than 0.87. From Table 2, notice that the Ea for LW-KPLSR is 195%, 120%,
and 103% lower than LW-PLSR, PLSR, and PCR, respectively.
Table 2. Predictive performance results from LW-KPLSR, LW-PLSR, PLSR and PCR models.
Models LW-KPLSR LW-PLSR PE (%) PLSR PE (%) PCR PE (%)
RMSE1 2.1555 1.9713 9 3.8111 77 4.4279 48
R12 0.9055 0.9256 2 0.6509 28 0.4631 121
RMSE2 2.2505 5.8456 160 2.2891 2 2.1876 101
R22 0.8784 0.6932 21 0.8252 6 0.7334 97
Ea 2.2716 6.7065 195 4.9949 120 6.1703 103
Although LW-PLSR has slightly better RMSE1 and R12 than LW-KPLSR, its
RMSE2 and R22 are 160% and 21% higher than LW-KPLSR. It is due to the help of
polynomial Kernel in LW-KPLSR which ables to map the datasets into a higher
dimension to obtain a better prediction [20]. Also, the polynomial Kernel function is
one of the famous Kernel functions in machinery learning applications [31]. On the
other hand, LW-KPLSR and LW-PLSR also provide better results for their training
data as compared to PLSR and PCR in which its RMSE1 is lower and its R12 is higher.
These results could due to the presence of a locally weighted model both LW-KPLSR
and LW-PLSR which utilises a weighted Euclidean distance-based approach to choose
the more relevant historical data for better prediction [32]. The absence of polynomial
Kernel function in the LW-PLSR causes its poorer prediction for the testing data than
PLSR and PCR.
Additionally, PLSR produces better predictive results than PCR where its Ea and
RMSE1 are lower and its R12 and R22 are higher than PCR. This is because PLSR
includes both input and output variables in its model development while PCR is only
involved the input variables [33]. Hence, the PLSR transformation objective can be
done by finding the maxima between the input and output variables to describe better
variance for prediction but PCR can only maximise the block coverage within the input
variables [34, 35]. Nevertheless, the LW-KPLSR model is still the best predictive
model to estimate the glucose concentration for the immobilised enzymes hydrolysis of
the oil palm trunks. As can be seen from Fig. 2(a) and Fig. 2(b), the predicted glucose
concentrations from LW-KPLSR are closer to the actual glucose concentration as
compared to LW-PLSR, PLSR, and PCR. In conclusion, LW-KPLSR is more appro-
priate to be used for the prediction of glucose concentration from the immobilised
enzymes hydrolysis of the oil palm trunks.
(a) (b)
Fig. 2. Comparisons between the actual and predicted glucose concentrations from different
least square regression models (a) for training data, (b) for testing data.
4 Conclusions
The experimental work indicated that stirring speed, hydrolysis time, and mass of OPT
contributed significant impacts to the enzymatic process for glucose production. The
result showed that the highest concentration of glucose, 30.1 mmol/L, was produced by
using 30 g of OPT, 225 rpm for 16 h at 60 °C. In this study, an LW-KPLSR model
was developed to predict the glucose concentration from the immobilised enzymes
hydrolysis of OPT. For the overall predictive performance, the LW-KPLSR model
predicted more accurate glucose concentrations than LW-PLSR, PLSR, and PCR
models since its Ea value is 103% to 195% lower. Moreover, its R2 values are more
than 0.8 and also closer to 1 in which results indicate that the deviation between the
actual and predicted glucose concentrations is not big. Hence, it can be concluded that
LW-KPLSR is suitable to be used to estimate the glucose concentration for this
hydrolysis process. It is suggested to further study more different experimental data of
glucose concentration hydrolysed from oil palm trunks, then more data can be used to
develop a PLSR-based model which can lead to more accurate predictive performance.
Acknowledgments. The authors would like to thank the supports from Curtin University
Malaysia, SEGi University, and Malaysian Palm Oil Board (MPOB). Also, the authors declare no
conflict of interest.
References
1. Analytics, P.O.: Palm Oil Analytics. Singapore (2017)
2. Ooi, L.C.L., et al.: SureSawitTM True-To-Type—A high throughput universal single
nucleotide polymorphism panel for DNA fingerprinting, purity testing and origin verification
in oil palm. J. Oil Palm Res. 31, 561–571 (2019)
3. Kushairi, A., et al.: Oil palm economic performance in Malaysia and R&D progress in 2018.
J. Oil Palm Res. 31(2), 165–194 (2019)
4. Sulaiman, O., Salim, N., Nordin, N.A., Hashim, R., Ibrahim, M., Sato, M.: The potential of
oil palm trunk biomass as an alternative source for compressed wood. BioResources 7(2),
2688–2706 (2012)
5. Azmi, A., Malek, M., Puad, N.: A review on acid and enzymatic hydrolyses of sago starch.
Int. Food Res. J. 24(12), 265–273 (2017)
6. Bukhari, N.A., Loh, S.K., Bakar, N.A., Ismail, M.: Hydrolysis of residual starch from sago
pith residue and its fermentation to bioethanol. Biores. Technol. 46(8), 1269–1278 (2017)
7. Ude, M.U., Oluka, I., Eze, P.C.: Optimization and kinetics of glucose production via
enzymatic hydrolysis of mixed peels. J. Biosour. Bioprod. 5(4), 283–290 (2020)
8. Samaranayake, M.D., De Silv, A.B.: Optimization of liquefaction and sacch production of
glucose syrup from Cas optimized conditi. J. Chem. Res. 7(7), 16–25 (2017)
9. Acosta-Pavas, J.C., Alzate-Blandon, L., Ruiz-Colorado, A.A.: Enzymatic hydrolysis of
wheat starch for glucose syrup production. Dyna 87(214), 173–182 (2020)
10. Choi, B., Rempala, G.A., Kim, J.K.: Beyond the Michaelis-Menten equation: accurate and
efficient estimation of enzyme kinetic parameters. Sci. Rep. 7(1), 1–11 (2017)
11. Boeckx, J., Hertog, M., Geeraerd, A., Nicolai, B.: Kinetic modelling: an integrated approach
to analyze enzyme activity assays. Plant Methods 13(1), 1–12 (2017)
12. Dobbelaere, M.R., Plehiers, P.P., Van de Vijver, R., Stevens, C.V., Van Geem, K.M.:
Machine learning in chemical engineering: strengths, weaknesses, opportunities, and threats.
Engineering (2021)
13. Purwanto, A., Research, M.: Partial least squares structural squation modeling (PLS-SEM)
analysis for social and management research: a literature review. J. Ind. Eng. Manag. Res. 2
(4), 114–123 (2021)
14. Khatri, P., Gupta, K.K., Gupta, R.K.: Environment: a review of partial least squares
modeling (PLSM) for water quality analysis. Model. Earth Syst. Environ. 7(2), 703–714
(2021)
15. Martínez, J.L., Leiva, V., Saulo, H., Liu, S.: Estimating the covariance matrix of the
coefficient estimator in multivariate partial least squares regression with chemical
applications. Chemom. Intell. Lab. Syst. 214, 104328 (2021)
16. Cammnitiello, I., Lombardo, R., Durand, J.-F.J.Q.: Quantity: robust partial least squares
regression for the evaluation of justice court delay. 51(2), 813–827 (2017)
17. Yeo, W.S., Saptoro, A., Kumar, P.: Development of adaptive soft sensor using locally
weighted Kernel partial least square model. Chem. Prod. Process. Model. 12(4), 1–13 (2017)
18. Eom, I.-Y., Yu, J.-H., Jung, C.-D., Hong, K.-S.: Efficient ethanol production from dried oil
palm trunk treated by hydrothermolysis and subsequent enzymatic hydrolysis. Biotechnol.
Biofuels 8(1), 1–11 (2015)
19. Philis-Tsimikas, A., Chang, A., Miller, L.J.J.O.D.S.: Technology: precision, accuracy, and
user acceptance of the OneTouch SelectSimple blood glucose monitoring system. 5(6),
1602–1609 (2011)
20. Yeo, W.S., Saptoro, A., Kumar, P.J.I., Research, E.C.: Adaptive soft sensor development for
non-Gaussian and nonlinear processes. 58(45), 20680–20691 (2019)
21. Ren, M., Song, Y., Chu, W.: An improved locally weighted PLS based on particle swarm
optimization for industrial soft sensor modeling. Sensors 19(19), 4099 (2019)
22. Ma, M., Khatibisepehr, S., Huang, B.: AB ayesian framework for real-time identification of
locally weighted partial least squares. AIChE J. 61(2), 518–529 (2015)
23. Mongillo, M.: Choosing basis functions and shape parameters for radial basis function
methods. SIAM Undergraduate Research Online 4(190–209), 2–6 (2011)
24. Orr, M.J.: Technical Report, Center for Cognitive Science. University of Edinburgh (1996)
25. Yeo, W.S., Lau, W.J.: Predicting the whiteness index of cotton fabric with a least squares
model. Cellulose 28(13), 8841–8854 (2021). https://doi.org/10.1007/s10570-021-04096-y
26. Harmel, R.D., Smith, P.K., Migliaccio, K.W.: Modifying goodness-of-fit indicators to
incorporate both measurement and model uncertainty in model calibration and validation.
Trans. ASABE 53(1), 55–63 (2010)
27. Dou, Y., Sun, Y., Ren, Y., Ren, Y.: Artificial neural network for simultaneous determination
of two components of compound paracetamol and diphenhydramine hydrochloride powder
on NIR spectroscopy. Anal. Chim. Acta 528(1), 55–61 (2005)
28. Saptoro, A., Vuthaluru, H., Tadé, M.: Presented at the Proceedings of the International
Conference on Modeling and Simulation (2006)
29. Yeo, W.S., Saptoro, A., Kumar, P.: Missing data treatment for locally weighted partial least
square-based modelling: a comparative study. Asia-Pac. J. Chem. Eng. 15(2), e2422 (2020)
30. Guang, W., Baraldo, M., Furlanut, M.: Calculating percentage prediction error: a user’s note.
Pharmacol. Res. 32(4), 241–248 (1995)
31. Zhou, J., Zeng, S., Zhang, B.: Kernel nonnegative representation-based classifier. applied
Intelligence. 1-21 (2021)
32. Kano, M., Fujiwara, K.: Virtual sensing technology in process industries: trends and
challenges revealed by recent industrial applications. J. Chem. Eng. Japan. 12we167 (2012)
33. Thien, T.F., Yeo, W.S.: A comparative study between PCR, PLSR, and LW-PLS on the
predictive performance at different data splitting ratios. Chem. Eng. Commun. 1–18 (2021)
34. Cramer, R.D.: Design: partial least squares (PLS): its strengths and limitations. Perspect.
Drug Discovery Des. 1(2), 269–278 (1993)
35. Carrascal, L.M., Galván, I., Gordo, O.: Partial least squares regression as an alternative to
current regression methods used in ecology. Oikos 118(5), 681–690 (2009)
Ontology of Lithography-Based Processes
in Additive Manufacturing with Focus
on Ceramic Materials
Marc Gmeiner1,2(B) , Wilfried Lepuschitz1 , Munir Merdan1 ,

and Maximilian Lackner2
1
Practical Robotics Institute Austria, Wexstraße 19-23, 1200 Vienna, Austria
{gmeiner,lepuschitz,merdan}@pria.at
2
University of Applied Sciences Technikum Wien, Höchstädtplatz 6,
1200 Vienna, Austria
Abstract. Additive manufacturing (AM) technologies are widely used

to fabricate complex 3-dimensional objects more quickly and cost effec-
tively than by subtractive manufacturing. Due to manifold options it
is important to select the best suited process, technology, material and
parameters for a successful print but the required knowledge is time-
consuming to gather, combine and conclude correctly. Currently opera-
tors just analyse a CAD model using slicing software with limited support
and knowledge provided, if a print is going to be successful. Consequently,
the challenge is to gather and combine information from different fields
of application into one knowledge source. Ontologies are increasingly
used for a machine readable representation of domain knowledge. This
paper presents an ontology with the focus on lithography-based ceramic
manufacturing. In this context, multiple sources like experts’ knowledge,
literature, guidelines and ontologies of other domains are revised and
combined into one general concept. In further work, we aim to use the
resulting ontology in a software for a printability analysis with an exist-
ing cloud manufacturing system.
Keywords: Ontology · Additive manufacturing · Ceramic ·

Lithography-based ceramic manufacturing · Vat photopolymerisation
1 Introduction
Additive Manufacturing (AM) is a rising field of application redefining the think-
ing and designing process of CAD objects to be manufactured as products. The
basic principle of AM is to manufacture a product by adding material layer-wise
and with its various applications [1] and increasing popularity since 20091 it is
now a widely used and accepted technology [2]. While designing and printing
with Fused Deposition Modeling (FDM)2 - this is a process, where polymer fila-
ment is melted in a nozzle and planar printed, layer by layer - is fairly easy and
1
In 2009 major patents for Fused Deposition Modeling (FDM) expired [2].
2
FDM is the same as FFF (Fused Filament Fabrication).
https://doi.org/10.1007/978-3-030-93247-3_89
Ontology LCM 939
cheap regarding the initial costs of the machine and the materials, it is quite
the opposite for other AM processes. The wider usage of FDM for hobbyists
and also in the industry - especially in research and development departments -
now starts to apply also for other technologies like Stereolithography (SLA) with
polymer slurries. But not only resin can be used with SLA, also other materials
such as metal and ceramic can be utilised [3].
1.1 Lithography-Based Manufacturing
Ceramics are the oldest synthetic materials humankind is able to manufacture,

and therefore a lot of techniques have been developed and knowledge has been
gathered throughout history [4]. With all the advantages of this material such
as chemical resistance, good mechanical properties like high hardness and stiff-
ness or high temperature resistance, it is an ideal material for multiple fields
of application e.g. medical implants, dentistry, turbine blades, or bearings [5].
For long time the creation of small and complex ceramic parts was not possible,
but with the rise of AM and material research, methods now exist to do so by
mixing a ceramic powder with liquid additives. In this context, Lithography-
based Ceramic Manufacturing (LCM) describes the process of creating so-called
ceramic green parts from a slurry in a vat photopolymerisation based technique.
A similar approach to LCM is by using metal powders also combined with liq-
uid additives to form a metal slurry. Lithography-based Metal Manufacturing
(LMM) uses the advantages of AM with metal materials and avoids the disadvan-
tages of hazardous metal dust used with other processes like powder bed fusion
(PBF). The metal slurry is solidified through photopolymerisation but depend-
ing on the viscosity of the slurry, a necessary post processing step - decaking -
is needed before hardening the green part in a sintering oven.
1.2 Knowledge Gathering
Although AM offers a high degree of freedom, there are still constraints limiting
the possibilities of a print like layer height, resolution/accuracy, materials, or
the process itself [6]. These parameters and hyper-parameters as well as mate-
rial properties affect not only the quality but also the time needed for a print.
However, most of the knowledge regarding these parameters is not documented
properly but only stored in the heads of the people working with it (mostly
machine operators) or embedded in the slicing software. Users therefore need
to search for design guidelines providing information about the printability of
specific geometries. To gather this knowledge, which is the key for development,
it is crucial to combine, connect and conclude correctly. For knowledge represen-
tation in one place, the creating of an ontology represents a suitable approach.
It represents an explicit specification of a target conceptualization [7], mean-
ing to gather information into abstract classes with relations to each class and
individuals specifying the content.
940 M. Gmeiner et al.
2 State of the Art

Ontologies are regarded as suitable knowledge representation approach for dif-
ferent topics and they have been developed for various Industry 4.0 domains [8].
As every AM material has its own properties, which need to be considered in
the print preparation, a fully modelled ontology of an AM process requires a
material database with corresponding dependencies and properties.
2.1 Manufacturing Technologies
Witherell et al. use a general approach of AM in their ontology [10]. AM Pro-

cesses are divided into the categories of the Standard ISO/ASTM 52900 [11] and
for materials a simple classification of ceramics, hard metals, metal alloys and
polymers exists. A different approach to model the manufacturing process is by
Mohd Ali et al. [12] with their AM Ontology (AMO). They follow a pattern
from the input as an STL to the finished product by dependencies along the
manufacturing process. With the integration of several top-level ontologies and
a detailed class structure they propose an ontology of the product life cycle in
manufacturing. Ramı́rez-Durán et al. [13] focus in their ontology on the Mate-
rial Extruding (MEX) process consisting of 5 modules. The components of an
extruding machine are represented as well as their spatial relationships and fea-
tures. Furthermore, a data capturing model of the sensors is included and 3D
models of the used components. This model is built in a generic way making it
possible to extend or replace the technology with other techniques.
2.2 Material Sciences
Various attempts of combining the quite diverse topics of material sciences into
semantic databases have been performed, describing in detail processes like ther-
mochemical properties of materials [14] or providing general information [15].
Ashino [16] created a material ontology as a substance network, covering the
concepts of material substance, property and environment. It is supposed to be
used for data exchange among heterogeneous material databases using the con-
cepts related to material sciences. Concerning metal materials, Zhang et al. [17]
propose the Metal Materials Ontology based on Yago (MMOY). The MMOY is
an attempt to derive knowledge from the metallic material domain from YAGO
with a string matching algorithm. With feasibility examples and a graphical
interface they showed the relations between materials and the concepts behind
them. Within the frame of several EU projects, the Elementary Multiperspective
Material Ontology (EMMO) has been developed as a multidisciplinary materials
and physics ontology for applied sciences for enabling a connection between the
physical, experimental and simulation world [18].
Ontology LCM 941
3 Presentation of ANALYTIC Ontology

Using the ontology editor Protégé [19], an existing ontology by Mayerhofer et al.
[20] is analysed and adapted. A language specification is set up for a uniform
labelling of entities. For this ontology, two modelling concepts are used. The
first one is the conventional method of linking individuals together, which are
restricted in the class descriptions. The second one is a pure knowledge trans-
portation method by adding linked data in between the classes. The top level
consists of reused terms with a more specific usage: Physical Object, Property,
Requirement, Skill and Specification. Newly added is the class Classification
containing for example the material database or manufacturing technologies.
Manufacturing technologies include subtractive, additive and electronic
classes, to represent a generic approach. The technologies which are used in AM
processes are classified likewise to the categorisation of the standard ISO/ASTM
52900 [11]. As an example vat photo polymerisation (VPP) contains various tech-
nology concepts, such as lithography-based metal and ceramic manufacturing,
stereolithography, hot-lithography or digital light processing as seen in Fig. 1.
Also post processing technologies are included as separate class, containing e.g.
UV Curing, decaking or sintering.
Fig. 1. The most important VPP based processes as modelled in this work.
Depending on the point of view, materials can be classified differently

[4,21–23]. A different classification is used than in the literature, combining
ceramics and polymers into a non-metal top class, and adding the composite
top class. The composite class is meant for materials that are a combination of
different materials, such as carbon or other fibre materials. Composite materi-
als also can include transition ceramics or metals3 . Polymers are included with
the non-metal materials, since there is no reason to treat them differently than
3
Transition ceramics are ceramics with elements of transition metal materials, but
belonging in literature to ceramic materials [4].
ceramics. Ceramics are classified in regard to Briehl [22], who uses the chemi-
cal view and divides the materials into cermets, oxide, non-oxide, and silicate
ceramics, while also mentioning the possible categorization between high-tech
and clay pottery ceramics. By using the chemical classification, it is possible to
use a detailed subdivision of the main categories. Therefore, the main classifica-
tion splits up into oxide, non-oxide and silicate ceramic. Cermets are combined
from cer amics and metals, making them a composite material with a greater
percentage of ceramic than metal grains. Silicate ceramics contain at least 20%
Si O2 , while oxide and non-oxide are mostly free from Si O2 . With that top level
ceramic classification, silicate and non-oxide ceramics can be divided further
into subcategories, with their main ingredient as a mid level class. In contrast
to ceramics, metal materials are easier to categorize as there is one major ele-
ment influencing the material characteristics, which is iron [23,24]. Therefore,
there is a division into iron-based and non-iron-based metal materials. The cor-
rect labelling of polymers is a subject of discussion, due to their labelling in
their discovery and actual use today [24]. For the sake of simplicity, the class of
polymers contains materials which are created either synthetically in chemical
processes or are related to them, excluding metal and ceramic materials. Mid
level classes are Natural Polymers, Elastomers, Thermoplastics, Thermosets and
Thermoplastic elastomers.
Newly introduced to the physical objects are the different feedstock types,
such as slurry, solid, filament or powder. Properties contain either a relation-
ship of being a fixed or a variable property, with the latter containing elements
from the deprecated class of parameters. Super classes of Requirements are files,
guidelines, guideline sets and software, to specify or fulfil a processes. Espe-
cially guidelines are an important feature, creating different sets of recommen-
dations on how to use machines and materials. AM Guidelines such as the VPP
Guideline list parameters (variable properties) and machine properties used in
those technologies. Skills describe capabilities of machines and overlap with the
classification class, since some elements can be defined as both. Manufactur-
ing properties are, equal to material and geometry, restructured to containing
entities influencing the printing process, e.g. the air humidity or temperature.
Some machines are able to control some of those properties, therefore those ele-
ments are modelled as subclasses of manufacturing and machine property. The
new material layout contains six super classes, for example material, geometry,
manufacturing or official certificate properties. Each of those super classes have
a selection of commonly used material properties, some taken from Mayerhofer
et al. [20] and material data sheets from different slurry manufacturers4 . Physi-
cal objects, especially machines and materials can be described with certificates
meeting specific standards and requirements, which is realized in the new Official
Certificate Property, containing the concepts of machine, medical and toxicity
certificates.
4
Some manufacturers are the companies Lithoz, 3dCeram, Prodways, Admatec and
Cubicure.
Ontology LCM 943
4 Evaluation
The evaluation of the ontology was performed with SPARQL queries and
involved also a graphical analysis with GraphDB [25] of different instances. For
achieving a practical database, multiple materials and machines from different
LCM and LMM manufacturers were added to the ontology as individuals. Source
were data sets provided by the companies including material folders and data
sheets of materials, processes and machinese. Listing 1 shows a SPARQL query
for non-oxide ceramic materials with its feedstock form as slurry. The result of
this query is presented in Table 1.
1 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

2 PREFIX pria: <http://www.pria.at/ontologies/Manufacturing.ttl\#>
3 SELECT * WHERE {
4 ?Result rdf:type pria:NonOxideCeramic .
5 ?Result rdf:type pria:Slurry .}
Listing 1. SPARQL query for ceramic materials
Table 1. Result of SPARQL query to Code 1
Result
#Material 3dCeram AluminiumNitride
#Material 3dCeram SiliconNitride
#Material Lithoz LithaNit 770
Also, templates were created for materials and machines as a guide to model
those instances correctly as individuals. Figure 2 shows the LCM machine tem-
plate, including the material guides for some manufacturers and also fixed
machine properties. Every machine property is determined through data - as
it is shown exemplary on the right side - which consists of an object property,
here is unit of, and a linked individual, representing an unit. From right top to
bottom the first three instances are created for this ontology, only the fourth
instance “UO 0000017” is from the uo import, while its actual label is microm-
eter [26]. For an automatic classification of properties into a predefined subclass
of the desired property, instances need to be added correctly. Using the material
property Hardness as example, this class contains subclasses of different hardness
determination methods, such as Vickers, Mohs or Shore. To classify the materials
into different nuances, Very High, High, Medium, Low and Very Low are used
also as subclasses of the material property. Figure 3 shows how the classification
of the material property High Hardness is performed, using the commonly used
Vickers method alongside with the more ancient Mohs method. If an individual,
classified as a hardness property of either Vickers or Mohs, contains a numeric
value (Data Property), the reasoner checks the predefined ranges of the hardness
Fig. 2. Template of an LCM capable machine, showing some units of fixed machine
properties on the right side
nuances classes and suggests to list that individual in the more detailed subclass.
This is restricted to the use of the unit class Newton per square Millimeter to
guarantee a correct classification. The automatic classification only needs to be
performed once in a subclass, as seen in Fig. 4. Figure 5 presents how parameter
dependencies in the form of variable machine properties are connected to certain
printing constructs. The class Manufacturing Process Result contains different
artefacts such as blobbing, stringing, or over polymerisation. With this class
entity, it is possible to see all depending parameters and to which guideline they
belong to.
Fig. 3. Vickers hardness automatic clas- Fig. 4. All instances using the Vickers
sification tool with the inferred instances hardness class for their own definition
Ontology LCM 945
Fig. 5. Dependencies of the artefact over polymerisation
5 Conclusion
This paper proposed an ontology model for representing additive manufacturing
processes with focus on ceramic materials. This ontology serves as a seman-
tic database filled with multiple types of information including commonly used
annotations, data and object properties, additional literature, guidelines, and
templates. A generic approach was used to easily add and represent manufac-
turing processes, machine properties and materials. For a general overview of
ceramic materials, several subclasses were created with regard to their main
ingredients, serving not only the purpose of a simple determination of materi-
als but also to model those classes with regard to their chemical composition.
For practical use information from different machine manufacturers and feed-
stock material producers were used to fill the ontology, serving also the purpose
to test the database for its efficiency. Generally, it is possible through simple
queries to filter different aspects of the ontology, while depending on the search
radius, more specific results can be obtained.
Future work will focus to extend some material concepts with the relevant
knowledge from chemical science as well as to integrate missing concepts focused
on the subtractive manufacturing sub domain. Moreover, the work presented in
this paper will be integrated in a cloud manufacturing system, which encom-
passes ontologies for representing knowledge on various layers [27].
Acknowledgement. The authors acknowledge the financial support from the Vienna
Business Agency in the frame of the “Research” program for the project ANALYTIC
(proposal ID 2783347)
References
1. Bühler, P., Schlaich, P., Sinner, D., Stauss, A., Stauss, T.: Produktdesign: Konzep-
tion - Entwurf - Technologie. Springer, Heidelberg (2019). https://doi.org/10.1007/
978-3-662-55511-8
2. Attaran, M.: The rise of 3-D printing: the advantages of additive manufacturing
over traditional manufacturing. Bus. Horiz. 60, 677–688 (2017). https://doi.org/
10.1016/j.bushor.2017.05.011
3. Robles Martinez, P., Basit, A.W., Gaisford, S.: The history, developments and
opportunities of stereolithography. In: Basit, A., Gaisford, S. (eds.) 3D Printing of
Pharmaceuticals. AAPS Advances in the Pharmaceutical Sciences Series, vol. 31,
pp. 55–79. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-90755-0 4
4. Salmang, H., Scholze, H.: Keramik. Ed. by Rainer Telle. 7th edn. Springer, Heidel-
berg (2007). https://doi.org/10.1007/978-3-540-49469-0, ISBN 978-3-540-63273-3
5. Schwentenwein, M., Schneider, P., Homa, J.: Lithography-based ceramic manufac-
turing: a novel technique for additive manufacturing of high-performance ceramics.
Adv. Sci. Technol. 88, 60–64 (2014). https://doi.org/10.4028/www.scientific.net/
AST.88.60
6. Gao, W., et al.: The status, challenges, and future of additive manufacturing in
engineering. Comput.-Aided Des. 69 (2015). https://doi.org/10.1016/j.cad.2015.
04.001
7. Guarino, N., Oberle, D., Staab, S.: What is an ontology? In: Staab, S., Studer, R.
(eds.) Handbook on Ontologies. International Handbooks on Information Systems.
Springer, Heidelberg, pp. 1–17 (2009). https://doi.org/10.1007/978-3-540-92673-
30
8. Kumar, V.R.S., et al.: Ontologies for Industry 4.0. Knowl. Eng. Rev. 34, e17 (2019).
https://doi.org/10.1017/S0269888919000109
9. Witherell, P., Lopez, F., Assouroko, I., Thompson, K.: Systems integration for
additive manufacturing. NIST (2014). https://www.nist.gov/programs-projects/
systems-integration-additivemanufacturing. Accessed 30 July 2021
10. iassouroko, 29 March 2021. Iassouroko/AMontology. https://github.com/
iassouroko/AMontology. Accessed 15 Apr 2021
11. ISO/ASTM 52900(En): Additive Manufacturing - General Principles - Terminol-
ogy, 11 May 2021. https://www.iso.org/obp/ui/#iso:std:iso-astm:52900:dis:ed-2:
v1:en. Accessed 11 May 2021
12. Ali, M.M., Rai, R., Otte, J.N., Smith, B.: A product life cycle ontology for addi-
tive manufacturing. Comput. Ind. 105, 191–203 (2019). https://doi.org/10.1016/
j.compind.2018.12.007
13. Ramı́rez-Durán, V.J., Berges, I., Illarramendi, A.: ExtruOnt: an ontology for
describing a type of manufacturing machine for Industry 4.0 systems. Semant.
Web 11(6), 887–909 (2020). https://doi.org/10.3233/SW-200376
14. Bale, C.W., Chartrand, P., Degterov, S.A.: FactSage thermochemical software
and databases. Calphad 26(2), 189–228 (2002). https://doi.org/10.1016/S0364-
5916(02)00035-4
15. An Ontology of Materials, 2 June 2021. http://www.dfki.uni-kl.de/∼imcod/
htdocs/Bernd/Paper/paper/paper.html. Accessed 02 June 2021
16. Ashino, T.: Materials ontology: an infrastructure for exchanging materials infor-
mation and knowledge. Data Sci. J. - DATASCIENCE 9 (2010). https://doi.org/
10.2481/dsj.008-041
Ontology LCM 947
17. Zhang, X., Pan, D., Zhao, C., Li, K.: MMOY: towards deriving a metallic materials
ontology from Yago. Adv. Eng. Inform. 30(4), 687–702 (2016). https://doi.org/10.
1016/j.aei.2016.09.002
18. European Materials and Modelling Ontology (EMMO): Emmo-Repo/EMMO.
https://github.com/emmo-repo/EMMO. Accessed 06 Feb 2021
19. Musen, M.A.: The Protégé Project: a look back and a look forward. AI Mat-
ters 1(4), 4–12 (2015). https://doi.org/10.1145/2757001.2757003, ISSN 2372–3483,
PMID 27239556
20. Mayerhofer, M., Lepuschitz, W., Hoebert, T., Merdan, M., Schwentenwein, M.,
Strasser, T.I.: Knowledge-driven manufacturability analysis for additive manufac-
turing. IEEE Open J. Ind. Electron. Soc. (2021). https://doi.org/10.1109/OJIES.
2021.3061610
21. Hornbogen, E., Eggeler, G., Werner, E.: Werkstoffe: Aufbau und Eigenschaften von
Keramik-, Metall-, Polymer- und Verbundwerkstoffen. 11, aktualisierte Auflage.
Lehrbuch, vol. 596. Springer, Berlin (2017). ISBN 978-3-642-53867-4 978-3-642-
53866-7
22. Briehl, H.: Chemie der Werkstoffe. Springer, Heidelberg (2021). https://doi.org/
10.1007/978-3-662-63297-0
23. Weißbach, W., Dahms, M., Jaroschek, C.: Werkstoffe und ihre Anwendungen.
Springer, Wiesbaden (2018). https://doi.org/10.1007/978-3-658-19892-3, ISBN
978-3-658-19891-6 978-3-658-19892-3. Accessed 05 May 2021
24. Bargel, H.-J., Schulze, G. (eds.): Werkstoffkunde 12, bearbeitete Auflage, kor-
rigierter Nachdruck, 531 pp. Springer, Berlin (2018). ISBN 978-3-662-48629-0978-
3-662-48628-3
25. GraphDB by Ontotext. https://graphdb.ontotext.com/. Accessed 20 Apr 2021
26. Gkoutos, G.V., Schofield, P.N., Hoehndorf, R.: The units ontology: a tool for inte-
grating units of measurement in science. Datab. J. Biol. Datab. Curation (2012).
https://doi.org/10.1093/database/bas033, PMID 23060432
27. Lepuschitz, W., Trautner, T., Mayerhofer, M., Merdan, M.: Applying ontologies
in a cloud manufacturing system. In: IECON 2019–45th Annual Conference of the
IEEE Industrial Electronics Society, vol. 1, pp. 2928–2933 (2019). https://doi.org/
10.1109/IECON.2019.8927535
Natural Convection and Surface Radiation
in an Inclined Square Cavity with Two Heat-
Generating Blocks
Rachid Hidki , Lahcen El Moutaouakil,

Mohammed Boukendil(&) , Zouhair Charqui ,
and Abdelhalim Abdelbaki
LMFE, Department of Physics, Faculty of Sciences Semlalia, Cadi Ayyad

University, B.P. 2390, Marrakesh, Morocco
m.boukendil@uca.ac.ma
Abstract. Heat transfer by natural convection and surface radiation in an

inclined cavity containing two heat-generating blocks is studied. This configu-
ration can be used for cooling electronic components. The differential equations
are discretized and solved by the finite volume method and the SIMPLE algo-
rithm. The Radiative Transfer Equation (RTE) is discretized by the Discrete
Ordinate Method (DOM). The effects of emissivity ðe ¼ 0 or 1Þ, thermal con-
ductivity ð0:1 K 1Þ and tilt angle ð90 c 90 Þ on dynamic and thermal
characteristics are studied. The results show that these parameters have a sig-
nificant effect on the flow structure and isotherms. The effect of c showed that
the 45 angle causes good cooling of both blocks.
Keywords: Natural convection Thermal radiation Inclined cavity Finite

volume method Heat-generating block Numerical simulation
1 Introduction
In recent years, the model of a closed cavity with heating blocks can be used in various
engineering fields, such as heating and cooling of buildings, cooling of electronic
components, heat exchangers, etc. Hence the justification of interest in this type of
configuration [1, 2]. Moreover, surface radiation has an important role in evacuating
excess heat, especially in the case of blocks with internal heat generation [3, 4]. Several
research works have been carried out in different configurations with blocks having
different geometrical shapes and under different thermal conditions [5–7].
In the case of blocks with imposed temperature, Rahmati and Tahery [8] performed
numerical simulations to study the effect of a hot block placed in a closed cavity. The
authors analyzed the impact of the cavity aspect ratio and that of the obstacle on the
flow and heat transfer. In the case of two hot blocks, Lam and Prakash [9] analyzed the
effect of the position of the two hot blocks attached to the horizontal walls of a porous
closed cavity. The same configuration was reconsidered a few years later by Zhang
et al. [10], considering the effect of the magnetic field and the tilt angle of the cavity.
The authors showed that these last parameters have a significant influence on the heat

https://doi.org/10.1007/978-3-030-93247-3_90
Natural Convection and Surface Radiation in an Inclined Square Cavity 949
transfer in the cavity. The effect of the magnetic field was considered in another study
treated by Zhang and Che [11], considering the case of four hot blocks in a tilted cavity.
They showed that the average Nusselt number decreases with the tilt angle until
reaching its minimum at c ¼ 75 and then increases. Recently Zheng et al. [12] con-
sidered the effect of the geometric shape of a pair of hot and cold blocks on the natural
convection in a closed square cavity. The authors concluded that the triangular and
circular shape of the blocks causes an improvement of the heat transfer in the cavity.
The effect of surface radiation in the presence of isothermal blocks is widely considered
in several research works [13–16].
In the case of heat-generating block, Raisi [17] considered the case of a square heat-
generating body placed in the center of a closed cavity filled with a nanofluid. The
author showed the effect of Rayleigh number, nanoparticle fraction, and conductivity
ratio on the flow and heat transfer. Subsequently, he found that the Nusselt number and
the block temperature increased with the conductivity ratio. At the same time, Umadevi
and Nithyadevi [18] consider the same previous configuration with a square cavity
under different boundary conditions. The same configuration of Raisi [17] has been
treated recently by Sivaraj et al. [19], considering the effect of surface radiation. They
showed that the increase in surface emissivity causes a good cooling of the heat-
generating body.
This literature review showed us that natural convection coupled with surface
radiation in a closed cavity with heating blocks had been studied extensively because of
their importance in many practical applications. It can also be seen that most of these
studies deal with the case of blocks at an imposed temperature (isothermal blocks). On
the other hand, fewer studies have considered the case of heat-generating blocks with
thermal radiation. Therefore, this work aims to study the effect of surface radiation on
the natural convection induced by two heat-generating blocks placed inside an inclined
cavity and cooled by one of its vertical walls. For this purpose, the effect of emissivity,
tilt angle, and thermal conductivity ratio on streamlines, isotherms, and maximum
temperature profiles are studied.
2 Mathematical Formulation and Numerical Method

2.1 Mathematical Formulation
The geometry of the considered physical problem is represented in Fig. 1. It highlights
two square electronic components (heat-generating blocks) placed inside a square
cavity ðL ¼ 5 cmÞ filled with air ðPr ¼ 0:71Þ. The electronic components generate two
identical and constant volume heat fluxes ðQ 860 W=m3 Þ and having the same side
ðW ¼ 0:2Þ. The coordinates of the first block (B1) are X 1 ¼ 0:25 and Y 1 ¼ 0:5, while
the second body (B2) is located at X 2 ¼ 0:75 and Y 2 ¼ 0:5. So, it is interesting to note
that B2 is closer to the isothermal wall than B1. The cavity is tilted by an angle c and
cooled by its right vertical wall with a constant temperature T C ¼ 20 C, then it is
thermally insulated by its other walls.
The physical properties of the air are constant with respect to temperature, except
for the density in the buoyancy term when its change with temperature is determined
950 R. Hidki et al.
Fig. 1. Studied configuration.
using the Boussinesq approximation. All walls (interior and exterior) are considered
gray and diffuse with the same emissivity ðeÞ. The air is considered entirely transparent.
By adapting these approximations to the equations of continuity, momentum,
energy, and the radiative transfer equation, the following system of dimensionless
equations is obtained:
@U @V
þ ¼0 ð1Þ
@X @Y
2
@U @U @U @P @ U @2U
þU þV ¼ þ Pr þ þ PrRahsinc ð2Þ
@s @X @Y @X @X 2 @Y 2
2
@V @V @V @P @ V @2V
þU þV ¼ þ Pr þ þ PrRahcosc ð3Þ
@s @X @Y @Y @X 2 @Y 2
2 3
2 Z4p
@h @h @h @ h @2h s 4
þU þV ¼ þ I Rb I R dX5 ð4Þ
@s @X @Y @X 2 @Y 2 Pl
0
2
@h @ h @2h
¼K þ þ1 ð5Þ
@s @X 2 @Y 2
2 3
Z4p
@I R @I R s 4ð1 xÞI Rb þ x I R /dX 5
0
l þg þ s I R ¼ ð6Þ
@X @Y 4p
0
4
h
/ and I Rb ¼ 4 TR þ1 correspond to the phase function and the non-dimensional
blackbody emission, respectively. ðl, gÞ are the direction cosines. s and x refer to the
optical thickness and the scattering albedo, respectively.
The equations below are transformed into a dimensionless form using the following
variables:
ð x; yÞ ðu; vÞL ðT T C Þ iR QL2

ð X; Y Þ ¼ ; ðU; V Þ ¼ ;h ¼ ; I R ¼ 4 ; DT ¼ ð7Þ
L af DT rT C kf
The control parameters that appear in Eqs. (1)–(5) are defined as follows:
gbL3 DT tf ks k f DT TC
Ra ¼ ; Pr ¼ ; K ¼ ; Pl ¼ 4 ; T R ¼ ð8Þ
t f af af kf rT C L DT
The dimensionless boundary conditions used to solve Eqs. (1)–(4) are:

• On all solid walls: U ¼ V ¼ 0
• On the right vertical wall: h ¼ 0
QRinc
• On all adiabatic walls: @h
@n ¼ e Pl þ I Rb
@hf QRinc
• On the surfaces of the blocks: @n ¼ K @h
@n e
s
Pl þ I Rb and hf ¼ hs
For a given surface element, the incident heat flux QRinc is expressed as follow:
Z
! !0
QRinc ¼ !0 I: n : s dX
ð9Þ
!
n : s \0
2.2 Numerical Method and Validation

The governing equations in the present study with the appropriate boundary conditions
have been discretized by the finite volume method. The discrete ordinate method is
used to solve the RTE (Eq. 6). An in-house numerical code based on the SIMPLE
algorithm was used to solve these equations. The convergence criterion imposed on all
variables is 104 and the time step was set to 5 105 . A test on mesh sensitivity was
done to choose the mesh that gives a good compromise between the computation time
and the desired solution. The mesh size of 150 150 was selected.
The present numerical code has been well validated using the numerical results of
Sivaraj et al. [19] in the case of a single square heat-generating block placed in the
center of an inclined cavity. This validation has been performed by comparing the
952 R. Hidki et al.
streamlines and isotherms obtained for W ¼ 1=3, e ¼ 1, Pr ¼ 0:71, and Ra ¼ 106 . The
comparative results are presented in Fig. 2, they show a good compromise between our
results and those of Sivaraj et al. [19].
Sivaraj et al. [19] Present study Sivaraj et al. [19] Present study
Streamlines Isotherms
Fig. 2. Streamlines and isotherms.
The results are presented to study the effect of thermal radiation on the natural con-
vection induced by two square heat-generating bodies placed in an inclined cavity. For
this purpose, the effect of cavity tilt angle ð90 c 90 Þ, thermal conductivity ratio
K ð0:1 K 1Þ, and emissivity e ðe ¼ 0 or 1Þ were studied. The Rayleigh and Prandtl
numbers are fixed at 106 and 0.71, respectively.
3.1 Effect of ðK; cÞ

Figures 3 and 4 show the streamlines and isotherms, respectively, for different com-
binations of ðK; cÞ with e ¼ 0. For a given conductivity and c = 0°, the streamlines
(Fig. 3) illustrate that the flow motion consists of a large clockwise cell encircling the
two heating bodies, and the flow is more intense around B2. In addition, two small
cooling cells appear in the vicinity of B2. When c ¼ 45 , the small cells merge below
B2, but the overall flow structure does not change significantly. For c ¼ 90 , the flow
structure is entirely different from the previous ones. It consists of two counter-rotating
cells limited by the cavity passive vertical walls. Indeed, for c ¼ 90 , the upper wall is
cold, which is the origin of this new flow behavior. In addition, Fig. 3 also shows that
the effect of thermal conductivity ratio on the streamlines is weak.
Fig. 3. Streamlines for different combinations of ðK; cÞ.
For a given tilt angle c and K ¼ 0:1, the isotherms (Fig. 4) are seen to be more
concentrated in the solid region. This is due to the air conductivity, which is ten times
higher than that of B1 and B2. Increasing the conductivity ratio from 0:1 to 1 allows the
maximum temperature in the cavity to be lowered considerably by up to 50%. When K
is fixed, and the inclination c is varied from 0 to 45 , the isotherms do not show a
significant variation compared to the case where c ¼ 90 . Moreover, for c ¼ 90 , the
isotherms are symmetrical with respect to the vertical median of the cavity. This is due
to the new flow structure developed when c ¼ 90 (Fig. 3). We can also note from this
figure that the minimum of the maximum temperature in the cavity is noted for
c ¼ 45 , and it is maximum at c ¼ 90 .
954 R. Hidki et al.
Fig. 4. Isotherms for different combinations of ðK; cÞ.
3.2 Effect of Emissivity e

To study the effect of emissivity on streamlines and isotherms, K and c are assumed to
be fixed at 0:1 and 45 , respectively. The streamlines (a) and isotherms (b) are plotted
in Fig. 5 for e ¼ 0 and e ¼ 1. By increasing the emissivity, a small counterclockwise
cell appears in the upper left corner of the cavity. Indicating that the adiabatic walls can
play a role in removing heat generated by the heating bodies. It can also be noted that
the maximum stream function decreases up to 56% when e goes from 0 to 1. This is
because the thermal radiation slows down the flow velocity in the cavity [19]. For the
isotherms (Fig. 5a), the presence of surface radiation causes a good cooling of both
blocks, and the temperature gradients are reduced in the blocks compared to the case of
no radiation ðe ¼ 0Þ.
Fig. 5. Streamlines (a) and isotherms (b) for K ¼ 0:1, c ¼ 45 , and e ¼ 0,1.
3.3 Maximum Temperature

From a practical perspective, it is very important to control the temperature of elec-
tronic components. For this reason, our objective in this section is to study the effect of
the previous parameters on the maximum temperature to choose the optimal values that
give the lowest temperature in the cavity. Figure 6 shows the maximum temperature
hmax as a function of c, e, and K. From this figure, it can be seen that in the absence of
surface radiation and whatever K, increasing in the inclination c causes a sharp drop in
the maximum temperature in the cavity. Indeed, when growing c from 90 to 0 , hmax
decreases up to 33%. The latter continues to decrease until it reaches its minimum
value at 45 , then increases. The increase of the emissivity e to 1, leads to a decrease of
hmax and the effect of c on hmax is no longer felt; these results have already been found
by Sivaraj et al. [19]. Another reduction of hmax is noted when the conductivity K
increases. So, to avoid the overheating of the electronic components, it would be
essential to first set the emissivity and the conductivity at their maximum values and
then select the appropriate inclination of the cavity.
956 R. Hidki et al.
Fig. 6. Maximum temperature for different values of K; c; and e.
4 Conclusion
This study provides a general overview of natural convection and surface radiation in
an air-filled, inclined square cavity. Inside this cavity, there are two square heat-
generating blocks. The effect of tilt angle, conductivity ratio, and emissivity are dis-
cussed for Ra ¼ 106 and Pr ¼ 0:71. The results are presented in terms of the
streamlines, isotherms, and the maximum temperature profile. It is found that the flow
and thermal characteristics are significantly affected by the inclination angle. It has
been shown that increasing the conductivity ratio or emissivity significantly reduces the
temperature of two heat-generating blocks. The results obtained from this study also
indicated that the inclination angle has a more significant influence in the absence of
surface radiation than in its presence. The inclination of c = 45° allows good cooling of
both heat-generating bodies.
References
1. Hidki, R., El Moutaouakil, L., Charqui, Z., Boukendil, B., Zrikem, Z.: Natural convection in
a square cavity containing two heat-generating cylinders with different geometries. Mater.
Today Proc. 45, 7415–7423 (2021)
2. Sheremet, M.A., Oztop, H.F., Pop, I., Abu-Hamdeh, N.: Analysis of entropy generation in
natural convection of nanofluid inside a square cavity having hot solid block: Tiwari and
Das’ model. Entropy 18, 1–15 (2016)
3. Nia, M.F., Ansari, A.B., Nassab, S.A.G.: Transient combined volumetric radiation and free
convection in a chamber with a hollow heat-generating solid body. Int. Commun. Heat Mass
Transf. 119, 104937 (2020)
4. Mezrhab, A., Bouali, H., Abid, C.: Modeling of combined radiative and convective heat
transfer in an enclosure with a heat-generating conducting body. Int. J. Comput. Methods 2,
431–450 (2005)
5. Pandey, S., Park, Y.G., Ha, M.Y.: An exhaustive review of studies on natural convection in
enclosures with and without internal bodies of various shapes. Int. J. Heat Mass Transf. 138,
762–795 (2019)
6. Mikhailenko, S.A., Sheremet, M.A., Mohamad, A.A.: Convective-radiative heat transfer in a
rotating square cavity with a local heat-generating source. Int. J. Mech. Sci. 142–143, 530–
540 (2018)
7. Martyushev, S.G., Sheremet, M.A.: Conjugate natural convection combined with surface
thermal radiation in an air filled cavity with internal heat source. Int. J. Therm. Sci. 76, 51–
67 (2014)
8. Rahmati, A.R., Tahery, A.A.: Numerical study of nanofluid natural convection in a square
cavity with a hot obstacle using lattice Boltzmann method. Alexandria Eng. J. 57, 1271–
1286 (2018)
9. Lam, P.A.K., Prakash, K.A.: A numerical study on natural convection and entropy
generation in a porous enclosure with heat sources. Int. J. Heat Mass Transf. 69, 390–407
(2014)
10. Zhang, T., Che, D., Zhu, Y., Shi, H., Chen, D.: Effects of Magnetic field and inclination on
natural convection in a cavity filled with nanofluids by a double multiple-relaxation-time
thermal lattice Boltzmann method. Heat Transf. Eng. 41, 252–270 (2020)
11. Zhang, T., Che, D.: Double MRT thermal lattice Boltzmann simulation for MHD natural
convection of nanofluids in an inclined cavity with four square heat sources. Int. J. Heat
Mass Transf. 94, 87–100 (2016)
12. Zheng, J., Zhang, L., Yu, H., Wang, Y., Zhao, T.: Study on natural convection heat transfer
in a closed cavity with hot and cold tubes. Sci. Prog. 104, 1–25 (2021)
13. El Moutaouakil, L., Boukendil, M., Zrikem, Z., Abdelbaki, A.: Natural convection and
surface radiation heat transfer in a square cavity with an inner wavy body. Int.
J. Thermophys. 41, 1–21 (2020)
14. Boukendil, M., El Moutaouakil, L., Zrikem, Z., Abdelbaki, A.: Coupled thermal radiation
and natural convection heat transfer in a cavity with a discretely heated inner body. Mater.
Today Proc. 27, 3065–3070 (2020)
15. Boukendil, M., El Moutaouakil, L., Zrikem, Z., Abdelbaki, A.: Natural convection and
surface radiation in an insulated square cavity with inner hot and cold square bodies. Mater.
Today Proc. 45, 7282–7289 (2021)
16. El Moutaouakil, L., Boukendil, M., Zrikem, Z., Abdelbaki, A.: Conjugate natural
convection-surface radiation in a square cavity with an inner elliptic body. In: Ben Ahmed,
M., Boudhir, A.A., Santos, D., El Aroussi, M., Karas, İR. (eds.) SCA 2019. LNITI,
17. Raisi, A.: Heat transfer in an enclosure filled with a nanofluid and containing a heat-
generating conductive body. Appl. Therm. Eng. 110, 469–480 (2017)
18. Umadevi, P., Nithyadevi, N.: Convection in a sinusoidally heated square enclosure utilizing
Ag−water nanofluid with heat generating solid body. Int. J. Mech. Sci. 131–132, 712–721
(2017)
19. Sivaraj, C., Miroshnichenko, I.V., Sheremet, M.A.: Influence of thermal radiation on
thermogravitational convection in a tilted chamber having heat-producing solid body. Int.
Commun. Heat Mass Transf. 115, 104611 (2020)
Improving the Route Selection for Geographic
Routing Using Fuzzy-Logic in VANET
Israa A. Aljabry(&) and Ghaida A. Al-Suhail

ghaida.suhail@uobasrah.edu.iq
Abstract. The ability to design a routing protocol capable of constructing

adaptive and efficient channels for delivering data packets is an important factor
in the successful evolution of VANET networks. Because they contain a GPS
unit, most VANETs use position-based routing protocols. Greedy Perimeter
Stateless Routing (GPSR), which has been widely implemented, is one solution
to VANET’s difficulties. The FL-NS GPSR routing protocol is proposed in this
study as an effective intelligent fuzzy logic control system. To detect the
appropriate next-hop node for packet forwarding, the proposed routing protocol
integrates two metrics: neighbor node and vehicle speed. It also alters the format
of the hello message by adding the direction field. The OMNeT++ and SUMO
simulation tools are used in parallel to examine the VANET environment. The
obtained results are made in an urban environment indicate substantial
improvements in the network performance compared to the traditional GPSR
concerning the QoS parameters.
Keywords: FL-NS GPSR Fuzzy logic OMNeT++ SUMO QoS

VANET
1 Introduction
VANET is a wireless network of automobiles that functions independently, while it may

use network infrastructure access points such as roadside devices. These devices are often
put in fixed sites to permit long-distance communications between vehicles and trans-
portation infrastructure or among vehicles, and they can also serve as an internet gateway
[1–3]. In particular, VANET is classified as a category of Mobile Ad-Hoc Network
(MANET). It offers a potential notion for informing consumers about real-time infor-
mation such as accident scenarios, weather updates, road conditions, and so on [4, 5].
Notably,position-based routing protocols route messages depending on the geo-
graphic location of the nodes. The position is normally known from the vehicle’s
Global Positioning System (GPS) information. In this protocol, the source node must
be alert of its own position as well as the site of the target node. The performance of a
position-based routing protocol is affected by driving conditions. For VANETs,
numerous position-based routing protocols have been suggested, such as Greedy
Perimeter Stateless Routing (GPSR) [6, 7]. The GPSR routing protocol routes the
packet to the nearest node to the destination. When it comes to real-time traffic,
throughput, and higher mobility models, the position-based Greedy Perimeter Stateless

https://doi.org/10.1007/978-3-030-93247-3_91
Improving the Route Selection for Geographic Routing Using Fuzzy-Logic 959
Routing protocol is said to be more suited for VANETs. Many routing protocols
created recently take the core principle of GPSR and alter it as needed [8–10].
The local maximum issue, also known as the problem of picking an unusual node
for sending data, is one of the fundamental tests with classical geographic routing
methods. The local maximum arises in the basic greedy algorithm when the distance
between the source and target nodes is less than the distance between surrounding
nodes and the destination or when there is no node near the destination other than the
source node. The state of the routing algorithm is altered when the local maximum
problem occurs. As a result, packet lifetime decreases, increasing the likelihood of
packet loss [11, 12].
Many parameters can influence the GPSR performance such as obstacles, the link
quality, the network size, and so on. Therefore, in this paper, the main contribution
focuses here on two parameters, namely the vehicle speed and the neighbor node.
Specifically, the vehicle speed has a very important role in the data routing among
vehicles since moving too fast makes the probability of a link breakage is very high so
it is very crucial to select the vehicle with a low-speed probability and ensure the arrival
of the packet to the destination. And, we need also to determine the neighbor node
probability being close to the source, in consequence, this factor would allow the node
being in closeness for a good delivery of the data. These two factors are applied using a
fuzzy logic approach to achieve the best next-hop selection to enhance performance. It
is accomplished in Infrastructure-to-Vehicle (I2V) mode because it minimizes the
packet loss and delay as well. The fuzzy logic controller is integrated into all vehicles
and also the RUS to pick the best next-hop based on the two metrics. The simulation is
applied in an urban area.
The arrangement of the paper is as follows: Sect. 2 demonstrates the related work;
Sect. 3 goes in-depth to show the process of enhancing the GPSR using fuzzy logic. In
Sect. 4, the proposed FL-NS GPSR algorithm is introduced. Section 5 reveals the
simulation tools and the obtained results. Finally, Sect. 6 summarizes the conclusions.
2 Related Work
Several strategies have recently been offered to mitigate GPSR problems in the asso-
ciated literature; some researches are devoted to the mathematical model while the
others are based on intelligent techniques. In this section, some of the famous strategies
are been demonstrated as follows. The authors in [13] offered an innovative greedy
forwarding approach utilized to develop a novel routing protocol grounded on vehicle
position, to prevent link breakages, and provide a fixed route that enhances PDR and
throughput. The recommended Density and Velocity (Speed, Direction) Aware Greedy
Perimeter Stateless Routing protocol (DVA-GPSR) is grounded on the proposed
greedy forwarding technique, which uses vehicle density, speed, and direction to
identify the most feasible relaying node candidate. On the other hand, in [14] the data
congestion problem has been considered via a data transfer routing method for real-
time data transmission. Such a problem is caused by heavy traffic on the main roads
and consequently leads to an increase in the data stream and package loss. In addition,
the connection partition problem is also produced by insufficient traffic movement, and
960 I. A. Aljabry and G. A. Al-Suhail
this issue will lead to increasing the transmission delay. The suggested protocol has two
phases: next-hop selection in the chosen path between the current and future inter-
sections and next-intersection selection. Zhou et al. in [15] suggested a unique data
delivery strategy for urban vehicle networks that can increase the performance of the
route without depending on GPS. A fuzzy-rule-based wireless transmission method is
proposed to improve relay choice while taking into account multiple factors such as
hop count, driving direction, connection time, and vehicle speed. Both wired trans-
missions among RSUs and wireless V2V transmission are used. Every RSU is outfitted
with a machine learning system. In [16], the authors introduce a new routing protocol
based on fuzzy logic systems that might aid in the coordination and analysis of con-
tradictory metrics. To pick the best next-hop for packet forwarding, the proposed
routing protocol integrates numerous variables such as achievable throughput, direc-
tion, vehicle position, and link quality.
Finally, in [17] the weight-aware greedy perimeter stateless routing protocol (WA-
GPSR) is given. Based on many routing factors, the upgraded GPSR protocol com-
putes the reliable communication area and determines the next forwarding vehicle.
3 The Enhancement of GPSR Using a Fuzzy Logic Controller
A famous model named fuzzy logic scheme with a strong academic foundation that
would brightly incorporate approximate, vague, and ambiguous knowledge [18–20].
This section, suggests a fuzzy logic-based enhancement to geographic routing. Our
suggested scheme aims to integrate fuzzy logic decision-making in the selection of
next-hop nodes by taking into account several metrics linked to vehicle speed and
neighbor node.
The architecture of VANET may differ among areas, as may the protocols and
interfaces. The presentation and session layers are omitted in VANET, and a specific
layer can be additionally divided or separated into sub-layers in the VANET design, as
shown in Fig. 1.
The parameters of vehicle speed and neighbor node are preprocessed and fed into
the network layer where the fuzzy system is installed, after that the output is passed to
the other layers for further processing. The position and direction are fed from the
MAC and physical layers.
The Fuzzy Logic Decision System (FL-DS) is in charge of determining the fuzzy
score of every nominee forwarding based on the vehicle speed and neighbor node.
These two factors work together to pick the best next-hop that is close to the destination
and has a high link quality.
We use the minimum deduction approach for the Mamdani system. Due to its
simplicity, we employ the triangle membership for the following input/output
(Table 1).
Fig. 1. FL-NS GPSR system architecture.
Table 1. Input/output fuzzy rules.

Input Output
Link quality Neighbor node Fuzzy score
Low Low Low
Low Medium High
Low High Very-high
Medium Low High
Medium Medium Medium
Medium High High
High Low Very-low
High Medium Low
High High Medium
In the proposed protocol, we consider the GPSR beacon frame, which includes the
following extra fields: a) The vehicle direction (b) The vehicle speed, and (c) The
neighbor node. Figure 2 depicts our suggested scheme’s redesigned beacon structure.
Fig. 2. The modified beacon structure.

The nodes use the hello packet data to generate a new item in the neighbor table or
to update that table. By default, each neighbor has one entry in the GPSR neighbor
table. Each item provides the neighbor’s (ID) IP address, the time-stamp of the last
hello packet received, and the X, Y coordinate. The neighbor table in our method now
includes two new fields: vehicle speed and neighbor node. Each vehicle has a
Neighbors’ Table (NT) that stores information received from the hello beacon as
illustrated in Table 2.
Now, to generate a clear numerical value, the center of gravity (COG) method is
chosen because it is the most used defuzzification methodology in many real-world
applications. Figure 3 illustrates the fuzzy score performance membership function and
illustrates the relationship between the input and the output variables.
Table 2. Neighbor table format.

Neighbor’s ID
Position (X, Y)
Direction
Vehicle speed
Neighbor node
Last packet sequence number
Last HELLO message timestamp
(a) MFs of Input Variable for Vehicle Speed. (b) MFs of Input Variable for Neighbor node.
(c) Output of MF(Membership Function). (d) 3D Graph of FIS.
Fig. 3. Fuzzy inference system of proposed FLC model FL-NS GPSR.

4 The Proposed FL-NS GPSR Algorithm
The proposed algorithm is mentioned below including the effect of the two parameters
of vehicle speed and neighbor node. It is developed to enhance the routing performance
when these parameters are taken into consideration of route selection. Note that this
improvement will increase the processer time but also has a positive impact on
improving the performance by controlling these parameters which in the traditional
protocol will have an undesirable effect on the performance. Hence, the pseudo-steps of
the proposed algorithm is termed as follows:
The Proposed FL-NS Algorithm

Theory: All Nodes has a GPS
Input: Nodes, Communication Range, Network Map
Output: Best Neighbor Node as next-hop
Stage 1: Characteristics Calculation
1- For each node Ni do
2- Calculate the position of Ni: (Xi, Yi)
3- Calculate the Speed of Ni (Si)
4- Calculate the Distance of each Ni (Di)
5- Calculate direction of Ni: (Diri)
6- End for
Stage 2: Add Fuzzy Logic Controller
7- If Node i is Destination
8- Forward the packet to the Destination
9- Criteria 1: Find Closest Distance to Destination
10- Calculate and Compare the Distance between Destination & all Neighbor Ni (Using Euclidean For-
mula);
11- Criteria 2: Use FLC to Find and Tune the Next-Hop using two parameters: Vehicle Speed &
Neighbor Node.
12- Fuzzy_output = Calculate_Fuzzy_Score;
13- Establish the New Neighbors' Table Adding Fuzzy_Score_Values of All N i and Distances of
Destination & all Neighbor Ni to Neighbor Table.
14- Search: if Ranki with Highest Fuzzy Score && Closest Distance to Destination
15- Set Node i as the Next-hop
16- End if
5 Performance Evaluation
We analyze the effectiveness and efficiency of FL-NS GPSR and compare it to the
traditional GPSR. The tools that are used to perform this simulation is the network
simulator OMNeT++ with the help of two frameworks the INET, and Veins. In
addition, to make the simulation more realistic, the traffic simulator SUMO is used in
conjunction with OMNeT++. Figure 4 depicts the road network for an urban envi-
ronment constructed using a 3 6 Manhattan grid.
Fig. 4. 3 6 Manhattan grid.
The parameters implemented in this scenario are revealed in Table 3.
Table 3. Simulation parameter.

Parameter Value or protocol
OMNeT++ version OMNeT++ V 5.5.1
SUMO version SUMO 1.6.0
INET version INET 4.2.1
Veins version Veins 5.0
Simulation area 2500 2500 m
MAC protocol IEEE802.11p
Layer 3 addressing IPv4
Routing protocol GPSR & FL-NS GPSR
Communication mode I2 V
Number of vehicles 10, 20, 30, 40, 50
Vehicle speed 40 km/h
Beacon interval 1s
Simulation time 600 s
Transmission range 250 m
Figure 5 explains the scenario with a different number of vehicles in terms of the
maximum allowable network size. Notice that Fig. 5(a) displays the packet delivery
ratios of the two routing protocols; the traditional GPSR and the proposed FL-NS
GPSR in various density cases. It can be seen that the packet delivery ratio of FL-NS
becomes higher in all density variations as compared to standard GPSR. In fact, it is
well-known that the degradation in the GPSR performance is justified by the use of
simply distance as a single measure for the routing process. Using only distance metrics
is incapable of avoiding unstable links that may break owing to congestion or outdated
neighbors. Therefore, our FL-NS GPSR uses a sophisticated routing decision process
that allows the best and most stable next hop to be selected.
Figure 5(b) demonstrates the packet drop ratio which shows that the FL-NS has
lower values compared to the GPSR protocol since it considers many parameters these
parameters can help in improving the performance and reducing the number of lost
data. The network throughput is an important indicator for demonstrating the network’s
scalability as in Fig. 5(c). A network’s capacity would rise linearly alongside the
number of nodes to ensure scalability as shown the FL-NS outperforms the traditional
routing protocol.
Figure 5(d) depicts the end-to-end delay result. It might be said that as the number
of nodes in the network grows, so does the end-to-end delay. Data packets are sent to
the destination in less time in the proposed protocol because nodes with a higher
possibility of being greedy are chosen. As a result, the end-to-end delay is decreased.
The suggested FL-NS uses fuzzy theory to choose nodes with reduced mobility and
traveling in the way of the target node as the next forwarding nodes. The proposed
Fuzzy GPSR can provide good and reasonable results compared to existing works,
such as a mathematical model in [13] and fuzzy logic-based model as in [16].
GPSR FL-NS GPSR GPSR FL-NS GPSR
100 100
Packet Delivery Ratio(%)
Packet Drop Ratio(%)
80 80
60 60
40 40
20 20
0 0
10 20 30 40 50 10 20 30 40 50
Network Size Network Size
(a) Packet Delivery Ratio. (b) Packet Drop Ratio.
GPSR FL-NS GPSR GPSR FL-NS GPSR
1000 4
End-to-end Delay (s)
800 3
Throughput
600
400
2
200 1
0 0
10 20 30 40 50 10 20 30 40 50
Network Size Network Size
(c) Throughput. (b) End-to-End Delay.
Fig. 5. GPSR and FL-NS GPSR vs the network size.

6 Conclusion
One of the most challenges in vehicular ad hoc networks is building routing protocols
for active topologies. In position-based routing protocols, the use of GPS in vehicles
provides the facility to detect their own geographic location as well as the geographic
location of their neighbors. In this study, the GPSR routing protocol was developed to
produce a new FL-NS GPSR based on the fuzzy logic controller (FLC). To determine
the fittest next-hop node, the proposed FL-NS GPSR routing protocol relies on vehicle
speed and neighbor node in order to reduce the delay and enhance the packet delivery
ratio. In Addition, to improve the network performance, a new field called “Direction
field” was also introduced within the beacon message. The simulation results reveal
that our proposed algorithm FL-NS GPSR outperforms the standard GPSR protocol
and other existing works due to the significant enhancements in the E2E delay and
network throughput. As a result, one can conclude that such an approach makes this
protocol resistant to the changes in the environment; in particular when the number of
vehicles increases. For future work, the proposed protocol can be extended using other
intelligent techniques of optimization algorithms like PSO or ABC.
References
1. Tripp-Barba, C., Zaldívar-Colado, A., Urquiza-Aguiar, L., Aguilar-Calderón, J.A.: Survey
on routing protocols for vehicular ad Hoc networks based on multimetrics. Electronics 8(10),
1–32 (2019)
2. Aljabry, I.A., Al-Suhail, G.A.: A survey on network simulators for vehicular Ad-hoc
networks (VANETS). Int. J. Comput. Appl. 174(11), 1–9 (2021)
3. Ayoob, A.A., Su, G., Mohammed, M.N.: Vehicular Ad Hoc network: an intensive review.
In: International Conference on Intelligent Computing and Optimization, pp. 158–167
(2018)
4. Yu, S., Choi, L., Cho, G.: A new recovery method for greedy routing protocols in high mobile
vehicular communications. In: Proceedings of the 2008 IEEE International Conference on
Vehicular Electronics and Safety, ICVES 2008, no. October, pp. 45–50, (2008)
5. Ghafoor, H., Koo, I., Gohar, N.U.D.: Neighboring and connectivity-aware routing in
VANETs. Sci. World J. 2014 (2014)
6. Kazi, A.K., Khan, M.: DyTE: an effective routing protocol for VANET in urban scenarios.
Eng. Technol. Appl. Sci. Res. 11(2), 6979–6985 (2021)
7. Aljabry, I.A., Al-Suhail, G.A.: A simulation of AODV and GPSR routing protocols in
VANET based on multimetrices. Iraqi J. Electric. Electr. Eng. 17(2), 66–72 (2021)
8. Rahimi, S., Jabraeil Jamali, M.A.: A hybrid geographic-DTN routing protocol based on
fuzzy logic in vehicular ad hoc networks. Peer-to-Peer Netw. Appl. 12(1), 88–101 (2018).
https://doi.org/10.1007/s12083-018-0642-4
9. Houssaini, Z.S., Zaimi, I., Drissi, M., Oumsis, M., Ouatik, A.: Trade-off between accuracy,
cost, and QoS using a beacon-on-demand strategy and Kalman filtering over a VANET.
Digital Commun. Netw. 4(1), 13–26 (2018)
10. Bala, R., Krishna, C.R.: Scenario based performance analysis of AODV and GPSR routing
protocols in a VANET. In: Proceedings - 2015 IEEE International Conference on
Computational Intelligence and Communication Technology, CICT 2015, pp. 432–437,
(2015)
11. Houmer, M., Hasnaoui, M.L.: An enhancement of greedy perimeter stateless routing
protocol in VANET. Proc. Comput. Sci. 160, 101–108 (2019)
12. Kumar, S., Verma, A.K.: Position based routing protocols in VANET: a survey. Wireless
Pers. Commun. 83(4), 2747–2772 (2015). https://doi.org/10.1007/s11277-015-2567-z
13. Bengag, A., Bengag, A., Elboukhari, M.: A novel greedy forwarding mechanism based on
density, speed and direction parameters for vanets. Int. J. Interactive Mob. Technol. 14(8),
196–204 (2020)
14. Wu, D., Li, H., Li, X., Zhang, J.: A geographic routing protocol based on trunk line in
VANETs. In: Ning, H. (ed.) CyberDI/CyberLife -2019. CCIS, vol. 1138, pp. 21–37.
15. Zhou, Y., Li, H., Shi, C., Lu, N., Cheng, N.: A fuzzy-rule based data delivery scheme in
VANETs with intelligent speed prediction and relay selection. Wireless Commun. Mob.
Comput. 2018, 1–15 (2018)
16. Alzamzami, O., Mahgoub, I.: Fuzzy logic-based geographic routing for urban vehicular
networks using link quality and achievable throughput estimations. IEEE Trans. Intell.
Transp. Syst. 20(6), 2289–2300 (2019)
17. Smiri, S., Abbou, A.B., Boushaba, A., Zahi, A., Abbou, R.B.: WA-GPSR : weight-aware
GPSR-based routing protocol for VANET. Int. J. Interactive Mob. Technol. 15(17), 69–83
(2021)
18. Zaimi, I., Boushaba, A., Squalli Houssaini, Z., Oumsis, M.: A fuzzy geographical routing
approach to support real-time multimedia transmission for vehicular ad hoc networks.
Wireless Netw. 25(3), 1289–1311 (2018). https://doi.org/10.1007/s11276-018-1729-9
19. Mishra, P., Gandhi, C., Singh, B.: Link quality and energy aware geographical routing in
manets using fuzzy logics. J. Telecommun. Inf. Technol. (2016)
20. Limouchi, E., Mahgoub, I., Alwakeel, A.: Fuzzy logic-based broadcast in vehicular ad hoc
networks. In: 2016 IEEE 84th Vehicular Technology Conference (2016)
Trends and Techniques of Biomedical
Text Mining: A Review
Maliha Rashida1 , Fariha Iffath1 , Rezaul Karim2 ,

and Mohammad Shamsul Arefin1(B)
1
of Engineering and Technology Chittagong, Chittagong 4349, Bangladesh
sarefin@cuet.ac.bd
2
Department of Computer Science and Engineering, University of Chittagong,
Chittagong 4331, Bangladesh
rezaul.cse@cu.ac.bd
Abstract. Data mining is the technique of turning raw data into useful
information. This technique is used in many research fields for discov-
ering the patterns from large dataset. This article deals with discussing
the application of data mining process in the field of Biomedical Text
Mining. Biomedical text mining (BTM) aims at processing the enor-
mous volume of biological literature to extract useful information. This
paper presents a review on the challenges and contributions of research
works on biomedical text mining held from 2003 to 2020. Furthermore,
we discussed their methodology, the datasets they utilized to evaluate
work, and also their findings. Finally, we summarized the impact of their
works followed by a discussion on limitations and difficulties.
Keywords: Biomedical text mining · BioTML · Gene information

discovery · Drug-drug interaction
1 Introduction
Data mining refers to the retrieval or “mining” of information from vast data
volumes [2]. The method of discovering appropriate information from vast vol-
umes of data contained either in databases, data warehouses, repositories or
other collections of information is data mining [20].
It is possible to retrieve interesting information, regularities, or high-level
data from the database by performing data mining and to view or explore from
various angles. The discovered knowledge can be applied to decision making,
process monitoring, information management, and query analysis [12].
Nowadays a large volume of biomedical data is being generated every day.
Chemical-disease relational information, drug-drug interaction, adverse drug
events, bacteria biotope information etc. are some examples of biomedical-text

https://doi.org/10.1007/978-3-030-93247-3_92
Trends and Techniques of Biomedical Text Mining: A Review 969
data. It is reported that these biomedical data are growing at an exponential

rate. Furthermore, this data is primarily unstructured or semi-structured format,
so it is incredibly difficult for researchers to maintain track of the information
they need. BTM (Biomedical Text Mining) has emerged as a possible solution to
this problem. The purpose of biomedical text mining (BTM) is to provide meth-
ods for searching and organising knowledge retrieved from biomedical literature
utilizing Artificial Intelligence techniques such as Natural Language Processing
(NLP), Machine Learning (ML), and Data Mining to process large text collec-
tions. It collects implicit knowledge from unstructured text and delivers it in an
orderly format.
In this paper, a review of biomedical text mining researches held from 2003
to 2020 has been presented. Their findings, challenges faced, limitations and
dataset descriptions have been discussed in a broad way. This study will assist
the future researchers in finding research trends and gaps in biomedical text
mining context.
The rest of the paper is organized as follows: Sect. 2 contains literature review,
discussion is in Sect. 3, and a conclusion is held in Sect. 4.
2 Literature Review
Several kinds of research work on biomedical text mining has been performed
so far; such as on Drug-drug interaction (DDI), Adverse drug event extraction
(ADE), Bacteria Biotope task (BB), Drug Named Entity Recognition (DNER)
and so on. The research works considered for this study have been listed in
Table 1. Nguyen et al. in [19] stated the development of a high-reliable automatic
DDI extraction method with higher accuracy. The accuracy of the state-of-the-
arts of DDI system is insufficient. This work focuses on improving the accuracy.
They proposed three models to detect and classify DDI from biomedical text and
achieved F1 score of over 90%. Their proposed models are- R-BERT, RBioBERT
and R-BioBERT. In all these models, they combined R-BERT/RBioBERT/R-
BioBERT with relation classification layer. Their overall results reveal the valid-
ity of their models.
In [5], Chukwuocha et al. aimed to design and implement a machine learning
based system named as DNER (Drug Named Entity Recognition). It can recog-
nize drug names in real time from PubMed biomedical abstracts. The system has
several phases, where the first two phases involve in data preparing tasks. The
next field extracts data-features and apply Naı̈ve Bayes on these features. The
last two phases are for implementation and generation of results. They received
a satisfactory result of 95% F1 score.
Despite its efficacy, machine learning classifiers require a large amount of
labeled data as training set to build a system. With the intention to avoid this
issue, authors in [32] built a knowledge-base (KB) based statistical method to
produce probabilities of word-concept from KB. They presented their model as
970 M. Rashida et al.
very useful in doing many text mining related tasks like disambiguation and
document ranking. However, word sense disambiguation has been performed
using Naı̈ve Bayes classifier in this method. This proposed model suffers from
worse case time and space complexity while considering a higher amount of words
and concepts.
Based on the notions of diseases, Shah and Luo [24] proposed a document
clustering framework to make the clustering and information extraction easier.
This work proposes a vector based disease concepts and an analogy measure-
ment between concepts. In this paper, they used an unsupervised learning tech-
nique called Self Organizing Maps (SOM) for this aim. The authors used UMLS
MetaMap to identify disease-concepts in their study. Pubmed Central Open access
document set is used as their dataset.
Rodrigues et al. [22] proposed the development of BioTML, a framework
that includes a number of ML-based techniques for NER (Named Entity Recog-
nition) to bridge the gap between machine learning and state-of-the-art of NER.
Corpora, NLP, features and machine learning are the 04 modules of this frame-
work. The first module is Corpora, which includes corpus, related data structures
and annotations. The NLP module performs tokenization and creates features,
whereas the Feature module contains dictionaries, rules and NLP components.
The machine Learning module contains algorithms for training models and put
them to work on corpora. Their model got a high precision value, however, the
recall value is low due to overfitting.
Another framework based method was found in an earlier work [27]. This
paper presents BioLexicon, a wide ranging lexical and theoretical resource in
biomedical context. It can support the retrieval of biomedical information using
text mining tools. Noun, adjective, verb, adverb - the four parts of speeches
covered in this work. They included both grammatical and semantics behavior
of verbs found in text, which makes their system more realistic than earlier
works. They covered a wide range of biomedical-related vocabularies that were
missing in many well-known lexical resources.
[29] designed and implemented a unique hierarchical three-level annotation
approach to tag key terms, drug interaction sentences and drug interaction pairs.
They used the DrugBank as corpus to form the drug named dictionary. [17]
proposed a concept-based medical-document clustering approach. It extracts
concepts and evaluates concept-weight based on identity and synonymy rela-
tions using the Medical Subject Headings MeSH ontology. These documents
are grouped using K-means clustering algorithm. In this proposed system, an
abstract and a keyword- holding the concept to be find-out from the document is
given as input. Then this system can successfully retrieve out the document even
from a huge database. According to the authors, this system outperforms tradi-
tional term based clustering. This work utilized Medline data of three categories-
cancer, virus and eye infection.
Table 1. Overall summary of literature review
Year Biomedical text mining

Before 2005 CCG [26], (Rohini et al., 2003),
Genescene [14], (Leroy et al., 2003),
GIS [4], (Jung et al., 2003),
DRPoS [28], (Yoshimasa et al., 2005)
2006–2008 ODBTM [47], (Rene et al., 2007),
IIDBTC [25], (Shatkay et al., 2006)
2009–2011 BTMA [23], (Raul et al., 2009),
NERBTSVM [9], (Zhenfei et al., 2011),
BioLexicon [27], (Thompson et al., 2011)
2012–2014 BDCOCW [17], (Logeswari et al., 2013),
BTMACR [33], (Feizhu et al., 2013),
RTMDSI [16], (Mei et al., 2014),
ATMBKE [18], (Amy et al., 2014),
MEBDD [21], (Deepak et al., 2014),
TMDDI [29], (Heng et al., 2014),
BTMSA [6], (Andreas et al., 2014)
2015–2017 MSRHBD [8], (Ming Ji, 2015),
CCBTM [7], (Chung et al., 2015),
KBWBTM [32], (Yepes et al., 2015),
DMLBTM [22], (Ruben Rodrigues, 2016),
SparkText [31], (Zhan et al., 2016),
EDBDC [24], (Setu et al., 2017),
ANGMBT [15], (Fei et al., 2017),
IGPBT [10], (Khorded et al., 2017)
2018–2020 DIBTMML [5], (Chukwuka et al., 2018),
ParaBTM [30], (Xing et al., 2018),
FamFlex [1], (John et al., 2018)
JEEBT [3], (Jizhi et al., 2019),
ANNERBTM [11], (Donghyeon et al., 2019),
DDI [19], (Dinh et al., 2020),
BioBERT [13], (Lee et al., 2020)
Ju et al. [9] suggested a high efficient support vector machine (SVM) based
NER (Named Entity Recognition) system in their paper. They used GENIA
corpus as their dataset. Initially the training data was translated to the format
that libsvm accepts. After training phase, they validated their system that it
could perform classification of every input word. They had a precision rate of
84.24% and recall rate of 80.76%.
Concept chain graphs, a new hybrid framework for information retrieval (IR)
systems and text mining (CCG), were introduced by Srihari et al. [26]. This is a
combination of a typical word-based information retrieval model and a data
extraction system. For serving the extracted information in text mining, an
extensible framework is required. In this context they employed CCG, which
is a strong tool for statistical analysis, research hypothesis, and learning. Their
proposal is a first step in the field of hybrid CCG.
In the field of entity recognition and relation extraction, joint extraction
approaches are unable to adequately address the issue of multi-head dilemma
(i.e. 1-N relation, where an entity is connected to several entities). Chen and
Gu [3] offered an unique tagging strategy and proposed a combined scheme for
producing unique tagging sequences based on deep neural networks. For better
management of the multi-head issue, they presented a two-step sequence tagging
pattern. In this study, two public datasets were used: the Bacteria Biotope task
of the BioNLP Shared Task 2016 and Adverse Drug Events (ADE). Their results
were 4.1% better than the state of the art techniques.
[10] shows the first attempt to develop a dedicated method for find-
ing genotype-phenotype correlation in biomedical literature. They employed a
machine learning based semi-automatic method in their system. However, their
training set is too small compared to other works. In addition, their system is
currently incapable of recognizing complex relationships, such as when a pronoun
refers to both a phenotype and a genotype in the same sentence.
Ji et al. [8] explored the problem of recognizing remarkable relations between
heterogeneous typed biological entities from large biomedical text data. EntityRel,
a new relevance metric for computing strong relevance between two heterogeneous
entities, was presented in their study. They used an unstructured biomedical text
data corpus for this. They started by creating an entity correlation graph out of
text data, which shows the entity relationship in graphical form. Then they applied
their framework EntityRel for final processing. Though they developed this for
biomedical data purpose, this framework can be equally used in other domains.
Chiang et al. [4] introduced a unique gene information finding biomedical text
mining system. Biological activities, connected disorders, related genes, and gene-
gene relationships were the four forms of gene-related information they looked at.
Their system has two modules, where the first module extracts documents from
PubMed abstracts. Then a sentence selection agent identifies the important part
from the collected abstracts. Their evaluation rate reveals their success in gene
information discovery. A lexicon analysis agent then finds and counts the biological
domain specific lexicons and keywords for biological function. In the second mod-
ule, the extracted information and the relations are categorized into three types,
and the final outcome of relationship is depicted as results.
Another unique approach has been presented in [25]. This paper explored a
novel concept of biomedical text categorization using images. They introduced
image-based feature concept which are directly extracted from figures of biomed-
ical content. Document triage is performed here, where each document is labeled
as either ‘relevant’ or ‘irrelevant’. Mean, variance, skewness, entropy gained from
histogram, contrast, correlation from co-occurrence matrix and many other fea-
tures were applied in this work for classification purpose.
The massive amount of literature creates a big data issue for text mining pro-
cessing efficiency. Xing et al. [30] addressed this challenge by using a supercom-
puter to handle parallel processing. Using Tianhe-2 supercomputer in context of
biomedical literature, they developed a runnable framework named ParaBTM.
It can enable parallel text mining on the Tianhe-2 supercomputer.
All these earlier works discussed above are summarized in the tables below
(Table 2, 3, 4 and 5):
Table 2. Summery on biomedical text mining related papers.
Paper title Major contributions Dataset Evaluation

Drug-Drug The first study to look at Dataset: DDI extraction Performance
Interaction the possibility of Relation 2013 corpus Properties: evaluation is held
Extraction from BERT to increase DDI Contains 792 texts on test set after
Biomedical Texts extraction accuracy, extracted from doing some
via Relation suggested three models to DrugBank database and preprocessing
BERT [19] discover and label DDI 233 Medline abstracts tasks. Metric of
from biomedical text macro average F1
score is used for
evaluation. Got
over 90% of F1
score
Jointly Extract Multi-head problem is a Dataset: Two public The overall F1
Entities and Their challenging issue for entity datasets- Bacteria score on BB task
Relations From recognition and relation Biotope task of the is 52.81%, whereas
Biomedical Text extraction. Joint BioNLP Shared Task the overall F1
[3] extraction approaches 2016 (BB), Adverse Drug score on ADE is
cannot handle this Events (ADE). 84.59%. This
properly. This research Properties: BB has two result is higher
offered a collaborative way subtasks: recognizing than the
for building unique bacteria, habitats, and state-of-the-art
tagging sequences using geographical entity approach by 4.1%
deep neural networks, as references, and
well as a novel tagging extracting Lives-In
technique Relations between
bacterium entities and
their places. The ADE
dataset is made up of
two sorts of entities
(drugs and disorders)
Design of an The goal is to use machine Dataset: DDI corpus Recall 0.91, F1
Interactive learning methods to create 2013, MongoDB score 0.95
Biomedical Text and implement a DNER Properties: Drag Bank
Mining Framework (Drug Named Entity Database and Medline
to Recognize Recognition) system. It both are used to create
Real-Time Drug can recognize drug names this corpus. This corpus
Entities Using from PubMed biomedical contains 1025 documents
Machine Learning abstracts in real time
Algorithms [5]
Identifying It presents a machine Dataset: Own created Proposed model
Genotype- learning based semi dataset based on was evaluated by a
Phenotype automatic method to abstracts from PubMed graduate student.
Relationships in identify and corpus of two Precision 77.70,
Biomedical Text genotype-phenotype research papers Recall 77.84, F
[10] relationships. Their measure 77.77
approach is the first
attempt to develop a
dedicated algorithm for
detecting
genotype-phenotype
correlations in biomedical
literature
Table 3. Summery on biomedical text mining related papers (Contd.)

Exploring Diseases This research presented a Dataset: PubMed Central An internal
based Biomedical document clustering Open Access document set evaluation
Document framework based on Properties: PMC-OA has technique-
Clustering and disease-concept to facilitate nearly 1 million articles; Davies-Bouldin
Visualization using clustering visualization and for this study, 600 papers Index (DB index) is
Self-Organizing information retrieval. A starts with the letters ‘A’ used here to
Maps [24] vector representation of or ‘B’ were selected at identify the best
disease concepts and random partitions
similarity measurement
between concepts are
proposed in this paper.
Unsupervised learning
algorithm Self Organizing
Maps (SOM) is applied here
Development of a The construction of a Dataset: Several manually Validate using
Machine Learning framework, BioTML, is annotated corpora several corpora.
Framework for proposed to bridge the gap Results presents
Biomedical Text between machine learning satisfactory value of
Mining [22] and the state-of-the-art in precision but lower
NER (Named Entity recall. probable
Recognition). BioTML cause of this can be
contains a number of overfitting and
ML-based NER algorithms absence of post
processing step
Mining Strong In this paper, the problem Dataset: MEDLINE Standard precision,
Relevance between of determining substantial database recall, mean
Heterogeneous relevance between different average precision
Entities from types of biomedical entities (MAP)
Unstructured from large amounts of
Biomedical Data [8] biomedical text data was
discussed. They proposed
EntityRel, a new relevance
metric for calculating the
strong relevance between
two heterogeneous things
Text Mining for To tag key phrases, drug DrugBank Didn’t mention
Drug-Drug interaction sentences and
Interaction [29] drug interaction pairings,
this research designed and
applied a novel hierarchical
three-level annotation
approach
Biomedical This research proposes a MEDLINE (three Accuracy is
Document concept-based catagories data- cancer, evaluated using
Clustering Using medical-document clustering virus, eye infection) purity measure and
Ontology based technique It used the it achieved high
Concept Weight Medical Subject Headings purity value
[17] MeSH ontology to extract
concepts and calculate
concept weights based on
identity and synonymy
relationships. These
documents are clustered
using the K-means
algorithm

Named Entity Proposed support vector The GENIA corpus- a Precision rate =
Recognition From machine (SVM) based NER database of Medline 84.24 And recall
Biomedical Text system with higher abstracts rate = 80.76%
Using SVM [9] efficiency
Concept Chain In this study, concept chain Several existing Promising. Not
Graphs: A Hybrid graphs (CCG) are a new bioinformatics database mentioned
IR Framework for hybrid framework for
Biomedical Text information retrieval (IR)
Mining [26] systems along with text
mining
A Neural Named Traditional text mining PubMed abstracts Tested on several
Entity Recognition processes disregard corpora and the
and Multi-Type overlapping entities, which best F1 score was
Normalization Tool are typically found in near 90%
for Biomedical Text multi-type named entity
Mining [11] recognition results. BERN,
a neural biomedical named
entity recognition and
multi-type normalization
tool, is proposed in this
paper. The BERN
recognizes known entities
and discovers new ones
using high-performance
BioBERT named entity
recognition models
GIS: a biomedical This study detailed a Used a domain-specific Precision is 0.840
text-mining system biomedical text-mining lexicon and recall is 0.767
for gene technique focusing on four
information types of gene-related
discovery [4] information. This
information are- biological
functions, linked disorders,
related genes, and gene-gene
connections. The target of
this research is to provide
scholars with an easy-to-use
bio-information service that
allows them to browse the
continuously expanding
scientific literature swiftly
BioBERT: a They explored to find out BioASQ factoid dataset, a Precision 77.02
pre-trained how a pre-trained language gold standard dataset Recall 75.90 F1
biomedical model may be applied to score 76.46
language biological corpora.
representation BioBERT (Bidirectional
model for Encoder Representations
biomedical text from Transformers for
mining [13] Biomedical Text Mining) is
a domain-specific language
descriptive model that was
pretrained on vast
biomedical database

ParaBTM: A The huge amount of PMC-OA dataset Didn’t mention
Parallel Processing literature makes a big-data
Framework for problem for text mining
Biomedical Text processing efficiency. This
Mining on research tackled the
Supercomputers problem by utilizing parallel
processing performed on a
supercomputer. They
created ParaBTM, an
executable structure text
mining in biomedical
literature. They
implemented it on the
Tianhe-2 supercomputer.
This system can perform
parallel processing with
efficacy
A neural joint This research developed a PubMed abstracts Precision 95.5
model for entity neural joint model to Recall 62.1 F1
and relation simultaneously extract score75.3
extraction from biomedical elements and
biomedical text their relationships, which
can tackle the error
propagation problem of
earlier works
FamPlex: a FamPlex is a useful tool for Several existing Precision 95.5
resource for entity increasing named entity bioinformatics database Recall 62.1 F1
recognition and recognition (NER), score75.3
relationship grounding, and connection
resolution of human resolution while
protein families and autonomously reading
complexes in biological materials. Prefix
biomedical text and suffix patterns have
mining been selected in FamPlex to
improve named entity
recognition and event
extraction
Developing a This work describes a They built a corpus that Precision 97%,
Robust part-of-speech tagger that has newspaper articles, Accuracy 95.10%
Part-of-Speech has been implemented biomedical documents and
Tagger for specifically for biomedical MEDLINE articles
Biomedical Text text. They created the
tagger using maximum
entropy modeling and a
cutting-edge tagging
algorithm
SparkText: They created a text mining Pubmed abstracts Accuracy by SVM
Biomedical Text framework called is 93.81%
Mining on Big Data Spark-Text on a Big Data
Framework architecture that is made up
of Apache Spark data
streaming and machine
learning algorithms. It also
maintains a Cassandra
NoSQL database

Knowledge based Authors in [32] built a MEDLINE databaset This model has
word-concept model knowledge-base (KB) based been evaluated by
estimation and statistical method to produce estimating the beta
refinement for probabilities of word-concept from value of the
biomedical text KB. Disambiguation, document probability at each
mining ranking can be performed step using
efficiently by the system. Expectation
However, it suffers from worse Maximization (EM)
case time and space complexity method. Obtained
while considering higher amount beta values: b0 =
of words and concepts 0.6654, b1 =
0.0678, b2 =
0.2668. After
refinement with the
target corpora, the
model got a new set
of beta values: b0 =
0.8315, b1 =
0.0711, b2 = 0.0975
The BioLexicon: a They presents BioLexicon, a wide JNLPBA- 2004 data The evaluation of
large-scale ranging lexical and theoretical set this system resulted
terminological resource in biomedical text in an F-score of
resource for mining. It can support the 73.78
biomedical text retrieval of biomedical
mining information using text mining
tools. They consider both
grammatical and semantics
behavior of verbs, which makes
their system sophisticated than
earlier works. They covered a
wide range of biomedical-related
vocabularies that were missing in
many well-known lexical resources
Integrating image A novel concept of biomedical Mouse Genome Average result from
data into text categorization using image Database 59 runs: utility
biomedical text has been discussed. They 0.330 precision
categorization introduced image-based feature 0.138 recall 0.519 f
concept which are directly score 0.195
extracted from figures of
biomedical content. Mean,
variance, skewness, entropy,
contrast, correlation many other
features were applied in this work
for classification purpose
3 Discussion
Mining information from large database has become a key research topic nowa-
days. Researchers in many different fields have shown their interest in data min-
ing. In response to such a demand, this paper presents a review on Biomedical
Text Mining on the aspect of data mining technique. This paper provides an
overview of the rise and time to time progress in the research field of biomedical
text mining. The difficulties and limitations faced by the researchers are also
addressed here. Rapid growth of scientific literature with unstructured or semi
structured format makes handling this a challenging task. However, researchers
keep doing their contributions in handling this hard task and inventing new sys-
tems for mining data. This study is an attempt to get a clear insight of how
this research field is being enriched time to time with numerous ideas and imple-
mentations and what are the shortcomings left to be overcome in future. Here
we summarizes the works done upto 2020; we expect that this exploration will
help researchers get a clear idea about the challenges and scopes in the field of
biomedical text mining.
4 Conclusion
In this study we analyze and summarize the related research works done on
biomedical text mining so far. We have discussed the works done from year of
2003 to 2020, almost two decades. All the important achievements, difficulties,
challenges and scopes of this research field has been highlighted in this study.
Hope this work will be helpful for researchers to get a precise concept of work
in the field of biomedical text mining.
References
1. Bachman, J.A., Gyori, B.M., Sorger, P.K.: FamPlex: a resource for entity recog-
nition and relationship resolution of human protein families and complexes in
biomedical text mining. BMC Bioinf. 19(1), 248 (2018)
2. Bharati, M., Ramageri, B.: Data mining techniques and applications. Indian J.
Comput. Sci. Eng. 1, 12 (2010)
3. Chen, J., Junzhong, G.: Jointly extract entities and their relations from biomedical
text. IEEE Access 7, 162818–162827 (2019)
4. Chiang, J.-H., Hsu-Chun, Yu., Hsu, H.-J.: GIS: a biomedical text-mining system
for gene information discovery. Bioinformatics 20(1), 120–121 (2004)
5. Chukwuocha, C., Mathu, T., Raimond, K.: Design of an interactive biomedical
text mining framework to recognize real-time drug entities using machine learning
algorithms. Procedia Comput. Sci. 143, 181–188 (2018)
6. Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical
text mining: state-of-the-art, open problems and future challenges. In: Holzinger,
A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomed-
ical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014).
https://doi.org/10.1007/978-3-662-43968-5 16
7. Huang, C.-C., Zhiyong, L.: Community challenges in biomedical text mining over
10 years: success, failure and the future. Brief. Bioinform. 17(1), 132–144 (2016)
8. Ji, M., He, Q., Han, J., Spangler, S.: Mining strong relevance between heteroge-
neous entities from unstructured biomedical data. Data Min. Knowl. Disc. 29(4),
976–998 (2015). https://doi.org/10.1007/s10618-014-0396-4
9. Ju, Z., Wang, J., Zhu, F.: Named entity recognition from biomedical text using
SVM. In: 2011 5th International Conference on Bioinformatics and Biomedical
Engineering, pp. 1–4. IEEE (2011)
10. Khordad, M., Mercer, R.E.: Identifying genotype-phenotype relationships in

biomedical text. J. Biomed. Semant. 8(1), 57 (2017)
11. Kim, D., et al.: A neural named entity recognition and multi-type normalization
tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019)
12. Kirshners, A., Parshutin, S., Leja, M.: Research on application of data mining
methods to diagnosing gastric cancer. In: Perner, P. (ed.) ICDM 2012. LNCS
(LNAI), vol. 7377, pp. 24–37. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-31488-9 3
13. Lee, J., et al.: BIOBERT: a pre-trained biomedical language representation model
for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
14. Leroy, G., et al.: Genescene: biomedical text and data mining. In: 2003 Joint Con-
ference on Digital Libraries, Proceedings, pp. 116–118. IEEE (2003)
15. Li, F., Zhang, M., Guohong, F., Ji, D.: A neural joint model for entity and relation
extraction from biomedical text. BMC Bioinf. 18(1), 1–11 (2017)
16. Liu, M., Hu, Y., Tang, B.: Role of text mining in early identification of potential
drug safety issues. In: Kumar, V.D., Tipney, H.J. (eds.) Biomedical Literature
Mining. MMB, vol. 1159, pp. 227–251. Springer, New York (2014). https://doi.
org/10.1007/978-1-4939-0709-0 13
17. Logeswari, S., Premalatha, K.: Biomedical document clustering using ontology
based concept weight. In: 2013 International Conference on Computer Communi-
cation and Informatics, pp. 1–4. IEEE (2013)
18. Neustein, A., Sagar Imambi, S., Rodrigues, M., Teixeira, A., Ferreira, L.: Applica-
tion of text mining to biomedical knowledge extraction: analyzing clinical narra-
tives and medical literature. In: Text Mining of Web-Based Medical Content, vol.
50. De Gruyter, Berlin (2014)
19. Nguyen, D.P., Ho, T.B.: Drug-drug interaction extraction from biomedical texts
via relation BERT. In: 2020 RIVF International Conference on Computing and
Communication Technologies (RIVF), pp. 1–7. IEEE (2020)
20. Prasdika, P., Sugiantoro, B.: A review paper on big data and data mining concepts
and techniques. IJID Int. J. Inf. Dev. 7, 33 (2018)
21. Rajpal, D.K., Qu, X.A., Freudenberg, J.M., Kumar, V.D.: Mining emerging
biomedical literature for understanding disease associations in drug discovery.
In: Kumar, V.D., Tipney, H.J. (eds.) Biomedical Literature Mining. MMB, vol.
1159, pp. 171–206. Springer, New York (2014). https://doi.org/10.1007/978-1-
4939-0709-0 11
22. Rodrigues, R., Costa, H., Rocha, M.: Development of a machine learning frame-
work for biomedical text mining. In: 10th International Conference on Practical
Applications of Computational Biology & Bioinformatics. AISC, vol. 477, pp. 41–
49. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40126-3 5
23. Rodriguez-Esteban, R.: Biomedical text mining and its applications. PLoS Com-
put. Biol. 5(12), e1000597 (2009)
24. Shah, S., Luo, X.: Exploring diseases based biomedical document clustering and
visualization using self-organizing maps. In: 2017 IEEE 19th International Con-
ference on e-Health Networking, Applications and Services (Healthcom), pp. 1–6.
IEEE (2017)
25. Shatkay, H., Chen, N., Blostein, D.: Integrating image data into biomedical text
categorization. Bioinformatics 22(14), e446–e453 (2006)
26. Srihari, R., Ruiz, M.E., Srikanth, M.: Concept chain graphs: a hybrid IR framework
for biomedical text mining. In Proceedings of the SIGIR 2003 Workshop on Text
Analysis and Search for Bioinformatics. Citeseer (2003)
27. Thompson, P., et al.: The BioLexicon: a large-scale terminological resource for
biomedical text mining. BMC Bioinf. 12(1), 1–29 (2011)
28. Tsuruoka, Y., et al.: Developing a robust part-of-speech tagger for biomedical text.
In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 382–392.
Springer, Heidelberg (2005). https://doi.org/10.1007/11573036 36
29. Wu, H.-Y., Chiang, C.-W., Li, L.: Text mining for drug–drug interaction. In:
Kumar, V.D., Tipney, H.J. (eds.) Biomedical Literature Mining. MMB, vol. 1159,
pp. 47–75. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-0709-
04
30. Xing, Y., Chengkun, W., Yang, X., Wang, W., Zhu, E., Yin, J.: ParaBTM: a paral-
lel processing framework for biomedical text mining on supercomputers. Molecules
23(5), 1028 (2018)
31. Ye, Z., Tafti, A.P., He, K.Y., Wang, K., He, M.M.: SparkText: biomedical text
mining on big data framework. PloS One 11(9), e0162721 (2016)
32. Yepes, A.J., Berlanga, R.: Knowledge based word-concept model estimation and
refinement for biomedical text mining. J. Biomed. Inform. 53, 300–307 (2015)
33. Zhu, F., et al.: Biomedical text mining and its applications in cancer research. J.
Biomed. Inform. 46(2), 200–211 (2013)
Electric Vehicles as Distributed Micro
Generation Using Smart Grid for Decision
Making: Brief Literature Review
Julieta Sanchez-Garcı́a1 , Román Rodrı́guez-Aguilar2(B) ,

and Jose Antonio Marmolejo-Saucedo1
1
0097498@up.edu.mx
2
Facultad de Ciencias Económicas y Empresariales, Universidad Panamericana,
Augusto Rodin 498, 03920 Ciudad de México, Mexico
Abstract. This article deals with a brief review of the literature about
the potential use of renewable energies through the integration of smart
grids and the use of electric vehicles as micro generators that allow energy
exchange with the grid. The main technical aspects are addressed, as well
as potential benefits and requirements necessary for said integration. High-
lighting key aspects in the integration of smart grids, energy storage sys-
tems, prosumers and their interaction with electrical vehicles on the grid.
Keywords: Smart grid · Vehicle to grid · Distributed energy · Smart

cities · Batteries
1 Introduction
The global objective of reducing emissions of CO2 plus the growing demand for
electricity, has encouraged the use of renewable sources and try to reduce the
use of conventional energy sources (especially dependent on fossil fuels). Techno-
logical advancement as part of the development of sustainable energy sources is
linked to the design of smart and sustainable products. The redesign of household
appliances, industrial equipment, vehicles, etc. It is aligned with the objective
of reducing emissions of CO2 and the development of green markets demanded
by consumers. The requirement of clean and sustainable energy, in addition
to products aligned with better environmental performance, has motivated the
development of innovative schemes for the integration of electrical networks. The
importance of real-time monitoring, the use of new digital technologies and high
connectivity have allowed the development of a set of innovative proposals such
as smart grids and different distributed generation schemes.

https://doi.org/10.1007/978-3-030-93247-3_93
982 J. Sanchez-Garcı́a et al.
The diversification of the energy matrix in the electricity sector is a primary

objective of many countries, this has fostered the growing participation of renew-
able sources (hydroelectric, wind, solar, etc.). Given its inherently intermittent
nature, it is necessary to establish greater control and monitoring of the opera-
tion. The great potential of the integration of renewable energy sources is clear,
but it also represents a great challenge to maintain the stability of the electrical
system and the correct integration of various technologies simultaneously. The
implementation of methodologies and technologies is necessary to make the most
of all the energy sources you consider as well as to optimize the operation of the
electrical system technically and economically.
These new challenges have driven the development of smart grids as a source
of support in making optimal decisions regarding the network operator and con-
sumers, who in these new schemes take on a new, more active role in the energy
sector as independent producers of energy. Energy from distributed micro gen-
erators. This new role has been named Prosumers as explained by Brown et al.
(2020).
The objective of this research is to establish the state of the art in the devel-
opment and integration of smart grids with distributed generation, especially
highlighting the role of electric vehicles as micro generators. The work is inte-
grated as follows, in the first section the document review and information inte-
gration strategy is described, in the second section the main findings detected in
reference to three fundamental concepts are presented: 1) smart grids, 2) energy
storage systems and electrical vehicles and 3) the role of prosumers in these new
schemes. Finally, the conclusions and the references used are presented.
2 Search Process and Methodology
A systematic review of the open source literature of three main databases was
carried out: a) Science Direct, b) Springer Link and c) Google Scholar. A review
period of the last five years was considered, due to the large number of publi-
cations identified, it was decided to limit the search period to the most recent
publications, so the analysis period is from 2017 to 2022. As selection criteria of
the publications were considered research focused on the operation and integra-
tion of renewable energies, distributed generators and the smart grid. In this first
approach to the state of the art, aspects such as safety, reliability and technical
requirements were not considered.
The keywords were combined creating 4 different search parameters in the
selected engines considering the energy discipline, for the largest number of
results the first 3 pages, the titles and abstracts were reviewed to decide whether
or not to include it in the database. Based on the information collected, a
database was structured considering variables such as: title, objective, method-
ology, results, abstract and the complete document. The integration of this
EV in Smart Grid 983
database allowed a systematic selection of the works using text mining tools.
The search results yielded a total of 17,440 papers, of which the congruence of
the title and abstract was checked according to the research objective for 135
papers. It should be noted that the identified works were mainly concentrated
on papers, a small proportion corresponding to book chapters and conference
proceedings. Similarly, in this search, unpublished products (working papers) or
theses were not considered.
A total of 34 papers related to the objective of the study were selected for
a complete analysis (Fig. 1). Based on these articles, the state of the art was
developed in three fundamental axes: 1) smart grids, energy storage systems
and 3) electric vehicles.
Fig. 1. Search methodology process.
3 Results
3.1 Smart Grids
The development of Smart Grids has been very broad in recent years, due to
the growing integration of renewable energy sources in electrical systems, the
growing demand and the need to find an optimal dispatch and understand the
evolution of electricity supply and demand. Smart Grids are the key element
that allows all interested parties to take advantage of the benefits of diversifying
the energy matrix and, above all, guarantee the stability and correct operation
of the electrical system.
Traditionally, the network was designed to meet consumer demand, and
power generation was strongly based on thermoelectric generators whose power
generation is controllable, today consumer demand (industrial, commercial and
residential) The inclusion of renewable sources in the generation portfolio has
generated a great challenge in the operation of the network, taking into account
the particularities of each technology considered. One of the main objectives
of smart grids is to guarantee the optimal operation of the grid by integrating
renewable energies and guaranteeing its operation and stability in real time.
Butt et al. (2021) explain the differences of the modern smart grids, which are
capable of taking decisions considering all the information available taking into
account the efficiency, sustainability, economics and security of the grid, versus
the conventional ones showing the complexity and the different stakeholders that
takes part of it (Fig. 2).
Fig. 2. Schematic representation of the Smart Grid.
Masera et al. (2018) shows through a cost benefit analysis the importance
and benefits of the Smart Grids taking in consideration not only the economic
impacts, but also the social and environmental ones, states the Smart Grids as a
crucial component of the Smart Cities which include mobility, communities and
neighborhoods that participate with Distributed Generation and Storage.
The different sources and data generated from the operation of the generation
portfolio has resulted in a challenging problem of economic dispatch, as well
as the integration of all the actors in the network, without compromising the
reliability and stability of the electrical system. Among the main findings in this
regard, different methodologies have been considered to achieve an approach to
the optimal dispatch using methodologies of stochastic optimization, simulation,
digital twins and artificial intelligence mainly (Table 1).
The main objective of the studies identified is the consideration and control
of the uncertainty in the operation of the network, as well as the measurement
and fulfillment of the related environmental objectives.
Table 1. Main studies identified related to the operation of smart grids.
Author Objective Method Relevant concepts

Domyshev et al. (2021) Optimize the operation of Control-oriented Due to the uncertainty
the grid stochastic model factors of the energy
Replicate and predict the Digital Twins production, the
operation computational and
technical resources
needed are more complex
Khalil et al. (2021) Interaction between the Optimization of Encourages the building
actors of the grid energy cost in real of more resources of the
time smart grids for control
and management
Hoang et al. (2021) Integrate a renewable Intelligent algorithms 60–70% of the greenhouse
energy system in smart gas emissions are
cities generated in consequence
of the cities
Mahmood et al. (2021) Optimal scheduling of the Model predictive Computational
renewables portfolio control requirements to execute
the expectations of the
smart grids. Importance
of back up or storage
systems
Muhanji et al. (2019) Aggregation and Use case comparison Transactive energy as a
transactions between way to manage the
participants of the same performance of the grid
microgrid to align the economics
with the real operation of
the system
Meng and Wang (2017) Optimal economical Optimization Considers the generation
dispatch and demand sides
Murty and Kumar (2020) Demonstrates that by Microgrid Energy Interaction and
including hybrid energy Management System participation of
sources the CO2 photovoltaic, wind
emissions are reduced in turbines, diesel
more than 50% generators, fuel cells,
micro turbines and
energy storage systems
3.2 Energy Storage Systems and Electrical Vehicles
Energy Storage Systems or batteries are crucial for autonomous systems, but also
a useful tool to control or reduce the energy that comes from fossil generators,
and take advantage of all the renewable energy generated above the consumption
curve. Similarly, in those territories where the distance between the generator
and the consumption node, it is economically more efficient to use distributed
generation instead of establishing transmission lines over very long distances.
According to the results identified in the works related to distributed generation,
the use of batteries is identified as a scheme that continues to be very expensive
and requires further optimization studies for this option to be economically viable
and operationally efficient.
Ramos et al. (2021) presents a case study for Finland which includes the
Energy Storage System approach as a third party service, along with the correct
regulatory basis resulting in revenue and more reliability considering that bat-
teries are still they are expensive to integrate. Georgakarakos et al. (2018) shows
a study of battery storage systems in buildings that act as a microgrid capable
of interacting with the grid and controlling demand as needed, while excess elec-
tricity can be exported to the grid on the spot optimal to take advantage of the
price difference and reduce the maximum demand of the system.
Currently there is a growing participation of electric vehicles in the transport
sector which implies more electricity demand, less greenhouse gases and also
an opportunity to regulate and support the grid during demand peaks. Most
authors agree that a flexible charging of EVs brings different benefits such as:
avoiding CO2 emissions, reducing demand peaks and savings or profits for own-
ers. However, it also represents an increase in the demand for electrical energy
and a potential use as a microgenerator for the regulation and maintenance of
stability in the grid (Table 2).
Table 2. Main studies identified related energy storage systems and EVs.
Author Methodology Objective and contribution

Zahedi (2018) Flow chart of The EV batteries need to communicate with the utility
charging grid to determine if there is spare capacity in the system
management and start the charge balancing the demand
Brenna et al. Genetic Requirement of 50% of more PV systems installed to fulfil
(2017) algorithm the growing demand derived by the increase of EV to reach
the sustainable goals
Mwasilu et al. Optimization The aggregation of the EVs to act as a generator
(2014) algorithm contributes with auxiliary services such as voltage an
frequency regulation, peak power control, and reduction of
the power system operating cost
Tarroja and Smart charging The cost of the charge can be reduced as achieving the
Hittinger (2021) model zero emission goal by adopting a flexible charging scheme,
under a Californian case
Jafari et al. Linear model Proposes parking lots in the subway system for its
(2021) interaction and taking advantage of the usable regenerative
braking energy, and reducing the charging costs
Salehpour and Economic The EVs are part of the day ahead dispatch portfolio
Tafreshi (2020) stochastic connected to the microgrid, as a result the grid reduces its
model costs and the EV owners get revenues
Martinez et al. Four scenarios The use of the EV as a storage energy device and its
(2017) comparison capability of exchanging energy to the homes are being
attractive to the home owners, as a strategy to reduce its
dependence of the grid, its environmental impacts and
operational costs
Sami et al. Simulation The integration of the EV with the Smart Grid,
(2019) model contributes in supporting the network during the peak
loads, controlling the power factor, the reactive power and
integrating renewable energy to the generation portfolio
Wang et al. Mixed model of Optimal charging and discharging of the EV including an
(2017) control and aggregator entity in order to minimize the variability of the
communications power demand and interaction to the grid, can yield higher
revenues for the actors
(continued)
Author Methodology Objective and contribution

Longo et al. Case of study The use of EVs in an Italian district introduces
(2019) with charging considerable savings with respect to the Base Case, and
scenarios the impacts on the grid management
Alirezazadeh Scheduling Including the distributed generation from the power point
et al. (2021) model of view for the EV charge, involves a greater flexibility of
the system and finally successfully validates its findings in
a case of study
Ferro et al. Optimal The charging times are defined by the utility company
(2018) scheduling of using the lowest market prices as decision variables
EVs charging
Tostado-Véliz Mixed Integer Taking advantage of controllable home appliances the
et al. (2021) Linear energy availability and power requirements, using the EV
Programming as a microgenerator whenever it’s demanded, is achieved a
41% cost reduction
Ghatikar et al. Country policies By economical incentives that takes in account the grid
(2017) comparison demand, the users can adapt and be more flexible on their
charging behavior
Castellanos Three scenarios Vehicle-to-grid (V2G) technology can be used to reduce
et al. (2019) comparison peak demands, where the number of EVs and their stay
hours in a parking place represent the critical parameters
for the effectively peak reduction
Toniato et al. Optimization By applying larger time granularity and V2G have the
(2021) case of study greatest impact on peak reduction up to 80%; but that
implies a higher computational requirements
Cai et al. (2018) Four cases By optimal controlling the charge and discharge of EVs the
comparison costs, peak, and energy requirements are reduced
Murakami and Micro grid By adding the Wireless Power Tranfer (WTP), which
Yamagata clusters concept is reviewed in Chhawchharia et al. (2018); and
(2017) optimization suggesting that the PV systems are able to charge driving
vehicles through this technology giving them sufficiency at
any time
3.3 Prosumers
The increasingly active role of consumers in the monitoring and fulfillment of

environmental objectives has generated modifications in the design of products
and services that seek to comply with environmental standards to achieve greater
acceptance among society. Similarly, in the electricity sector, by integrating gen-
eration through renewable sources, it allows consumers to play a more active role
by generating energy independently, this has led to a new recognized agent in
the electricity sector, the so-called Prosumers, which are distributed microgener-
ators of renewable energy that are also consumers, and exchange electricity with
the grid. The consideration of this new agent in the operation of the electrical
system requires new approaches in its operation (Table 3).
Table 3. Main studies identified related to the figure of Prosumers.
Author Methodology Contributions

Winther et al. Norway case of study Finding the motivations of the Prosumers: helping
(2018) the system to ease the pressure, environmental
concerns, social prestige and economic profitability
Kappner et al. Total cost of the installation The investment in the PV system is profitable, but
(2019) of PV systems and batteries in the case of the batteries, it shows a slight
in different households economical benefit, although it brings a more
self-sufficiency system
Leal-Arcas et al. European Union case of study The legislation must integrate and include the
(2018) considerations required and support the
participation and protection of the prosumers
Ruiz-Cortes et al. Interaction of 2 prosumers Shows how by aggregating both demands,
(2018) with batteries and PV generation and storage, through a microgrid, they
systems could reduce the energy exchanged to the grid by
approximately 13%
Sha and Aiello Montecarlo simulation Prosumers participation leads to reducing the
(2020) maximum load, energy losses, and energy costs
One of the main contributions of the integration of the Prosumer figure is the
positioning of environmental objectives in society, as well as positive benefits in
the operation of the system, such as the reduction of the maximum load, energy
losses as well as economic benefits in energy costs.
4 Conclusions
The need and urgency to change the way the world produces energy is clear.
The inclusion of renewable sources in the generation portfolio and the decentral-
ization of the grid through the inclusion of more actors and generation sources
is a great challenge for electricity systems worldwide. The integration of these
new elements have changed the way electricity grids operate. The development
of new technologies has made evident the need to use as much information as
possible for the correct real-time operation of electrical systems. Smart grids are
an example of this, their growing development in recent years marks a milestone
in the design, operation and regulation of electricity grids.
The development of distributed generation technologies has been hampered
by the high costs they represent, however in recent years the option of integrating
electric vehicles into the grid as microgenerators has been explored. So far the
implementation of this scheme is incipient, but various alternatives are being
explored, potential benefits are envisioned for regulation and maintenance of
network stability. In addition, to represent a potentially viable alternative that
allows to extend the coverage of electrical energy in remote communities. On
the other hand, the integration of the figure of prosumers as an active agent in
the network can lead to a more responsible use of energy, in addition to being
a reflection of the growing awareness of society for the care of environmental
resources.
Much remains to be done to achieve optimal use of the potential benefits

of these new schemes and minimize the uncertainty in their operation linked
to weather conditions. Considering the amount of energy and data exchanged
between the interconnected networks, Smart Grids assume a leading role in the
optimal use of facilities, resources and energy, to achieve the objective of con-
suming energy whenever it is generated avoiding losses and optimizing the use
of resources.
References
Alirezazadeh, A., Rashidinejad, M., Afzali, P., Bakhshai, A.: A new flexible and resilient
model for a smart grid considering joint power and reserve scheduling, vehicle-to-grid
and demand response. Sustainable Energy Technol. Assess. 43, 100926 (2021)
Brenna, M., Foiadelli, F., Longo, M., Zaninelli, D.: A study of the integration of dis-
tributed generation and EVs in smart cities. In: 2017 International Conference of
Electrical and Electronic Technologies for Automotive, pp. 1–6 (2017)
Brown, D., Hall, S., Davis, M.E.: What is prosumerism for? Exploring the normative
dimensions of decentralised energy transitions. Energy Res. Soc. Sci. 66, 101475
(2020)
Butt, O.M., Zulqarnain, M., Butt, T.M.: Recent advancement in smart grid technology:
future prospects in the electrical power network. Ain Shams Eng. J. 12(1), 687–695
(2021)
Cai, H., Chen, Q., Guan, Z., Huang, J.: Day-ahead optimal charging/discharging
scheduling for electric vehicles in microgrids. Prot. Control Mod. Power Syst. 3(1),
1–15 (2018)
Castellanos, J.D.A., Rajan, H.D.V., Rohde, A.-K., Denhof, D., Freitag, M.: Design
and simulation of a control algorithm for peak-load shaving using vehicle to grid
technology. SN Appl. Sci. 1(9), 1–12 (2019)
Chhawchharia, S., Sahoo, S.K., Balamurugan, M., Sukchai, S., Yanine, F.: Investigation
of wireless power transfer applications with a focus on renewable energy. Renew.
Sustain. Energy Rev. 91, 888–902 (2018)
Domyshev, A., Häger, U., Panasetsky, D., Sidorov, D., Sopasakis, P.: Resilient future
energy systems: smart grids, vehicle-to-grid, and microgrids. In: Solving Urban
Infrastructure Problems Using Smart City Technologies, pp. 571–597. Elsevier (2021)
Ferro, G., Laureri, F., Minciardi, R., Robba, M.: An optimization model for electrical
vehicles scheduling in a smart grid. Sustain. Energy Grids Netw. 14, 62–70 (2018)
Georgakarakos, A.D., Mayfield, M., Hathway, E.A.: Battery storage systems in smart
grid optimised buildings. Energy Procedia 151, 23–30 (2018)
Ghatikar, G., Ahuja, A., Pillai, R.K.: Battery electric vehicle global adoption practices
and distribution grid impacts. Technol. Econ. Smart Grids Sustain. Energy 2(1),
1–10 (2017)
Hoang, A.T., Nguyen, X.P., et al.: Integrating renewable sources into energy system for
smart city as a sagacious strategy towards clean and sustainable process. J. Cleaner
Prod., 127161 (2021)
Jafari, M., Kavousi-Fard, A., Niknam, T., Avatefipour, O.: Stochastic synergies of
urban transportation system and smart grid in smart cities considering V2G and
V2S concepts. Energy 215, 119054 (2021)
Kappner, K., Letmathe, P., Weidinger, P.: Optimisation of photovoltaic and battery
systems from the prosumer-oriented total cost of ownership perspective. Energy
Sustain. Soc. 9(1), 1–24 (2019)
Khalil, M.I., Jhanjhi, N., Humayun, M., Sivanesan, S., Masud, M., Hossain, M.S.:
Hybrid smart grid with sustainable energy efficient resources for smart cities. Sustain.
Energy Technol. Assess. 46, 101211 (2021)
Leal-Arcas, R., Lesniewska, F., Proedrou, F.: Prosumers as new energy actors. In:
Africa-EU Renewable Energy Research and Innovation Symposium, pp. 139–151
(2018)
Longo, M., Foiadelli, F., Yaı̈ci, W.: Simulation and optimisation study of the integration
of distributed generation and electric vehicles in smart residential district. Int. J.
Energy Environ. Eng. 10(3), 271–285 (2019)
Mahmood, D., Javaid, N., Ahmed, G., Khan, S., Monteiro, V.: A review on optimization
strategies integrating renewable energy sources focusing uncertainty factor-paving
path to eco-friendly smart cities. Sustain. Comput. Inf. Syst. 30, 100559 (2021)
Martinez, I.J., Garcia-Villalobos, J., Zamora, I., Eguia, P.: Energy management of
micro renewable energy source and electric vehicles at home level. J. Mod. Power
Syst. Clean Energy 5(6), 979–990 (2017)
Masera, M., Bompard, E.F., Profumo, F., Hadjsaid, N.: Smart (electricity) grids for
smart cities: assessing roles and societal impacts. Proc. IEEE 106(4), 613–625 (2018)
Meng, W., Wang, X.: Distributed energy management in smart grid with wind power
and temporally coupled constraints. IEEE Trans. Industr. Electron. 64(8), 6052–
6062 (2017)
Muhanji, S.O., Flint, A.E., Farid, A.M.: Transactive energy applications of eIoT. In:
eIoT, pp. 91–113. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10427-
64
Murakami, D., Yamagata, Y.: Micro grids clustering for electricity sharing: an approach
considering micro urban structure. Energy Procedia 142, 2748–2753 (2017)
Murty, V., Kumar, A.: Multi-objective energy management in microgrids with hybrid
energy sources and battery energy storage systems. Prot. Control Mod. Power Syst.
5(1), 1–20 (2020)
Mwasilu, F., Justo, J.J., Kim, E.-K., Do, T.D., Jung, J.-W.: Electric vehicles and smart
grid interaction: a review on vehicle to grid and renewable energy sources integration.
Renew. Sustain. Energy Rev. 34, 501–516 (2014)
Ramos, A., Tuovinen, M., Ala-Juusela, M.: Battery energy storage system (BESS) as
a service in Finland: business model and regulatory challenges. J. Energy Storage
40, 102720 (2021)
Ruiz-Cortes, M., et al.: Optimal charge/discharge scheduling of batteries in microgrids
of prosumers. IEEE Trans. Energy Convers. 34(1), 468–477 (2018)
Salehpour, M.J., Tafreshi, S.M.: Contract-based utilization of plug-in electric vehicle
batteries for day-ahead optimal operation of a smart microgrid. J. Energy Storage
27, 101157 (2020)
Sami, I., et al.: A bidirectional interactive electric vehicles operation modes: vehicleto-
grid (v2g) and grid-to-vehicle (g2v) variations within smart grid. In: 2019 Inter-
national Conference on Engineering and Emerging Technologies (ICEET), pp. 1–6
(2019)
Sha, A., Aiello, M.: Topological considerations on peer-to-peer energy exchange and
distributed energy generation in the smart grid. Energy Inf. 3(1), 1–26 (2020)
Tarroja, B., Hittinger, E.: The value of consumer acceptance of controlled electric
vehicle charging in a decarbonizing grid: the case of California. Energy 229, 120691
(2021)
Toniato, E., Mehta, P., Marinkovic, S., Tiefenbeck, V.: Peak load minimization of an
e-bus depot: impacts of user-set conditions in optimization algorithms. Energy Inf.
4(3), 1–18 (2021)
Tostado-Véliz, M., León-Japa, R.S., Jurado, F.: Optimal electrification of off-grid smart
homes considering flexible demand and vehicle-to-home capabilities. Appl. Energy
298, 117184 (2021)
Wang, K., et al.: Distributed energy management for vehicle-to-grid networks. IEEE
Netw. 31(2), 22–28 (2017)
Winther, T., Westskog, H., Sæle, H.: Like having an electric car on the roof: domesti-
cating PV solar panels in Norway (2018)
Zahedi, A.: Smart grids and the role of the electric vehicle to support the electricity
grid during peak demand. In: Application of Smart Grid Technologies, pp. 415–428.
Elsevier (2018)
A Secured Network Layer and Information
Security for Financial Institutions: A Case
Study
Md Rahat Ibne Sattar1, Shrabonti Mitra1, Sadia Sultana1,

Umme Salma Pushpa1, Dhruba Bhattacharjee1, Abhijit Pathak1,
and Mayeen Uddin Khandaker2(&)
1
Department of Computer Science and Engineering, BGC Trust University
Bangladesh, Chattogram, Bangladesh
{rahat,abhijitpathak}@bgctub.ac.bd
2
Centre for Applied Physics and Radiation Technologies, School of
Engineering and Technology, Sunway University, 47500 Bandar Sunway,
Selangor, Malaysia
mayeenk@sunway.edu.my
Abstract. Communication networks are used to transfer valuable and confi-

dential information for a variety of purposes. As a result, they attract the
attention of those who intend to steal or misuse information or disrupt or destroy
storage or communication systems. In this paper, the authors try to explore some
of the major issues related to achieving a reasonable level of resilience against
network attacks. Some attacks are planned and specifically targeted, while others
may be opportunistic, resulting in thefts. In this contemporary world, security
plays a significant part. And when it is about a financial industry, it becomes an
emergency case. Keeping proper protection is challenging. Though security
contains various fields of the section, but the network layer is one of the most
critical sections for online systems. Cyber security is changing and developing
rapidly; no system can claim to be a hundred percent safe in this world. But we
can measure the majority risk and overcome this. We are offering mixed nodes
and a decentralized protocol to make transactions more secure. And lastly, we
want to make sure that if the system is compromised with any other security
sections, then our alarming system of honeypots will notify the administrator
and keep the system up and running.
Keywords: Network security Cryptography Honeypot Mix-nodes Man

in the middle attack Decentralization
1 Introduction
Online Banking is a very popular topic in the field of the transaction or cashless
payment. Nowadays it is very easy to deposit and cash out money through online
banking in a short period which is very beneficial. Online banking has reduced human
suffering. People are now enjoying banking services from anywhere at any time via
online which has brought the whole world’s internal transactions to the forefront.

https://doi.org/10.1007/978-3-030-93247-3_94
A Secured Network Layer and Information Security for Financial Institutions 993
In online banking, anyone can always be alert about transactions and balances,
money can be transferred easily from one end to the other in the world. Within minutes,
accounts can be registered online from anywhere in the world which is never possible in
the case of our traditional banking system. It is a platform where all financial relation-
ships are stored in one place and can be accessed from anywhere at any time. Unnec-
essary fees are avoided in online banking. This opportunity is never available in our
traditional banking system. Anyone can keep money online from one country to another
to implement tasks. For example, most of the people of Bangladesh deposit money in
Switzerland’s Swiss Bank. People are getting 24 h and 7 days online banking facilities in
a very commendable week. They can always check the account history. This service can
be accessed very quickly and efficiently online. Fees per capital are reduced on this
platform and the reduction in per capita fee means a higher interest rate [1].
The authors’ current round of discussion on this topic seeks to focus on how to secure
an online banking system in Bangladesh. Amid the ups and downs of the coronavirus
epidemic that has been started in the middle of spring 2020 in the country, online banking
was the only trust of the people for the transaction. So securing the online banking system
is now the priority on which we are working. Online banking is still safe but not
completely safe this time. So safety needs to be increased. The goal of this paper is to
strengthen security in online banking and reduce cyberspace hacking. This paper aims to
present a propositional model that will greatly reduce cyberspace hijacking [2].
Despite its advantages, online banking has some disadvantages too. The most
common problem of online banking is the technical problem. If in banks, the server
shut down because of a power outage, then people may not be able to connect to the
server. This becomes a big problem when this technical issue happened at an important
moment with that account.
Another problem is that in online banking services, customers have no direct contact
with any banking personnel so that customers can talk to them physically. And also
sometimes, people have to face difficulties at the time of transaction like getting con-
firmation message or e-mail, transferring money from one bank to another, using the
bank’s e-check [3]. Another problem is a security issue. In today’s world, this is a big
problem. Sometimes there is a risk of hijacking while transferring a large amount of
money. Because of lack of security, if anybody can get unauthorized access to the
customer’s account he can do anything. Or if the attacker can identify the medium
through which the transaction will be made, the hacker can hack the medium. That’s why
to avoid the identification of medium, we are using a technique in our model. That is we
are using different nodes for transferring the packets so that attackers will not be able to
guess that by which medium packet will transfer. Even after that if the attackers can
identify then the system will notify the system and the hijacked medium will shut down.
2 Case Study
This case is all about a cyber-attack on a bank where a hacking group was able to steal
almost a billion dollars. The victim was the Bangladesh bank and the hacking was
attempted by North Korea. US federal indictment is an official document where
somebody is formally charged with a crime, in this case, the crime has supposedly been
994 Md. R. I. Sattar et al.
committed by a North Korean criminal group. Federal Bureau of Investigation

(FBI) identified those involved in hacking using threat intelligence techniques as well.
In cyber security, a term called Indicators of Compromise, a shorter form is IOC. It
is a face to describe the signs of a malware infection. If anyone finds a specific malware
action in a log file, it will be an indicator that the system is being compromised. And it
should clean up and investigate further. When it’s about analyzing malware, the goal is
to find infections as many as possible. IOC often takes the form of the URL, or the IP
addresses the malware communicates with files are written by the malware will process
names are created.
In February 2016, Bangladesh bank’s computers connected to the SWIFT payment
system was attacked or hacked by a hacker group. Now Swift is a mechanism that allows
you to transfer money between each other, and there was no compromise of Swift itself,
but here the attackers were able to generate requests within Swift from inside the Ban-
gladesh bank [4]. Those requests were sent to the Federal Reserve System in New York,
where the Bangladesh bank keeps her money as country’s reserve, and it was requested to
transfer funds to accounts in the Philippines and Srilanka. The hacker was successful to
transfer several hundred millions of dollars. However, one of the reasons why the rest of
the transfers that would have title a billion dollars was failed due to the fact that the streets
address of the receiving bank in the Philippines contained the word, Jupiter. Now Jupiter
was also the name of an oil tanker and a shipping company under US sanctions against
Iran. So it was on a list of sanctions, resulted this payment went through a red flag in the
New York Federal Reserve System because it contained the word, Jupiter. The adversary
used the spear-phishing emails to target finding Bangladesh bank employees. They use
resume seemed emails that include a link to download a potential candidate’s resume for
employment. They even send some employees messages on LinkedIn. A quite interesting
technique because it’s a kind of message that doesn’t go free standard email to end
security controls. FBI analysis shows at least three employees click the link and open the
file. And this first attack cut down almost a year before the billion-dollar attack took place.
Once it’s in the network, it seems pretty easy for the attackers to move around and find the
swift system they were looking for [5]. And what’s also interesting is that the bad guys
were so busy that the forensic evidence shows that once they found the Swift terminals,
they tried to log into it. And they used some users’ names and passwords from a different
bank, that nothing to do with the Bangladesh bank. This was related to the South
American currency exchange. Who has also targeted the malware, but that was com-
pletely separate. So, it seems like they, for a short period, managed to confuse themselves
as to which victim they were dealing with and left some traces of that malware behind.
And once it is found in the swift terminals, they deployed a custom-written mal-
ware that executed the phony payment request. Many key indicators of compromise of
possible out of this malware.
If anyone runs this malware in a normal environment, then nothing happens.
Because this malware is specifically designed to run with a swift environment, it’s
looking for particular file structures and a specific program to be present on a system to
execute the attack. It used a packer and became difficult to read. The malware can
remove itself and history after completing the attack [6]. The malware was deployed
with an encrypted config file. The algorithm used to protect this config file is RC4. It is
known as KSA, which is a part of the RC4 routine. The malware was also enumerating
the victim’s computer, specifically looking for a particular module. To be loaded in the
process that lives adb.dll, which is used in Swift platform, and the malware could
overwrite the code and request for a transaction.
3 Cyber Thereat
3.1 Viruses
A virus is a piece of code that can copy itself and is harmful to any system. It can
destroy many necessary data of a file and corrupt any system. They attach themselves
to any type of system files and executable files and cause unexpected behavior of a
system or program and sometimes cause system crashes. They destruct network
security and create cramps in passing data through network layers. Sometimes delete
data from a file when it is downloading. All this creates a hazard in network security
layers. It can spread through a network while sending an email or text message
attachments to others. Also, it spreads at the time of downloading any file which needs
an internet connection to be downloaded and with the social media scam links [7].
3.2 System Infectors

Those infectives who can affect any system or create any cramp into passing data, they
are called system infectors including virus. The attack involves modifying, encrypting,
and deleting data, etc. They create problems in the process of any network layer. These
infectors do whatever their objective during the process of network layer which creates
a hazard in the network layer.
3.3 File Infectors

The file infector is a type of virus that is attached to an executable code such as
computer games, installation documents, word processors, etc. Once a file infector
attacks a file, it can re-manufacture itself by copying and spreading to other programs
and files. So this is very harmful to a network. Even the networks that are used to utilize
files and programs, those can face a major problem for this infector. Macro Viruses:
This is a type of virus that works by embedding malicious code in the macros that are
associated with documents, spreadsheets, and other data files. It ensures the fast run-
ning or executing of a malicious program just when the documents are opened. Once an
infected macro is executed in a user’s computer it will infect all other documents of that
computer severally. Through any shady app download, this can infect one’s mobile
devices or smartphones [8].
3.4 Worm
A worm is a type of malware or malicious software that can replicate rapidly and
spread across devices within a network. As it spreads, a worm consumes bandwidth,
overloading infected systems and making them unreliable or unavailable. They spread
over computer networks by exploiting operating system vulnerabilities. Computer-

worms can also contain “payloads” that damage host computers.
3.5 Trojan
A Trojan horse or Trojan is a type of malware that is often disguised as legitimate
software. Once activated, Trojans can enable cyber-criminals to spy on user, steal
users’ sensitive data, and gain backdoor access to users’ system. So this is very much
dangerous. It can take control of one’s computer and do whatever it wants.
3.6 Rootkit
Rootkits usually infect computers through a phishing email, fooling users with a
legitimate-looking email that contains malware, but sometimes they can be delivered
through exploit kits. Naturally for sending an email to others, an internet connection is
needed. So it can be said that in the above way rootkits attack in a network layer.
4 Methodology
4.1 Conceptual Network Layer Framework

The infrastructure for sending data securely is shown in Fig. 1. This is an enhanced
system of data communication. The indispensable components of this system are a
central computer, Gateway, Honeypot, some other workstation which is being
addressed here node, etc. Necessary credential-sensitive information will have to be
stored in a central computer which is the most treasured item of this system. The work
procedure of this system is, first of all, the central computer will provide data to the
world wide web [9].
On the way to the data, Gateway will chip the received data from a central com-
puter, convert that data, and then Set them to the nodes. Several nodes will stay there.
Nodes will be connected to honeypots. Nodes will encrypt the received data and one
sends the encrypted data to the other. So, it is quite hard for anyone to identify which
node has the final data. In this way, we can ensure maximum security in the online
banking system. If a node is attacked by a third party, the honeypot will inform others
through alarms so that other nodes may be alert. Then the secured data will be sent to
the gateway. Gateway will chip and convert the data then send them to their desti-
nation. In this process, data can securely reach the destination of online banking
transactions midst. In the above description, one-way data sending is described such as
client to bank. The other way take place in the same process. For example, bank-to-
client data transactions will be induced like this, data from the bank will be sent to the
nodes. On the way, the gateway will chip and convert data. Then it will be moved to
nodes and one node to others, again sent to the gateway and converted data sent to the
central computer [10] (Fig. 1).
Fig. 1. Conceptual network framework
4.2 Data Flow

The data is passing inside the node. Within the nodes, several paths of data passing are
shown. The reason behind the diverse path is to make sure the utmost indemnity. If a
common sequence is fixed for data gate such as data will be sent respectably path a to
b, path b to c, path c node to d, that means we are creating that opportunity for a third
party to attack our system. It can’t be allowed to happen at all.
To avoid this type of interruption, several paths inside a node are inescapable.
A binary data is passing from favorable path to path inwardly nodes in Fig. 2. This is
like 1001010 is going on through the 1st path among a lot of in that node. Then it is
passing through the 3rd path inside the second node and third node. In this approach, it
can be ensured that a third party will not be able to identify the data and attack our
system. Deductively numerous percent security is ensured.
Fig. 2. Data flow
4.3 Decentralized Network

The main goal is to unlink origin and destination using multiple nodes. It can be used to
communicate data. As Fig. 3 shows, three layers have to be connected. And the traffic
flows from the first layer, formally to the second layer, and then to the third. The main
reason to select this topology is that the research has shown that it is the most optimal
for trustworthy privacy and security.
This cryptographic packet format ensures bitwise unlike ability, which means the
packets in each network hub are entirely changed. So if anyone is observing the
broadcasting or the end node, even then, it’s difficult for them to trace the packets and
get back the binary presentation. In addition to that, this cryptographic packet format is
compact and computationally efficient.
Fig. 3. Decentralized nodes
Each node only knows the information of the previous and the following nodes.
Packets are routed in an independent path, and it’s making sure that every package is
going through a different path, which makes this entirely untraceable.
After that, we want to make sure that it doesn’t provide any information about other
nodes. Therefore, we have used a script to turn off that node and disconnect it from
other nodes, making the system secure even if the node is being compromised.
4.4 Honeypots Workflow

Honeypots are usually used to track and collect valuable information about the attack.
Our system configures a honeypot to forward the captured sensitive information
through an email to the primary system and the administrations to be aware of the
cyber-attack. In Fig. 4, we consider the injection a middle man attack as intercepting
the packets and accessing the nodes.
Fig. 4. Honeypot alarm

After the access generally, the attempts should be collecting the credentials. Here is
where our honeypots will keep the preamble. In the broadcasting node, we already
configured a few honeypots named as fake credentials. Those honeypots will auto-
matically send the logs to the administrators and the primary system whenever anyone
tries to open or temper.
4.5 Overview of Cyber Attack

In recent cyberspace, no one can guarantee a fully proved secured system that can stay
secured long. So it is always better to keep a backup plan in the system which can save
the system in critical situations. In the previous section, we already discussed honey-
pots workflow. And we concluded that it would turn off that specific node that is being
compromised. And we wanted to make sure that compromised nodes can harm as little
as possible. So the next challenge will be to activate the system even after dropping one
node. The desynchronize network will have to maintain the broadcasting and end node.
Therefore the system will forward the next node as a broadcasting node. The end node
will be as it is. Traffic flow will maintain the same independent path. And the route will
be individual as well (Fig. 5).
After considering these terms, the system will be up and running until we identify
the causes of the honeypots and other connections.
Fig. 5. Cyber attack’s incident
5 Limitations
As we have mentioned earlier, our system will only work on network-based attacks.
That’s why our system has some limitations. Our system will handle the threats which
are directly come upon the network. But there exist some other cyberattacks which do
not fall under our system. The cyber threats that do not fall under this system are as
follows.
5.1 Phishing
We should have a clear definition of phishing. Phishing is one of the most well-known
cyber-attacks. It’s used to steal personal data like user name, credit card number, login
credentials, and amp, password, pin code, etc. The phishing attacker mainly targets
those user and bank employee who is connected with that bank. Phishing occurs when
attackers sent malicious emails, messages, or phone calls and if the user personnel or
employee replies to that email, message, or call then attackers can get access and
become a victim of phishing. So, it is not possible to prevent phishing by our system.
5.2 Spoofing
Spoofing is hacking by email. In the term of spoofing, someone intimates the sender’s
information and pretends to be a trusted friend, or colleague, or someone else to gain
access to personal information or steal data. So we should not respond to that kind of
email.
6 Conclusion
Since there is no immediate gateway of banking security from Bangladesh, the

implementation of the combination of mixed nodes and desynchronized protocol to
make the transfer more secure as a quick fix has to be measured to control cyber-
attacks. “Effective desynchronized protocol must become more personalized, flexible,
and on-demand if it is to serve as a support source as a means to an end that expands
the scope of locking traditional banks, controlling nation-state hacking, and securing
transactions more and more”. In a preliminary attempt, there might be some challenges
in directing this approach in the state levels of the banking system such as the server
going down because of a power outage. But to strengthen security in online banking
and reduce cyberspace hacking, this process is one of the best technique. This is very
emergent for a hacker to identify through which node transaction is happening. But so
many nodes are playing a flawless role here together. That’s why this is impossible to
identify a specific node among so many nodes. So, no chances of cyberspace hacking
left here. For saving the system in troublesome status, a backup plan is ready which is if
any packet is intercepted by any middle man attacker, honeypots will automatically
send logs to the administrators and the victim node will be turned off and disconnected
from other nodes. So, this security system is infrangible. This is an easily applicable
system undoubtedly. This model intends to ensure maximum security of the banking
transaction process. It is possible to reduce the maximum number of cyber-attacks by
implementing this concept that’s why its significance is borderless.
References
1. Abe, M.: Mix-networks on permutation networks. In: Lam, K.-Y., Okamoto, E., Xing, C.
(eds.) ASIACRYPT 1999. LNCS, vol. 1716, pp. 258–273. Springer, Heidelberg (1999).
https://doi.org/10.1007/978-3-540-48000-6_21
2. Camenisch, J., Lehmann, A., Neven, G.: Electronic identities need private credentials. IEEE
Secur. Priv. 10(1), 80–83 (2012)
3. Cannell, J.S., et al.: Orchid: a decentralized network routing market. Technical report,
Orchid Labs, Technical Report (2019)
4. Chaum, D., et al.: cMix: mixing with minimal real-time asymmetric cryptographic
operations. In: Gollmann, D., Miyaji, A., Kikuchi, H. (eds.) ACNS 2017. LNCS, vol. 10355,
5. Troncoso, C., Isaakidis, M., Danezis, G., Halpin, H.: Systematizing decentralization and
privacy: lessons from 15 years of research and deployments. Proc. Priv. Enhancing Technol.
2017(4), 404–426 (2017)
6. Greschbach, B., Kreitz, G., Buchegger, S.: The devil is in the metadata—new privacy
challenges in decentralised online social networks. In: 2012 IEEE International Conference
on Pervasive Computing and Communications Workshops, pp. 333–339 (2012)
7. Kastrenakes, J.: NordVPN reveals server breach that could have let attacker monitor traffic,
October 2019. https://www.theverge.com/2019/10/21/20925065/nordvpn-serverbreach-vpn-
traffic-exposed-encryption
8. Bora Keskin, N., Zeevi, A.J.: Dynamic pricing with an unknown demand model:
asymptotically optimal semi-myopic policies. Oper. Res. 62(5), 1142–1167 (2014)
9. Khan, M.T., DeBlasio, J., Voelker, G.M., Snoeren, A.C., Kanich, C., VallinaRodriguez, N.:
An empirical analysis of the commercial VPN ecosystem. In: Proceedings of the Internet
Measurement Conference 2018, pp. 443–456. ACM (2018)
10. Perta, V.C., Barbera, M.V., Tyson, G., Haddadi, H., Mei, A.: A glance through the VPN
looking glass: IPv6 leakage and DNS hijacking in commercial VPN clients. Proc. Priv.
Enhancing Technol. 2015(1), 77–91 (2015)
(2019). https://doi.org/10.1007/978-3-030-00979-3. ISBN 978-3-030-00978-6. https://www.
springer.com/gp/book/9783030009786
(2020). https://doi.org/10.1007/978-3-030-33585-4. ISBN 978-3-030-33585-4. https://www.
springer.com/gp/book/9783030335847
(2021). https://doi.org/10.1007/978-3-030-68154-8. https://link.springer.com/book/10.1007/
978-3-030-68154-8
Author Index
A Bekdaş, Gebrail, 35, 536, 592

Aashiq Kamal, Khaleque Md., 341 Bhattacharjee, Dhruba, 992
Abdelbaki, Abdelhalim, 948 bin Mohd Amiruddin, Ahmad Azharuddin
Abir, Mohammed Adnan Noor, 647 Azhari, 207
Abugabah, Ahed, 391 Biswas, Avijit, 258, 268
Adhikary, Subhrangshu, 879 Biuk, Adriana, 857
Adouane, Nour ElHouda, 45 Boukendil, Mohammed, 801, 948
Agnihotri, P. G., 479, 671, 685, 695 Bukhari, Nurul Adela, 55, 927
Agnihotri, Prasit Girish, 832 Burge, Gokhan, 167
Aguilar-Ramirez, Jose Eduardo, 450
Ahmed, Razu, 905 C
Akter, Ferdusee, 751 Çamur, Hüseyin, 167
Akter, Laboni, 467 Castellino, Dion Trevor, 196
Akter, Yeasmin Ara, 108 Chabira, Safia, 237
Al-Ani, Ahmed Muayad Rashid, 167 Chakma, Ratna, 248
Alejandrino, Jonnel, 417 Chan, Din Yuen, 185
Al-Hussain, Enaam A., 97, 407 Chan, Mieow Kee, 207, 927
Aljabry, Israa A., 958 Chang, Chien-I, 185
Alkhawaldeh, Reyad, 391 Chanza, Martin, 577
Al-Suhail, Ghaida A., 97, 407, 958 Charqui, Zouhair, 801, 948
Apu, Khalid Ibn Zinnah, 302 Chen, Kaixuan, 555
Arefin, Mohammad Shamsul, 427, 773, 905, 968 Chiang, Chung Ching, 185
Armoogum, Sheeba, 893 Chin, Wang Chan, 55
Arshad, M., 479 Chin-Hernandez, Adrian, 623
Awal, Md. Abdul, 467 Chowdhury, Afsana Akther, 427
Azimi, Mohammad Yasin, 685 Chun, Tee Hoe, 55
Concepcion II, Ronnie, 417
B
Balyan, Vipin, 867 D
Bandyopadhyay, Tarun Kanti, 503 Dadios, Elmer, 417
Banharnsakun, Anan, 3 Ðambić, Goran, 291
Banik, Anirban, 503 Dang, Trung Thanh, 707
Barid, Mimun, 331, 718 Daoden, Kanchana, 613
Bazeer Ahamed, B., 845 Das, Anindita, 352, 728
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2022
https://doi.org/10.1007/978-3-030-93247-3
1004 Author Index
Das, Annesha, 457 J

Das, Utpol Kanti, 248 Jacinto-Cruz, Marcos, 812
Deb, Kaushik, 341 Jahan, Israt, 108
Deb, Ujjwal Kumar, 751, 917 Jahan, Tajnim, 427
Demir, Binnur, 167 Jahara, Fatima, 155
Deshmukh, Sujata, 196 Jeong, Seung Ryul, 738
Dey, Ashim, 457 Joseph, S., 479
Djambic, Goran, 370 Joshua Thomas, J., 120
Jothi, Justtina Anantha, 120
E Juričić, Vedran, 291
El Moutaouakil, Lahcen, 801
Emon, Rahat Yeasin, 440 K
Kachhawa, Nitin Singh, 832
F Kalyan, Dummu, 479, 671, 685
Feng, Zunlei, 555 Kaneshima, Kohei, 546
Ferdous, Gazi Jannatul, 75 Karim, Rezaul, 773, 968
Fister Jr., Iztok, 65 Karmaker, Rajib, 917
Fister, Iztok, 65 Kassem, Youssef, 167
Frozan, Daryosh, 479, 685 Kayabekir, Aylin Ece, 35, 536
Kee, Chan Mieow, 55
G Khandaker, Mayeen Uddin, 992
Gadri, Said, 45, 237 Khasgiwala, Yash, 196
Gaudel, Bijay, 738 Kibria, Hafsa Binte, 397
Ghosh, Promila, 467 Krishnamurthy, Murugan, 845
Gmeiner, Marc, 938 Kučak, Danijel, 291, 857
Groenewald, B., 867
L
H Lackner, Maximilian, 938
Hai, Daeniel Song Tze, 120 Lepuschitz, Wilfried, 938
Han, Gengshi, 555 Liu, Shunyu, 555
Haque, Monjurul, 227 Losper, Bertram, 867
Hasan, Md. Manzurul, 331, 718
Hasan, Md. Maruf, 457 M
Hasan, Sumayea Benta, 427 Mahmud, Sarker Safat, 360
Hassan, Md. Mehedi, 467 Mahmud, Sarker Shahriar, 360
Herizi, Khadidja, 237 Maimuna, Maisha, 905
Hidki, Rachid, 801, 948 Majumder, Nilanjana, 258, 268
Hooshmand, F., 566 Manshahia, Mukhdeep Singh, 762
Hoque, Mohammed Moshiul, 155, 302 Marmolejo-Saucedo, Jose Antonio, 450, 623,
Hossain, Khandkar Asif, 258, 268 634, 981
Hossain, Md Shariar, 277 Marmolejo-Saucedo, Jose-Antonio, 812
Hossain, Md. Azad, 75 Materum, Lawrence, 417
Hossain, Monuar, 132 Matin, Abdul, 397
Hossain, Shahadat, 331, 718 Matos, Telmo, 488, 495
Hossain, Syed Md. Minhaz, 341 Mazumder, Partha P., 132
Hung, Bui Thanh, 602 Mehmood, Atif, 391
Hyder, Tasfia, 773 Mehmood, Komal, 145
Mehmood, Maryam, 145
I Merdan, Munir, 938
Iffath, Fariha, 968 Mia, Md. Zesun Ahmed, 227
Islam, Md. Moinul, 217, 227 Miah, Mohammad Islam, 647
Islam, Sabiha, 457 MirHassani, S. A., 566
Islam, Saiful, 227 Mitiku, Tigilu, 762
Islam, Sanzida, 397 Mitra, Shrabonti, 992
Ivanov, Stefan, 24 Mohamudally, Nawaz, 893
Author Index 1005
Mohseni, Usman, 671, 685, 695 Rodríguez-Aguilar, Román, 450, 812, 820, 981
Mokoena, Naledi Blessing, 577 Romsai, Wattanawong, 14
Moutaouakil, Lahcen El, 948 Roy, Saralya, 217
Mpeta, Kolentino N., 888
Mršić, Leo, 370, 857 S
Muhury, Ripa, 751 Sahu, Nilkanta, 352, 728
Munapo, Elias, 513 Salgado-Reyes, Antonia Paola, 820
Salihi, Muqadar, 479
N Sana, Kajal, 258, 268
Nafisa, Nuren, 427
Sanchez-García, Julieta, 981
Nakamura, Morikazu, 546
Saran, V., 671, 695
Nath, Shipan Chandra Deb, 917
Sarder, Keya, 258, 268
Nawikavatan, Auttarat, 14
Sarkar, Juliet Polok, 258, 268
Netshimbupfe, Adivhaho Frene, 167
Sarker, Iqbal H., 302
Ng, Sokchoo, 207
Sathi, Khaleda Akhter, 75
Nigdeli, Sinan Melih, 35, 536, 592
Sattar, Md Rahat Ibne, 992
Niyomsat, Thitipong, 14
Saucedo-Martinez, Jania, 623
O Sebastian, Joseph, 685
Osornio, Armando Calderon, 634 Selaotswe, Otsile R., 888
Ould Mehieddine, Sara, 237 Sen, Anik, 341
Özceylan, Eren, 381 Sen, Prithwish, 352, 728
Shaikh, Arbaaz A., 788
P Shamim, Md., 360
Palconit, Maria Gemel, 417 Sharfi, Elhamam A. M., 167
Panchenko, Vladimir, 503 Sharif, Omar, 155
Patel, D. P., 479 Shiebler, Dan, 525
Patel, Dhruvesh, 671, 685 Shrestha, Deepanjal, 738
Pathak, Abhijit, 992 Sikder, Juel, 248
Pathan, Azazkhan Ibrahimkhan, 479, 671, 685, Şimşek, Şeyda, 381
695, 788, 832 Singh, Aditya, 87
Patidar, Nilesh, 671, 695 Smadi, Ahmad A. L., 391
Pea-Assounga, Jean Baptiste Bernard, 314 Song, Jie, 175
Pham, Thanh Vu, 707 Song, Mingli, 175, 555
Podgorelec, Vili, 65 Stankov, Stanko, 24
Prieto, Cristina, 671, 685 Sultana, Sadia, 992
Prince, Md. Rakibul Islam, 360 Sultana, Suriya, 258, 268
Protulipac, Dario, 370 Sun, Li, 175
Puangdownreong, Deacha, 14 Sybingco, Edwin, 417
Pushpa, Umme Salma, 992
T
R Thanh, Dang Trung, 659
Rahat, Fazle, 217 Tista, Sharmistha Chanda, 440, 457
Rahman, Md. Habibur, 108 Toklu, Yusuf Cengiz, 536
Rahman, Md. Mahmudur, 331, 718 Trinidad, Emmanuel, 417
Rahman, Nafiza, 905 Tsoku, Johannes Tshepiso, 577
Rahman, S. M. A. Mohaiminur, 227 Tuyet, Nguyen Huynh Anh, 659
Raihan, M., 258, 268, 467
Rajkarnikar, Neesha, 738
Rana, Vaishali, 695 U
Rasalingam, Rasslenda-Rass, 120 Uddin, Raihan, 427
Rashida, Maliha, 968
Rathod, Praveen, 788 V
Riaz, Md Hasnat, 132, 277 Vo, Quang Minh, 707
Ripan, Rony Chowdhury, 217 Vrbančič, Grega, 65
1006 Author Index
W Y
Waikhom, Sahita I., 788 Yalçın, Neşe, 381
Wenan, Tan, 738 Yeo, Wan Sieng, 927
Wu, Mengyun, 314 Yu, Na, 555
Wu, Pei Hung, 185 Yücel, Melda, 536, 592
Z
Zaman, Sadika, 467
X Zareer, Shabir Ahmad, 479, 671, 695
Xue, Mengqi, 175 Zrikem, Zaki, 801

2022 Book IntelligentComputingOptimizati

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 Book IntelligentComputingOptimizati

Uploaded by

Copyright:

Available Formats

Lecture Notes in Networks and Systems 371

More information about this series at https://link.springer.com/bookseries/15179

ISSN 2367-3370 ISSN 2367-3389 (electronic)

The 4th edition of the popular as well as prestigious International Conference on

December 2021 Pandian Vasant

Elias Munapo North-West University, South Africa

Special Sessions Chairs

Keynote Chairs and Panel Chairs

Publicity and Social Media Chairs

Workshops and Tutorials Chairs

Posters and Demos Chairs

Sponsorship and Exhibition Chairs

Sustainable Artiﬁcial Intelligence Applications

Importance of Fuzzy Logic in Trafﬁc

Deep Learning and Machine Learning Applications

Development of Contact Angle Prediction for Cellulosic Membrane . . . 207

IOTs, Big Data, Block Chain and Health Care

Inductions of Usernames’ Strengths in Reducing Invasions

Sustainable Modelling, Computing and Optimization

Generalized Optimization: A First Step Towards Category

Sustainable Environmental, Social and Economics Development

Developing a System to Analyze Comments of Social Media

Emerging Smart Technology Applications

A Novel Prevention Technique Using Deep Analysis Intruder

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003

Dr. Pandian Vasant

Professor Ivan Zelinka

technician, computer specialist (HW+SW) and commercial bank (computer and

Professor Gerhard-Wilhelm Weber

Professor Elias Munapo

Professor Jose Antonio Marmolejo

He is Member of the Network for Decision Support and Intelligent Optimization of

Computational Intelligence Research Laboratory (CIRLab), Computer

Abstract. Images taken in low-light environments tend to show incomplete

Keywords: Artiﬁcial Bee Colony (ABC) Image contrast enhancement

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022

Fig. 1. A low contrast (left) and high contrast (right) image.

Contrast is a variance of the gray level determined by measuring the intensity

Co-occurrence matrix M1,0 (i,j) size 8×8

Fig. 2. Example of co-occurrence matrix construction [19], Left: Matrix representation of

3 Enhancement of Image Contrast by Using ABC

where the distance weight wd is p1ﬃﬃ2.

where Pi is the probability of occurrence of ith intensity of the image.

Evaluate fitness of Evaluate fitness of

Update Each onlooker bee

4 Experimental Settings and Results

(a) (b) (c) (d)

(a) (b) (c) (d)

(a) (b) (c) (d)

In this work, image enhancement using an ABC-based gamma correction method is

Thitipong Niyomsat1, Wattanawong Romsai2, Auttarat Nawikavatan3,

Keywords: State-feedback controller Tractor active suspension Lévy-flight

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022

2 Dynamic Model of Tractor Active Suspension System

From the state-space

4 Results and Discussions

K ¼ ½ 250 500 300 200 150 ð9Þ

To design an optimal state-feedback controller for the tractor active suspension

K ¼ ½ 3; 814:22 3; 693:58 3; 172:14 127:49 271:86 ð11Þ

Fig. 3. Step responses of the tractor active suspension controlled system.

Fig. 4. Sinusoidal responses of the tractor active suspension controlled system.

The application of the Lévy-flight intensiﬁed current search (LFICuS) algorithm to

Stefan Ivanov(&) and Stanko Stankov

Department of Automation, Information and Control Systems, Technical