Professional Documents
Culture Documents
Intelligent Data Communication Technologies and Internet of Things
Intelligent Data Communication Technologies and Internet of Things
D. Jude Hemanth
Danilo Pelusi
Chandrasekar Vuppalapati Editors
Intelligent Data
Communication
Technologies
and Internet
of Things
Proceedings of ICICI 2021
Lecture Notes on Data Engineering
and Communications Technologies
Volume 101
Series Editor
Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data
technologies and communications. It will publish latest advances on the engineering
task of building and deploying distributed, scalable and reliable data infrastructures
and communication systems.
The series will have a prominent applied focus on data technologies and
communications with aim to promote the bridging from fundamental research on
data science and networking to data engineering and communications that lead to
industry products, business knowledge and standardisation.
Indexed by SCOPUS, INSPEC, EI Compendex.
All books published in the series are submitted for consideration in Web of Science.
Intelligent Data
Communication
Technologies and Internet
of Things
Proceedings of ICICI 2021
Editors
D. Jude Hemanth Danilo Pelusi
Department of Electronics Faculty of Communication Sciences
and Communication Engineering University of Teramo
Karunya Institute of Technology Teramo, Italy
and Sciences
Coimbatore, India
Chandrasekar Vuppalapati
Department of Computer Engineering
San Jose State University
San Jose, CA, USA
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
We are honored to dedicate the proceedings
of ICICI 2021 to all the participants and
editors of ICICI 2021.
Foreword
Dr. K. Geetha
Conference Chair—ICICI 2021
vii
Preface
It is with deep satisfaction that I write this Foreword to the Proceedings of the ICICI
2021 held in, JCT College of Engineering and Technology, Coimbatore, Tamil Nadu,
from August 27 to 28, 2021.
This conference was bringing together researchers, academics and professionals
from all over the world, experts in Data Communication Technologies and Internet
of Things.
This conference encouraged research students and developing academics to
interact with the more established academic community in an informal setting to
present and discuss new and current work. The papers contributed the most up-to-date
scientific knowledge in the fields of data communication and computer networking,
communication technologies, and their applications such as IoT, big data, and cloud
computing. Their contributions aided in making the conference as successful as it
has been. The members of the local organizing committee and their assistants have
put in a lot of time and effort to ensure that the meeting runs smoothly on a daily
basis.
We hope that this program will stimulate further research in intelligent data
communication technologies and the Internet of Things, as well as provide prac-
titioners with improved techniques, algorithms, and deployment tools. Through this
exciting program, we feel honored and privileged to bring you the most recent devel-
opments in the field of intelligent data communication technologies and the Internet
of Things.
We thank all authors and participants for their contributions.
ix
Acknowledgements
ICICI 2021 would like to acknowledge the excellent work of our conference orga-
nizing committee, keynote speakers for their presentation on August 27–28, 2021.
The organizers also wish to acknowledge publicly the valuable services provided by
the reviewers.
On behalf of the editors, organizers, authors and readers of this conference, we
wish to thank the keynote speakers and the reviewers for their time, hard work,
and dedication to this conference. The organizers wish to acknowledge Dr. D. Jude
Hemanth and Dr. K. Geetha for the discussion, suggestion, and cooperation to orga-
nize the keynote speakers of this conference. The organizers also wish to acknowledge
for speakers and participants who attend this conference. Many thanks given for all
persons who help and support this conference. ICICI 2021 would like to acknowl-
edge the contribution made to the organization by its many volunteers. Members
contribute their time, energy, and knowledge at a local, regional, and international
levels.
We also thank all the chair persons and conference committee members for their
support.
xi
Contents
xiii
xiv Contents
Dr. D. Jude Hemanth received his B.E. degree in ECE from Bharathiar University
in 2002, M.E. degree in communication systems from Anna University in 2006 and
Ph.D. from Karunya University in 2013. His research areas include Computational
Intelligence and Image processing. He has authored more than 120 research papers
in reputed SCIE indexed International Journals and Scopus indexed International
Conferences. His Cumulative Impact Factor is more than 150. He has published 33
edited books with reputed publishers such as Elsevier, Springer and IET.
Dr. Danilo Pelusi received the Ph.D. degree in Computational Astrophysics from the
University of Teramo, Italy. Associate Professor at the Faculty of Communication
Sciences, University of Teramo, he is an Associate Editor of IEEE Transactions on
Emerging Topics in Computational Intelligence, IEEE Access, International Journal
of Machine Learning and Cybernetics (Springer) and Array (Elsevier). Guest editor
for Elsevier, Springer and Inderscience journals, he served as program member of
many conferences and as editorial board member of many journals. His research inter-
ests include Fuzzy Logic, Neural Networks, Information Theory and Evolutionary
Algorithms.
xxi
xxii About the Editors
Abstract Despite the fact that there are numerous ways for object identification,
these techniques under-perform in real-world conditions. For example, heavy rains
and fog at night. As a result, this research work has devised a new convolutional
neural network for identifying animals in low-light environments. In the proposed
system, images of different animals (containing both domestic and wild animals)
are collected from various resources in the form of images and videos. The overall
number of samples in the dataset is 2300; however, because convolutional neural
networks require more samples for training, a few data augmentation techniques
are employed to raise the number of samples in the dataset to 6700. Horizontal flip,
rotation, and padding are the data augmentation techniques. The proposed model has
achieved an accuracy of 0.72 on the testing set and 0.88 on training set, respectively,
without applying the edge detection techniques. The proposed model has achieved
0.81 accuracy after using Canny edge detection technique on animal dataset for
outperforming the state-of-the-art models with ResNet-50 and EfficientNet-B7.
1 Introduction
There is a lot of research happening on in the field of object detection. Deep learning
and computer vision have now produced incredible results for detecting various
classes of objects in a given image. Recent advances in this domain assisted us
in creating bounding boxes around the objects. Future developments in this sector
may benefit visually impaired people [1, 2]. The most frequent strategy used in
computer vision techniques for object detection is to transform all color images into
grayscale format and subsequently into binary image. A later region convolutional
neural network is constructed to outperform the standard computer vision algorithms
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_1
2 P. N. R. Bodavarapu et al.
in terms of accuracy [3, 4]. Object detection in remote sensing images is quite a bit
of challenge. The major difference between natural image and remote sensing image
is the size of the object, which is small in remote sensing images when compared
to the background of natural image. This becomes hard to detect the object in the
remote sensing images. Further challenge is that the remote sensing images are only
top-down view [5].
The feature pyramids can generate segment proposals for object detection. The
applications that can be combined with feature pyramids for object detection are: (1)
regional proposal network and (2) fast R-CNN [6, 7]. It is difficult to detect items
in images that do not include labeled detection data. However, YOLO9000 assists
in detecting item types that do not contain label data. YOLO9000 can function in a
real-time environment while performing these difficult tasks [8]. Unmanned aircraft
systems play an important role in wild animal survey for managing a better ecosystem.
Going through all these aerial photographs for manual detection of wild animals is
a difficult task. Employing deep learning methods helps to detect wild animals more
accurately with reduced time consumption. The steps involved for detecting wild
animals in aerial images are: (1) image collection, (2) image splitting, (3) image
labeling, (4) dividing data into train and test sets, (5) training and validation, and (6)
testing [9]. The advantages of using ReLU activation function are the computations
are cheaper and converge faster. The major objective, the vanishing gradient effect,
may be addressed with the ReLU activation function. The error percentage of deep
convolutional neural network (DCNN) with ReLU activation function is 0.8%, which
is more advantageous than sigmoid and tanh activation functions, since the error
percentage is 1.15 and 1.12, respectively, on MNIST dataset [10, 11].
Vehicle collisions with animals cause traffic accidents, resulting in injuries and
deaths for both humans and animals. When the vehicle’s speed exceeds 35 kmph,
the driver has a more difficult time avoiding a collision with an animal since the
distance between the car and the animal is shorter. The influence of humans in
road accidents is nearly 92%. Vehicle collision with animals can be categorized into
direct and indirect collisions [12]. Convolutional neural network can be significantly
effected (increase or decrease) by various techniques, namely (1) weighted residual
connections, (2) cross stage partial connections, (3) cross mini-batch normalization,
(4) self-adversarial training, (5) mish activation, (6) mosaic data augmentation, and
(7) DropBlock regularization. The training of the object detection can be improved
by (1) activation functions, (2) data augmentation, (3) regularization methods, and (4)
skip connections [13, 14]. The steps involved in classifying the wild animals in video
footage are: (1) input video, (2) background subtraction, (3) difference clearing, (4)
calculate energy, (5) average variation, and (6) classification [15, 16].
The important contributions made in this research paper can be outlined as: (1)
devised a novel convolutional neural network, (2) applied various edge detection tech-
niques, (3) experimented on different opacity levels, and (4) compared all the results
and provided valid conclusion. This research is based on animal detection in an image
that has been taken in low-light conditions, and we collected different animals (both
domestic and wild animals) from many resources in form of images and videos. These
videos include several animals; we split the recordings into frames using a Python
An Optimized Convolutional Neural Network … 3
script, and relevant images were chosen and grouped into their respective directo-
ries. The size of the dataset is 2300 samples, since the convolutional neural networks
need more number of samples for training. Also, few data augmentation techniques
are used to increase the size of dataset to 6700 samples. The data augmentation
techniques used here are horizontal flip, rotation, and padding. The proposed model
contains four convolutional layers, two batch normalization layers, two maxpooling
layers, and two dropout layers. The activation function used at convolutional layers is
rectified linear unit (ReLU), and the activation function that has been used in output
layer is softmax. The learning rate and weight decay used in this work are 0.0001 and
1e−4, respectively. The proposed model is then trained for 100 epochs with batch
size 32.
2 Related Work
proposed method before adding the recognition algorithm is 43 fps, and the frame rate
after adding the recognition algorithm is 36 fps. Othman et al. [23] have proposed a
system for object detection in real time, which can run at high frames per second (fps).
The author has used MobileNet architecture combined with a single shot detector,
which is trained on framework Caffe. To implement this model, Raspberry Pi 3 is
used for obtaining high frames per second, where movidius neural compute stich is
used. Data augmentation is used, since the convolutional neural networks need large
number of samples for training. This method on the Raspberry Pi 3 CPU has obtained
0.5 frames per second. Gasparovsky et al. [24] have discussed about the importance
of outdoor lighting and the factors affecting it. The outdoor lighting depends on
various conditions, namely (1) season, (2) time of day, and (3) no. of buildings and
population of the area.
Guo et al. [25] have proposed a neural network for two sub-tasks: (1) region
proposals and (2) object classification. RefineNet is included after the region proposal
network in the region proposals section for the best region suggestions. On the
PASCAL VOC dataset, the proposed technique is tested. After analyzing the results,
the author has explained that the fully connected layer with softmax layer must
be fine-tuned. The proposed model on PASCAL VOC dataset has achieved 71.6%
mAP. The state-of-the-art model R-CNN has obtained 66.0% mAP on PASCAL VOC
dataset. The results clearly indicate that the proposed method performs significantly
better than the R-CNN. Guo et al. [26] have proposed a convolutional neural network
for object detection, which does not use region proposals for object detection. For
detecting the objects, DarkNet is transferred to a fully convolutional network, and
later, it is fine-tuned. The region proposal system is not effective in real time, since
they take more run time. The proposed model has obtained 73.2% mAP, while fast
R-CNN and faster R-CNN obtained 68.4% and 70.4% mAP, respectively.
3 Proposed Work
The technique of finding boundaries of an image is called edge detection. This tech-
nique can be employed in various real-world applications like autonomous cars and
unmanned aerial vehicles. The edge detection techniques help us to decrease the
computation and processing time of data while training the deep learning model.
There are several edge detection approaches; in this study, we utilize Canny edge,
Laplacian edge, Sobel edge, and Prewitt edge detection.
Canny edge detection is used for finding many edges in images. The edges detected
in this image will generally have high local maxima of gradient magnitude. This
technique decreases the probability of not finding an edge in the image. The steps
involved in this technique are, namely (1) smoothing, (2) find gradients, (3) non-max
suppression, (4) thresholding, (5) edge tracking, and (6) output.
Canny edge detection equation:
Edge_gradient = G 2x + G 2y (1)
Gy
Angle(θ ) = tan−1
Gx
∂2 f ∂2 f
∇2 f = + (2)
∂x2 ∂ y2
6 P. N. R. Bodavarapu et al.
Sobel edge detection finds the edges, where gradient of image is very high. Unlike
Canny edge detection, Sobel edge detection does not generate smooth edges, and
also, the number of edges produced by Sobel edge detection is less than Canny edge
detection.
Sobel edge detection equation:
M= S2x + S2y (3)
Prewitt edge detection is used for finding vertical and horizontal edges in images.
The Prewitt edge detection technique is fast when compared to Canny edge and Sobel
edge techniques. For determining the magnitude and edge detection, it is considered
as one of the best techniques.
Prewitt edge detection equation:
M= S2x + S2y (4)
3.3 Algorithm
Step 5: Then, the same process is repeated with Laplacian, Sobel, and Prewitt edge
detection techniques.
Step 6: Later, all the datasets are divided in the ratio of 80:20 for training and
evaluation.
Step 7: The proposed model and different state-of-the-art models are trained on the
train set and tested on the test set, respectively.
Step 8: Lastly, the performance metrics of different models are displayed based on
the train and test sets.
4 Experimental Results
or during heavy fog. Below are the sample images that the proposed model is able
to detect the animals in night and fog conditions (Fig. 4).
dataset without any edge detection techniques, but after using Canny edge technique,
the proposed model achieved an accuracy of 0.81 on animal dataset. There is a signif-
icant increase in train accuracy and test accuracy of proposed model, when Canny
edge detection is applied. The model seems to perform better when Canny edge detec-
tion is applied on animal dataset. The proposed model achieved less accuracy when
Prewitt edge detection is used. When all the above four edge detection techniques
are compared, Canny edge detection is better, since it is achieving high accuracy
than other edge detection techniques. The results clearly suggest that the proposed
model’s accuracy on both train and test sets significantly improved when Canny edge
detection is used.
10 P. N. R. Bodavarapu et al.
Fig. 4 Animals detected by proposed model during night and fog conditions
Table 2 Outline of accuracy and loss of proposed model after applying edge detection techniques
S. no. Technique Train accuracy Test accuracy Train loss Test loss
1 Canny 0.92 0.81 0.26 1.05
2 Laplacian 0.88 0.68 0.35 1.16
3 Sobel 0.90 0.68 0.31 1.60
4 Prewitt 0.81 0.65 1.13 2.01
An Optimized Convolutional Neural Network … 11
Table 3 Accuracy of proposed model for various opacity levels on animal dataset
S. no. Opacity level No. of actual animals No. of detected animals Accuracy
1 1.0 7 7 100
2 0.9 7 6 85.7
3 0.7 7 5 71.4
4 0.5 7 3 42.8
5 0.3 7 0 0
6 0.1 7 0 0
Table 3 illustrates the accuracy of proposed model at different opacity levels. When
the opacity level is 1, the proposed model has detected every object in the image and
obtained 100% accuracy. We next reduced the opacity level to 0.9, and the suggested
model obtained 85.7% accuracy, detecting 6 items out of 7 correctly. When the
opacity level is adjusted to 0.5, the suggested model’s accuracy is 42.8, which means
it detected only three items out of seven. When the opacity levels are 0.3 and 0.1, the
accuracy of model is 0, that is, it did not detect any of the 7 objects in the image. Here,
we can see that the accuracy of the model is decreasing as the opacity levels decrease.
This shows that light is very important factor in an image for object detection. The
future work of this research is to develop a system, which can work better at opacity
levels less than 0.5. The drawback of the traditional models and the proposed model
is that they do not perform well under the 0.5 opacity levels. Below is the sample
number of objects detected in image by proposed model, when the opacity level is
1.0.
5 Conclusion
In the proposed system, images of various kind of animals (containing both domestic
and wild animals) are collected from many resources in form of images and videos.
All the videos are then divided into frames using Python script, and appropriate
images are selected. The size of the dataset is 2300 samples, since the convolutional
neural networks require more number of samples for training, and we used few data
augmentation techniques to increase the number size of dataset to 6700 samples.
The data augmentation techniques used are horizontal flip, rotation, and padding.
After the data augmentation, the dataset is divided in the ratio of 80:20, where 80%
(5360 images) are used for training and 20% (1340 images) are used for evaluating
(testing). The proposed model achieves an accuracy of 0.72 on the testing set and
0.88 on training set, respectively, without applying edge detection techniques. The
14 P. N. R. Bodavarapu et al.
proposed model achieved 0.81 accuracy after using Canny edge technique on animal
dataset. Similarly, the proposed model achieved 0.68, 0.68, and 0.65 accuracies when
Laplacian, Sobel, and Prewitt edge detection techniques are applied, respectively. The
results clearly suggest that the proposed model’s accuracy on both train and test sets
significantly improved when Canny edge detection is used, and it is outperforming
the state-of-the-art models ResNet-50 and EfficientNet-B7.
References
1. Nasreen J, Arif W, Shaikh AA, Muhammad Y, Abdullah M (2019) Object detection and narrator
for visually impaired people. In: 2019 IEEE 6th international conference on engineering
technologies and applied sciences (ICETAS). IEEE, pp 1–4
2. Mandhala VN, Bhattacharyya D, Vamsi B, Thirupathi Rao N (2020) Object detection using
machine learning for visually ımpaired people. Int J Curr Res Rev 12(20):157–167
3. Zou X (2019) A review of object detection techniques. In: 2019 International conference on
smart grid and electrical automation (ICSGEA). IEEE, pp 251–254
4. Gullapelly A, Banik BG (2020) Exploring the techniques for object detection, classification,
and tracking in video surveillance for crowd analysis
5. Chen Z, Zhang T, Ouyang C (2018) End-to-end airplane detection using transfer learning in
remote sensing images. Remote Sens 10(1):139
6. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017)Feature pyramid networks
for object detection. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 2117–2125
7. Kumar KR, Prakash VB, Shyam V, Kumar MA (2016) Texture and shape based object detection
strategies. Indian J Sci Technol 9(30):1–4
8. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
conference on computer vision and pattern recognition, pp 7263–7271
9. Peng J, Wang D, Liao X, Shao Q, Sun Z, Yue H, Ye H (2020) Wild animal survey using UAS
imagery and deep learning: modified Faster R-CNN for kiang detection in Tibetan Plateau.
ISPRS J Photogramm Remote Sens 169:364–376
10. Ding B, Qian H, Zhou J (2018) Activation functions and their characteristics in deep neural
networks. In: 2018 Chinese control and decision conference (CCDC). https://doi.org/10.1109/
ccdc.2018.8407425
11. NarasingaRao MR, Venkatesh Prasad V, Sai Teja P, Zindavali Md, Phanindra Reddy O (2018)
A survey on prevention of overfitting in convolution neural networks using machine learning
techniques. Int J Eng Technol 7(2.32):177–180
12. Sharma SU, Shah DJ (2016) A practical animal detection and collision avoidance system using
computer vision technique. IEEE Access 5:347–358
13. Bochkovskiy A, Wang CY, Liao HY (2020) Yolov4: Optimal speed and accuracy of object
detection. arXiv preprint arXiv:2004.10934
14. Krishnaveni G, Bhavani BL, Lakshmi NV (2019) An enhanced approach for object detection
using wavelet based neural network. J Phys Conf Ser 1228(1):012032. IOP Publishing
15. Chen R, Little R, Mihaylova L, Delahay R, Cox R (2019) Wildlife surveillance using deep
learning methods. Ecol Evol 9(17):9453–9466
16. Chowdary MK, Babu SS, Babu SS, Khan H (2013) FPGA implementation of moving
object detection in frames by using background subtraction algorithm. In: 2013 International
conference on communication and signal processing. IEEE, pp 1032–1036
17. Fu K, Zhao Q, Gu IY, Yang J (2019) Deepside: a general deep framework for salient object
detection. Neurocomputing 356:69–82
An Optimized Convolutional Neural Network … 15
18. Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017)Deeply supervised salient object
detection with short connections. In: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp 3203–3212
19. Jia S, Bruce NDB (2019) Richer and deeper supervision network for salient object detection.
arXiv preprint arXiv:1901.02425
20. Ren S, He K, Girshick R, Sun J (2015)Faster r-cnn: towards real-time object detection with
region proposal networks. arXiv preprint arXiv:1506.01497
21. Liu G-H, Yang J-Y (2018) Exploiting color volume and color difference for salient region
detection. IEEE Trans Image Process 28(1):6–16
22. Yu L, Sun W, Wang H, Wang Q, Liu C (2018)The design of single moving object detection and
recognition system based on OpenCV. In: 2018 IEEE international conference on mechatronics
and automation (ICMA). IEEE, pp 1163–1168
23. Othman NA, Aydin I (2018)A new deep learning application based on movidius ncs for
embedded object detection and recognition. In: 2018 2nd international symposium on
multidisciplinary studies and innovative technologies (ISMSIT). IEEE, pp 1–5
24. Gasparovsky D (2018) Directions of research and standardization in the field of outdoor
lighting. In: 2018 VII. lighting conference of the Visegrad Countries (Lumen V4). IEEE,
pp 1–7
25. Guo Y, Guo X, Jiang Z, Zhou Y (2017)Cascaded convolutional neural networks for object
detection. In: 2017 IEEE visual communications and image processing (VCIP). IEEE, pp 1–4
26. Guo Y, Guo X, Jiang Z, Men A, Zhou Y (2017) Real-time object detection by a multi-feature
fully convolutional network. In: 2017 IEEE international conference on image processing
(ICIP). IEEE, pp 670–674
A Study on Current Research
and Challenges in Attribute-based Access
Control Model
Abstract Access control models are used to identify and detect anonymous users or
attacks when sharing big data or other resources in the distributed environment such as
cloud, edge, and fog computing. The attribute-based access control model (ABAC) is
a promising model used in intrusion detection systems. Comparing with the primary
access control models: discretionary access control model (DAC), mandatory access
control model (MAC), and role-based access control model, ABAC gets attention in
the current research due to its flexibility, efficiency, and granularity. Despite ABAC is
performing well in addressing the security requirements of today’s computing tech-
nologies, there are open challenges such as policy errors, scalability, delegations, and
policy representation with heterogeneous datasets. This paper presents the funda-
mental concepts of ABAC and a review of current research works toward framing
efficient ABAC models. This paper identifies and discusses the current challenges in
ABAC based on the study and analysis of the surveyed works.
1 Introduction
The intrusion detection system (IDS) is a software and protection mechanism used
in the security system to monitor, identify, and detect anonymous users’ attacks. The
primary roles of IDS are monitoring all incidents, watching logging information, and
reporting illegal attempts [1]. The increased quantity of malicious software gives
K. Vijayalakshmi (B)
Vels Institute of Science, Technology and Advanced Studies, Chennai, India
Arignar Anna Govt. Arts College, Cheyyar, India
V. Jayalakshmi
School of Computing Sciences, Vels Institute of Science, Technology and Advanced Studies,
VISTAS, Chennai, India
e-mail: jayasekar.scs@velsuniv.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 17
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_2
18 K. Vijayalakshmi and V. Jayalakshmi
dangerous challenges for researchers in designing efficient IDS. And also, there are
more security threats such as denial of service, data loss, data leakage, loss of data
confidentiality in the connected information technology. Hence, the security is an
important issue, and the design of efficient IDS is also a challenging task [2]. The
formal definition of IDS is introduced in 1980. IDS is mainly classified as misuse-IDS
and anomaly-based IDS. The misuse-IDS uses recognized patterns to detect illegal
access. The possible harmful and suspicious activities are stored as patterns in the
database. Based on the recognized patterns, this misuse-IDS monitors and detects
illegal activities. The anomaly-based IDS uses network behavior as a key to detect
the anonymous user or attacks. Thus, if network behavior is up to the predefined
behavior, then the access is granted; otherwise, the anomaly-based IDS generates
the alerts [3]. IDS uses access control models as an agent and the analysis engine to
monitor and identify the sign of intrusion [4]. The traditional IDS fails in addressing
the security attacks of today’s computing technologies like cloud computing, edge
computing, fog computing, and the Internet of Things (IoT). With the development
of the Internet and the usage of social networks, the resources and the users are
increasing exponentially. And, the attacks and security threats are also increased day
by day. Developing the IDS to meet all the security needs is a big challenge [5].
IDS implements an access control model to monitor and detect malicious intru-
sion. The implementation of a flexible and efficient access control model is an impor-
tant task for addressing todays’ complex security needs [6]. The access control model
is a software function and established with a set of security policies. The three main
operations of the access control model are authentication, authorization, and account-
ability. Authentication is the process of identifying the legal users based on the proof
of identity. The function of authorization is deciding whether to allow or deny the
request of the user. Accountability is the task of monitoring users’ activity within
the system and logging the information of these activities for further auditing [7].
Thus, access control model allows or denies the request of the user based on the
security policies. Many access control models have been proposed, and some got
great success in addressing the security needs, while some fail [8]. The discretionary
access control model (DAC) uses the access control list (ACL) for each shared
resource that specifies the access rights for the users [9]. DAC is a owners’ discre-
tionary model; thus, it allows the owner of the resource to create the ACL for his
resource. The mandatory access control model (MAC) uses security labels for both
user and the resource. MAC identifies the legal access or user based on the security
labels [10]. Both DAC and MAC give better performance when the number of users
and resources is limited. They failed in addressing the security issues of todays’
complex computing technologies. The role-based access control model (RBAC) is
proposed to address security attacks in large-scale applications [11, 12]. RBAC estab-
lishes two mappings: permissions-role and role-user. RBAC first assigns all feasible
access rights (permissions) to the role (job) of the user and then it assigns the role to
the user. Hence, the user can get the access rights up to the limit of his role. Many
versions of RBAC with the new technical concept have been proposed to refine and
improve the efficiency of the model [13, 14]. Despite RBAC is performing well, there
are some limitations like poor expressive power of policies and inability to address
A Study on Current Research and Challenges … 19
the dynamic and complex security needs of today’s computing technologies. The
attribute access control model (ABAC) is promising in addressing well-developed
and complex security attacks and threats [15]. The most challenging security attacks
are denial of service, account hijacking, data breach, and data loss [16, 17].
ABAC identifies and allows legal activities based on security policies. The secu-
rity policy is a set of security rules, and a rule is constructed with the attributes of
the subject (who requests the access), resource, and environmental conditions [18,
19]. Despite ABAC meets complex security needs, and some open challenges affect
the performance and efficiency of the model [20, 21]. In this paper, we described the
basic concepts of ABAC and presented a review of current research works toward
framing efficient ABAC models. And also, we identified and analyzed the important
challenges of policy errors, scalability, delegations, and policy representation with
heterogeneous datasets. Section 2 presents the related research works toward devel-
oping the ABAC model. Section 3 describes the fundamental concepts of the ABAC
model and presents a review on ABAC models. Section 4 categorizes and discusses
the current research works in ABAC. Section 5 discusses the current open challenges
in designing efficient ABAC model. Finally, we concluded in Sect. 6.
2 Literature Survey
Heitor Henrique and his team designed and implemented the access control model to
overcome the security problems in federated clouds (interconnected clouds). They
experimented in bioinformatics applications [22]. Muhammad Umar Aftab proposed
a hybrid access control model by combining the strengths and features of ABAC and
RBAC and removing the limitations of these two models. This hybrid model has
the features of the policy-structuring ability of ABAC and the high-security power
of RBAC [23]. Jian Shu Lianghong Shi proposed an extended access control model
by introducing action based on ABAC. The usage of multi-attributes with complex
structures is avoided in this model, and also, this model resolves the issues in dynamic
authorization and changes of access rights [24]. Bai and Zheng have done a survey on
access control models and provided a detailed analysis of the access control models
through the research on access control matrix, access control list, and policies [25].
Xin Jin and Ram Krishnan have proposed a hybrid ABAC model called ABACα
which can easily be configured to other dominating access control models DAC,
MAC, and ABAC. Thus, ABACα combines the strengths and features of DAC, MAC,
RBAC, and ABAC [26]. Canh Ngo and his team proposed a new version of ABAC by
incorporating complex security mechanisms for multi-tenant cloud services. They
extended their model for inter-cloud platforms [27]. Riaz Ahmed Shaikh proposed a
data classification method for ABAC policy validation. He proposed the algorithms
for detecting and resolving rule inconsistency and rule redundancy anomalies [28].
Daniel Servos and Sylvia L. Osborn gave a detailed review of ABAC and discussed
the current challenges in ABAC [29]. Maryem Ait El Hadj and his team proposed
a cluster-based approach for detecting and resolving anomalies in ABAC security
20 K. Vijayalakshmi and V. Jayalakshmi
ABAC model is a software and protection mechanism to monitor and identify the
intrusion of malicious users [42]. ABAC is established with a set of security policies.
The decision on the users’ requests is made based on the specified policy set. Each
ABAC policy is a set of security rules. ABAC allows or denies the request for
accessing the shared resource based on the security rules. Thus, ABAC model allows
only the legitimate users by checking their identity in two gates [43]. The first gate
is a traditional authentication process that verifies the common identities of the users
like username, password, and date of birth. The second gate is the ABAC model that
A Study on Current Research and Challenges … 21
Table 1 (continued)
References Technique Efficiency Limitations
Qi et al. [6] A hybrid model with Efficient in managing High computation
the features of RBAC static relationships time and complex
and ABAC and dynamic ABAC implementation
policies
Search
Articles related to Various databases such as
ABAC models Springer, Elsevier, IEEE,
Google Scholar, etc.
Download
verifies the users with more attributes like name of department, designation, resource
name, resource type, time. The common jargons in ABAC models are as follows:
Subject: The user who requests access to a shared resource is called the subject. The
subject may be a person (user), process, application, or organization.
Subject attributes {S1 , S2 , …, Sn }: The important properties or characteristics used
to describe the subject are referred to as subject attributes.
Example: {S1 , S2, S3 } = {Department, Designation, grade}.
A Study on Current Research and Challenges … 23
Subject attributes values {VS1 , VS2 , …, VSn }: The possible set of values (domain)
is assigned to the subject attributes {S1 , S2 , …, Sn }. Such that VSk = {sk v1 , sk v2 , …,
sk vn } is the value domain for attribute Sk , and Sk = {values ε VSk }.
Example: {VDepartment = {Cardiology, Hematology, Urology, Neurology}.
Subject value attribute assignment: The values of the subject attribute are assigned
as Sk = {values ε VSk }.
Example: Department = {Hematology, Urology} ∧ Designation = {Nurse, Doctor}.
Object: The shared resource is called the object.
Object attributes {A1 , A2 , …, An }: The important properties or characteristics used
to describe the object are referred to as object attributes.
Example: {O1 , O2, O3 } = {ResourceName, ResourceType, LastUpdatedOn}.
Object attributes values {VO1 , VO2 , …, VOn }: The possible set of values (domain)
is assigned to the object attributes {O1 , O2 ,.., On }. Such that VOk = {ok v1 , ok v2 , …,
ok vn } is the value domain for attribute Ok , and Ok = {values ε VOk }.
Example: {VResourceName = {Pat_007_Blood_Report, Pat_435_CBC_Report}.
Object value attribute assignment: The values of the object attribute are assigned
as Ok = {values ε VOk }.
Example: ResourceName = { Pat_435_CBC_Report} ∧ ResourceType =
{DataFile}.
Environmental condition: This category specifies the information about the
environmental conditions.
Environmental condition attributes {E1 , E2 , …, En }: The characteristics are used
to describe the environment.
Example: {E1 , E2, E3 } = {DateOfRequest, Time, PurposeOfRequest}.
Environmental condition attributes values {VE1 , VE2 , …, VEn }: The possible set
of values (domain) is assigned to the environment attributes {E1 , E2 ,.., En }. Such that
VEk = {ek v1 , ek v2 ,…,ek vn } is the value domain for attribute Ek , and Ek = {values
ε VEk }.
Example: {V Time = {07:12, 12:05, 08:16y}.
Environmental value attribute assignment: The values of the environmental
attribute are assigned as Ek = {values ε VEk }.
Example: Time = {07:12}.
ABAC rule is expressed as R = {Xop | {A1 e VA1 , A2 e VA2 , …, An e VAn }.
X is the decision (allow or deny) for the request of operation (read, write, print,
etc.). {A1 , A2 , …, An } is the list of attributes belonging to categories {subject,
object, environmental conditions}. VA1 , VA2 , VAn are the set of permitted values
24 K. Vijayalakshmi and V. Jayalakshmi
The above rule, R1, states that the persons who are all working as a surgeon
or chief doctor belonging to the department of cardiology can read the file
Pat_567_CBC_Report.
ABAC policies can be written using access control policy languages. Most ABAC
implementation uses extensible access control markup language (XACML) to
express the ABAC policies [44]. Organization for the Advancement of Struc-
tured Information Standards (OASIS) created a standard for XACML based on
XML concepts in 2002. OASIS also developed security assertion markup language
(SAML) in 2005 for the specification of security policies [45]. The security policy
set of the ABAC model can also be expressed by JavaScript Object Notation (JSON).
In XACML, each attribute is expressed with the pair (attribute’s name, attribute’s
value) using a markup language. The ABAC policy set can be expressed by XACML
as follows:
<PolicySet> <Object>
<Policy PolicyID=”P1”> <ResourceName>
<Rule RuleID=”R1” Decision=”Allow” > PatID_005_CBCe_Report
<Operation> </ ResourceName >
<Operation-1>read</Operation-1> </Object>
<Operation-2>write</Operation-2> <EnvironmentalCondition>
</Operation> <Duration>07:12</Duration>
<Subject> </EnvironmentalCondition>
<Department>dermatology</Department> </Rule>…// more rules can be specified
<Designation>chief doctor<Designation> </Policy> // more policies can be specified
</Subject> </PolicySet>
In the above example, the rule R1 states that security policy allows the
chief doctor in the department of dermatology to read and write the file
“PatID_005_CBC_Report” during the time 07:12 h.
A Study on Current Research and Challenges … 25
The research on designing ABAC models either the original model or hybrid models
is getting great attention. The design of the original model is a purely new attribute-
based access control model not an extended model of any previous access control
model. The design of the original access control model may be general or domain-
specific models. The hybrid models are designed by the combined features or
strengths of two or more existing models. Table 2 shows the type of ABAC models.
The researches toward the implementation of ABAC models have also great impact
and interest in todays’ communication technology. Comparing to the researches
on designing ABAC models, the researches on the implementation is in the next
place to the design of ABAC models. The framework for the implementation of the
ABAC model comprises several functional components like representation of ABAC
policies in any one of the access control languages (XACML, SAML, ORACLE,
MySQL, or others), establishing security policies, storing and managing policies
and metadata, and testing and maintenance of the framework.
The researches toward the development, testing, and validation of ABAC security
policies are also getting great attention. The researcher has an equivalent interest
in policy-related tasks and the implementation of the ABAC model. The previous
and current research contributions on policies are preserving the consistency and
confidentiality of the policies, flexible and efficient policymaking, testing policies,
detecting anomalies, and validating policy anomalies.
The literature review describes that there are also more research contributions on
determining and specifying attributes in the policies. The research on policy attributes
involves preserving confidentiality, adding more attributes to improve the security
level, flexible attribute specification, storing and managing attributes. Figure 3 shows
the evolution of ABAC research, and Fig. 4 shows the research rate of each category
of ABAC.
A Study on Current Research and Challenges … 27
5 Challenges in ABAC
The main critical issues are anomalies or conflicts in the security policies. The
policy errors cause dangerous security issues like denial of service, data loss, or data
breach. The primary policy errors are rule redundancy and rule discrepancy [49].
The rule redundancy errors consume high storage space and increase the complexity
28 K. Vijayalakshmi and V. Jayalakshmi
in updating security policies [50]. The rule discrepancy error provides confusion
in granting permissions to the users. This error causes unavailability of the shared
resource or illegal access.
5.2 Scalability
5.3 Delegations
The most essential feature of access control models is delegation. The delegation
feature allows one subject to grant (delegate) certain permissions (access rights)
to the other subjects. Due to frequent and dynamic changes of attributes and poli-
cies, achieving dynamic delegation is more complex [52]. The delegation requires
constant policies with constant attributes and role-user assignments. The researchers
are struggling to fulfill the requirement of dynamic delegation.
5.4 Auditability
Another important and necessary aspect of all security systems and access control
models is auditing. The term auditing refers to the ability to determine the number
of subjects who has got particular access rights (read, write, or share) for a certain
object, or the particular subject has got access rights for how many objects. ABAC
never maintains the identity of the users [42]. The users are unknown, and they get
the access rights if their attributes are satisfied with the predefined ABAC policies.
Thus, it is more difficult to determine the number of users for a particular object and
the number of objects allowed for access to a particular user.
A Study on Current Research and Challenges … 29
6 Conclusion
With the help of the Internet, communication, and information technology, the
number of users and resources is growing rapidly. Hence, the security is an essen-
tial, critical, and challenging concept. Many access control models play a vital role
in addressing security threats and attacks like denial of service, account hijacking,
and data loss. ABAC is getting more attention from the researchers, due to its flex-
ibility and efficiency. This paper has presented the fundamental concepts of the
ABAC model and the taxonomy of research in ABAC. This paper has categorized
and described each category of ABAC research. This article also discussed the chal-
lenges in ABAC models. This review work may help the researchers and practitioners
toward attaining knowledge of ABAC models, implementation, policies, attributes,
and challenges in ABAC.
References
1. Kumar A, Maurya HC, Misra R (2013) A research paper on hybrid intrusion detection
system.Int J Eng Adv Technol 2(4):294–297
2. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion detection
systems: techniques, datasets and challenges. Cybersecurity 2(1). https://doi.org/10.1186/s42
400-019-0038-7
3. Hydro C et al (2013) We are IntechOpen, the world ’ s leading publisher of Open Access books
Built by scientists, for scientists TOP 1 %. INTECH 32(July):137–144
4. Liang C et al (2020) Intrusion detection system for the internet of things based on blockchain
and multi-agent systems. Electrononics 9(7):1–27. https://doi.org/10.3390/electronics9071120
5. Varal AS, Wagh SK (2018) Misuse and anomaly intrusion detection system using ensemble
learning model. In: International conference on recent innovations in electrical, electronics &
communication engineering ICRIEECE 2018, pp. 1722–1727. https://doi.org/10.1109/ICRIEE
CE44171.2018.9009147
6. Qi H, Di X, Li J (2018) Formal definition and analysis of access control model based on role
and attribute. J Inf Secur Appl 43:53–60. https://doi.org/10.1016/j.jisa.2018.09.001
7. Suhendra V (2011) A survey on access control deployment. In: Communication in computer and
information science, vol 259 CCIS, pp 11–20. https://doi.org/10.1007/978-3-642-27189-2_2
8. Sahafizadeh E (2010) Survey on access control models, pp 1–3
9. Conrad E, Misenar S, Feldman J (2016) Domain 5: identity and access management (Control-
ling Access And Managing Identity). In: CISSP Study Guid, pp 293–327. https://doi.org/10.
1016/b978-0-12-802437-9.00006-0
10. Xu L, Zhang H, Du X, Wang C (2009) Research on mandatory access control model for
application system. In: Proceedings of international conference on networks security, wireless
communications and trusted computing NSWCTC 2009, vol 2, no 1, pp 159–163. https://doi.
org/10.1109/NSWCTC.2009.322
11. Sandhu RS et al (1996) Role based access control models. IEEE 6(2):21–29. https://doi.org/
10.1016/S1363-4127(01)00204-7
12. Sandhu R, Bhamidipati V, Munawer Q (1999) The ARBAC97 model for role-based admin-
istration of roles. ACM Trans Inf Syst Secur 2(1):105–135. https://doi.org/10.1145/300830.
300839
13. Sandhu R, Munawer Q (1999) The ARBAC99 model for administration of roles. In: Proceed-
ings 15th annual computer security applications conference, vol Part F1334, pp 229–238.
https://doi.org/10.1109/CSAC.1999.816032
30 K. Vijayalakshmi and V. Jayalakshmi
14. Hutchison D (2011) Data and applications security and privacy XXV. In: Lecture notes
computer science, vol 1, pp 3–18. https://doi.org/10.1007/978-3-319-20810-7
15. Crampton J, Morisset C (2014) Monotonicity and completeness in attribute-based access
control. In: LNCS 8743,Springer International Publication, pp 33–34
16. Prakash C, Dasgupta S (2016) Cloud computing security analysis: challenges and possible
solutions. In: International conference on electrical, electronics, and optimization techniques
ICEEOT 2016, pp 54–57. https://doi.org/10.1109/ICEEOT.2016.7755626
17. Markandey A, Dhamdhere P, Gajmal Y (2019) Data access security in cloud computing:
a review. In: 2018 International conference on computing, power and communication
technologies GUCON 2018, pp 633–636. https://doi.org/10.1109/GUCON.2018.8675033
18. Que Nguyet Tran Thi TKD, Si TT (2017) Fine grained attribute based access control model
for privacy protection. Springer International Publication A, vol 10018, pp 141–150. https://
doi.org/10.1007/978-3-319-48057-2
19. Vijayalakshmi K, Jayalakshmi V (2021) Analysis on data deduplication techniques of storage of
big data in cloud. In: Proceedings of 5th international conference on computing methodologies
and communication ICCMC 2021. IEEE, pp 976–983
20. Vijayalakshmi K, Jayalakshmi V (2021) Identifying considerable anomalies and conflicts
in ABAC security policies. In: Proceedings of 5th international conference on intelligent
computing and control systems ICICCS 2021. IEEE, pp 1286–1293
21. Vijayalakshmi K, Jayalakshmi V (2021) A similarity value measure of ABAC security rules.
In: Proceedings of 5th international conference on trends electronics and informatics ICOEI
2021, IEEE
22. Costa HH, de Araújo AP, Gondim JJ, de Holanda MT, Walter ME (2017) Attribute based
access control in federated clouds: A case study in bionformatics. In: Iberian conference on
information systems and technologies CIST. https://doi.org/10.23919/CISTI.2017.7975855
23. Aftab MU, Habib MA, Mehmood N, Aslam M, Irfan M (2016) Attributed role based access
control model. In: Proceedings of 2015 conference on information assurance and cyber security
CIACS 2015, pp 83–89. https://doi.org/10.1109/CIACS.2015.7395571
24. Shu J, Shi L, Xia B, Liu L (2009) Study on action and attribute-based access control model for
web services. In: 2nd International symposium on information science and engineering ISISE
2009, pp 213–216. https://doi.org/10.1109/ISISE.2009.80
25. Bai QH, Zheng Y (2011) Study on the access control model in information security. In: Proceed-
ings of 2011 cross strait quad-regional radio science wireless technology conference CSQRWC
2011, vol 1, pp 830–834. https://doi.org/10.1109/CSQRWC.2011.6037079
26. Jin X, Krishnan R, Sandhu R (2012) A unified attribute-based access control model covering
DAC, MAC and RBAC BT. In: Lecture notes in computer science, vol 7371, pp 41–55
27. Ngo C, Demchenko Y, De Laat C (2015) Multi-tenant attribute-based access control for cloud
infrastructure services. https://doi.org/10.1016/j.jisa.2015.11.005
28. Shaikh RA, Adi K, Logrippo L (2017) A data classification method for inconsistency and
incompleteness detection in access control policy sets. Int J Inf Secur 16(1):91–113. https://
doi.org/10.1007/s10207-016-0317-1
29. Servos D, Osborn SL (2017) Current research and open problems in attribute-based access
control. ACM Comput Surv (CSUR) 49(4):1–45. https://doi.org/10.1145/3007204
30. El Hadj MA, Ayache M, Benkaouz Y, Khoumsi A, Erradi M (2017) Clustering-based approach
for anomaly detection in xacml policies. In: ICETE 2017—proceedings of 14th international
joint conference on E-business telecommunication, vol 4, no Icete, pp 548–553. https://doi.
org/10.5220/0006471205480553
31. Pussewalage HSG, Oleshchuk VA (2017) Attribute based access control scheme with controlled
access delegation for collaborative E-health environments. J Inf Secur Appl 37:50–64. https://
doi.org/10.1016/j.jisa.2017.10.004
32. Afshar M, Samet S, Hu T (2018) An attribute based access control framework for healthcare
system. J Phys Conf Ser 933(1). https://doi.org/10.1088/1742-6596/933/1/012020
33. Fu X, Nie X, Wu T, Li F (2018) Large universe attribute based access control with efficient
decryption in cloud storage system. J Syst Softw 135:157–164. https://doi.org/10.1016/j.jss.
2017.10.020
A Study on Current Research and Challenges … 31
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 33
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_3
34 S. J. Mohammed and N. Radhika
2 Background
In this section, some of the recent techniques that have been used for denoising
speech using artificial intelligence models and algorithms have been surveyed.
The deep autoencoder model (DAE) described in this paper [1] has been utilized to
perform dimensionality reduction, face recognition, and natural language processing.
The author studied utilizing a linear regression function to create the DDAE model’s
decoder (termed DAELD) in this research and evaluated the DAELD model on
two speech enhancement tasks (Aurora-4 and TIMIT datasets). The encoder and
decoder layers of the DAELD model are used to transform speech signals to high-
dimensional feature representations and then back to speech signals. The encoder
consists of nonlinear transformation, and the decoder consists of the linear regression
algorithm. The author had proved that utilizing linear regression in the decoder part,
he was able to obtain improved performance in terms of PESQ and STOI score
values.
In this paper [2], using deep neural networks, the author used an ideal binary
mask (IBM) as a binary classification function for voice improvement in complex
noisy situations (DNNs). During training, IBM is employed as a target function, and
trained DNNs are used to estimate IBM during the augmentation stage. The target
speech is then created by applying the predicted target function to the complex noisy
mixtures. The author had proved that a deep neural network model with four hidden
layers and with the mean square error as its loss function provides an average seven
percent improvement in the speech quality.
This paper [3] provides a deep learning-based method for improving voice
denoising in real-world audio situations without the need for clean speech signals in
a self-supervised manner. Two noisy realizations of the same speech signal, one as
the input and the other as the output, are used to train a fully convolutional neural
network as proposed by the author using LeakyReLU as the activation function as
the author had mentioned that it would help speeding the training processes. Thus,
LeakyReLU had been selected as the activation function for the proposed model of
this paper.
In this paper [4], in the spectro-temporal modulation domain, the author presented
a simple mean squared error (MSE)-based loss function for supervised voice enhance-
ment. Because of its tight relationship to the template-based STMI (spectro-temporal
modulation index), which correlates well with speech intelligibility, this terms the
loss spectro-temporal modulation error (STME). In the training and test sets, the
author used a small-scale dataset with 9.4 hours and 35 min of loud speech, respec-
tively. The author used the Interspeech2020 deep noise suppression (DNS) dataset
for the large-scale dataset. The author’s model consists of four fully connected layers
connected with two stacked gated recurrent units between the first and the second
layer. Unlike the proposed model of this paper, the author of paper [4] had built a
speech enhancement model on the modulation domain.
36 S. J. Mohammed and N. Radhika
In this paper [5], deep neural networks had been developed by the author to classify
spoken words or environmental sounds from audio. After that, the author trained
an audio transform to convert noisy speech to an audio waveform that minimized
the recognition network’s “perceptual” losses. For training his recognition network
with perceptual loss as the loss function, the author utilized several wave UNet
architectures and obtained PESQ score of 1.585 and STOI score of 0.773 as the
highest score reached by the author for various proposed architectures with a similar
architecture to that of the UNet model.
This Stanford paper [6] by the author Mike Kayser has come up with two different
approaches for audio denoising, and the first method is to provide the noisy spectro-
gram to a convolutional neural network and obtain a clean output spectrogram. The
clean spectrogram is used to generate mel-frequency cepstral coefficient (MFCC). In
the second method proposed by the author, the noisy spectrogram is given as an input
to the multilayer perceptron network which is in turn connected to a convolutional
neural network. This combined network learns and predicts the MFCC features. The
author has also concluded from his experiments that for various architectures, tanh
activation function gives better results when training audio spectrograms compared
to that of rectified linear units.
In this paper [7], the author proposes Conv-TasNet, a deep learning system for
end-to-end time-domain speech separation, as a fully convolutional time-domain
audio separation network (Conv-TasNet). Conv-TasNet generates a representation
of the speech waveform that is optimized for separating distinct speakers using a
linear encoder. The encoder output is subjected to a collection of weighting functions
(masks) to achieve speaker separation.
Thus, in order to propose a deep neural network model and to improve the perfor-
mance of the designed model, the above literature survey was done. From paper
[2], it shows that the presence of hidden layers can improve the performance of the
model. Utilizing LeakyReLU as mentioned in paper [3] reduces the training time of
the proposed model, and paper [6] shows that tanh activation function can improve
the performance of the denoising model. Paper [5] shows how UNet model architec-
ture can be used for building the audio denoising model. Combining this extracted
information done from the literature survey, a deep neural network model which is a
hybrid of UNet model and dense layers has been proposed and explained in the next
section.
3 Methodology
In this section, a description of the dataset and a detailed explanation of the model
architecture are provided.
Audio Denoising Using Deep Neural Networks 37
3.1 Dataset
The proposed neural network structure was constructed with the UNet model as the
base, and this network architecture was modified for working with spectrograms, and
the last five layers of this network architecture comprise dense layers. The overall
working of the entire system has been shown in the form of a block diagram shown
in Fig. 2.
The deep neural network model is similar to that of the UNet architecture. The
UNet architecture has been chosen for this application because UNet architectures are
normally used in image segmentation problems, which are similar to the denoising of
audio file application as the network has to identify and segment out the clean audio
from the incoming noise audio file. The constructed model has two major portions.
The first major portion of the neural network is known as the contracting portion,
and the second major portion is called the expansive portion. The expansive portion
has five dense layers present at the end of the architecture as shown in Fig. 3.
Figure 3 shows the architecture diagram of the proposed model. The output from
the proposed model gives the value of negative audio noise value which is then
subtracted from the noisy speech audio spectrogram to produce denoised audio file.
38 S. J. Mohammed and N. Radhika
4 Experimental Setup
Dataset Preprocessing. The audio files cannot be used as such for training as the
noisy speech data should be synthesized by randomly combining both the ESC-50
dataset and the LibriSpeech dataset. The audio files are first converted into NumPy
which is then converted into a spectrogram matrix. For converting an audio file into
a NumPy matrix, the following parameters had been initially set,
1. sampling rate = 8000
2. frame length = 255
Audio Denoising Using Deep Neural Networks 39
This sub-section explains the evaluation metrics utilized for evaluating the perfor-
mance of the proposed model. Upon literature survey, various authors [1–5] have
utilized the same evaluation metric for evaluation of denoised audio.
PESQ Score. PESQ [10] refers to perceptual evaluation of speech quality which
is defined by International Telecommunications Union recommendation P.862. This
score value ranges from 4.5 to −0.5 where greater score value indicates better audio
quality. The PESQ algorithm is normally used in the telecommunications industry
for objective voice quality testing by phone manufacturers and telecom operators.
Higher the PESQ score, better the audio quality. For obtaining the PESQ score of
the model, both clean speech audio file and the noisy speech audio file are required
for obtaining the PESQ score. PESQ library has been used for this purpose.
STOI Score. STOI [11] refers to short term objective intelligibility which is a
metric that is used for predicting the intelligibility of noisy speech. This does not
evaluate the speech quality (as speech quality is normally evaluated in silence) of
the audio, but this returns a value between 0 and 1 where 1 being the highest score
where the noisy speech can be understood easily. The pystoi library has been used for
obtaining the STOI score of the models, and similar to the PESQ metric, the STOI
score computation also requires the presence of clean speech audio and the noisy
speech audio file.
40 S. J. Mohammed and N. Radhika
The synthesized noisy speech is converted into spectrogram and along with its pair
of clean speech audio. These spectrograms are then given as input training data to
the proposed model. The input dataset is split into 80% training data and 20% testing
for training the model. Adam optimizer function is used, and mean squared error
loss function is used in the proposed model. The model stops once training when the
validation loss starts to increase. This is necessary in order to avoid overfitting of
the model. Once the model is trained, an input noisy speech audio in the form of its
matrix spectrogram is given as input to the model, and the predicted output values
are obtained. The predicted output values are then subtracted from the noisy speech
audio spectrogram in order to obtain the clean speech audio spectrogram which is
then converted to audio file.
The base UNet model [12] has been implemented on the same dataset for comparison
purposes. Figure 9 shows the training graph for the UNet model. UNet model [12]
has been chosen as this architecture’s contracting and symmetrical expanding path
helps in distinct localization with constructing the model with less training data.
The UNet model performs best when it comes to image segmentation problems. The
audio files are converted into spectrograms [13], which are later converted into a
NumPy array at the time of training the model. This process is similar to the way
of handling image files where the images are converted into NumPy arrays when
feeding as training data into the model.
From the above Fig. 4, there is a slight increase in validation loss, after the third
epoch. This shows that the model has stopped learning. On increasing the number
of epochs above 4, the model starts overfitting as the training loss starts increasing.
The UNet model implementation was done using standard hyperparameters.
For this implementation purpose, a different external audio noisy speech file was
generated and used for obtaining the evaluation metric of the model. Figures 5 and
6 show the graphical representation of the input audio file.
Results. The following are the test results that were obtained for testing the UNnet
model for the above noise voice file. Figures 7 and 8 show the graphical representation
of the output audio file. The output spectrogram file from Fig. 8 shows only fewer
and clean spikes of red with a deeper blue background compared to that of the input
file’s spectrogram from Fig. 6. This shows visually the absence of noise in the output
spectrogram.
The evaluation metric obtained for the implementation of the UNet model is,
1. STOI score is 0.738.
2. PESQ score is 1.472.
Initially, the proposed model has been trained on default hyperparameter values. The
number of epochs has been set to 4 because if the number of epochs is increased, the
proposed model starts overfitting, and the validation loss starts increasing, denoting
that the model has stopped learning. LeakyReLU has been utilized as the activation
function based on the results obtained from this paper [14] for the ESC-50 dataset
[9]. Mean squared error loss function is utilized since this is a prediction problem
and not a classification problem along with Adam [15] optimizer. Hyperparameter
values are,
1. number of epochs = 4
2. activation function = LeakyReLU
3. optimizer = Adam
4. loss = mean squared error
5. sampling rate = 8000.
Results. For the above hyperparameter values, the model was trained and the
following evaluation metric results were obtained.
1. STOI score is 0.727.
2. PESQ score is 1.681.
The obtained evaluation metrics do not provide a drastic change compared to the
evaluation metric values obtained from the implementation of the UNet model for
comparison purposes. The presence of dense layers present in the proposed archi-
tecture did not bring change in the evaluation metric except for a slight increase in
PESQ score. In order to boost the performance of the proposed model, tuning of the
hyperparameter values must be done.
Audio Denoising Using Deep Neural Networks 43
Hyperparameter tuning. According to the author Mike Kayser in his paper [6]
“Denoising convolutional autoencoders for noisy speech recognition,” the author had
proved that the tanh activation function yields better when a deep learning model
is trained on audio data; hence, the activation function had been changed to the
proposed architecture. The visual representation of the input noise speech audio file
is shown in Figs. 9 and 10.
In order to obtain improved results from implementing the proposed model, the
sampling rate of the audio file had been increased from 8000 to 16,000 Hz during
the training of the proposed model, as sampling rate defines the number of samples
per second taken from a continuous signal to make it a discrete or digital signal.
Thus, on increasing the sampling rate, the number of samples utilized for training
the proposed model increases which helps in learning the features of the audio file.
The proposed model had been trained for 3 epochs with the following hyperparameter
values. Hyperparameters values are,
1. number of epochs = 3
2. activation function = LeakyReLU, tanh
3. optimizer = Adam
4. loss = mean squared error
5. sampling rate = 16,000.
For the above hyperparameter values, Fig. 11 shows the learning curve diagram.
44 S. J. Mohammed and N. Radhika
From the above training graph from Fig. 11, it is observed that the rate of validation
loss values slightly increases after the second epoch, but the training loss decreases
at a faster rate further. This shows that the model had slowly started to overfit based
on the given input training data.
If the model is trained above 3 epochs, the model starts overfitting, and this can be
visually seen when the training loss keeps decreasing, but the validation loss values
saturates. The following results were obtained for testing the model on a random
noise voice file. Figures 12 and 13 show the graphical representation of the output
audio file.
Results. On visual comparison between the input spectrogram and output spec-
trogram, the brightness of red is slightly reduced in areas where noises are present,
but it is not completely removed. From this, it is inferred that the denoised audio file
does have noise present in it but at a lower magnitude compared to that of what the
noise file had initially. The evaluation metric obtained for this experiment is,
1. STOI score is 0.756.
2. PESQ score is 1.905.
From the obtained evaluation metric, it is observed that by increasing the sampling
rate of the audio file at the time of training the proposed model, changing the activation
function to tanh has increased the performance of the proposed model. Comparing
the spectrograms between Figs. 13 and 10, there is a drastic visual difference in the
shade of red, showing the magnitude of the noise has still further reduced.
Table 1 shows the condensed form of all results obtained, and from this, we can
clearly see that upon changing the hyperparameter values, improvised results can
be obtained. The table also clearly shows comparison of already existing models
from WaveNet model and A1 + W1 model from this paper [5]. It is observed that
the proposed model has better evaluation metrics compared to that of the existing
models and variations of the UNet model.
6 Inference
From Table 1, we can infer that the proposed deep neural network model works
better than the standard UNet architecture due to the presence of dense layers present
in the proposed model. Dense layers are normally utilized to identify unlabeled or
untagged features compared to that of a standard convolutional layer, where layer can
accurately learn the marked or highlighted features. Moreover, the proposed model
with tanh activation function increases the performance of the model. Further, the
performance of the model can still be increased with increase in sampling rate of the
audio file. Increased sampling rate refers to increasing number of samples obtained
for once second present in the audio; thus, more detailed features are present for the
proposed model to learn, hence increased performance of the model.
46 S. J. Mohammed and N. Radhika
7 Conclusion
In this project, a deep neural network model has been proposed and experimented
with, which enhances the speech and denoises multiple kinds of noises present in for
any given audio file. The proposed model shows significant improvement in terms of
PESQ and STOI as audio spectrograms of clean speech audio files, and synthesized
speech noise audio files are used as training data. The results from experiments
where we obtain a STOI value of 0.756 and a PESQ score of 1.905 show how the
presence of dense layers with tanh activation function and the increased sampling
rate (from 8000 to 16,000 Hz) during training can significantly improve the results
of the proposed model.
References
1. Zezario RE, Hussain T, Lu X, Wang H-M, Tsao Y (2020) Self-supervised denoising autoen-
coder with linear regression decoder for speech enhancement. In: ICASSP 2020—2020 IEEE
international conference on acoustics, speech and signal processing (ICASSP), pp 6669–6673
https://doi.org/10.1109/ICASSP40776.2020.9053925
2. Saleem N, Khattak MI (2019) Deep neural networks for speech enhancement in complex-noisy
environments. Int J Interact Multimed Artif Intell InPress, p 1. https://doi.org/10.9781/ijimai.
2019.06.001
3. Alamdari N, Azarang A, Kehtarnavaz N (2020) Improving deep speech denoising by
noisy2noisy signal mapping. Appl Acoust (IF 2.440) Pub Date 16 Sept 2020. https://doi.org/
10.1016/j.apacoust.2020.107631
4. Vuong T, Xia Y, Stern RM (2021) A modulation-domain loss for neural-network-based real-
time speech enhancement. In: ICASSP 2021—2021 IEEE international conference on acous-
tics, speech and signal processing (ICASSP), pp 6643–6647. https://doi.org/10.1109/ICASSP
39728.2021.9414965
5. Saddler M, Francl A, Feather J., Kaizhi A, Zhang Y, McDermott J (2020). Deep network
perceptual losses for speech denoising
6. Kayser M, Zhong V (2015) Denoising convolutional autoencoders for noisy speech recognition.
CS231 Stanford Reports, 2015—cs231n.stanford.edu
7. Luo Y, Mesgarani N (2019) Conv-tasnet: Surpassing idealtime–frequency magnitude masking
for speech separation. IEEE/ACM Trans Audio Speech Lang Process 27(8):1256–1266
8. Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on
public domain audio books. In: 2015 IEEE international conference on acoustics, speech and
signal processing (ICASSP), pp 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964
9. Piczak KJ (2015) ESC: dataset for environmental sound classification. https://doi.org/10.7910/
DVN/YDEPUT, Harvard Dataverse, V2
10. Rix A (2003) Comparison between subjective listening quality and P.862 PESQ score
11. Taal CH, Hendriks RC, Heusdens R, Jensen J (2010) A short-time objective intelligibility
measure for time-frequency weighted noisy speech. In: ICASSP, IEEE international conference
on acoustics, speech and signal processing—proceedings, pp 4214–4217. https://doi.org/10.
1109/ICASSP.2010.5495701
12. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image
segmentation. LNCS 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28
13. French M, Handy R (2007) Spectrograms: turning signals into pictures. J Eng Technol 24:32–35
14. Zhang X, Zou Y, Shi W (2017) Dilated convolution neural network with LeakyReLU for
environmental sound classification, pp 1–5. https://doi.org/10.1109/ICDSP.2017.8096153
Audio Denoising Using Deep Neural Networks 47
15. Kherdekar S (2021) Speech recognition of mathematical words using deep learning. In: Recent
trends in image processing and pattern recognition. Springer Singapore, pp 356–362
16. Pandey A, Wang DL (2019) A new framework for cnn-based speech enhancement in the time
domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188
17. Zhao Y, Xu B, Giri R, Zhang T (2018) Perceptually guided speech enhancement using deep
neural networks. In: 2018 IEEE international conference on acoustics, speech and signal
processing (ICASSP), IEEE, Calgary, AB, pp 5074–5078
18. Martin-Donas JM, Gomez AM, Gonzalez JA, Peinado AM (2018) A deep learning loss function
based on the perceptual evaluation of the speech quality. IEEE Signal Process Lett 25(11):1680–
1684
19. Mohanapriya SP, Sumesh EP, Karthika R (2014) Environmental sound recognition using
Gaussian mixture model and neural network classifier. In: International conference on green
computing communication and electrical engineering (ICGCCEE)
20. Kathirvel P, Manikandan MS, Senthilkumar S, Soman KP (2011) Noise robust zerocrossing
rate computation for audio signal classification. In: TISC 2011—proceedings of the 3rd
international conference on trendz in information sciences and computing, Chennai, pp 65–69
21. Manoj C, Magesh S, Sankaran AS, Manikandan MS (2011) Novel approach for detecting
applause in continuous meeting speech. In: ICECT 2011—2011 3rd international conference
on electronics computer technology, Kanyakumari, vol 3, pp 182–186
22. Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio
conversation based on text and speech mining. In: Proceedings of the international conference
on information and communication technologies (ICICT), Procedia Computer Science
23. Raj JS (2020) Improved response time and energy management for mobile cloud computing
using computational offloading. J ISMAC 2(1):38–49
24. Suma V, Wang H (2020) Optimal key handover management for enhancing security in mobile
network. J Trends Comput Sci Smart Technol (TCSST) 2(4):181–187
Concept and Development of Triple
Encryption Lock System
Abstract The main aim of the triple encryption lock system is to surge the concept
of security and to illuminate threats, and it allows higher authorities to authorise
the concerned person to access the restricted areas. The issue of accessing highly
authorised areas is paramount in all places. This system is suitable for server rooms,
examination cells, home security and highly secured places. It is designed in such a
way that the door has three encryptions—DTMF, password security and fingerprint
sensing. We have designed it in such a way that the circuit will be in an OFF condition.
The user sends a signal to audio jack frequency, then the relay is triggered, and it
moves to the other two encryptions—keypad and fingerprint sensing. The attempt of
our encryption system is that the microcontroller gets turned on only when the signal
is sent from the user so that 24 h of heating issues are resolved. The real benefit is that
it provides significant changes in accessing highly authorised areas and can bring a
great change in the security system.
1 Introduction
Security being the main intent of the project, the most important application is to
provide security in home, examination cells, manufacturing units etc. Shruti Jalapur,
Afsha Maniyar published an article on “DOOR LOCK SYSTEM USING CRYP-
TOGRAPHIC ALGORITHMS BASED ON IOT” in which the secured locking is
achieved with AES-128 (Advanced Encryption Standards) and SHA-512 (Secure
Hashing Algorithm). Hardwares such as Arduino, servo motor, Wi-Fi module
and keypad have been used to obtain the proposed locking system [1]. Neelam
Majgaonkar et al. proposed a door lock system based on Bluetooth technology, but
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 49
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_4
50 A. F. Ahamed et al.
the communication range of a Bluetooth module is too low in comparison with Wi-Fi
or GSM communication [2]. Harshada B. More et al. made a deep technical survey
on face detection, face recognition and door locking system. Eigenfaces algorithm
is one of the algorithms that is mainly used for face recognition. The recognised
face is compared with the prefetched face in order to lock and unlock the door [3].
Varasiddhi Jayasuryaa Govindraj et al. proposed a smart door using biometric NFC
band and OTP-based methods in which Arduino, biometric, OTP, NFC, RFID and
GSM module are used. In this, NFC band has been used as one of the methods for
registered members and OTP technology is used for the guest user [4]. We have
analysed and studied the locks of various manufacturing companies such as Godrej
and Yale. On the basis of overall study, microcontrollers in all the systems have
been switched on for 24 h which results in heating issues and reduces the lifetime of
the system. So we are suggesting an efficient development over the current locking
system with high security without heating issues. As the name defines the meaning,
triple encryption lock system, primarily has three encryptions, i.e., DTMF module,
password security and fingerprint sensor. As it is a secure and safe lock system, it
consists of an electronic control assembly which ensures that safety is only in the
hands of the authorities. Two things that happen in authenticated places are to provide
security and easy access for unlocking the door, to be accessed only by the specific
person with the user control. Dual-tone multiple frequency module, password secu-
rity and fingerprint sensor are attached to the door to achieve the proposed system.
This paper will give a vivid idea about the mechanism of each encryption, flow chart
of the system and its working.
2 Objective
The main objective of the paper is to bring out the study about the three encryptions
(dual-tone multiple frequency module, password security and fingerprint sensor) in
a generalised perspective. Authorisation is the process of verifying the credentials of
a person and granting permission to access. In such a case, our system could be able
to award authorisation to a higher degree. This system not only provides security for
homes but also for other authenticated places. The report supplies information about
the techniques and working in each encryption.
3 Methodology
The report is structured by identifying the importance and demand of security in door
locking and unlocking. This system involves electrical work to achieve our idea. The
design methodology of the system consists of various steps. A single operator can use
this system in minutes. First, the user’s problem in security is planned to achieve the
desired system. Problems in existing system is analysed, then the essential method is
Concept and Development of Triple … 51
Fig. 1 Workflow of
designing a triple encryption
lock system
the electrical part using which workflow and functional block diagram is obtained.
To integrate the system to any existing structure of design, microcontroller and motor
is selected accordingly. The testing of code is done. The final prototype is developed
for effective target to access the highly authenticated places. This enables the user to
enter into highly authenticated areas (Figs. 1 and 2).
See Table 1.
52 A. F. Ahamed et al.
This project uses DTMF technology for opening and closing of doors. Positive
terminal of the LED is connected to the output pin of the decoder, negative terminal
of the LED is connected to the ground of the decoder. Similarly, the mobile to DTMF
decoder is connected by the auxiliary cable. Every numeric button on the keypad of
the mobile phone generates a unique frequency when pressed. The user presses the
keys and signal is sent via to the audio jack of the mobile. DTMF decoder decodes
the audio signal. When the signal comes from the mobile, the corresponding value
of the frequency selects their function and performs it. The positive and negative
terminal of the 9 V battery is connected to the Vcc and ground, respectively. When
Concept and Development of Triple … 53
number “1” is pressed from the authorised person from a faraway locked spot, the
user mobile receives the frequency and the microcontroller gets turned on (Fig. 3).
The keypad consists of a set of push buttons. The encryption is made in such a
way that the entered pin number compares with the preprogramed pin number. The
keypad lock works with the 3-digit code which is 888. Once you place the correct
combination in the keypad, the door gets unlocked. The door will remain closed upon
entering the wrong pin number (Fig. 4).
The Vcc of the fingerprint sensor is connected to 5 V of the Arduino and the ground
of the fingerprint sensor is connected to ground of the Arduino. Similarly, Rx of
fingerprint sensor is connected to pin 2 of Arduino and vx of fingerprint sensor is
connected to pin 3 of Arduino. The door gets unlocked when the user scans the right
fingerprint which is recorded in the system (Fig. 5).
This project is proposed to provide access to the system far away from the locked
spot. Sometimes in the examination cells and home, it is necessary to provide access
even if the authorised person is not present. The solution proposed is triple encryption
lock system. The proposed system is designed in such a way that the door has three
encryptions—DTMF (dual-tone multi frequency), password security and fingerprint
sensing. In this project, the circuit will be initially at OFF condition. The user sends a
signal to the audio jack frequency of the mobile to turn on the microcontroller. Relay
gives true or false information; microcontroller, keypad and fingerprint gets turned on.
If the password security results are true, it moves to the fingerprint sensing. If all the
two inputs result true as an output, the motor running forward door lock gets opened.
The relay gives the required supply to the DTMF to turn ON the microcontroller.
Once the task is completed, Relay 2 is triggered to lock the door. The attempt of our
encryption system is that the microcontroller gets turned on only when the signal is
sent from the user so that 24 h of heating issues are resolved. This can bring a great
change in the security system. In certain cases, it is difficult for higher authorities to
give authorization to the concerned person to access the restricted areas. The project
is designed in such a way that the access is given at that instant and triple encryption
lock system can be made affordable to the people around. In today’s fast growing
world, the proposed system has high security and gives convenient access with three
encryptions (Fig. 6).
Fig. 5 Fingerprint
encryption
Concept and Development of Triple … 55
The user has to successfully cross the three verification points as below.
Authorised person who is far away from the locked spot has to dial up to the mobile
that is connected to the locking system and enters number “1” from his/her dial
pad. So that the DTMF decodes the audio frequency and powers the microcontroller,
keypad and fingerprint sensor. Power to the microcontroller and other equipment are
turned ON only if the relay is closed.
56 A. F. Ahamed et al.
The user has to enter a number on the keypad, the entered pin is verified with the
preprogramed pin. If it is verified true, it moves on to the next fingerprint encryption.
If either of the input is false, the process gets stopped.
Third encryption is the fingerprint sensor. The user has to place the finger on the
fingerprint sensor, the captured fingerprint is verified with the recorded fingerprint.
If it results true, then the motor runs forward and the door gets unlocked (Fig. 7).
7 Conclusion
With the developments enumerated, we have developed expertise in the design, devel-
opment performance and modelling in the application of lock systems. This will be
pivotal in ensuring access to the concerned areas. The designed system not only
provides easy access to the user, it also resolves 24 h of heating issues. Thus, access
can be given to the concerned person in highly authorised areas. On the off chance
when you go to the market for the unique lock, it is close to |9000–|14,000, whereas
Concept and Development of Triple … 57
our triple encrypted lock system is modest with high security and suited for examina-
tion cell, server room, home security and highly authenticated areas. Therefore, when
we need a solution for easy access in highly concerned areas with a high security,
our triple encrypted lock system will be the solution.
References
1. Jalapur S, Maniyar A (2020) Door lock system using cryptographic algorithms based on IOT.
Int Res J Eng Technol 7(7)
2. Majkaongar N, Hodekar R, Bandagale P (2016) Automatic door locking system. IJEDR 4(1)
3. More HB, Bodkhe AR (2017) Survey paper on door level security using face recognition. Int
J Adv Res Comput Commun Eng 6(3)
4. Govindraj VJ, Yashwanth PV, Bhat SV, Ramesh TK (2019) Smart door using biometric NFC
band and OTP based methods. JETIR 6(6)
5. Nehete PR, Chaudhari J, Pachpande S, Rane K (2016) Literature survey on door lock security
systems. Int J Comput Appl 153:13–18
6. Delaney R (2019) The best smart locks for 2019. In: PCMag
7. Automatic door lock system using pin on android phone (2018)
8. Verma GK,Tripathi P (2010) A digital security system with door lock system using RFID
technology. Int J Comput Appl (IJCA) (09758887) 5:6–8
9. Hassan H, Bakar RA, Mokhtar ATF (2012) Face recognition based on auto-switching
magnetic door lock system using microcontroller. In: 2012 International conference on system
engineering and technology (ICSET), pp 1–6
10. Jagdale R, Koli S, Kadam S, Gurav S (2016) Review on intelligent locker system based on
cryptography wireless & embedded technology. Int J Tech Res Appl pp 75–77
11. Johnson J, Dow C (2017) Intelligent door lock system with encryption. Google Patents
Partially Supervised Image Captioning
Model for Urban Road Views
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 59
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_5
60 K. Srihari and O. K. Sikha
the state-of-the-art image captioning algorithms [1] have different set of prepro-
cessing techniques for image and the output text separately. They use different types
of sequence modeling to map the input image to the output text. Supervised image
captioning has stronger impact but has few important limitations and issues. Applica-
tions of supervised image captioning like super-captioning [2] uses two-dimensional
word embedding while mapping the image to its corresponding natural language
description. Partially supervised image captioning [3] has applied their approach of
captioning to existing neural captioning models using COCO dataset. This approach
uses weakly annotated data which is available in the object detection datasets. The
primary objective of this paper is to derive set of natural language captions with the
partially available supervised data retrieved from the Instance segmentation network
trained on Cityscapes dataset. Object detection and semantic segmentation results
were involved in creating the partially supervised data. Major contributions of this
work include:
• Training end-to-end MRCNN with U-NET model for instance segmentation to
get semantic information.
• Object classification and localization based on images from urban street view.
• Develop an inference level captioning module without sequence modeling for
generating meaningful captions based on the information produced from the
instance segmentation layers.
The rest of the paper is organized as follows. Related works are explained in
Sect. 2; Sect. 3 describes the dataset attributes, image details and proposed model
in brief; The captioning results obtained from the proposed model are detailed in
Sect. 4. Finally, the paper concludes with Sect. 5.
2 Related Works
Object detection and semantic segmentation have become one of the key research
problem in computer vision since most of the high-end vision-based tasks such
as indoor navigation [4], autonomous driving [5], facial part object detection [6],
human computer interaction require accurate and efficient segmentation. With the
advent of deep learning models in the past few decades’ semantic segmentation
problem also witnessed great progress especially using deep Convolutional Neural
Networks. Mask RCNN (MRCNN) [7] is an important breakthrough in the instance
segmentation domain. Mask R-CNN uses Feature Pyramid Networks [8] and Region
Proposal Networks as in FASTER-RCNN [9] for object detection and uses fully
convolution layers for semantic segmentation. Ronnebergeret al. developed U-NET
[10] model which was an inspiring semantic segmentation algorithm majorly used
in medical AI applications. Brabander et al. [11] proposed a discriminative loss
function based semantic instance segmentation model for autonomous driving appli-
cation. Recurrent neural networks for semantic instance segmentation proposed by
Salvador et al. [12] uses CNN, RNN and LSTM for semantic segmentation, object
Partially Supervised Image Captioning Model for Urban Road Views 61
This section describes the proposed partially supervised image captioning model in
detail. An improved Mask-RCNN model [21] with UNET architecture proposed in
our previous work is used for generating instance segmentation labels. The bounding
box and pixel-wise semantic information obtained from the hybrid M-RCNN—U-
NET model is used as the initial input to the image captioning model.
The annotations, masked outputs, localization results and object level labels
obtained from the instance segmentation model are used for generating meaningful
captions. Figure 1 shows the MRCNN-UNET [21] hybrid model architecture for
instance segmentation, and Figure 2 shows the proposed image captioning model.
The instance segmentation labels obtained from the MRCNN-UNET hybrid model
is shown in Fig. 4, which has pixel level annotation and corresponding confidence
score.
Region proposal network is used to generate proposals for object detection in
faster-rcnn. RPNs does that by learning from feature maps obtained from a base
network (VGG16, ResNet, etc.,). RPN will inform the R-CNN where to look. The
input given to the RPN is the convolution feature map obtained from a backbone
62 K. Srihari and O. K. Sikha
network. The primary function of RPN is to generate Anchor Boxes based on Scale
and Aspect Ratio. 5 varying scales and 3 different aspect ratios are initialized, creating
15 anchor boxes around each proposal in the feature map. The next immediate task
of RPN is to classify each box whether it denotes foreground or back ground object
based on IOU values of each anchor boxes compared with the ground truth. The
metrics used in this level is rpn_cls_score and rpn_bbox_pred values. In anchor
target generation, we calculate the IOU of GT boxes with anchor boxes to check if it
is foreground/background and then the difference in the coordinates are calculated
as targets to be learned by regressor. Then these targets are used as input for cross
Partially Supervised Image Captioning Model for Urban Road Views 63
entropy loss and smooth l1 loss. These final proposals are propagated forward through
ROI pooling layer and fully connected layers.
Feature pyramid network (FPN) is a feature extractor which generates multiple
feature map layers (multi-scale feature maps) with better quality information than
the regular feature pyramid for object detection. With more high-level structures
detected, the semantic value for each layer increases. FPN provides a top-down
pathway to construct higher resolution layers from a semantic rich layer. FPN extracts
feature maps and later feeds into a detector, says RPN, for object detection. RPN
applies a sliding window over the feature maps to make predictions on the objectness
(has an object or not) and the object boundary box at each location.
U-NET is a fully convolutional neural network which is mainly used for training
end to end image processing algorithms where the set of input images can be of
any domain, but the corresponding output images are masked images of the primary
objects present in the input image. The size of input and output images are same.
U-NET model is nothing but a Convolutional AutoEncoder which maps input images
to masked output images. One important modification in U-Net is that there are a
large number of feature channels in the upsampling part, which allow the network
to propagate context information to higher resolution layers. As a consequence,
the expansive path is more or less symmetric to the contracting part, and yields a u-
shaped architecture. The network only uses the valid part of each convolution without
any fully connected layers. The output images are of binary images where only the
primary objects will be in the form of masked objects. The U-NET model consists
of 10 layers where first 5 layers in the contractive phase and the last 5 layers in the
expansive phase. The loss function used is binary cross entropy and the optimizer
used is Adam. The metric used for validating the FCN model is cross entropy loss
value and Mean IOU with 2 classes. One class is for background and one class is for
foreground objects. There were 19,40,817 trainable parameters. Figure 3 shows the
sample image and its corresponding output mask.
Table 1 Quantitative
Metric name Value
evaluation Metrics analysis of
MRCNN-UNET instance Box-classifier loss 0.3874
segmentation model Box-localization loss 0.3232
Box-mask loss 2.5810
RPN localization loss 1.7640
RPN classification loss 1.8240
Total loss 7.0620
Mean Average Precision (mAP) 0.0341
mAP at 0.5 IOU 0.0401
mAP at 0.75 IOU 0.0001
mAP (small) 0.0029
mAP (medium) 0.0017
mAP (large) 0.0094
Average Recall (AR) 0.0017
AR (small) 0.0012
AR (medium) 0.0362
AR (large) 0.0197
were fit into meaningful NLP descriptions including semantic information such as
object color, location and distance between the objects. The skeleton structure of
the output captions are fixed for all the images but distance, color and region values
of the objects differ based on the 3 different captioning modules in every image.
The captioning modules include estimating size and distance based on reference
object method, color detection using k-means clustering and HSI color calculations,
and image-region wise captioning. This combination of instance segmentation labels
and inference captioning modules is a novelty approach for image captioning. This
approach does not include any sequence modeling for generating captions, making
the inference part computationally simple and effective. First level of caption lists the
important objects present in the image based on class label information. Second level
of caption is based on the estimated distance between vehicles or distance between
a vehicle and traffic signal, with respect to a real world reference object. Colors
66 K. Srihari and O. K. Sikha
of the contour detected objects are found based on the color detection captioning
module. Finally, the object’s location in the image is found using image-region wise
captioning module.
Instance segmentation is used as it is best scene understanding in any real time
applications. The issues in the existing traditional Image captioning systems are
limited ability of generating captions, generating identical sentences for similar
images. Sequence modeling is computationally expensive which is not used in the
proposed model. Localization and segmentation results are used in the proposed
approach which is lacking in most of the state of the art image captioning approaches.
Therefore, proposed model overcomes these research issues which are carried over
by the usual image captioning algorithms and provide good results by using the
inference level captioning modules.
The Cityscapes Dataset is used for evaluating the proposed model. The dataset
contains images from 50 cities during several months (spring, summer and fall)
and the images are augmented with fog and rain, making it diverse. It has manu-
ally selected frames which include large number of dynamic objects, scene layout
and varying background. It covers various vehicles like car, truck, bus, motor cycle,
bicycle, caravan and also classifies human person walking in a side walk or rider
riding on road. Table 2 shows the set of classes present in the dataset and Fig. 6 are
few sample images.
The cityscapes dataset contains 25,000 images across 30 classes of objects
covering 50 different cities. From each class around 500–600 instances, across
different images are taken into consideration for the training process. Tensorflow
object detection and instance segmentation Mask-RCNN approach is used. The
creation of training and validation records used for modeling is created based on
the 30 classes using annotations and parsing into a single json file is created where
all the image and object details will be present as the ground truth. Horizontal and
vertical flips are majorly used image augmentation and preprocessing techniques.
Contrast, saturation, brightness and hue image processing attributes are also used
in the augmentation process. Around 500 first stage max proposals are given to
each ROI for best detection purpose during training and dropout, weight decay and
l2 regularization techniques are also used. Pipeline configuration files are used to
fine tune the hyper parameters for the MRCNN model. The training is connected to
tensorboard where real-time graph values of all metrics can be seen and analyzed.
After the training and validations are completed, the saved hybrid MRCNN-UNET
model is generated using tensorflow inference graph session-based mechanisms.
The generated captions from the proposed model has information regarding the
distance between vehicle instances, or distance between vehicle and traffic signal. A
reference object based algorithm is used for calculating the distance, which demands
for a reference object whose original size is known. The pixel wise vehicle mask
obtained in the segmentation result is mapped with the original size to get a relative
pixel wise size. Reference ratio is then calculated by dividing the original size by pixel
wise size. By taking corresponding length, or corresponding width, the reference
ratio will be approximately same. Our objective is to find the original distance of any
object or original distance between any 2 vehicles. The pixel wise distance between
two vehicles is calculated using the Euclidean formula which is then multiplied with
the reference ration to obtain the actual distance as shown in Eqs. 1 and 2. Table 3
shows a sample vehicle mask obtained from instance segmentation module and the
corresponding distance calculated based on reference object algorithm.
11 Meters
2.45 Meters
Original_Distance_Between_Cars = Reference_Ratio
*Calculated_Difference_Between_Cars (2)
The basic information needed for calculating the distance between 2 cars is region
of interests box detections of the 2 cars. Consider the reference object in the image
is another car where the original height and width of the car is known. The reference
object is also detected by the model and its corresponding ROI box coordinates are
known. So we will be getting the original height of the reference object and the box
coordinate calculated machine result height of the reference object. Reference ratio
is calculated by dividing original height by machine result height of the reference
object. This reference ratio is same and common for all the objects present in the
image. Whereas the reference ratio value changes with image to image as each
image has different orientation and zooming attributes. The objective is to find the
original distance between the cars. Since the ROIs of the 2 cars is known, the center
coordinates of both the cars is also calculated. By using Euclidean distance, machine
result of pixel wise distance between the 2 center coordinates of the car is calculated.
When this distance value is multiplied with the reference ratio, the original distance
between the cars can be calculated. Using this algorithm, the distance between any
2 objects can be calculated provided the object is present in the training data and is
well trained.
Partially Supervised Image Captioning Model for Urban Road Views 69
The bounding box coordinates of objects present in the image is further processed
to calculate the location. The entire image of size 1024 × 2048 is divided into 5
regions namely; top-left, top-right, bottom-left, bottom-right and center as in Fig. 7.
The center pixel coordinates of each region was calculated as tabulated in Table 4.
Euclidean distances are calculated between center coordinates of the object and all
other region’s center coordinates to find the exact location. In Fig. 7, the object of
interest, i.e., car is located in the center part of the image.
The output set of captions derived out of our image captioning algorithm explains
clear understanding of the instances available in the image. Semantic details like
distance between vehicles, distance between traffic signal and the vehicles, the region
in which the instances are present and the color of instances are captured in the output
captions. Few example images and their sample output captions are illustrated as
follows.
70 K. Srihari and O. K. Sikha
Sample 1:
Output Caption: The objects present in this image are: a building, 2 cars on the
road and 2 cars in the parking, 2 traffic sign boards, 3 poles. A tree is present in the
top right part of the image. The 2 cars are present in the top left part of the image.
The distance between black car and white car is 11 m.
Sample 2:
Output Caption: The objects present in this image are: 3 cars, 3 bicycles, 1
motorcycle, 1 traffic light, 2 persons and 1 tree. The traffic light shows green. The
distance between white car and traffic light is 6 m. The 3 cars are present in the top
right part of the image. The distance between black car and traffic light is 2 m.
Figures 8 and 9 describe the module wise output captions in detail. The initial level
of captions tells the list of objects present in the image. Distance between truck and
car is calculated in module-1 based on reference object distance calculation. Color
Partially Supervised Image Captioning Model for Urban Road Views 71
Fig. 8 Module wise output captions—Sample 1 a Input image. b Generated instance segmentation
mask
Fig. 9 Module wise output captions—Sample 2 a Input image. b Generated instance segmentation
mask
of the truck and car is obtained in the second module based on K-means and CIE
L*a*b space values. Object locations are found in the image-region wise captioning
part as the third module.
72 K. Srihari and O. K. Sikha
5 Conclusion
This paper proposes an image captioning model of cityscapes dataset using instance
segmentation labels as the input. Mask-RCNN-UNET is used as the instance segmen-
tation algorithm where bounding box prediction values and pixel segmented values
are available as partially supervised output data. The proposed image captioning
system generates semantic descriptions including distance between vehicles by refer-
ence object distance method, colors of objects present in the image using k-means
clustering and LAB color space values. The generated captions are meaningful and
can be applied to many real world applications. Captions which are generated from
the urban city images will have detailed information about the traffic control and
pedestrian safety, which can be useful for autonomous driving. More indications or
alert can be used for pedestrians crossing the road, vehicles which disobey the traffic
rules based on the output of the model. When it comes to real world applications, the
captions can be given to hearing aid for the blind. This model can be used for auto-
mated captions in YouTube for videos containing urban street views. While driving,
some unusual behavior in the roads can be given as an instruction in order to avoid
accidents. In case of road accidents, exact set of reports can be collected instantly.
References
Abstract Managing waste water is one of the important things that is directly
connected to the entire water chain and so it is essential to manage water utility.
This is the right time to start saving water as population increases drastically along
with which the necessity of water increases. For an instance, about 85 L/day of water
is wasted on an average by a family. The water we save today will serve tomorrow, by
this way though there are lots of technologies in saving water wastage, this project
is all about narrowing down all the technologies and making an IOT application
which not only makes water management at home easy but also make it handy
which indeed helps to monitor and access the household water even at the absence
of physical presence.
1 Introduction
Here, we are with a simple IOT device which in turn helps the user to manage the
household water in an efficient way. This handy and easily portable device will be
even very helpful to elderly people to manage water from wasting [1]. An auto-
matic water management system doesn’t require a person’s contribution in mainte-
nance. All automated water systems are embedded with electronic appliances and
specific sensors. The automatic irrigation system senses the soil moisture and further
submersible pumps are switched on or off by using relays, and as a result, this system
helps in functioning without the presence of the householder. The main advantage
of using this irrigation system is to reduce human interference and ensure proper
irrigation [2]. The use of automatic controllers in faucets and water storage tanks
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 75
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_6
76 K. Priyadharsini et al.
help save a large amount of water being wasted. This project shows a system where
domestic use of water can be controlled using IoT devices from 200 m away. House-
hold consumption of water is majorly in gardening, water taps in kitchen and other
areas, refilling of over water tanks [3]. It is also used to know the water level in the
tank and also to control overflow by switching off the pump when the tank is filled.
2 Existing System
In general, in agriculture, a drip irrigation system [4] is used to maintain the moisture
but broadly there is no control in measuring the moisture content already present
in the soil. The person by their experience water the plants if moisture content is
already present in the soil the water goes waste, and before watering if the soil goes
dry losing the moisture watering after is of no use [5]. The same is happening in
sprinkler systems also, some areas are watered more, and some are less. There is
a lot of wastage during usage of water in houses, while operating taps. Taps must
be closed tight to avoid spillage of water, or it will drain the water from the tank.
If a person opens the tap to fill a bucket of water, there is always a chance that the
bucket will overflow before the tap is closed. Water tanks are not monitored mainly
because of their location. One knows that there is no water when the taps run dry
after then only the motor is turned ON to fill the water tank [6]. Now, while the tank
is being filled with water the motor should be switched off before it overflows, but if
we forget the water will flow out and go waste to drain. There are systems to monitor
the above causes but not together. If a tap is half closed and drains the water from the
tank means, while the water level indicator senses the level and turns ON the motor,
the entire storage of water will be wasted.
Based on recent research, IoT-based household water management is only used
for managing tank water level by using application [7].
3 Proposed System
The proposed methodology introduces an application with IoT options which controls
the water usage of the house. The proposed system overcomes the drawback of
mono-purpose usage as supervising the tank water level to multi-purpose usage as
controlling three main areas of water usage [8]: gardening, taps in the house and
water tank. In the application, the first one is for gardening where there is an option
for sensing the soil moisture if the moisture content is less than the required amount
we can switch on the motor for gardening. This can be done directly in the application
through mobile. The taps in the house are sensor based so that when we place hands
the water will flow. If the user feels that a tap may be leaky, and he is not in the house
he can close water flow in all the taps through the application. During all these, there
should always be water in the tank, if the water level is less the application sends out
Ease and Handy Household Water Management System 77
a notification to switch ON the motor to fill the tank. When the tank is fully filled, it
will automatically switch OFF the motor without letting the water to overflow. The
retrieved data is sent to the cloud through the IoT module, and the user manages it
through the application [9].
3.1 Advantages
• Usage of smart IoT devices simplifies work and helps to plan efficiently.
• Minimum amount of water is used to satisfy the daily needs.
• Wastage of water can be controlled from anywhere through IoT.
4 Block Diagram
The working model of the system is briefly described in the given block diagram.
All modules and sensors including moisture and temperature sensors are directly
connected to the arduino which serves as a controlling system [10]. Therefore,
these modules are connected via cloud to our mobile phone which indeed helps
the application to run (Fig. 1).
The application ‘E-Water’ aims at conserving water for future generations which
starts from every individual home. And also, the ecosystem should be kept balanced
by treating plants, herbs and trees which are planted in homes with sufficient amounts
of water (Fig. 2).
So, initially, this application has the control over taps and the tank of the home.
The following FCs would be representing individual hardware setups’ procedure of
working.
Here, the tap is being smartened by automation with servo motor so as to turn on
and off while the user is present and absent, respectively, by calculating the specific
range of distance using an ultrasonic sensor [11] (Fig. 3).
78 K. Priyadharsini et al.
In addition to that, there is also the water tank at home which is being smartened in
a way that the setup is set at the lid of the water tank so that the ultrasonic sensor
senses the distance of presence of water from the lid of the tank. The resultant status
is updated instantly on a liquid crystal display (Fig. 4).
If the water is at bottom, the LCD displays ‘very low’ which in turn makes the
IC L239D to direct the servo motor to be ‘ON’ and when subsequent levels of water
rise, corresponding notification is updated in the LCD. While the water level reaches
the top of the lid, again the IC is made to change the servo motor position so as to
turn ‘OFF’ the motor [12].
80 K. Priyadharsini et al.
Then to irrigate the plants, the soil moisture is tracked and if it goes below the
threshold value potentiometer raises, an LED glows, an LCD (liquid crystal display)
notifies ‘water irrigation’ and the motor is triggered to ‘ON’ position [13]. After the
moisture of the soil is maintained, the motor is made to be ‘OFF’ (Fig. 5).
Ease and Handy Household Water Management System 81
6 Concept Implementation
6.1 Sensors
The Arduino connects with ultrasonic sensors to detect water level in the tank, to
sense hand in order to open the faucet through ultrasonic waves and by using soil
moisture sensor to measure the moisture in the garden for irrigation.
82 K. Priyadharsini et al.
The interface between the sensors and the cloud is done by IoT module. The collected
data from hardware integration is stored in cloud memory (noSQL big data base)
through IoT [14].
6.3 Application
If-then-else approach is used to make it easier. User gets the collected data in the appli-
cation. User can access and manage home water management through the application
(Fig. 6).
7 Simulation
8 Function of Application
Having the entire application as hardware cum software with three main usages,
the ultimate aim is to preserve water through the individual circuit integration. This
proposed project integrates the tank water level indication, automatic on/off water
tap (by sensing hand) and smart water irrigation for home gardens [16].
When a user gets into this application, can see at the bottom the options of garden,
tap, tank, status, exit. If user choose garden, it checks the moisture of soil and brings
us notice to turn off water for irrigation when moisture level is low. It also shows
the level of moisture in the soil. Likewise, for tap, it manages the leakages of taps.
By sensing the hand, the water flow is activated in the hand washing area as well
as the hands leave the sensing area (hand washing area) the water tap completely
closes preventing any wastage [17, 18]. Thus, the automation in faucet is being
monitored that can not only be implemented in a newly constructed building but
also in homes those have been present for so many years with faulty leaky pipes.
If the user chooses the option of tank, it checks the water level in the water tank
84 K. Priyadharsini et al.
and indicates us by variant colors. Once if the tank is getting filled or emptied, the
buzzer also alerts us as well as the user gets notified by the application. The status
option shows whether the motor is turned ON or not. Chosen of exit option, exit the
application. This household water management system can be used even when user
is not at home using the automatic devices technology [19]. This system even works
on older homes’ water systems and older leakage faucets (Figs. 10, 11, 12, 13, 14
and 15).
Fig. 11 Showing of
moisture level
are triggered to send information to the control panel to indicate the emptiness or
completely getting filled before the overflow.
Thus, the application aims in controlling the water wastage by the user’s choice,
i.e., when one among the three choices, is selected by the applicant, the respective
principle behind each choice can be accessed and the working commences.
9 Result
From the E-water application, we checked the following water wastage situations: a
drinking water tap can waste up to 75 L a day due to leakage. Of the total usage of
water 15% of water is wasted in leakage per day in the absence of inmates (Source
Google). We can avoid the wastage by shutting the flow from the tank remotely
also, if water level decrease is monitored in the tank after installing our product at
homes. It is estimated that 7% of the water supplied is wasted during refilling of water
tanks because of overflow (Source Google). With our application refilling of water
tanks can be monitored and the pump can be turned off before the tank overflows
by continuously monitoring the water level. This prevents the overflow spillage and
Ease and Handy Household Water Management System 87
wastage of water to Nil. Usually during gardening, twice the amount of water is
being watered to plants and leads to major water wastage in a household. This can
be controlled by checking the moisture content with the application, and watering
the plants when it is required (Fig. 16).
The above graph constitutes the concept of utility of water before and after the
implementation of the proposed system. The survey gives the information about the
water usage in a home for a period of approximately two months. Axis of abscissa
is the time period where one unit equals one week. Axis of ordinate is the amount
of water used in kilo-liters where one unit equals one kilo-liter. The existing system
graph shows the history of greater amount of water used before implementation of
the proposed idea. The proposed system graph conveys the less usage of water since
the wastage is controlled with the help of the application. The variation from one
week to another week is due to the situations handled at home (say due to occasional
moments, functional days or malfunctioning/repair on the user’s mobile). Hence, the
conclusion is that when the application is used in a home, and water is preserved
than before (Table 1).
Ease and Handy Household Water Management System 89
8
6
4 Proposed system
2 Existing system
Table 1 Comparison between before and after implementation of the proposed system
Wastage of water in household After installing application
A flush of the toilet uses 6 L of water. On an We only manual checking for this now here
average a person wastes about 0–45 L of water after we can check for water leakage and have
per day for flushing. To understand it better, it control over it by using our handy application
is 30% of the water requirement per person per embedded with IoT and monitor and control it
day. Hence, wasted water amounts to 125 from anywhere
million liters per day
A drinking water tap can waste upto 75 L a day We can reduce atleast 98% of total wastage
due to leakage. Of the total usage of water, 15% after installing our product at homes
water is wasted in leakage per day
It is estimated to be 7% of the water supplied is With our application refilling of water tanks
wasted during refilling of water tanks because can be monitored and the pump can be turned
of overflow off before tank over flows. This prevents the
overflow spillage and wastage of water to nil
Usually during gardening, twice the amount of This can be controlled by checking the
water is being watered to plants and leads to moisture content and only watering the plants
major water wastage in a household when it is required
Moisture in soil is monitored and the
application gives notification when it less
10 Conclusion
The household water management system connects via IOT and brings into a single-
handy application. This application can be used effortlessly. This bring entire house-
hold water management into single-handy application. Hence, by this application
can conclude that to preserve water in the modern world is the need of the hour.
Starting to implement in homes and then extending to the entire country helps other
countries to take us as a role model and begin to save water. On an average a family
with 3 members could save 40% of water. This can also be used by a large family
which in turn makes them realize they would save upto 50% of water. When this is
used by a densely populated places like the hotels, hostels, halls, etc., so that India
could escape from water scarcity. This is the best way to save the water and prevent
from the wastage of water. The final outcome of the project is a single-handy appli-
cation controlling the IoT connected devices placed to manage household water. The
proposed system helps to completely save water in the upcoming busy world.
References
1. Robles T, Alcarria R, Martín D, Morales A (2014) An ınternet of things based model for
smart water management. İn: Proceedings of the 8th ınternational conference on advanced
ınformation networking an applications workshops (WAINA), Victoria, Canada. IEEE, pp
821–826
2. Kumar S (2014) Ubiquitous smart home system using android application. Int J Comput Netw
Ease and Handy Household Water Management System 91
Commun 6(1)
3. Perumal T, Sulaiman M, Leon CY (2019) Internet of Things (IoT) enable water monitoring
system. In: IEEE 4th Global conference consumer electronics, (GCCE)
4. Dinesh Kumar JR, Ganesh Babu C, Priyadharsini K (2021) An experimental investigation to
spotting the weeds in rice field using deepnet. Mater Today Proc. ISSN 2214-7853. https://doi.
org/10.1016/j.matpr.2021.01.086; Dinesh Kumar JR, Dakshinavarthini N (2015) Analysis and
elegance of double tail dynamic comparator in analog to digital converter. IJPCSC 7(2)
5. Rawal S (2017) IOT based smart irrigation system. Int J Comput Appl 159(8):1–5
6. Kansara K, Zaveri V, Shah S, Delwadkar S, Jani K (2015) Sensor based automated ırrigation
system with IOT: a technial review. IJCSIT 6
7. Real time wireless monitoring and control of water systems using Zigbee 802.15.4 by Saima
Maqbool, Nidhi Chandra.
8. Durham R, Fountain W (2003) Water management within the house landscape Retrieved day,
2011
9. Kumar A, Rathod N, Jain P, Verma P, Towards an IoT based water management system for a
campus. Department of Electronic System Engineering Indian Institute of Science Bangalore
10. Pandian AP, Smys S (2020) Effective fragmentation minimization by cloud enabledback up
storage. J Ubiquit Comput Commun Technol (UCCT) 2(1): 1–9
11. Dhaya R (2021) Analysis of adaptive image retrieval by transition Kalman Filter approach
based on intensity parameter. J Innov Image Process (JIIP) 3(01):7–20
12. Parvin JR, Kumar SG, Elakya A, Priyadharsini K, Sowmya R (2020) Nickel material based
battery life and vehicle safety management system for automobiles. Mater Sci 2214:7853
13. Dinesh Kumar JR, Priyadharsini K., Srinithi K, Samprtiha RV, Ganesh Babu C (2021) An
experimental analysis of lifi and deployment on localization based services & smart building.
In: 2021 International conference on emerging smart computing and ınformatics (ESCI), pp
92–97. https://doi.org/10.1109/ESCI50559.2021.9396889
14. Priyadharsini K et al (2021) IOP Conf Ser Mater Sci Eng 1059:012071
15. Priyadharsini K, Kumar JD, Rao NU, Yogarajalakshmi S (2021) AI- ML based approach in
plough to enhance the productivity. In: 2021 Third ınternational conference on ıntelligent
communication technologies and virtual mobile networks (ICICV), pp 1237–1243. https://doi.
org/10.1109/ICICV50876.2021.9388634
16. Priyadharsini K, Kumar JD, Naren S, Ashwin M, Preethi S, Ahamed SB (2021) Intuitive
and ımpulsive pet (IIP) feeder system for monitoring the farm using WoT. In: Proceedings of
ınternational conference on sustainable expert systems: ICSES 2020, vol 176. Springer Nature,
p 125
17. Nanthini N, Soundari DV, Priyadharsini K (2018) Accident detection and alert system using
arduino. J Adv Res Dyn Control Syst 10(12)
18. Kumar JD, Priyadharsini K, Vickram T, Ashwin S, Raja EG, Yogesh B, Babu CG (2021) A
systematic ML based approach for quality analysis of fruits ımpudent. In: 2021 Third ınter-
national conference on ıntelligent communication technologies and virtual mobile networks
(ICICV). IEEE, pp 1–10
19. Priyadharsini K, Nanthini N, Soundari DV, Manikandan R (2018) Design and implementation
of cardiac pacemaker using CMOS technology. J Adv Res Dyn Control Syst 10(12). ISSN
1943-023X
Novel Intelligent System for Medical
Diagnostic Applications Using Artificial
Neural Network
T. P. Anithaashri (B)
Institute of CSE, Saveetha School of Engineering, Saveetha Institute of Medical and Technical
Sciences, Chennai 602105, India
e-mail: anithaashritp.sse@saveetha.com
P. S. Rajendran
Department of CSE, Hindustan Institute of Technology and Science, Chennai, India
e-mail: selvir@hindustanuniv.ac.in
G. Ravichandran
AMET University, Chennai, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 93
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_7
94 T. P. Anithaashri et al.
1 Introduction
The lack of highly efficient detection system for image recognition of chronic diseases
such as cardiovascular diseases, cancers, respiratory diseases, pulmonary diseases,
asthma, diabetes [1] in the later stages becomes fatal. The exigent factors for these
types of chronic diseases are the continuous treatment, pharmaceutical requirements,
medical electronic equipment requirements to diagnose the stage [2] of the damage
in the organ, in a periodical manner. Diagnosis of these types of diseases and taking
measures for treatments are becoming more challenge [3]. The analyzes on various
kinds of chronic diseases using the factors such as periodical observation through
tests, monitoring the level of adequacy of glucose, sucrose [4], etc., are tedious
process. Thus diagnosing the chronic disease just by symptoms is less efficient.
Hence Artificial Intelligence techniques [3] can be used for better detection with
high resolution and accuracy to overcome the disease diagnosis errors between the
scanned images [2] and x-ray images.
2 Existing System
The image analysis for chronic diseases using artificial intelligence techniques gives
less efficiency in image recognition. By using convolution neural network, the diag-
nosis on diseased images gives approximation [5] in image recognition. The major
functions of convolution neural network returns the feature map for the image identi-
fication with the various parameters in the recognition of images [6]. Differentiating
the use of convolution neural network with other artificial intelligence techniques for
the image analysis [7], recurring convolution neural network provides more accu-
racy than convolution neural network, but time constraints are more. The convolution
neural network algorithm [2] can be used to outperform image classification to predict
and differentiate the diseased images from the normal images. So, in order to reduce
the errors and time lapse process, the artificial neural network algorithm is used with
chest x-ray images [8] of patients for identifying the disease.
In many sectors, the use of emerging trends in artificial intelligence, paved the way
to enhance the existing system in terms of image recognition, image analysis, image
classification, etc. It became an industrial revolution [6] in terms of automation for
image recognition. In the medical field, there are many AI algorithms [9] and tech-
niques that are being implemented. Disease prediction has always been a challenge
to doctors and it is time consuming. To overcome all these drawbacks, automation of
disease prediction using AI techniques [2] can make the process simple and feasible.
With the help of AI algorithms, implementation of a smart medical diagnosing [10]
system for diagnosing [11] chronic diseases through image recognition [12] is less
efficient in diagnosis. The use of neural network helps in analyzing the trained data
sets with validation but gives less accuracy in identifying the disease through image
analysis.
Novel Intelligent System for Medical Diagnostic Applications … 95
3 Proposed System
The application of artificial techniques to any field enhances the efficiency in automa-
tion of emerging technology. The use of AI in the medical applications [7] were
tremendous in the automation of manual work for the real time applications. The
image recognition for the analysis of various diseases [6] has become a big chal-
lenge to the Doctors community in analyzing the various kinds of parameters such
as consumption of time to diagnose, feature extraction of scanned images, etc. To
overcome all these drawbacks [8], the AI techniques can be used to enhance the
system for image recognition of scanned images. In this novel system, the evalua-
tion through test procedures and classification [13] of normal and abnormal scanned
images of the patients through various clinical observations. Artificial intelligence
algorithms ANN [14] and fuzzy logic system are compared with their performances
in prediction. In the fuzzy logic system, the weights and biases [14] are assigned from
layer to layer connected, but in ANN algorithm, the first and last layers are connected
which is considered as an output layer. To address the problem [4] of diagnosing the
disease through image recognition, a novel system has been proposed. The overview
of the proposed system depicted in Fig. 1.
In this framework, processing of input data through cloud application takes place.
The use of neural architectural search automates the working function of artificial
neural network. The neural architectural search explore the search space and helps to
evaluate the ANN for the specific task. The processing of images in neural network
algorithm helps in the extraction of images. Its starts with the identification of data
sets, such that once the data sets are identified, the process of image classification is
Fig. 1 Novel framework for diagnosing chronic diseases through artificial neural network
96 T. P. Anithaashri et al.
carried out by artificial neural network. Thus, it paved the way to extract the image
accuracy for diagnosing the diseases through image processing. The study setting of
the proposed work is done in Saveetha University. The number of groups identified
are two. The group 1 is fuzzy logic system, and group 2 is artificial neural network
algorithm. Artificial neural network and fuzzy logic system was iterated various
number of times with the sample size of 200.
In this system, an image is considered as frames and the pixels of images with the
data augmentation, complicated images are considered to be classified as trained. An
image which is clear and precise to the human eye may not be accurate and not clear
with details. The analysis of scanned images through different permeation for each
layer provides clarity in the analysis of images. This gives us the time efficiency. But
in terms of accuracy of data, it is not efficient.
Step 1: Start.
Step 2: Load the datasets path through cloud application.
Step 3: Read images and resize them.
Step 4: Convert to grayscale.
Step 5: Train and test the images.
Step 6: Repeat the process for analysis.
Step 7: Prediction of Accuracy extracted images.
Step 8: Stop.
After the process, find the number of samples for each class and test images with
the trained data that are classified to predict the image recognition in an effective way.
The process of data intensification helps to enhance the performance of the algorithm
to classify the scanned images. After the data intensification [15], the quality images
and the classified images [5] will be saved in some random order. By classifying the
patients’ scanned images and modification of the dataset was a difficult process and
hence provides the less accuracy.
The artificial neural network is used to identify the feature values of the samples of
external data. It processes the inputs and analyze the images to extract the feature in
an image. The feature extraction of scanned images through image analysis from the
Novel Intelligent System for Medical Diagnostic Applications … 97
classified data provides the clarity, and thus, it helps to predict the accuracy. The peak
signal ratio or noise or disturbances are the part of the image for classification is called
loss. The use of neural architectural search improvise the application of algorithm,
thus provides the novelty to this proposed system. The neural architectural search
paved the way for better enhancement in processing of the images through automation
in three layers namely input layer, output layer and hidden layer. Hence, this novel
architecture of artificial neural network is a better model because of the less number
of parameters, reusability of weights assigned, and thus, it gives the time efficiency
with high accuracy.
Step 1: Start
The image analysis with predicted data sets are trained for 10 epochs and a total
sample of 200 images of chest scanned image datasets. A total of 10 epochs and
batch size of 22 are used in the model and tabulated with epoch stages as shown
98 T. P. Anithaashri et al.
Table 1 Analysis on accuracy (0.9147) of train and loss (0.3262) data of images for different
epochs stages (10) with the model trained by various categories of scanned images of diseased
patients
Epoch stage Training accuracy Training loss Validation accuracy Validation loss
1 0.61 0.58 0.71 0.63
2 0.82 0.62 0.65 0.71
3 0.81 0.51 0.62 0.54
4 0.79 0.51 0.60 0.36
5 0.87 0.68 0.66 0.45
6 0.79 0.31 0.52 0.71
7 0.69 0.48 0.63 0.67
8 0.72 0.52 0.58 0.74
9 0.93 0.54 0.58 0.62
10 0.72 0.61 0.55 0.31
in the Table 1. Thus, training with 10 epochs and the specified batches provides an
accuracy of 81% in disease prediction through the proposed system. In Table 1, the
process of data training will be carried out by novel diagnostic system and after the
classification of training data, and the system will be trained to categorize different
kinds of virus affecting the human body, respectively.
Here, Fig. 2 represents the variations between the accuracy and loss by analyzing
the trained data sets and achieving the accuracy of 0.93 which in turn specifies the
improvisation through artificial neural network.
In Table 2, F refers to the f statistics variable which is calculated by dividing
mean square regression by mean square residual. T refers to t score and depicts
Fig. 2 Accuracy scores image extraction based on the different stages of epochs on the major axis
with the range of minor axis for the accuracy (0.93) and loss (0.31), respectively
Table 2 SPSS statistics depicts data reliability for artificial neural network and fuzzy logic system with independently sample T- test and the result is applied
to fix the dataset with confidence interval as 95% and level of significance as 0.05 to analyze the data sets for both algorithm and achieved more accuracy for
artificial neural network than that of fuzzy logic
F sig T df Sig (2-tailed) Mean difference Std. error 95% confidence 95% confidence
difference interval of the interval of the
difference difference
Lower Upper
Accuracy Equal variances 1.8 0.265 2.61 18.0 0.002 0.19 0.51 0.075 0.28
assumed
Accuracy Equal variances 2.61 16.0 0.003 0.18 0.050 0.078 0.32
not assumed
Loss Equal variance 4.5 0.040 0.97 18.0 0.360 16.31 17.0 −19.30 53.04
Novel Intelligent System for Medical Diagnostic Applications …
assumed
Loss Equal variance 0.97 10.0 0.36 16.31 17.0 −23.14 54.88
not assumed
99
100 T. P. Anithaashri et al.
the population variance, when the t value exceeds the critical value, then the means
are different. It can be calculated by dividing the difference between the sample
mean and given number to standard error. Sig (2 tailed) is a significance, which is
depicted by comparing with 0.05 it should be within the level of significance. The
below graphical representation Figure 2 depicts the accuracy and loss for respective
algorithms compared. When compared to fuzzy logic system, artificial intelligence
neural network algorithms depicts more accuracy in the image recognition analysis.
The scanned images of chest are considered for classification of data. After the
classification, the trained data is tested and validated with 10 epochs and results of
validation are obtained. Graphical representation of the loss and accuracy for artificial
neural networks gives 81% of accuracy with the assumed variance 0.18 with the help
of SPSS. The reliability of data with respect to the artificial neural network with the
mean difference for assumed variances and non-assumed variances of 0.02 provides
more accuracy than the fuzzy logic algorithm with their mean accuracies, and thus,
the high accuracy is obtained in extraction of images.
5 Conclusion
The validation results are obtained by the classification of the images with trained
data and 10 epochs. The implementation results show the improved accuracy of 81%
in image recognition with extraction of images. By using artificial neural network,
the connection between each layer helps to acquire more accuracy from classification
of images. By using the novel diagnostic system, the grouping of images for affected
and unaffected people helps to classify and train models to diagnose the presence of
disease through the image recognition in a significant manner. The proposed system
has considered scanned images, which is a limitation and can be overcome by using
the radiology images for more accuracy. The use of web application through bots
interaction and the utilities of AI tools for disease prediction and treatment would be
a future scope of this system.
References
1. Harmon SA, Sanford TH, Xu S, Turkbey EB, Roth H, Xu Z, Yang D et al (2020) Artificial intel-
ligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets.
Nat Commun
2. Kılıc MC, Bayrakdar IS, Çelik Ö, Bilgir E, Orhan K, Aydın OB, Kaplan FA et al (2021) Artifi-
cial intelligence system for automatic deciduous tooth detection and numbering in panoramic
radiographs. Dento Maxillo Facial Radiol
3. Grampurohit S, Sagarnal C (2020) Disease prediction using machine learning algorithms.
https://doi.org/10.1109/incet49848.2020.9154130
4. Livingston MA, Garrett CR, Ai Z (2011) Image processing for human understanding in low-
visibility. https://doi.org/10.21236/ada609988
Novel Intelligent System for Medical Diagnostic Applications … 101
Abstract In general, applications are built to serve certain business purposes. For
example, a bank invests into an application development to offer its customers online
services such as online shopping, FD services, utility bill payments, etc. Within the
application, every screen has its own purpose. For example, the login page is used
to authenticate a customer. Dashboard screen gives a high-level view of activities
done by a customer. Similarly, every field appearing on the screen has purpose too.
The username field of login screen enables customer to supply a username given
to the customer by the bank. Every screen of a given enterprise application is not
developed keeping the exact purposes in mind. Many a times, a screen may serve
multiple purposes. While an application screen may serve different purposes and an
application with less number of screens suits an enterprise in terms of reduced expen-
diture for the development and subsequent maintenance activities, it is at loggerhead
with privacy laws which are demanding purpose-based processing. Therefore, it is
necessary to first build a repository of purposes that a given application can serve.
And, then subsequent refactoring of application can be done (if required) to com-
ply with privacy laws. To find out purposes, we use keyword extraction method,
K-means clustering on text data of application, to get keywords and respective pur-
poses.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 103
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_8
104 A. Jain and S. Mane
1 Introduction
As data scientists said that “Data gives any information, as we need to process that
data, i.e., to confess anything from data, we need to torture it until we get our result.”
Well, it’s absolutely true when it comes for business organization to make more profit
from the current present system by analyzing historical data set and inculcating the
knowledge gained in taking proper or better efficient business decisions. Nowadays,
organizations are also trying to make their applications or existing system more
secure and privacy concern to gain customer trust by giving privacy to their personal
data, by taking their consent. Government of all countries are also making some rules
and laws regarding privacy of data. So that all organization have to develop or update
their system according to new rules made by government of country.
As of we know, currently, most of the application are not developed keeping the
exact purpose in mind. Every screen of a given enterprise application many times
serve multiple purpose but as of now that is not taking in concern or we can say not
given any importance to that there are several reasons for it. Chief among them is,
it reduces the development time (by creating a multi-purpose screen) and hence, the
subsequent resources required for the testing and maintenance for the application. It
is at loggerhead with privacy laws which are demanding purpose-based processing.
For that, this research part come into existence. Data Privacy and Data Security
both used mutually, but there is some difference between them, Data privacy regulate
how data is collected, shared for any business purpose and how it used after sharing.
Data privacy is a part of data security. Data security safeguard data from intruders and
vicious insiders. Main concept of data privacy is how exactly it deals with data con-
sent, governing obligations, notice or task. More specifically, practical data privacy
concerns about: (1) Exactly how or whether information is shared with third parties.
(2) How information is legally collected or stored. In the digital age, the meaning
of PHI (personal health information) and PII (personally identifiable information) is
nothing but, how the data privacy concept is applied to critical personal info. This
can include medical and health records, SSN numbers (Social Security numbers),
financial info such as: bank account and credit card numbers and even basic, but still
sensitive, information, like addresses, full names and birth dates, etc.
System architecture develop for this research work is shown in Fig. 1. In that major
focus on, how Inference engine is developed for purpose-based processing. For that
first need to clear, what is Inference Engine? How it works? Importance of it? How
it made system as expert System?
Inference Engines: To get new facts and relationships from data repositories or
knowledge graph, these unit of an AI (Artificial Intelligence) methodology apply
some kind of logical rules. The process of inferring relationships between units uses
ML (Machine Learning), NLP (Natural Language Processing) and MV (Machine
Vision) have expanding exponentially the scale and value of relational databases
and knowledge graphs in the past few years. There is two way of building inference
engine as backward chaining and forward chaining.
Extracting Purposes from an Application … 105
Here to find purpose for given application data, we create one system which we
called it as inference engine. Mainly perform last 4-to-5 step of concrete system
design (see Fig. 2).
Regarding data we used in this research, as we are considering web data as our data,
like set of websites from different domains. Detail explanation we see next sections.
Purpose repository here used in research is made manually by taken consideration
of keywords and there purposes of respective domains of data. When we add some
more data from different domain we need to update that purpose repository manually
that was some time consuming work need to perform. Lastly we get purpose related
to keywords which are matched with keywords placed in purpose repository. Where
ever keywords and combination of keywords are matched there purpose is extracted
from that repository of purpose and present in the output.
2 Related Work
As we know most research work is done related to identity management concern and
added to that now-days it related to privacy as privacy policy enforcement and privacy
obligations related work as Marco Casassa Mont, Robert Thyne, present there work
106 A. Jain and S. Mane
3 Methodology
This section first proposes a structure of our given model or architecture of system
shown in below figure, after that we described our workflow of given architecture
and what we used to perform in each phase of our architecture.
As shown in Fig. 2, first we required to gather data from web applications for
which we have to find purposes, using “Crawljax” [16]. Crawljax is one of the java
based web crawler used to extract whole web data in form of html states, DOMs,
result .json file containing Json data, states and screenshots of web pages in one
output folder that is used for in next phase of system.
Crawljax is an open source tool generally called web crawler. As it is open source
we get its jar file or code or maven file that need to run in specified framework. As
jar file can run using command prompt on operating system by providing specified
parameters mentioned in their readme file. That readme file we get while we download
its jar files. There are so many number of options we can provide as per need,
like states -s, depth -d, -waitAfterReload, -waitAfterEvent, override -o, etc. They
provide some initial vales to this options mention in readme file. There is compulsory
parameters are url of the page or website to which we want to crawl or want data and
another parameter is path of output folder where result is stored.
Files which extracted is parsed with the help of java based parser as named “Jsoup:
Java HTML parser” or we may use BeautifulSoup html parser. From that we can get
the text content of that application and saved that into text file or we can say that we get
our text data from which our actual work of inference engine is starts. Before move
further, we have to know about some basic idea about Jsoup parser, as name indicate
it is a parser used for parsing data from one form to another form, here we required
data in text format for finding keywords or keyphrases from any document or file.
That we can get with the help of document object Model(DOM) object and its various
features i.e., function or methods like—tElementById(), getElementByTag(), etc.
Text data which we get is need to be pre-processed. And we know that there are
lots of files we get from an application related to each state or web page. That text
data is pre-processed with using some libraries or API’s. There are so many NLP
(Natural Language Processing) libraries present for pre-processing of text data. In
pre-processing we perform task as,
108 A. Jain and S. Mane
based keyword extraction technique). We also used some more keyword extraction
algorithm as, TextRank, Rake, Yake, Gensim summary pakage, LDA, TF-IDF, etc.
By using keywords, we try to find out purposes related to that keywords with the help
of purpose repository by applying some matching algorithm as flashtext API or using
regular expressions [17]. We get purpose related to each keyword and combination
of keywords present in purpose repository which is created manually.
As we know that there are many clustering techniques or algorithms are present but,
in this model, we use k-means clustering algorithm which is based on partitioning
method of clustering and that to unsupervised k-means is used [11]. After this opera-
tion we can get several different clusters contains number of word/s which are similar
with respect to centroid of cluster word in document. Then, we select keyphrases,
as “n” words nearer means some what similar to centroid of each clusters and pro-
vide value to variable “k,” as number of clusters we want to made. by using elbow
algorithm or method we can also find optimize cluster values. From that we find top
“n” keyphrases for every cluster, and we can decide value for “k” by taken into con-
sideration of length of the document that value for k will be find out by performing
number of experiments (on trial basis, default value for consider it as 2).
This method is generally used for unsupervised data. According to [11], they used
this approach by creating graphs from word and sentences by evaluating similarity
between them. In that three graphs are constructed as, sentence-to-sentence (s-s
graph), word-to-word (w-w graph) and sentence-to-word (s-w graph) graphs. All
graphs are built by using similarity between them, as cosine similarity between
sentences and words.
• For s-s graph, every sentence consists of several words so we construct word set for
sentences, by using it find out cosine between two vectors of sentences is similarity
between sentences and consider it as weight of edges and sentences as nodes of
s-s graph.
110 A. Jain and S. Mane
• For w-w graph, to find similarity between words, we need to convert that word into
its numerical values using word embedding or word-vector here we use fastText
word embedding. It is library build by Facebook’s AI Research lab, used for
text classification as well as learning of word embeddings. For obtaining vector
representation of w (words) model allows to construct learning algorithms they
may be unsupervised or supervised.
• For s-w graph, both above mentioned graphs are taken into consideration and try
to construct third graph by using word frequency and inverse sentence frequency
and formulate one formula similar to TF-IDF (Term Frequency-Inverse Document
Frequency) method. We get weight matrix for s-w graph.
For Keyword extraction, there are some algorithms which find keywords auto-
matically such as, TextRank, Rake, TF-IDF.
• TextRank algorithm is genrally work faster on small datasets respect to other two
algorithms. But it gives proper keywords as it is based on graph ranking criteria
similar to PageRank algorithm created by google to set websites ranking.
• Rake name indicates Rapid Automatic Keyword Extraction. It find out the
keyphrases from document without considering any other context. It produces
more keyphrases which are complicated too with having more information than
any single words.
• TF-IDF stands for Term Frequency-Inverse Document Frequency. Most of the
time, this algorithm is used with large number of documents in dataset. As name
indicates, it consider Inverse Document Frequency means for any given word in
one document, it will consider other number of documents containing that word
too for calculating word score or rank of word respect to particular document. One
more term is Term Frequency means count of word occur in particular document.
Together it comes up with one single value which shows how much important that
word in that document or simply gives rank of word respect to particular document.
4 Results
As research topic is very large, so here, we restrict our data with respect to web
applications, as websites related with as three different domain websites. Total we
collect 530+ or more number of webpages from websites as, banking, education and
hospital and health care domains. As we know webpage consist so much data as text
data, images, audio, videos, advertisement, etc. From that we extract text data which
was necessary for our work, like text data from body tag, title tag, form tags, etc. We
collect all that data and make text document of it and also convert that text data in
excel format data using excel.
For finding purpose of the webpage, first required purpose repository that is created
manually for this work. In that we take three columns (Domain, Keywords, Purpose).
In that according to our data, we put keywords and there purposes. If anyone add some
more domains or new webpages need to update that purpose repository manually.
Extracting Purposes from an Application … 111
Next phase of doing keyword extraction task. Before doing this we first apply
clustering algorithm to get similar words having same cluster. and then we get top
8–10 keywords related to each cluster. For keyword extraction there are so many
different algorithms are present some are automatic keyword extraction algorithm and
some of having graph-based, statistical-based, unsupervised, supervised algorithms
are here we try to use some of them and find out keywords and there purposes using
purpose repository.
Last phase to find purpose related with keywords from purpose repository using
matching technique, manually or handcrafted rules, or similarity measurement or
comparison between keywords which is getting previously with keywords and there
purpose present in purpose repository. One another method using VLOOKUP present
in excel sheet. Also we are using one flastext library, which is mainly used for search
and replace tasks. But in our case, we modify that task as search and match keywords
with there purpose. We use this because, between regular expressing and flashtext,
flashtext is work faster than regular expression [17].
We can also get the word cloud and word frequency graphs or plots shown in next
images and final result or output also given at last (Figs. 3 and 4).
5 Conclusion
Nowadays, privacy management plays more important role for enterprises. The main
objective of it to address customers privacy by considering customers preferences
and rights. It is important to consider the data subjects consent and data requesters
purpose if the specific person wants information from any organization.
In this research work, we conducted a survey of existing techniques and tools
for text analysis, and studied the drawback and limitation of the exiting tech-
nique and tools. So, to understand purposes related to each fields or screen/s with
respect to given application data we implemented a model which temporary ful-
fill our requirement, which will address all the exiting challenges against pur-
pose extraction to enable purpose based processing with the help of keyphrases,
words or keywords and simultaneously provides the efficient purpose repository
from which the finding relationship of keywords and there purposes is more
easy. And also try to get all such matched purpose/s from application screen to
develop purpose-based processing. We also used clustering algorithms for find-
ing similar words or keywords according to topic in a document and then by
combining them get there purposes too, as shown in final result image in result
section.
There is lot of scope to do future research in this area. As of now, I did not get
so much related research work specified with this topic. Any one will want to do
some more research in this area may can extend scope of research by considering
some more different type of data (accordingly need to add there purposes in purpose
repository). Another task one can perform as, try to make purpose repository auto-
matic in nature. Also come up with new approach related to extraction, matching and
selection of keywords and there purposes regarding application and one will make
system more scalable for various different types of applications (now we consider
webpages from some websites, which came into web application type) some more
application type as: desktop application, mobile application, gaming application,
etc., or we can say it as, by adding more different domain data. Added to that, we
try to encourage new researchers by saying some words, “any research idea will
come up in any one’s mind so just read more relevant work and get new ideas for
research in this domain or any domain.” For that purpose, we add some future scope
as above.
Extracting Purposes from an Application … 113
References
1. Mont MC, Thyne R (2006) A systemic approach to automate privacy policy enforcement in
enterprises. In: Privacy enhancing technologies 6th international workshop, PET 2006, Cam-
bridge, UK, 28–30 June 2006. Revised Selected papers
2. Mont MC, Thyne R, Bramhall P (2005) Privacy enforcement with HP select access for regula-
tory compliance. Technical report, Technical Report HPL-2005-10, HP Laboratories Bristol,
Bristol, UK
3. Mont MC (2004) Dealing with privacy obligations in enterprises. In: ISSE 2004-securing
electronic business processes. Springer, pp 198–208
4. Agrawal R, Kiernan J, Srikant R, Xu Y (2002) Hippocratic databases. In: VLDB’02: proceed-
ings of the 28th international conference on very large databases. Elsevier, pp 143–154
5. Barbosa D, Wang H, Yu C (2015) Inferencing in information extraction: techniques and appli-
cations. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 1534–1537
6. Goldsteen A, Kveler K, Domany T, Gokhman I, Rozenberg B, Farkash A (2015) Application-
screen masking: a hybrid approach. IEEE Softw 32(4):40–45
7. Kaddu MR, Kulkarni RB (2016) To extract informative content from online web pages by using
hybrid approach. In: 2016 International conference on electrical, electronics, and optimization
techniques (ICEEOT). IEEE, pp 972–977
8. Parvathi P, Jyothis TS (2018) Identifying relevant text from text document using deep learning.
In: 2018 International conference on circuits and systems in digital enterprise technology
(ICCSDET). IEEE, pp 1–4
9. Ravindranath VK, Deshpande D, Girish KV, Patel D, Jambhekar N, Singh V (2019) Infer-
ring structure and meaning of semi-structured documents by using a gibbs sampling based
approach. In: 2019 International conference on document analysis and recognition workshops
(ICDARW), vol 5. IEEE, pp 169–174
10. Singh S, Karwayun R (2010) A comparative study of inference engines. In: 2010 Seventh
international conference on information technology: new generations. IEEE, pp 53–57
11. Yan Y, Tan Q, Xie Q, Zeng P, Li P (2017) A graph-based approach of automatic keyphrase
extraction. Procedia Comput Sci 107:248–255
12. Beliga S, Meštrović A, Martinčić-Ipšić S (2015) An overview of graph-based keyword extrac-
tion methods and approaches. J Inf Organ Sci 39(1):1–20
13. Ying Y et al (2017) A graph-based approach of automatic keyphrase extraction. Procedia
Comput Sci 107:248–255
14. Gali N, Mariescu Istodor R, Fränti P (2017) Functional classification of websites.In: Proceed-
ings of the eighth international symposium on information and communication technology
15. Zhu J et al (2006) Simultaneous record detection and attribute labeling in web data extraction.
In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery
and data mining
16. Mesbah A, Van Deursen A, Lenselink S (2012) Crawling ajax-based web applications through
dynamic analysis of user interface state changes. ACM Trans Web (TWEB) 6(1):1–30
17. analyticsvidhya. https://www.analyticsvidhya.com/blog/2017/11/flashtext-a-library-faster-
than-regular-expressions/. Last accessed 7 Dec 2017
18. Zhao Y, Li J (2009) Domain ontology learning from websites. In: 2009 Ninth annual interna-
tional symposium on applications and the internet. IEEE, pp 129–132
19. Gao R, Shah C (2020) Toward creating a fairer ranking in search engine results. Inf Process
Manage 57(1):102138
20. Lindemann C, Littig L (2007) Classifying web sites. In: Proceedings of the 16th international
conference on World Wide Web
21. Qi X, Davison Brian D (2009) Web page classification: features and algorithms. ACM Comput
Surv (CSUR) 41(2):1–31
Cotton Price Prediction and Cotton
Disease Detection Using Machine
Learning
1 Introduction
Agriculture, the main occupation of India, seems to cover about 70% of the busi-
ness including primary and secondary business that are completely dependent on
agriculture. Market arrival time of any crop plays a prominent role in the crop price
for farmers. Talking about cotton crop specifically, a large number of people tend
to generate dependency on cotton crops, by any of the processes involved in cotton
crops. The increase in demand of cotton at domestic as well as international level
has inclined productivity towards mission-oriented purposes in recent times [1].
However, it is not easy to predict the price and the leaf disease of cotton due to high
fluctuation because of various factors such as weather conditions, soil type, rainfall,
etc. Thus, showing the necessity for the price prediction and disease detection of
cotton crops. Implementing such systems add on to the revenue for the farmers as
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 115
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_9
116 P. Tanwar et al.
well as the country. Thus, having a robust automated solution, especially in devel-
oping countries such as India, not only aids the government in taking decisions in a
timely manner but also helps in positively affecting the large demographics.
Various methodologies have been brought into use to forecast the retail/wholesale
value of agricultural commodities, such as regression, time series and automatic
learning methods. The auto-regressive method and the vector auto-regressive moving
average model of the regression method, tend to predict agricultural commodity
prices by taking into consideration the various factors affecting them [2]. The study
of changes in agricultural product prices is both intriguing and significant from the
standpoint of the government.
We know that manually recorded data is prone to human-caused errors, such as no
or incorrect data reported on a specific day. With new pricing data entering every day
for ML/DL-based models, updating the models may generate stability concerns due
to crop price data quality issues. The data for price prediction in the price prediction
module is of the continuous type, hence it falls under a regression model. The prices
can be determined by recognizing different patterns of the training dataset which is
then passed as the input to the algorithm.
Diseases cause severe effects on plants which are, in the final analysis, the natural
factor. For example, it reduces the overall productivity. The identification and accu-
rate classification of leaf diseases are essential for preventing agricultural losses.
Different leaves of plants have different diseases. Viruses, fungi and bacteria are the
prominent categories for leaf disease [3]. In accordance with this, plants are being
isolated from their normal environment and grown in unique settings. Many impor-
tant crops and plants are highly susceptible to illness. Plant diseases have an impact
on plant development and yield, which has a socio-biological and monetary impact
on agriculture. Plant diseases are one of the ecological elements that contribute to
the coexistence of live plants and animals. Plant cells primarily strengthen their
defences against animals, insects, and pathogens via signalling pathways contained
within them. With careful care, humans have selected and cultivated plants for food,
medicine, clothing, shelter, fibre and beauty for thousands of years. As a result,
monitoring regional crop diseases is critical for improving food security.
Cotton is a drought-resistant crop that delivers a consistent income to farmers who
cultivate in climate-change-affected areas. To detect these cotton leaf diseases appro-
priately, the prior knowledge and utilization of several image processing methods
and machine learning techniques are helpful. In leaf disease detection, we focus on
the number of predictions which can be classified correctly and thus, it falls under
the classification model. There are various classification models that can be used for
detection of disease in a leaf. Classification models are evaluated on the basis of the
results. Thus, this system will become handy for the farmers cultivating cotton to
know the diseased cotton plant and also the price trends for the same.
Cotton Price Prediction and Cotton Disease … 117
2 Literature Review
In this work [4], the approach focuses on the development of a precise forecasting
model for wheat production using LSTM-NN, which is very precise when it comes to
the forecasting of time series. A comparison is also provided between the proposed
mechanism and some existing models in the literature. The R value obtained for
LSTM is 0.81. The results obtained for this system can achieve better results in the
forecasts and, whilst grain production will accelerate over a decade, the output ratio
will keep on decelerating and pose a threat to the economy as a whole.
In this paper [5], the authors presented a comparative study of LSTM, SARIMA
and the seasonal Holt-Winter method for predicting walnut prices. Arecanuts price
data on a monthly basis for 14 districts of Kerala was taken from Department of
Economics and Statistics of Kerala. The RMSE values for LSTM for non-stationary
data were 146.86 and for stationary data it was 7.278, the ARIMA S value was
16.5, and the Holt-Winter value was 18.059. It was concluded that the LSTM neural
network was the best model that fit the data.
In this article [6], the main aim of the researchers here is to help farmers by
focussing on profitable growing of vegetables by developing an Android application
in Sri Lanka. The collected data set is divided into 3 parts, in the ratio of 8:1:1
which implies 80% data was used for training, 10% was kept for testing and the
remaining 10% as validation. The model is then created using LSTM RNN for
vegetable forecasting and ARIMA for price forecasting.
In this study [7], the researcher suggests a prediction model for the price of
vegetables that uses the pre-processing method of season-trend-loess (STL) and
long-short term memory (LSTM). In order to predict monthly vegetable prices, the
model used vegetable price data, meteorological data from major producing districts,
and other data. For this system, the model was applied to Chinese cabbage and radish
on the Korean agricultural market. From the performance measurement results, it was
observed that the suggested model of vegetable price forecast had predicted accuracy
of 92.06% and 88.74%, respectively, for cabbage and radish in China.
In this article [8], the researchers suggest the STL-ATTLSTM model, which
complements the decomposition of seasonal trends that uses the Loess Pre-Treatment
Method (STL) and (LSTM). In this system, STL-ATTLSTM model is used for
predicting vegetable prices based on monthly data using different forms of data. The
LSTM attention model has improved predictive accuracy by about 4–5% compared
to the LSTM model. The combination of the LSTM and STL (STL-LSTM) has
reached predictive accuracy of 12% higher than the LSTM attention model. The
STL-ATTLSTM model outperforms other models, having 380 as the RMSE value
and MAPE as 7%.
In this paper [9], the authors have presented an artificial intelligence based solu-
tion to predict future market trends based on the time series data of cotton prices
collected since 1972. The datasets are evaluated using various models like moving
118 P. Tanwar et al.
average, KNN, auto-arima, prophet and LSTM. After comparison, LSTM model was
concluded to be the best fit with RMSE value of 0.017 and an accuracy of 97%.
In paper [10], the authors presented an user-friendly interface to predict crop prices
and forecast prices for the next 12 months. The data containing the whole price index
and rainfall of various Kharif and Ragi crops like wheat, barley, cotton, paddy etc.
was collected and trained on 6 different algorithms out of which supervised machine
learning algorithm called Decision Tree Regressor was the most accurate with RMSE
value of 3.8 after the comparison.
In this paper [11], the researchers presented a comparative survey of different
machine learning algorithms to predict crop prices. The data consisting of prices
of fruits, vegetables and cereals was collected from the website of the Agricul-
tural Department of India. Random Forest Regressor was concluded as the optimal
algorithm with an accuracy of 92% as compared to other algorithms like Linear
Regression, Decision Tree Regressor and Support Vector Machine.
In paper [12], the researchers have proposed a web-based automated system to
predict agricultural commodity price. In the two series experiments, machine learning
algorithms such as ARIMA, SVR, Prophet, XGBoost and LSTM have been compared
with large historical datasets in Malaysia and the most optimal algorithm, LSTM
model with an average of 0.304 mean square error has been selected as the prediction
engine of the proposed system.
In paper [13], the authors present techniques to build robust crop price predic-
tion models considering various features such as historical price and market arrival
quantity of crops, historical weather data that influence crop production and trans-
portation, data quality-related features obtained by performing statistical analysis
using time series models, ARIMA, SARIMA and Prophet approaches.
In paper [14], the researchers have proposed a model that is enhanced by applying
deep learning techniques and along with the prediction of crop. The objective of
the researchers is to present a python-based system that uses strategies smartly to
anticipate the most productive reap in given conditions with less expenses. In this
paper, SVM is executed as machine learning algorithm, whilst LSTM and RNN are
used as Deep Learning algorithms, and the accuracy is calculated as 97%.
Prajapati et al. [15] presented a survey for detecting and classifying diseases present
in cotton assisted with image processing and machine learning methodologies. They
also investigated segmentation and background removal techniques and found that
RGB to HSV colour space conversion is effective for background removal. They
also concluded that the thresholding technique is better to work with than the other
background removal techniques. The data set included about 190 pictures of various
types of diseases spotted clicked by Anand Agricultural University for classifying
and detecting the type of infection. Performing colour segmentation with masking
the green pixels in the image removed from the background, the otsu threshold on
Cotton Price Prediction and Cotton Disease … 119
the fetched masked image in order to obtain a binary image was applied. It was
concluded from the results that SVM provides quite good accuracy.
Rothe et al. [16], a system that identifies and classifies the diseases that cotton
crop deals with, generally, such as Alternaria, leaf bacterial and Myrothecium was
presented. The images were obtained from fields of cotton in Buldhana and Wardha
district and ICRC Nagpur. The active contour model (snake segmentation algorithm)
is used for image segmentation. The images of cotton leaf detected with disease were
classified using the posterior propagation neural network in which training was done
by the extraction of seven invariant moments from 3 types of images for a diseased
leaf. The mean classification accuracy was 85.52%.
In this article [17], the author has developed an advanced processing system
capable of identifying the infected portion of leaf spot on a cotton plant by imple-
menting the image analysis method. The digital images were obtained with the help
of a digital camera of a mobile and enhanced after segmentation of the colour images
using edge detection technologies such as Sobel and Canny. After thorough study,
homogeneous pixel counting technique was used for the image analysis and disease
classification of Cotton Disease Detection Algorithm.
In this article [18], the researchers carried out detection of leaf diseases assisted
with a neural network classifier. Various kinds of diseases like, target leaf spot,
cotton and tomato leaf fungal diseases and bacterial spot diseases were detected.
The segmentation procedure is performed by k-means grouping. Various character-
istics were extracted and provided as inputs to the ANN. The average accuracy of
classification for four types of diseases is 92.5%.
In this work [19], researchers have an approach to accurate disease detection,
diagnosis and timely management to avoid severe losses of crops. In this proposal,
the input image pre-processing by using histographic equalization is initially applied
to increase contrast in the low-contrast image, the K-means grouping algorithm
that is used for segmentation, it is used to classify the objects depending upon a
characteristic set into number of K classes and then classification occurs through
the Neural-Network. Imaging techniques are then used to detect diseases in cotton
leaves quickly and accurately.
In paper [20], the authors have compared various deep learning algorithms such
as SVM, KNN, NFC, ANN, CNN and realized that CNN is 25% more precise in
comparison to the rest after which they compared the two models of CNN which
were GoogleNet and Resnet50 for examining the lesions on the cotton leaves. They
finally concluded Resnet50 to have an edge over GoogleNet proving it to be more
reliable.
Paper [21] comprises the authors conducting cotton leaf disease detection as
well as suggesting a suitable pesticide for preventing the same. The proposed
system implemented Cnn algorithm and with the use of keras model and appropriate
processing layers built a precise system for disease detection.
Paper [22] is an extensive comparative analysis for detection of organic and nonor-
ganic cotton diseases. It consists of information about various diseases and an advis-
able method to detect that disease in its initial stage only. Different algorithms survey
is also discussed along with their efficiencies as well as pros and cons to recognize
120 P. Tanwar et al.
the most apt one. It is nothing but an in depth analysis and comparison of quite a lot
of techniques.
3 Price Prediction
3.1 Dataset
This system is based on statistical data that has been obtained from the data released
by the Agriculture Department, Government of India almost every year from their
website data.gov.in [23]. The daily market prices of cotton include information about
the state, district, market in that district, variety of cotton grown, the arrival date of
cotton produce, minimum price, maximum price and modal price of cotton in the
market.
It is a deep learning model requiring a large data set. The architecture of the LSTM
model is well suited for prediction systems due to the presence of lags of the important
events in time series for unknown duration. A unit cell of the LSTM model has an
input gate, output gate and forget gat entrance, an exit port and a forgotten door. Input
gate handles the amount of the information needed to flow in the current cell state with
the help of point wise multiplication of sigmoid and tanh in the order, respectively.
Output gate takes the charge of decision making for the information that needs to
be passed to the following hidden state. The information from the previous cell that
need not be remembered is decided by the forget gate.
3.3 Methodology
Firstly, in this system, the data set is loaded. Then, the pre-processing of data is done
where necessary filtration is carried out. The boxplots for min, max and modal prices
are plotted in order to understand the outliers present in the data. In order to avoid the
data inconsistency, the values lying in the outliers are dropped. The Sklearn module
performs the pre-processing of data. The prices columns are taken into consideration
and the arrival date column of the data is transformed as the index of the data which is
converted to datetime format using pandas framework. The dataset is then arranged
in the ascending order of the arrival date. This step is then, followed by visualizing
the data set.
Cotton Price Prediction and Cotton Disease … 121
This model is trained over the dataset which is further divided into the training
and testing data in the ratios of 80:20. The train_test_split of the sklearn module is
used for splitting the dataset. The data is scaled using the MinMaxScaler. 5 hidden
layers are used in the process of training the model. The model consists of dividing
the data set into small batches and the error is calculated by epoch. Keras sequen-
tial model is used for evaluation. For this system, the model is trained against 200
epochs and batch size taken is 32. The optimizers used for the system are Adam
optimizer, RMSPROP and AdaDelta optimizer. The objective here is to predict the
prices of cotton crops. Thus, various optimizer’s results are compared in order to
decide the one that fetches the best output. The training and validation graph is
plotted for the data. Then, for the prediction model, the price prediction graph is
plotted. The graphs are plotted to increase the ease of understanding. For the purpose
of plotting graphs, matplot library is used for visualization. Mean squared error is
used as a loss function, whilst dealing with the keras. Further, error metrics are
calculated in order to understand the performance of the model. For calculating the
error metrics, math library, mean_squared error, mean_absolute error, max_error,
r2_score, explained_variance_score and median_absolute_error are imported, and
each of these errors are calculated for the testing data and the predicted data.
3.4 Results
Figure 1 describes the results obtained on testing the dataset model for the modal
price prediction of cotton with Adam optimizer for which batch size was taken as
32 and to calculate error 200 epochs were considered. The green line represents the
actual price of cotton crop and the red line indicates the predicted price of the crop.
It is seen that the graph follows the trend throughout.
Accuracy. Figure 2 shows the curve for the training loss versus the validation loss
graph using the Adam optimizer for LSTM model. The training data is represented
by blue line, and the line graph in red is for the validation data. From the graph, it
can be seen that the values converge during training. The data is neither overfitting
nor underfitting for this model.
In Table 1, we can view the performance by taking into consideration various
accuracy parameters for the LSTM model for this system. We have taken 3 different
optimizers in order to check the one that gives the best result. The values obtained
Fig. 2 Training loss versus validation loss graph using Adam optimizer
for each accuracy measure are compared, and the best values are considered. After
training and testing process of the models for the same dataset, these values were
calculated. It can be inferred from the comparison that LSTM that uses Adam opti-
mizer outperforms the LSTM model that uses other optimizers for all the values
obtained. Thus, it can be said that LSTM model that uses Adam optimizer is better
suited for the price prediction of cotton crop for this system.
4 Disease Detection
4.1 Dataset
The initial step is to collect data from the public database, considering an image as
an input. The most popular image domains have been acquired, so any format can
be used as batch input, for example .bmp, .jpg or .gif. The dataset comprises of 1951
images as training, 106 as testing and 253 as validation datasets. The dataset has
four kinds of images in each of the categories, that is, diseased cotton leaf, diseased
cotton plant, fresh cotton leaf and fresh cotton plant.
4.3 Methodology
Table 2 Classification
Category Precision (%) Recall (%) f 1-score (%)
accuracy metrics
Diseased cotton 100 80 89
leaf
Diseased cotton 96 89 93
plant
Fresh cotton leaf 81 100 90
Fresh cotton 93 96 95
plant
4.4 Results
The classification accuracy metrics report for CNN algorithm was generated with
the values as shown in Table 2. To find the accuracy metrics, we first convert the
testing data into a numpy array to read.
A confusion matrix works best with the help of a dataframe, so the created numpy
array is first converted into a dataframe. A normalized confusion matrix was also
found with an indication of well computed accuracies as shown in Fig. 3.
The training accuracy versus validation accuracy and training loss versus vali-
dation loss graphs were also plotted as shown in Figs. 4 and 5, respectively, which
depicted neither underfitting nor overfitting of data and showed optimal results.
The plot represents a good fit due to the following reasons:
• The training loss plot decreases until a point of stability.
• The validation loss plot decreases up to a point of stability and a small gap exists
with the training loss plot.
5 Conclusion
Agriculture contributes about 20% to India’s GDP, which plays an important role in
India’s economy and employment, so we need to make sure that this segment does
not lose. Hence, a machine learning based system consisting of price prediction and
disease detection modules was created with the ambition of benefiting the society to
the best possible capacity. The novelty of the proposed system is that it is an integrated
one consisting of both price prediction along with disease detection which does not
exist at present based on the research done. Such a system is of immense utility and
benefit to the actual users.
The price prediction module was implemented using the LSTM algorithm and
different optimizers were used to compare the results. LSTM with ADAM is the best
optimizer with RMSE value of 184.52, whilst RMSProp and AdaDelta have RMSE
values of 209.12 and 323.74, respectively.
Cotton Price Prediction and Cotton Disease … 127
Also, the disease detection module used CNN algorithm to classify the cotton
plants and leaves as fresh or diseased possessing an accuracy of 91.5%.
References
19. Warne PP, Ganorkar SR (2015) Detection of diseases on cotton leaves using K-mean clustering
method. Int Res J Eng Technol (IRJET) 2(4). e-ISSN: 2395-0056
20. Caldeira RF, Santiago WE, Teruel B (2021) Identification of cotton leaf lesions using deep
learning techniques. Sensors 21(9):3169
21. Suryawanshi V, Bhamare Y, Badgujar R, Chaudhary K, Nandwalkar B (2020) Disease detection
of cotton leaf. Int J Creat Res Thoughts (IJCRT) 8(11)
22. Kumar S, Jain A, Shukla AP, Singh S, Raja R, Rani S, Harshitha G, AlZain MA, Masud M
(2021) A comparative analysis of machine learning algorithms for detection of organic and
nonorganic cotton diseases. Hindawi Math Probl Eng 2021, Article ID 1790171
23. https://data.gov.in/
24. Saradhambal G, Dhivya R, Latha S, Rajesh R (2018) Plant disease detection and its solution
using image classification. Int J Pure Appl Math 119(14):879–884. ISSN: 1314-3395
Acute Leukemia Subtype Prediction
Using EODClassifier
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 129
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_10
130 S. K. Abdullah et al.
with blood slide images for detection of leukemia employing feature selection and
classification. They reported 93% accuracy with KNN. Subhan et al. [4] applied a
similar approach where they segment blood cell images, extract features and clas-
sify. Visual cues inspired feature extraction approach from segmented cells is also
followed in [5–7]. Neural network-based classification is also followed in similar
experiments [8, 9]. A comparative study of such classification algorithms is made
in [10]. Recently, deep learning approaches are also being explored. Sahlol et al.
[11] have presented a hybrid approach for leukemia classification from white blood
cells by using VGGNet and bio-inspired salp swarm algorithm. Unlike blood cell
images, bone marrow images have also been used in convolutional neural network
for leukemia prediction [12].
Since the introduction of microarray gene expression leukemia dataset by Golub
et al. [13], investigation into genetic root of leukemia is being increasingly studied.
Besides prediction of leukemia subtypes from a subject, identification of the
associated gene(s) has become a prime interest. A recently developed classifier
called EODClassifier [14] integrates discriminant feature selection and classifica-
tion following an ensemble approach. Using this classifier, one can select top n
discriminant features for training and prediction. In this paper, two acute leukemia
subtypes, i.e., ALL and AML are predicted using EODClassifier from microarray
gene expression [13]. Multi-fold experiments revealed consistently high performance
and robustness in leukemia prediction.
2 Overview of EODClassifier
By default, nof = ‘all’ and p = 1. Given a training set of samples X_train with
labels y_train, training can be done as
Acute Leukemia Subtype Prediction Using EODClassifier 131
eod.fit(X_train,y_train)
y_pred = eod.predict(X_test)
where y_pred is an array of predicted classes for the test samples. Subsequently,
standard evaluation measures can be applied to quantize the classification perfor-
mance. As of now, it supports binary classification which works for two classes only.
Multi-class support is not available.
3 Methodology
A brief introduction to the leukemia gene expression dataset [13] employed in this
work is presented here. There are 72 gene expression samples of leukemia patients.
Each of these samples contains the measured and quantified expression levels of
7129 number of genes. Gene expression levels of the samples are visually shown in
Fig. 2. It may be noted that some genes are negatively expressed.
Usually, gene expression datasets such as the present leukemia dataset [13] contain
limited number of samples and high number of features. Moreover, it may be real-
ized from Fig. 2 that the gene expression levels of AML and ALL types are not very
distinct. Figure 3 shows the distributions of four sample genes for all the 72 samples.
It reflects that there are no distinct decision boundaries for most of the features.
132 S. K. Abdullah et al.
Fig. 1 Block diagram of the leukemia subtype prediction method (prediction is done on the basis
of microarray gene expression data of different subjects)
Fig. 2 Expression levels of 7129 genes for 72 samples (The first 25 samples are of AML type and
the remaining 47 samples are of ALL type)
Hence, ensemble approaches such as the one followed in EODClassifier are suit-
able for prediction of high-dimensional samples. It may be noted that this classifier
predicts the final class based on the decisions of each individual features and their
fitness measures. Thus, in this classifier, a discriminating feature contributes more
in determining the final class of a recall sample.
Acute Leukemia Subtype Prediction Using EODClassifier 133
Fig. 3 Expression levels of four sample genes for 25 AML (first class) and 47 ALL (second class)
samples
Experiments have been carried out on the leukemia gene expression dataset [13]
which contains 72 instances and 7129 attributes. All attributes have numerical values
and the outcome or class contains binary values ‘1’ or ‘0’. Class 1 signifies that
the subject is acute lymphocytic leukemia and class 0 signifies that the subject is
acute myelocytic leukemia. Experimental setup is discussed in Sect. 1. Prediction
performance along with comparative performance analysis with respect to other
classifiers is presented in Sect. 2. Finally, some observations are discussed in Sect. 3.
strategy is also adopted. Besides the experiments with the EODClassifier, similar
experiments have been conducted with other well-known classifiers for comparative
study. In order to report the obtained results, standard evaluation metrics such as
recall, prediction, f-score and accuracy have been adopted.
Classification models are trained with mostly default parameters. There are only a few
required changes in parameters to these models. These parameters are presented in
Table 1. As leukemia subtype prediction is shown with the EODClassifier, confusion
matrices obtained for different cross-validation are also shown in Fig. 4. It may be
realized that the classification performance of the said classifier is reasonably good
and the misclassification rate is nominal.
Mean precision, recall, f-score, accuracy and RMSE of all folds have been reported
in Table 2 for threefold, fivefold, tenfold, 20-fold and leave-one-out (LOO) cross-
validation experiments with multiple classifiers along with the present classifier of
interest, i.e., the EODClassifier. Default values of parameters as available in scikit-
learn are taken in naïve Bayes. In KNN, the number of neighbors, i.e., k is 3. In
SVM, linear kernel with gamma = ‘auto’ and C = 1 is employed. For multilayer
perceptron (MLP), 100 neurons in the hidden layer with ‘relu’ activation function
are taken. In random forest (RF) classifier, n_estimators = 10 and random_state =
0. At last, for the EODClassifier, we have taken the parameters as nof = ‘all’ and p
= 5.
Table 1 Parameters of different classifiers for training and prediction of acute leukemia subtypes
Classifier Parameters
GNB priors = None
KNN n_neighbors = 3, weights = ‘uniform’, p = 2, metric = ‘minkowski’ p = 2,
metric = ‘minkowski’
SVM kernel = ‘linear’, gamma = ‘auto’, C = 1
MLP random_state = 41, hidden_layer_sizes = 100, activation = ‘relu’, solver =
‘adam’, alpha = 0.0001, batch_size = ‘auto’, learning_rate = ‘constant’,
learning_rate_init = 0.001, power_t = 0.5, max_iter = 200
Random forest n_estimators = 10, random_state = 0, criterion = ‘gini’, max_depth = None,
min_samples_split = 2
EODClassifier nof = ‘all’, p = 5
Acute Leukemia Subtype Prediction Using EODClassifier 135
Fig. 4 Confusion matrices obtained for different folds of cross-validation with the EODClassifier.
Misclassification rate is very less (as reflected in the non-diagonal positions)
4.3 Discussion
As evident from Table 2, EODClassifier achieves over 96% accuracy in all cross-
validation experiments. Accuracies of other classifiers are sometimes less and some-
times closed by that of the EODClassifier. Each method has its own merits and
demerits. It is important to note that no single classifier can be identified as the best
for all problems. A classifier which struggles on a dataset may yield great results on
another dataset. However, it cannot be denied that consistency is important. In that
respect, one may observe that performance of EODClassifier has been consistent in
all experiments conducted in the present work, which reflects its robustness besides
having high classification performance.
136 S. K. Abdullah et al.
Table 2 Acute leukemia subtype prediction performance with multiple classifiers for threefold,
fivefold, tenfold, 20-fold and LOO cross-validation
#Fold Classifier P R F-score Accuracy RMSE
3 NB 0.9267 1.0 0.9610 0.9473 0.1846
KNN 0.8571 0.9743 0.9116 0.8771 0.3487
SVM 0.9743 0.9777 0.9733 0.9649 0.1529
MLP 0.8898 0.8944 0.9398 0.9122 0.2406
RF 0.9440 0.9583 0.9311 0.9123 0.229
EOD 0.9696 0.9761 0.9761 0.9649 0.1529
5 NB 0.975 1.0 0.9866 0.9818 0.0603
KNN 0.8683 1.0 0.9276 0.8954 0.2849
SVM 0.9666 0.975 0.9633 0.9500 0.1180
MLP 0.8955 0.9777 0.9411 0.9121 0.2199
RF 0.95 0.9355 0.9447 0.9303 0.2022
EOD 0.9666 0.9714 0.9664 0.9636 0.1206
10 NB 0.9800 1.0 0.9888 0.9833 0.0408
KNN 0.89 1.0 0.9377 0.9100 0.2119
SVM 0.9800 0.975 0.9746 0.9666 0.0816
MLP 0.9400 0.975 0.9638 0.9433 0.1040
RF 0.9600 0.9550 0.9292 0.9133 0.2080
EOD 0.975 0.975 0.9714 0.9666 0.0816
20 NB 0.9833 1.0 0.99 0.9833 0.0288
KNN 0.9083 0.975 0.9433 0.9083 0.1508
SVM 0.9666 0.975 0.9633 0.9500 0.0860
MLP 0.9416 1.0 0.9633 0.9416 0.0930
RF 0.9666 0.9166 0.9800 0.9666 0.0577
EOD 0.975 0.975 0.9666 0.9666 0.0577
72 NB 0.6491 0.6527 0.6491 0.9824 0.0175
(LOO) KNN 0.6491 0.6250 0.6491 0.9122 0.0877
SVM 0.6315 0.6388 0.6315 0.9473 0.0563
MLP 0.6491 0.6527 0.6491 0.8596 0.1403
RF 0.5789 0.6250 0.5789 0.8771 0.1280
EOD 0.6315 0.6315 0.6315 0.9649 0.3508
bold values are denote performance of the EODClassifier
5 Conclusion
References
1. Bullinger L, Dohner K, Dohner H (2017) Genomics of acute myeloid leukemia diagnosis and
pathways. J Clin Oncol 35(9):934–946
2. Maria IJ, Devi T, Ravi D (2020) Machine learning algorithms for diagnosis of Leukemia. Int
J Sci Technol Res 9(1):267–270
3. Joshi MD, Karode AH, Suralkar SR (2013) White blood cells segmentation and classification
to detect acute leukemia. Int J Emerg Trends Technol Comput Sci 2(3):147–151
4. Subhan MS, Kaur MP (2015) Significant analysis of leukemic cells extraction and detection
using KNN and hough transform algorithm. Int J ComputSci Trends Technol 3(1):27–33
5. Laosai J, Chamnongthai K (2014) Acute leukemia classification by using SVM and K-Means
clustering. In: Proceedings of the international electrical engineering congress, pp 1–4
6. Supardi NZ, Mashor MY, Harun NH, Bakri FA, Hassan R (2012) Classification of blasts in
acute leukemia blood samples using k-nearest neighbor. In: International colloquium on signal
processing and its applications. IEEE, pp 461–465
7. Adjouadi M, Ayala M, Cabrerizo M, Zong N, Lizarraga G, Rossman M (2010) Classification
of Leukemia blood samples using neural networks. Ann Biomed Eng 38(4):1473–1482
8. Sewak MS, Reddy NP, Duan ZH (2009) Gene expression based leukemia sub-classification
using committee neural networks. Bioinform Biol Insights 3:BBI-S2908
9. Zong N, Adjouadi M, Ayala M (2006) Optimizing the classification of acute lymphoblastic
leukemia and acute myeloid leukemia samples using artificial neural networks. Biomed Sci
Instrum 42:261–266
10. Bakas J, Mahalat MH, Mollah AF (2016) A comparative study of various classifiers for
character recognition on multi-script databases. Int J Comput Appl 155(3):1–5
11. Sahlol AT, Kollmannsberger P, Ewees AA (2020) Efficient classification of white blood cell
leukemia with improved swarm optimization of deep features. Sci Rep 10(2536):1–11
12. Rehman A, Abbas N, Saba T, Rahman SIU, Mehmood Z, Kolivand H (2018) Classification of
acute lymphoblastic leukemia using deep learning. Microsc Res Tech 81(11):1310–1317
13. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh
ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classifica-
tion of cancer: class discovery and class prediction by gene expression monitoring. Science
286(5439):531–527
14. Hasan SR, Mollah AF (2021) An ensemble approach to feature selection and pattern classifi-
cation. In: Proceedings of international conference on contemporary issues on engineering and
technology, pp 72–76
15. EODClassifier (2021) https://github.com/iilabau/EODClassifier. Accessed 15 June 2021
Intrusion Detection System Intensive
on Securing IoT Networking
Environment Based on Machine
Learning Strategy
Abstract The Internet of Things is the technology that is exploding in the day-to-
day life of the home to the large industrial environment. An IoT connects various
applications and services via the internet to make the environment contented. The
way of communication among the devices leads to network vulnerability with various
attacks. To protect from the security vulnerability of the IoT, the Intrusion Detection
Systems (IDS) is employed in the network layer. The network packets from the
interconnected IoT applications and services are stored in the Linux server on the
end nodes. The packets are got from the server using the crawler into the network
layer for attack prediction. Thus, the work contains the main objective is to identify
and detect the intrusion among the IoT environment based on machine learning (ML)
using the benchmark dataset NSL-KDD. The NSL-KDD dataset is pre-processed to
sanitize the null values, eliminating the duplicate and unwanted columns. The cleaned
dataset is then assessed to construct the novel custom features and basic features for
the attack detection, which represent the feature vector. Novel features are constructed
to reduce the learning confusion of machine learning algorithm. The feature vector
with the novel and basic features is then processed by employing the feature selection
strategy LASSO to get the significant features to increase the prediction accuracy. Due
to the outperform of ensembled machine learning algorithms, HSDTKNN (Hybrid
Stacking Decision Tree with KNN), HSDTSVM (Hybrid Stacking Decision Tree
with SVM) and TCB (Tuned CatBoost) are used for classification. Tuned CatBoost
(TCB) technique remarkably predicts the attack that occurs among the packets and
generates the alarm. The experimental outcomes established the sufficiency of the
proposed model to suits the IoT IDS environment with an accuracy rate of 97.8313%,
0.021687 of error rate, 97.1001% of sensitivity, and specificity of 98.7052%, while
prediction.
D. V. Jeyanthi (B)
Department of Computer Science, Sourashtra College, Madurai, Tamilnadu, India
B. Indrani
Department of Computer Science, DDE, MKU, Madurai, Tamilnadu, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 139
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_11
140 D. V. Jeyanthi and B. Indrani
1 Introduction
The IDS is the concept of shield to avoid attacks in the computer system. To compro-
mise the security and constancy among the network which is connected to the internet
using various techniques, the IDS are the most required part in the network security
configuration. Generally, IDS can be classified into two categories: Anomaly detec-
tion and signature-based detection. An anomaly based detection system constructs a
database with normal and generates an alarm when there is existence of the abnormal
behavior from the normal. The signature-based detection system which maintains a
database includes the obtainable patterns of the attacks [1, 2]. This system verifies
whether the similar patterns or data exists in the current situation and provide an
indication that attack or not.
IoT is based on the network layer design of the intercommunication which is
responsible for data packets moving among the hosts. In IoT architecture, the network
layer is vulnerable and miscellaneous phase which is noticeable to various security
concerns. The main reason for the security vulnerability in the IoT is that it contains
large number of linked nodes which may lead to failure of entire system due to the
affected of single node. The IoT architecture flaws lead to attacks such as DDoS,
remote recording, botnets, data leakage, and ransomware. Manipulating a firewall is
a primary security measure to fight with the vulnerability on IoT, but that is not a
prominent solution due to the variability of the issues on the IoT architecture.
This work proposes a framework that employs machine learning techniques to
predict security anomalies in IoT environment. Thus, the work focuses on the intru-
sion occurrence on the IoT connected devices using machine learning techniques.
The work adopted the NSL-KDD dataset for the attack prediction using the machine
learning techniques by creating novel custom features from the given dataset to
increase the prediction accuracy and reduce the training time. Thus, the proposed
framework provides high-performance results for NSL-KDD_CF than the employ-
ment of feature selection in NSL-KDD dataset. The following section describes the
attack prediction process.
2 Review of Literature
The scheme suggested by Soni et al. [3] creates usage of two methodologies C5.0
and ANN. To classify the information based on the performance of C5.0 and ANN,
a set of significant features must be elected. To attain unique attacks, Somwang et al.
[4] developed a hybrid clustering model joining PCA and FART. Exhausting hier-
archical clustering and SVM, Su et al. [5] improved detection accuracy by uniting
Intrusion Detection System Intensive … 141
IDS with hierarchical clustering and SVM. KDD 99 dataset was utilized to conduct
the experiments. As a result, DoS and probe attacks have been perceived enhanced
outcomes. Ei Boujnouni and Jedra et al. [6] proposed an anomaly detection-based
network intrusion identification system. This scheme encompasses data conversion,
normalization, relevant feature, and novelty discovery models based on a classifi-
cation and resolution scheme composed of SPPVSSVDs and SMDs for defining
whether traffic is normal or intrusive. Bhumgara et al. [7] proposed cross approaches
merging J48 Decision Tree, SVM, and NB to discern dissimilar varieties of attacks
and includes dissimilar sorts of accuracy deliberating to algorithms. Based on the
OPSO-PNN model, the researcher proposed an anomaly based IDS in Sree Kala and
Christy [8].
In the paper [9], an IDS is proposed with minimal set of features that employs
random forest classifier as supervised machine learning method. These manually
selected features assist in training and detect the intrusion in IoT environment
with minimum selection and relevant features. The work [10] proposes construct
an accurate model by employing various data pre-processing techniques that allow
the machine learning algorithm to classify the possible attacks for the parameters
exactly using cybersecurity dataset.
The work [11] identifies various kinds of attacks of IoT threats using deep learning
based IDS for IoT environment using five benchmark datasets. The main objective of
the work [12] was to compare both KDDCup99 and NSL-KDD by the performance
evaluation of various machine learning techniques with large set of classification
metrics. The work [13] focuses on IoT threats by detecting and localizing the IoT
devices which are infected and generate alarm. The research [14] proposes an archi-
tectural design and implementation with hybrid strategy based on multi-agent and
block chain using deep learning algorithms. The researcher [15] proposed a novel
framework for inspecting and labeling the suspected packet header and payload to
increase the accuracy of the prediction. The paper [16] proposed a system to monitor
the soldiers those who are wounded and lost on the front line by tracking the data from
sensors. The paper [17] proposed a system in wireless networks for the sustainable
smart farming with block chain technology is evaluated to measure the performance.
The paper [18] designed a system to control the devices which is far from the control
system by sending the status using the sensors.
3 Proposed Scheme
This section describes the proposed scheme for the IDS for IoT environment with
NSL-KDD. Proposed architecture depicts the attack identification and recognition
based on the basic and novel custom features derived. The features are employed
with the ML for the attack prediction. The detailed architecture for the proposed
scheme is shown in Fig. 1. The proposed architecture for intrusion detection in IoT
environment used NSL-KDD. This architecture handles packet information, missing
value imputation, duplicate detection, best feature selection and classification. The
142 D. V. Jeyanthi and B. Indrani
proposed work mainly focuses on to generate novel features to solve learning confu-
sion problem for classifiers, and it helps to analyst to understand the features. The
proposed work holds 5 layers (i) Data Collection Layer (ii) Pre-Processing Layer (iii)
Construction Layer (iv) Feature Selection Layer and (v) Detection Layer to detect
the attacks.
Table 1 shows the parameters that was used in this work.
Intrusion Detection System Intensive … 143
3.1.1 Dataset
The NSL-KDD has 41 features that considered as into basic, content, and traffic
features. In compare to KDD-Cup dataset, an inventive form of NSL-KDD does
not undergo from KDD-Cup’s shortcomings [17]. In addition, the NSL-KDD (D)
training sets shows a rational number of records. Due to this benefit, it is possible to
execute the experiments on the entire dataset short of manually choosing a small part.
The dataset D includes various attack groups with ratio of DOS (79%), PROBING
(1%), R2L (0.70%), U2R (0.30%) and Normal (19%).
This pre-processing layer of this work includes the pre-processing of the raw dataset
to clean and for the process of deriving novel custom features. The pre-processing of
the dataset processes the dataset (D) by eliminating the duplicate columns, avoiding
missing values and redundant columns from the dataset reduces the size for the
further processing.
In Fig. 2, the missing value illustration is shown for the given dataset. The figure
depicts that the given dataset doesn’t contain any missing values.
144 D. V. Jeyanthi and B. Indrani
In this phase, the features in the dataset are encoded for the unique format for
process. The fields in the dataset are in various formats so it is complex to compute
the custom features for the progression thus the work encodes the fields of the set
into uniform format with the encoding value. The sample encoding value for some
of the features in shown in Table 2.
This construction layer builds the proposed novel features which are derived from
the dataset (D). These proposed novel features (DN ) are extracted from the fields
in the dataset (d) which employed in the prediction to increase accuracy, and it is
helping to avoid the learning confusion for ML techniques.
Total Bytes:
The sum of the total number of source and destination bytes among the packets
transaction is integrated to derive the custom feature Total Bytes.
Intrusion Detection System Intensive … 145
Byte Counter:
This custom feature is derived to evaluate the Byte Counter with respect to the Total
Bytes evaluated which is proportional to the total count.
Interval Counter:
This custom feature is derived to evaluate the Byte Counter with respect to the Total
Bytes evaluated which is proportional to the total count.
Unique ID:
The custom feature Unique ID is derived with the concatenation of the service, flag
and protocol type of the captured packet.
Average SYN Error = (SYN Error Rate + Destination Host SYN Error Rate)/2
REJ Error Mean = (REJ Error Rate + Destination host REJ Error Rate)/2
Login State:
This login state feature is derived to identify whether the host login is enable or not.
This feature is derived using the feature logged-in.
The purpose of this layer is to identify significant features among the derived features
of the given dataset in order to increase the accuracy of the prediction. This work
presents the following techniques for selecting best features in cleaned NSLKDD
(DC). The selected best features are help to improve the accuracy of the classifier.
3.4.1 PSO
Whenever PSO is employed together as a group and when private involvements are
learned, those experiences are consolidated. According to the proposed resources,
the optimal solution follows a predetermined path. Alternatively, this path is called
the particular best solution (pbest ) of the particle as it has been measured as the
shortest path. By analyzing its individual rapid experiences and interactions with
others, each particle in the exploration space searches for the best solution. A better
fitness value can also be achieved by detecting any particle adjacent to any particle
in the group. This is denoted as the gbest . Each particle has its linked velocity for
the acceleration concerning achieving the pbest and gbest . The basic thought of PSO
is to attain global optimal solution, thereby moving each particle toward pbest and
gbest with random weight at every phase. Particle swarms are randomly generated
Intrusion Detection System Intensive … 147
and then progress through the search space or primary space until they identify the
optimal set of features by keeping track of their position and velocity. As a result,
the particle’s current position (p) and velocity (v) are described as follows:
D = pi D + vi D
pik+1 k k+1
vik+1
D = w ∗ pi D + a1 ∗ r 1 ∗ pid − pi D + a2 ∗ r 2 ∗ pgd − pi D
k k k
where kth iterations in the procedure is denoted with k. In the search space, the
dth dimension is represented as d ∈ D. Inertia weight is denoted by “w” used to
regulate the influence of the preceding of the present velocity. The random values
are denoted as r 1 and r 2 for uniformly distributed in [0, 1]. Acceleration constants
are represented as a1 and a2. The elements of pbest and gbest are represented as pid and
pgd in the dimension dth. Particle positions and velocity values are updated without
interruption until the stopping criteria are met, which can be either a large number
of iterations or a suitable fitness value.
Through LASSO feature selection, regression coefficients are shrunk, and many of
them are dropped to zero. This aims to normalize model arguments. As a result
of shrinkage, during this phase, the model must be restructured for every non-zero
value. In statistical models, this technique minimizes related errors in predictions. An
excessive transaction of accuracy is offered by LASSO models. Due to the shrinkage
of coefficients, accuracy increases as the inconsistency is reduced, and bias is reduced.
It extremely relies on parameter “λ”, which is the adjusting factor in shrinkage. The
larger “λ” becomes, then the more coefficients are enforced to be zero. Additionally,
it is useful for wipe out all variables that are not correlated to and that are not
accompanying with the response variable. Thus, in LR (Linear Regression), this
algorithm shrinks the error present in the work by providing an upper bound for
squares. If “λ” is a parameter, then the LASSO estimator will be conditional. The
“λ” influences shrinkage, with an upsurge in “λ” increasing shrinkage. An inverse
connection exists between the upper bound of all coefficients and the “λ”. Whenever
the upper bound raises, the attribute λ diminishes. At whatever time the upper bound
is decreased, the “λ” grows instantaneously.
148 D. V. Jeyanthi and B. Indrani
The detection layer is surrounded with machine learning algorithms for detecting
the attack using classification techniques. The ML system is employed to the attack
prediction for the intrusion detection system. The best features (S B ) obtained from the
feature selection phase are used as training (X n ) and testing (yn ) set for the prediction
model. The prediction models employed are as follows:
3.5.1 HSDTKNN
One of the ensemble based machine learning algorithm is stacking. The advan-
tage of stacking is that it can harness the abilities of a range of well-performing
models on a classification or regression job and create predictions that have improved
performance than any solo model in the ensemble. This proposed algorithm entitled
“Hybrid Stacking Decision Tree with K-Neighbors Classifier.” Tree-based models
are a class of nonparametric calculations that work by distributing the component
space into different more minor areas with comparative reaction esteems utilizing a
set of splitting rules. Predictions are accomplished by fitting a more straightforward
model in every region. Given a training data X n = {t 1 , …, t n } where t i = {t i , …, t i }
and the training data X n encompasses the subsequent attributes {T 1 , T 2 , …, T n } and,
respectively, attribute Tn comprises the next attribute values {T 1i , T 2i , …, T ni }the
instance of the input and specifies a record for network packet. Each instance in the
training data X n has a specific class “yn ” is the class tag that means the output of
every record perceived. The algorithm first searches for the multiple copies of the
same instance in the training data X n .
The stacking with Decision Tree (DT) is employed to predict the attack among
the network. The ensemble classifier is designed by stacking DT (Meta) and KNN
(Base) together. The DT classifier is integrated with the KNN to enhance the overall
performance of the training time. From the significant feature set, SFS is employed
to construct the training set X n, the each features of n elements is stacked with a
different assigned value. Afterward, the DT model is fitted to the n − 1 portions of
the setup, while the predictions of the network are prepared at the nth part of the
stack. To fit the entire set X n , the same process is repetitive for every part of the
training set X n (i). To both yn and X n , the stacked classifier KNN is fitted. There are
two sets for training: training set and validation set. The validation set is used to
construct the new model with performed evaluations on the set yn .
The stacking model of Meta learners is very much like trying to find the best
combination of base learners. In this classifier (HSDTKNN), (Table 3) the Base
Learner is KNN, followed by the Meta Learner, Decision Tree (DT). The present
algorithm begins by specifying the number of base algorithms. This algorithm uses a
single-base algorithm called “KNN.” There are specific parameters associated with
KNN, such as ten neighbors, KD-Tree computed, and Euclidean distance measures.
DT Meta learner has other parameters, including five levels of max depth, none of the
Intrusion Detection System Intensive … 149
Table 3 Parameter of
HSDTKNN—Parameters Value(s)
HSDTKNN
Base learner K Neighbor Classifier (KNN)
Meta learner Decision Tree (DT)
Cross validation 5
Max depth (DT) 5
Random state (DT) None
Max leaf node (DT) 20
K-Neighbors (KNN) 10
Algorithm (KNN) KD-Tree
Distance metric (KNN) Euclidean
random state, and 20 maximum leaf nodes. Next, it performs k-fold cross-validation
with value “5” for predicting the value from the base algorithm. Having received
a prediction from the base learner, the meta learner begins to generate ensemble
predictions.
3.5.2 HSDTSVM
The Decision Tree is a tree structure in which internal nodes represent tests on
attributes, branches represent outcomes, and leaf nodes represent class labels.
Subtrees rooted at new nodes are then created using the same procedure as above.
An algorithm based on “Hybrid Stacking Decision Trees Using Support Vector
Machines” is proposed in this work. SVMs are essentially binary classifiers that
divide classes by boundaries. SVM is capable of tumbling the mistake of experi-
mental cataloging and growing class reparability using numerous transformations
instantaneously. As the margin reaches the maximum range, separation between
classes will be maximized. Expect to be that “yn = {x i , yi }” is a testing sample
containing two yi = 1/0 classes, and each class is composed of “x i where, i = 1,
…, m” attribute. With solo decision-making or learning models, DT and SVM are
more performant as a stack. Although the SVM is an accurate classification method,
its deliberate processing makes it a very slow method of training when dealing with
large datasets. In the training phase of SVM, there is a critical flaw. To train enormous
data sets, here need an efficient data selection method based on decision trees and
support vector classification. During the training phase of the proposed technique,
the training dataset for SVM is reduced by using a decision tree. It addresses the issue
of selecting and constructing features by reducing the number of dataset dimensions.
An SVM can be trained using the disjoint areas uncovered by an SVM decision
tree. A smaller dataset thus finds a more complex region than a larger one obtained
from the entire training set. The complexity of decision trees is reduced with small
learning datasets, despite the fact that decision rules are more complex.
150 D. V. Jeyanthi and B. Indrani
Table 4 Parameter of
HSDTSVM—Parameters Value(s)
HSDTSVM
Base learner Support Vector Machine (SVM)
Meta learner Decision Tree (DT)
Cross validation 5
Max depth (DT) 5
Random state (DT) None
Max leaf node (DT) 20
Kernel Sigmoid
Co-efficient Sigmoid
Verbose True
In this classifier (HDTSVM), (Table 4) base learner is the support vector machine
(SVM) and the meta learner is the Decision Tree (DT). The workflow of the present
algorithm begins with specifying the number of the base algorithm. Here, the base
algorithm (SVM) has its own parameters such as Sigmoid Kernel, Sigmoid Co-
Efficient, and Verbose is true. The meta learner DT also has its own parameters such
as five-level of max depth, None of Random State, and twenty Maximum leaf node.
Next, it performs k-fold cross-validation with the value of “5” to predict the value
from the base algorithm. After getting predictions from the base learner, the meta
learner starts predictions to generate an ensemble predicted output.
generalizes on testing data as it should and prevents overfitting, offering the chance
to prepare the model with a pool of parameter and pick the ones that generalize better
with testing data.
The parameters tuning for the TCB classifier in this work are mentioned in Table
5. The table depicts the tuned value of the tuned parameter, while prediction and its
aspects such as iterations, learning rate, and loss function, depth, etc. With these tuned
parameters implemented in the proposed classifier which increases the classifier
accuracy and decreases the error rate, while attack prediction.
The proposed IDS architecture is implemented using python with Anaconda Environ-
ment on Ubuntu Linux, 64 bit system environment with an Intel Xeon E5-2600 with
16 GB RAM, GTX 1050Ti 4 GB Graphics card and 6 TB hard disk on Rack Server.
The training and testing sets contain the attack and normal packets information that
are collected from NSL-KDD dataset. Processes of this experiment begin with the
cleaning and encoding module to extract non-duplicate and useful features are stored
into the feature vector. To reduce the machine learning algorithm’s learning confu-
sion (zero value) problem, novel features (NSLKDD_CF) are constructed. NSLKDD
dataset used feature selection (PSO, LASSO) to avoid undesirable features and select
only the best features. In the next step, classification (HSDTSVM, HSDTKNN, and
TCB) is performed using the training set and test set. Based on the detection accuracy
between the NSLKDD and NSLKDD_CF (Existing data set features and proposed
custom features), the results are compared with the NSLKDD and NSLKDD_CF.
This section of the work includes the evaluation results for the feature selection and
classification strategies are as follows. The evaluation results are computed for the
dataset NSL-KDD and NSL-KDD_CF (Novel Features) is depicted with illustra-
tions. The elapsed time (in sec) is evaluated for the feature selection strategy, while
selection is displayed in Fig. 3. It shows that the proposed method LASSO consumes
less elapsed time than PSO.
152 D. V. Jeyanthi and B. Indrani
30
25
20
15
10
PSO LASSO
Feature Selection Algorithm (s)
Figure 4 illustrates the accuracy and error rate for NSLKDD and NSLKDD_CF
with the classifiers performance. The accuracy ratio is high and gives less error rate
for the proposed method TCB than other method. Table 6 shows TP, TN, FP, and FN
ACCURACY AND ERROR RATE
0.8
0.6
0.4
0.2 0.1030301
0.0430016 0.0514876 0.0216874 0.0342984 0.0122531
0
HSDTKNN HSDTSVM TCB HSDTKNN HSDTSVM TCB
NSLKDD NSLKDD_CF
CLASSIFIERS
Accuracy Error Rate
values for the classifiers HSDTKNN, HSDTSVM, TCB with the datasets NSLKDD,
NSLKDD_CF.
Table 7 compares the classification metric performance of three classifiers
HSDTKNN, HSDTSVM, TCB compared to two datasets NSL KDD, NSLKDD_CF
including accuracy, error rate, sensitivity, specificity, and miss rate.
(a) Accuracy (ACC) and Error Rate (ER):
One way to measure a machine learning algorithm’s accuracy is to determine how
many data points it correctly classifies. Based on all data points, the accuracy of a
prediction is the number of points correctly predicted.
PT + NT
ACC =
PT + PF + NT + NF
62,755 + 57,800 120,555
ACCHSDTKNN = = = 0.9569984
62,755 + 57,800 + 4587 + 830 125,972
The error rate (ERR) is calculated as the number of all incorrect predictions
divided by the number of data points that were analyzed. Error rates are best at 0.0
and worst at 1.0.
ER = 1 − ACC
Figure 4 shows the accuracy and error rate for the NSLKDD and NSLKDD_CF.
TCB has more accuracy and low error rate.
(b) Sensitivity and Miss Rate:
As determined by the number of correct positive measurements divided by the total
number of positives, the sensitivity (SN) is calculated. It is also known as recall
(REC) or true positive rate (TPR). Sensitivity is best at 1.0, while it is worst at 0.0.
PT
SE(TPR) =
PT + NF
62,755
SEHSDTKNN = = 0.9869466
62,755 + 830
A miss rate, or false negative rate (FNR), is calculated by dividing the true positive
predictions by the total number of true positives and false negatives. In terms of false
negative rates, the best rate is 0.0, and the worst rate is 1.0.
MR(FNR) = 1 − SE(TPR)
154
0.6
0.4
0.0753774
0.0130534 0.0597164 0.0289992 0.0082662
0.2
0.0133794
0
HSDTKNN HSDTSVM TCB HSDTKNN HSDTSVM TCB
NSLKDD NSLKDD_CF
Classifiers
The sensitivity and fall out for the three classifiers and two dataset are in Fig. 5. The
proposed method TCB holds high sensitivity and low miss rate for both NSLKDD
and NSLKDD_CF datasets.
(c) Specificity and Fall Out:
Based on the number of correct negative predictions divided by the number of total
negatives, specificity (SP) is calculated. True positive rate (TNR) is another name
for this ratio. Specificity is best at 1.0, and worst at 0.0.
TN
SP (TNR) =
TN + FP
57,800
SP HSDTKNN = = 0.9264751
4587 + 57,800
FO(FPR) = 1 − SP(TNR)
0.9870521 0.9895496
1 0.9264751 0.9585136 0.9287562
0.8557613
SPECIFICITY AND FALL OUT 0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1442387
0.0735249 0.0414864 0.0712438
0.1 0.0129479 0.0104504
0
HSDTKNN HSDTSVM TCB HSDTKNN HSDTSVM TCB
NSLKDD NSLKDD_CF
CLASSIFIERS
Specificity (TNR) Fall-Out (FPR)
The specificity (SP) and fall out (FO) for both NSLKDD and NSLKDD_CF
provides high specificity and low fall out for proposed method TCB than other
classifiers is shown in Fig. 6.
4 Conclusion
The IDS for the IoT networking environment is implemented using machine learning
techniques by constructing custom features dataset NSLKDD_CF. Custom features
are constructed with the motive to diminish the prediction time and upsurge the
accuracy of attack identification, which is exposed achieved. PSO, LASSO are used
for feature selection to neglect undesirable features. Ensembled hybrid machine
learning classifications algorithms HSDTSVM, HSDTKNN, and TCB classified the
attacks in the two dataset NSLKDD and NSLKDD_CF and performance is measured.
TCB with tuned parameters outperformed for the both dataset. This work is limited
with two benchmark datasets. This can be implemented with various IoT real-time
dataset and can be implemented as a product and deployed.
References
3. Soni P, Sharma P (2014) An intrusion detection system based on KDD-99 data using data
mining techniques and feature selection. Int J Soft Comput Eng (IJSCE) 4(3):1–8
4. Somwang P, Lilakiatsakun W (2012) Intrusion detection technique by using fuzzy ART on
computer network security. In: IEEE—7th IEEE conference on ındustrial electronics and
applications (ICIEA)
5. Horng S-J, Su M-Y, Chen Y-H, Kao T-W, Chen R-J, Lai J-L, Perkasa CD (2011) A novel
intrusion detection system based on hierarchical clustering and support vector machines. Exp
Syst Appl 38(1):306–313
6. Ei Boujnouni M, Jedra M (2018) New ıntrusion detection system based on support vector
domain description with ınformation metric. Int J Network Secur pp 25–34
7. Bhumgara A, Pitale A (2019) Detection of network ıntrusions using hybrid ıntelligent system.
In: International conferences on advances in ınformation technology, pp 500–506
8. Sree Kala T, Christy A (2019) An ıntrusion detection system using opposition based particle
swarm optimization algorithm and PNN. In: International conference on machine learning, big
data, cloud and parallel computing, pp 184–188
9. Rani D, Kaushal NC (2020) Supervised machine learning based network ıntrusion detec-
tion system for ınternet of things. In: 2020 11th ınternational conference on computing,
communication and networking technologies (ICCCNT)
10. Larriva-Novo X, Villagrá VA, Vega-Barbas M, Rivera D, Sanz Rodrigo M (2021) An IoT-
focused intrusion detection system approach based on preprocessing characterization for
cybersecurity datasets. Sensors 21:656. https://doi.org/10.3390/s21020656
11. Islam N, Farhin F, Sultana I, Kaiser MS, Rahman MS et al (2021) Towards machine learning
based intrusion detection in IoT networks. CMC-Comput Mater Continua 69(2):1801–1821
12. Sapre S, Ahmadi P, Islam K (2019) A robust comparison of the KDDCup99 and NSL-KDD
IoT network ıntrusion detection datasets through various machine learning algorithms
13. Houichi M, Jaidi F, Bouhoula A (2021) A systematic approach for IoT cyber-attacks detection
in smart cities using machine learning techniques. In: Barolli L, Woungang I, Enokido T (eds)
Advanced ınformation networking and applications. AINA 2021. Lecture notes in networks
and systems, vol 226. Springer, Cham. https://doi.org/10.1007/978-3-030-75075-6_17
14. Liang C, Shanmugam B, Azam S (2020) Intrusion detection system for the ınternet of things
based on blockchain and multi-agent systems. Electronics 9(1120):1–27
15. Urmila TS, Balasubramanian R (2019) Dynamic multi-layered ıntrusion ıdentification and
recognition using artificial ıntelligence framework. Int J Comput Sci Inf Secur (IJCSIS)
17(2):137–147
16. Rahimunnisa K (2020) LoRa-IoT focused system of defense for equipped troops [LIFE]. J
Ubiquitous Comput Commun Technol 2(3):153–177
17. Sivaganesan D (2021) Performance estimation of sustainable smart farming with blockchain
technology. IRO J Sustain Wireless Syst 3(2):97–106. https://doi.org/10.36548/jsws.2021.
2.004
18. Dr PK (2020) A sensor based IoT monitoring system for electrical devices using Blynk
framework. J Electron Inform 2(3):182–187
Optimization of Patch Antenna
with Koch Fractal DGS Using PSO
Abstract An edge fed patch antenna is designed to operate in the Wi-Fi band of 5.2
GHz. For improving the band width and for achieving multiband operation, a Koch
fractal DGS structure was incorporated into the ground plane. On introduction of
fractal DGS, the antenna exhibits dual-band operation at 3.9 and 6.8 GHz. In order to
obtain the originally designed frequency of 5.2 GHz, the antenna structure was opti-
mized using Particle Swarm Optimization (PSO). The optimized antenna resonates
at 5.2 GHz and also at 3.5 GHz, which is the proposed frequency for 5G operations.
So our optimized antenna exhibits dual-band operation and is suitable for Wi-Fi and
5G applications. Also, it provides good gain in the operating frequency bands. This
novel antenna design approach provides dual-band operation with enhanced band-
width in compact size. The antenna structure was simulated and its performance
parameters evaluated using OpenEMS.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 159
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_12
160 S. Viswasom and S. Santhosh Kumar
antenna design that utilizes fractal shapes to design new antennas with improved
features. Two of the most important properties of fractal antenna are Space Filling and
Self Similarity properties. Fractals exhibit self-similarity as they consist of multiple
scaled down versions of itself at various iterations. Hence, a fractal antenna can
resonate at large number of resonant frequencies and exhibit multiband behavior
[5]. The space filling property of antenna can be used to pack electrically larges
antennas into small areas, leading to miniaturization of the antenna structures [4].
Fractal shapes has been used to improve the features of patch antennas [6–9].
Defected Ground Structures (DGS) refers to some compact geometry that is etched
out as a single defect or as a periodic structure on the ground plane of a microwave
printed circuit board. The DGS slots have a resonant nature. They can be of different
shapes and sizes. Also, their frequency responses can vary with different equivalent
circuit parameters [7]. The presence of DGS is found to exhibit a slow wave effect,
which increases the overall effective length of the antenna, thereby reducing its
resonant frequency leading to antenna miniaturization [10]. To achieve maximum
slow wave effect, fractal structures can be etched to the ground plane. In [11], Koch
curve fractal DGS structure has been etched in the ground plane of a circularly
polarized patch antenna, which resulted in considerable improvement in terms of
better radiation efficiency, optimal return loss bandwidth and size reduction. In [12]
Sierpenski carpet fractal DGS structure has been incorporated into a microstrip patch
to improve its performance and the structure optimized using PSO to achieve the
desired performance characteristics.
A microstrip patch antenna has been designed for an operating frequency of
5.2 GHz. The substrate material used for this design is FR4 glass epoxy having a
relative permittivity of 4.4. A Koch Snowflake fractal structure has been introduced
to the ground plane of the designed antenna for multiband operation and wide-
band behavior. The modified antenna with the DGS structure resonates at two new
frequencies—3.9 and 6.8 GHz. The antenna structure was further optimized using
Particle Swarm Optimization (PSO) and optimized antenna resonates at 3.5 and 5.2
GHz, with reasonably good gain. OpenEMS [13] and Octave software were used for
the antenna simulation and analysis.
2 Antenna Design
This section discusses about the methodology adopted to design the antenna. In this
proposed work, a edge fed patch antenna has been designed to operate in the Wi-Fi
band of 5.2 GHz. The introduction of a Koch Snowflake fractal DGS structure in the
ground plane results in improved performance in terms of multiband behavior. How-
ever, the frequency of operation now shifts from the originally designed frequency
of 5.2 GHz. By using Particle Swarm Optimization, the patch antenna dimensions
and the DGS structure are optimized, so that the optimized antenna now operates at
5.2 GHz and a second resonant frequency of 3.5 GHz.
Optimization of Patch Antenna with Koch Fractal DGS Using PSO 161
The patch antenna is a hugely popular antenna used for a wide array of applications,
like in satellite communication, mobile communication and aerospace application.
A basic rectangular patch antenna can be designed by the following equations [1]:
c
W = (1)
εr +1
2 fo 2
1
εr + 1 εr − 1 12h − 2
εreff = + 1+ (2)
2 2 w
c
L= √ − 2L (4)
2 f o εreff
The microstrip patch antenna was designed and simulated using openEMS software
using Octave interface. OpenEMS is a free and open source electromagnetic field
solver which utilizes the FDTD (Finite-Difference time-domain) technique. It sup-
ports both cylindrical and Cartesian co-ordinate system. Octave provides a flexible
scripting tool for OpenEMS.
Optimization of Patch Antenna with Koch Fractal DGS Using PSO 163
The reflection coefficient (S11 ), 2D and 3D radiation pattern of the patch antenna are
as shown in Figs. 4, 5 and 6.
The antenna resonates at 5.2 GHz and its reflection coefficient is −18 dB as shown
in Fig. 4. Its 2D and 3D pattern are as shown in Figs. 5 and 6. The gain of the antenna
is 5.14 as shown in Fig. 6.
For improving the performance of the patch antenna, a Koch Snowflake fractal DGS
structure was introduced to the ground plane. Now the antenna operates at two
frequencies—3.9 and 6.8 GHz. However the operating frequency of the antenna has
shifted from its originally designed resonant frequency of 5.2 GHz. As a means of
obtaining the original operating frequency of 5.2 GHz, Particle Swarm Optimization
has been applied to the antenna structure. Using PSO, the dimensions of the fractal
DGS and the patch dimensions has been optimized so that the antenna still resonates
at 5.2 GHz, along with a new operating frequency of 3.9 GHz.
The reflection coefficient (S11 ), 2D and 3D pattern of the patch antenna with
fractal DGS are as shown in Figs. 7, 8 and 9.
The antenna now resonates at two frequencies 4 and 6.8 GHz and its reflection
coefficient values are −15 dB and −10 dB as show in Fig. 7. Its 2D and 3D pattern
are as shown in Figs. 8 and 9, respectively.
The antenna gain is 3.376 as shown in Fig. 9.
1
F(w, l, L) = + λ × S11 ( f ) (5)
Gain( f )
168 S. Viswasom and S. Santhosh Kumar
4 Conclusion
The design and simulation of a microstrip patch antenna for an operating frequency
of 5.2 GHz are presented in this paper. A Koch Snowflake fractal DGS structure was
incorporated into the ground plane for improving the antenna performance. However,
its resonating frequency deviated from its originally designed frequency of 5.2 GHz.
So the antenna structure was optimized using PSO. The optimized antenna resonates
at 3.5 GHz and 5.2 GHz and show marked improvement in bandwidth as show in
Table 2. The antenna can be used for Wi-Fi and 5G mobile applications.
References
1. Balanis CA (2005) Antenna theory: analysis and design, 3rd edn. Wiley, New York
2. Pozar DM (1992) Microstrip antennas. Proc IEEE 80(1):79–81
3. Maci S, Biffi Gentili G (1997) Dual-frequency patch antennas. IEEE Antennas Propag Maga
39(6):13–20
4. Werner DH, Ganguly S (2003) An overview’ of fractal antenna engineering research. IEEE
Antennas Propag Maga 45(I)
5. Sindou M, Ablart G, Sourdois C (1999) Multiband and wideband properties of printed fractal
branched antennas. Electron Lett 35:181–182
6. Petko JS, Werner DH (2004) Miniature reconfigurable three-dimensional fractal tree antennas.
IEEE Antennas Propag Maga 52(8):1945–1956
7. Masroor I, Ansari JA, Saroj AK (2020) Inset-fed cantor set fractal multiband antenna design for
wireless applications. International Conference for Emerging Technology (INCET) 2020:1–4
8. Yu Z, Yu J, Ran X (2017) An improved koch snowflake fractal multiband antenna. In: 2017 IEEE
28th Annual international symposium on Personal, Indoor, and Mobile Radio Communications
(PIMRC), 2017, pp 1–5
9. Tiwari R (2019) A multiband fractal antenna for major wireless communication bands. In: 2019
IEEE International Conference on Electrical, Computer and Communication Technologies
(ICECCT), 2019, pp 1–6
10. Guha D, Antar YMM (2011) Microstrip and printed antennas–new trends, techniques and
applications, 1st edn. Wiley, UK
11. Ratilal PP, Krishna MGG, Patnaik A (2015) Design and testing of a compact circularly
polarised microstrip antenna with fractal defected ground structure for L-band applications.
IET Microwaves Antenna Propag 9(11):1179–1185
Optimization of Patch Antenna with Koch Fractal DGS Using PSO 171
12. Kakkara S, Ranib S (2013) A novel antenna design with fractal-shaped DGS using PSO for
emergency management. Int J Electron Lett 1(3):108–117
13. Liebig T, Rennings A, Erni D (2012) OpenEMS a free and open source Cartesian and cylin-
drical EC-FDTD simulation platform supporting multi-pole drude/lorentz dispersive material
models for plasmonic nanostructures. In: 8th Workshop on numerical methods for optical
nanostructures
14. Pozar DM (2012) Microwave engineering, 4th edn. Wiley, New York
Artificial Intelligence-Based
Phonocardiogram: Classification Using
Cepstral Features
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 173
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_13
174 A. Saritha Haridas et al.
cepstral coefficients. The PhysioNet PCG training dataset is used in the experiments.
This section compares KNN with SVM classifiers, indicating that KNN is more
accurate. Furthermore, the results indicate that statistical features derived from PCG
Mel-frequency cepstral coefficients outperform both frequently used wavelet-based
features and conventional cepstral coefficients, including MFCCs.
1 Introduction
research work summarises our effort’s idea, including the hardware and software
phases, as well as the works that inspired us. We were able to solve the problem of
heart sound analysis by developing a low-cost and effective biotechnological system
for capturing and processing PCG signals.
2 Literature Review
Luisada et al. [2] performed a clinically and graphic research study on 500 children
of school age, using the phonocardiogram technique to collect data. Three examiners
conducted clinical auscultation, and a phonocardiographic test revealed 114 (22.8%)
abnormal occurrences. There was no correlation between the heart sound and the
child’s height/weight in the research. Tae H. Joo et al. examined phonocardiograms
(PCGs) of aortic heart valves [3]. They identified frequency domain characteris-
tics by using a parametric signal modelling method that was designed specifically
for sound wave categorization purposes. According to the model, a high-resolution
spectral estimate is provided, from which the frequency domain characteristics may
be deduced. PCGs were classified using two stages of classification: feature selection
first, followed by classification. The classifiers are trained on the locations of the two
maximum spectral peaks.
The classifier successfully identified 17 patients out of a total of 20 cases in
the training set. A method for assessing children for murmur and adults for valve
abnormalities was established by Lukkarinen et al. [4] in situations when ultrasonog-
raphy exams are not readily accessible. The equipment comprises of a stand-alone
electronic stethoscope, stethoscope-mounted headphones, a sound-capable personal
computer, and software applications for capturing and analysing heart sounds. It is
possible to perform a number of operations and research on heart beat and murmur
in the 20 Hz to 22 kHz range of frequencies thanks to the technology that has been
created [5]. Highlighted the essential phases involved in the generation and interpre-
tation of PCG signals. This article discusses how to filter and extract characteristics
from PCG signals using wavelet transformations. Additionally, the authors high-
light the gaps that exist between existing methods of heart sound signal processing
and their clinical application. The essay highlights the limitations of current diag-
nostic methods, namely their complexity and expense. Additionally, it addresses the
requirement for systems capable of correctly obtaining, analysing and interpreting
heart sound data to aid in clinical diagnosis. B. Techniques Artificial Intelligence-
based Shino et al. [6] present a technique for automatically categorising the phono-
cardiogram using an Artificial Neural Network (ANN). A national phonocardiogram
screening of Japanese students was used to validate the method. 44 systolic murmurs,
61 innocent murmurs and 36 normal data are highlighted in the test findings.
The melodic murmur was effectively isolated from the potentially dangerous
systolic murmur via the use of frequency analysis. When it comes to making the
final decision, the procedure is very beneficial for medical professionals. Strunic et al.
[7] created an alternative method for PCG classification by using Artificial Neural
176 A. Saritha Haridas et al.
Phonetic cardiography (PCG) is a method for capturing and visualising the sounds
produced by the human heart during a cardiac cycle [9]. It is used to diagnose and treat
heart failure. This technique is carried out using a phonocardiogram, which is a kind
of electrocardiogram. Various dynamic processes happening within the circulatory
system, such as the relaxation of the atria and ventricles, valve motion and blood flow
resulted in the production of this sound. When it comes to screening and detecting
heart rhythms in healthcare settings, some well stethoscope method has long been
the gold standard. Auscultation of the heart is the study of determining the acoustic
properties of heart beats and murmurs, including their frequency, intensity, number
of sounds and murmurs, length of time and quality.
One significant disadvantage of conventional auscultation is that it relies on
subjective judgement on the part of the physician, which may lead to mistakes in
sound perception and interpretation, thus impairing the accuracy of the diagnosis. The
creation of four distinct heart sounds occurs throughout a cardiac cycle. During the
first heartbeat of systole, the first cardiac sound, often abbreviated S1, is generated by
the turbulence induced by the mitral and tricuspid valves closing simultaneously. The
aortic and pulmonic valves close, resulting in the production of the second cardiac
sound, which is represented by the term “dub”. When a stethoscope is put on the
chest, as doctors do, the first and second heart sounds are easily distinguishable in a
healthy heart, as are the third and fourth heart sounds (Fig. 1).
The low-frequency third heart sounds (S3) is usually produced by the ventricular
walls vibrating in response to the abrupt distention caused by the pressure difference
between the ventricles and the atria, which causes the ventricular walls to vibrate. It is
only heard in youngsters and individuals suffering from heart problems or ventricular
dilatation under normal circumstances [10]. S4 is very seldom heard in a normal
heart sound because it is produced by vibrations in expanding ventricles caused by
contracting atria, which makes it difficult to detect. Each of the four heart beat has a
Artificial Intelligence-Based Phonocardiogram: Classification … 177
distinct frequency range, with the first (S1) being [50–150] Hz, the second [50–200]
Hz, the third (S3) [50–90] Hz, and the fourth (S4) being [50–80] Hz. Moreover, the
S3 phase starts 120–180 ms after the S2 phase and the S4 phase begins 90 ms before
the S1 phase.
4 Existing System
The well-known stethoscope technique is the usual method of screening and diag-
nosing heart sounds in primary health care settings. Auscultation of the heart is the
study of determining the acoustic properties of heart sounds and murmurs, including
their frequency, intensity, number of sounds and murmurs, length of time and quality.
One significant disadvantage of this technique of auscultation is that it relies on
subjective judgement on the part of the physician, which may lead to mistakes in
sound perception, thus impairing the validity of the diagnostic information. Figure 2
depicts a block diagram representation of a common phonocardiogram configuration.
It is necessary to detect sound waves using a sensor, which is most often a high-fidelity
microphone. After the observed signal has been processed using a signal conditioner
such as a pre-filter or amplifier, the signal is shown or saved on a personal computer.
The CZN-15E electret microphone, two NE5534P type amplifiers, a transducer
block and connector (3.5 jack) for signal transmission to the computer, as well as a
12 V DC power supply, are the main hardware components of this workstation’s hard-
ware architecture. This particular sensor is an electret microphone, which does not
need an external power source to polarise the voltage since it is self-contained. Using
the CZN15E electret microphone in this system is something that’s being explored
[11, 12]. Table 1 has a listing of the suitable electret microphone (CZN15E) as well
as the microphone’s physical characteristics. In response to the heart’s vibrations,
vibrating air particles are sent to the diaphragm, which then regulates the distance
between the plates as a result of the vibrations transmitted to it. As air passes through
the condenser, the electret material slides over the rear plate, causing a voltage to be
generated. The voltage produced is very low, and it is necessary to provide it to the
amplifier in order for it to operate at its best. The amplifier needed for high-speed
audio should have a low-noise floor and use a minimal amount of electrical power.
A special operational amplifier, the NE5534P, was developed specifically for this
purpose.
5 Proposed System
This project’s hardware component is broken. Only the simulation part, which
consists of a few simple stages, is used, for example, to determine the cepstral proper-
ties. To accurately analyse heart sounds that are non-stationary in nature, the wavelet
transform is the most appropriate technique for the task at hand. A wavelet trans-
form is a representation of data that is based on time and frequency. Using cepstrum
analysis has many advantages. First, the cepstrum is a representation used in homo-
morphic signal processing to convert convolutionally mixed signals (such as a source
and filter) into sums of their cepstra, which can then be used to linearly separate the
signals. The power cepstrum is a feature vector that may be used to model audio
signals and is very helpful. The method to feature extraction is the most impor-
tant component of the pattern recognition process since it is the most accurate. To
measure the features of each cardiac cycle, the complete cycle must be analysed using
cepstral coefficients, which efficiently determine the log-spectral distance between
two frames. In this part, we compare the two models K-Nearest Neighbour (KNN)
and Support Vector Machines (SVM) [13] to discover which the better choice for
binary classification problems is. KNN has an exceptional accuracy.
Artificial Intelligence-Based Phonocardiogram: Classification … 179
Fig. 4 Diagram of
classification 1
• As we can see, the three nearest neighbours all belong to category A, indicating
that this new data point must as well (Fig. 5).
When dealing with Classification and Regression issues that require Supervised
Learning, the Support Vector Machine, or SVM, is a frequently used method. In
Machine Learning, on the other hand, it is mostly utilised to tackle categorization
issues. The goal of the SVM method is to find the optimal line or decision boundary
Fig. 6 Diagram of
classification 1
that divides n-dimensional space into classes, enabling future data points to be cate-
gorised with ease. A hyperplane is the mathematical term for this optimal choice
boundary.
The SVM algorithm determines the hyperplane’s extreme points/vectors. Support
vectors are used to refer to these severe conditions, and the technique is called a
Support Vector Machine. Consider the diagram below, which illustrates two distinct
categories divided by a decision boundary or hyperplane (Fig. 6).
6 Experimental Setup
TRAINING TESTING
184 A. Saritha Haridas et al.
Table 2 Hardware
Processor PC with a core i3processor (Recommended)
requirement
RAM 4 GB (Recommended)
Hard circle 320 GB (Recommended)
Artificial Intelligence-Based Phonocardiogram: Classification … 185
7 Experimental Results
7.1 Training
Preprocess the dataset prior to initiating the training method. Through the use of
randomised augmentation, it augments the training dataset. Additionally, augmenting
enables the training of networks to be insensitive to picture data abnormalities.
Resizing and grayscale conversion are included in the pre-processing. Individuals
were classified into two groups in this section: “Normal” and “Abnormal”. One
can determine the progress of training by tracking several factors. When the “Plots”
option in training Options is set to “training-progress” and the network is trained, train
network generates a figure and shows training metrics for each iteration. Each cycle
determines the gradient and adjusts the parameters of the network. If training options
include validation data, the picture displays validation metrics for each time the
train network validates the network. It provides information on training correctness,
validation accuracy and train loss (Fig. 7).
Confusion matrix and receiver operating characteristic curves illustrate the
system’s performance. Confusion matrix constructs a Confusion Matrix Chart object
from a confusion matrix chart that includes both true and anticipated labels. The
confusion matrix’s rows correspond to the real class, and its columns to the predicted
class. Diagonal and off-diagonal cells represent properly categorised observations
and erroneously classified observations, respectively (Fig. 8).
In a binary classification, the first green colour denotes the positive (abnormal),
whereas the second green colour denotes the negative (normal) (normal). We picked
a total of 19 abnormal signals and 29 normal signals for testing. For aberrant signals,
true positives (TP) are 19 (green in the confusion matrices), but true negatives (TN)
are zero (pink colour). That is, all 19 aberrant signals in this case are correctly
anticipated as abnormal. Thus, the true negative is zero in this case, suggesting one
hundred per cent accuracy. Five false positives and twenty-four false negatives are
displayed for every 29 normal signals. That instance, normal 5 signals are incorrectly
classified as abnormal, but normal 24 signals are correctly classified as normal. This
equates to an accuracy of 82.8%. Overall, the accuracy is 95%.
The receiver operational curve (ROC curve) is a graph that illustrates the perfor-
mance of a classification model over all categorization criteria. The genuine positive
rate (Y axis) and the false positive rate (X axis) are shown on this curve (x-axis). The
word “True Positive Rate” is a colloquial term for “recall”.
It is defined as follows:
TP
TPR =
TP + FN
FP
FPR =
FP + TN
The receiver operating characteristic (ROC) curve depicts the connection between
TPR and FPR over a range of classification criteria. Reduce the threshold for positive
classification, and more items are labelled as positive, increasing both False Positives
and True Positives. A typical receiver operating characteristic (ROC) curve is seen in
the accompanying figure. To compute the points on a ROC curve, we might analyse
a logistic regression model several times with varied classification criteria. However,
this would be wasteful. Fortunately, there is a quick, sorting-based approach known
as AUC that can provide this information. In this situation, the curve formed is
nonlinear (Fig. 9).
AUC is an abbreviation for the Area Under the Receiver Operating Charac-
teristic Curve. The term AUC stands for “Area Under the ROC Curve“. That is,
AUC measures the whole two-dimensional area beneath the entire receiver oper-
ating characteristic curve (consider integral calculus) from (0,0) to (1,1). (100,100).
(1,1).
Dataset Testing Results:
Abnormal Case
See Fig. 10.
Normal Case
See Fig. 11 and Table 4.
8 Future Scope
Similarly, electrocardiogram (ECG) data should be analysed using the artificial intel-
ligence (AI) approach. Rather of just classifying cardiovascular diseases as normal
or abnormal, this future endeavour will give them names. Finally, it is recommended
that the PCG and ECG techniques be integrated and used to heart disease diagnostics
in order to enhance the prediction of coronary artery disease.
9 Conclusion
This piece is divided into three different sections. This research is divided into two
stages: the first includes gathering phonocardiogram data, and the second involves
creating an artificial intelligence-based computer system for automatically distin-
guishing normal from pathological heart sounds. Third step is cardiovascular disease
screening of a limited group of PCG as part of their social responsibility. The proce-
dure has been finished in its entirety. After acquiring the PCG signal, features were
extracted using the cepstral coefficient and classification conducted using the KNN
and Support Vector Machine (SVM) techniques. The best choice is shown by KNN.
We used PCG data from the well-known Physio Net online service to conduct
training and testing. The training procedure is significantly faster than previous
feature extraction techniques.
References
Abstract Diabetic retinopathy is an issue that impacts the eyes due to diabetes. The
problem is caused by arteries in the light-sensitive tissue in the eyeball. It is becoming
extremely crucial to diagnose early important things to save many lives. This work
will be classified by identifying patients with diabetic retinopathy. A convoluted
neural network has been developed using K-Fold cross-validation technology to make
the above diagnosis and to give highly accurate results. The image is put through
convolution and max-pooling layers that are triggered with the ReLU function before
being categorized. The softmax function was then utilized to complete the process by
triggering the neurons in the dense layers. While learning the system, the accuracy
improves, and at the same period, the loss is reduced. Image enhancement is used
before installing the algorithm to reduce overfitting. The network-based convolution
neural network gave a total validation accuracy of 89.14%, recall of 82%, precision
of 83%, and F1-Score of 81%.
1 Introduction
Diabetes is a long-term condition in which blood sugar levels rise due to a lack of
insulin [1]. It impacts 425 million adults globally. Diabetes influences the retina,
nerves, heart, and kidneys [2].
Diabetic retinopathy (DR) is the common cause of eyesight loss. DR will impact
191 million individuals worldwide by 2030 [3]. It happens when diabetes harms
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 193
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_14
194 S. N. Firke and R. B. Jain
Fig. 1 Steps of DR
the coronary arteries in the eyes. It can cause blind spots, and blurred vision. It can
impact a person with diabetes, and it often affects both eyes.
DR is a progressive procedure, and thus the medical experts recommend that
people with diabetes should be examined at least twice a year for indications of the
sickness [4]. There are four steps of DR such as Mild Nonproliferative DR (Mild
NPDR), Moderate Nonproliferative DR (Moderate NPDR), Severe Nonproliferative
DR (Severe NPDR), and Proliferative DR (PDR). Figure 1 shows steps of DR.
In past times, researchers have worked on improving the efficiency of DR
screening, which detects lesions such as microaneurysms, hemorrhages, and
exudates, and have established a range of models to do so. All the strategies presented
thus far are established on DR prediction and extraction of features.
Convolutional Neural Networks (CNN) or Deep CNN (DCNN) have been
frequently utilized to extract and classify information from fundus images. Recently,
Deep Learning has been widely used in DR detection and classification. It can
successfully learn the features of input data even when many heterogeneous sources
are integrated [5].
This paper’s main contribution is the establishment of an automatic DR detection
technique that depends on a short dataset. With the goal of achieving end-to-end
real-time classification from input images to patient conditions, we are working
on classifying fundus imagery based on the severity of DR. To extract multiple
significant features and then classify them into their corresponding categories, image
data preprocessing approaches are applied. We assess the model’s Accuracy, Recall,
Precision, and F1-Score.
Severity Classification of Diabetic Retinopathy … 195
The rest of the article is summarized as below: Section 2 discusses some relevant
research. We explain our proposed methods in Sect. 3. Section 4 outlines the exper-
imental findings and assesses the system’s performance. The conclusion is provided
in the end section.
2 Related Work
more realistic. For that purpose, the customized CNN model with the K-Fold cross-
validation (K-Fold CV) technique is developed here. CNN’s are simpler to train and
require many fewer parameters than fully connected networks with the same number
of hidden units. K-Fold CV method can balance out the projected features classes if
one is dealing with an unbalanced dataset. This prevents the proposed model from
overfitting the training dataset.
3 Proposed Work
The deep learning CNN strategy is highly adaptive among the most important strate-
gies in detection activities. It works well specially in image data classification, partic-
ularly in the study of retinal fundus images. CNN can extract useful information from
images, obviating the need for time-consuming human image processing. Its huge
popularity is due to architecture, which eliminates the necessity for feature extrac-
tion. These features of CNN motivate us to use customized CNN models for our
work. Figure 2 shows the general workflow of the proposed system.
It consists of four main steps: preprocessing, data augmentation, model training,
and testing. The preprocessing step includes image resizing, image normalizing, and
label encoder. To reduce overfitting issues, the image augmentation step is used.
The model is trained using the CNN with the K-Fold cross-validation technique.
Finally, measures such as Precision, Recall, Accuracy, and F1-score are calculated
to evaluate the findings and compare them to commonly used methodologies.
3.1 Dataset
The collection of data is an important aspect of the experiments for the proposed
technique’s analysis. The dataset must be chosen carefully since it must contain a
diverse collection of images. For this work, we have obtained a dataset from APTOS
3.2 Preprocessing
The fundus photographs used in this method are collected from Kaggle in various
forms. So it is necessary to apply preprocessing stages. Here, we apply various
image preprocessing stages such as Image Resizing, Normalizing Image, and Label
Encoder.
Images are resized to 128 × 128 pixels to be made ready as input to the system.
198 S. N. Firke and R. B. Jain
The technique of adjusting a collection of pixel values to create an image more visible
or standard to the senses is known as image normalization. It is used to eliminate
noise from images. By dividing by 255, the pixel values are rescaled into Null and 1.
Substituting a numeric value ranging from zero and the n classes minus 1 for the
category variable value has five distinct classes, this method is employed (0, 1, 2, 3,
and 4).
Here, the dataset is categorized into two phases: a training phase and a testing phase.
80% of data is adopted for training and 20% of data is used for testing. The training
phase comprises a known output, and the model learns from it in order to obtain new
data in future. To evaluate our model’s prediction on this subset, we have the testing
dataset. The full dataset consists of 3699 fundus images [11], which are divided into
2959 training, and 740 testing images.
In the training set, a total of 2959 images of which 1590 images are labeled as
NO DR, 259 images are labeled as Mild, 751 images are labeled as Moderate, 132
images are labeled as Severe and 227 images are labeled as Proliferative.
In the testing set, a total of 740 images of which 384 images are labeled as NO
DR, 56 images are labeled as Mild, 208 images are labeled as Moderate, 36 images
are labeled as Severe and 56 images are labeled as Proliferative. Figure 4 shows the
number of images in each class for training and testing sets.
CNN (deep learning) models are used to split the DR images. The efficiency of the
algorithms can be improved by data augmentation. Data augmentation is applied to
training data, so as to improve its quality, size, and adeptness. The parameters used
in data augmentation are shown in Table 1.
Severity Classification of Diabetic Retinopathy … 199
Fig. 4 Number of images in each class for training and testing sets
3.5 K-Fold CV
To segregate training datasets, the K-Fold CV resampling method is used. This aids
in evaluating the CNN model’s capability. Here, the dataset of training is detached
into 5 independent folds in K-Fold CV, with 5-1 folds utilized to train the algorithm
and the remaining one is saved for validation. This procedure is done until all of the
folds have been used only once as a validation set. 5-Fold CV is depicted in Fig. 5.
The first and most significant layer in CNN is the convolution. The feature selection
layer is so named because it’s where the image’s parameters are taken. The image is
supplied to the filters in convolution. It is the pointwise multiplication of functions
to build the third function. A convoluted feature is a matrix created by applying a
kernel to an image and computing the convolution operation.
In the example illustrated in Fig. 7, there is a 5 × 5 input image with pixel values
of 0 or 1. A 3 × 3 filter matrix is also given. The filter matrix slides over the image
and computes the dot product to produce the convolved feature matrix.
To reduce the breadth of a source volume’s dimensions, a pooling layer is utilized. The
thickness of the source is not reduced by this layer. This layer is often used to reduce
the image’s dimensionality, minimizing the processing power needed to process it.
Severity Classification of Diabetic Retinopathy … 201
The highest value present in a specified kernel is maintained in max pooling, while
all other values are eliminated. In average pooling, the mean of almost all of the
values in a kernel is stored.
In the example, the max-pooling operation is used on the 4 × 4 input. It’s really
easy to divide it into separate regions. The output is 2 × 2 each of the outputs will
simply be the maximum value from the shaded zone. In average pooling, the average
of the numbers in green is 10. This is average pooling. Figure 8 illustrates the Pooling
operation.
The final output of the pooling and convolution layers are flat and coupled to one
or more fully linked layers. It is also termed the Dense layer. When the layers are
fully integrated, all neurons in the preceding layer are coupled with all neurons in
202 S. N. Firke and R. B. Jain
the subsequent layer. Layers that are fully connected, often known as linear layers
[13]. Figure 9 depicts the FC Layer.
The result obtained from the proposed method is compared with the existing
method. The experiments are performed on Jupyter Notebook. Keras is used with
the TensorFlow machine learning back-end library.
In the deep learning process, optimizers are crucial. Here, Adam Optimizer is
utilized. It is efficient and takes minimal memory. In this work, the learning rate
and the number of epochs or iterations are set to be 0.0001 and 20, respectively.
The batch size is considered as 32 images per epoch. With all these parameters, the
Neural Network has been designed to learn. The algorithm measures performance on
different parameters like Accuracy, Confusion Matrix, Precision, Recall, F1 Score.
For contrast adjustment, color balance adjustment, rotation, or cropping, the image
editing tool is employed. The NumPy package is used for image resizing during the
Severity Classification of Diabetic Retinopathy … 203
Various evaluation measures are used to assess the quality of a model. We evaluated
our proposed model using Accuracy, Precision, Recall, and F1 Score. Where, TP=
True Positive, TN= True Negative, FP= False Positive, FN= False Negative.
4.2.1 Accuracy
TP + TN
Accuracy = (1)
TP + FP + FN + TN
4.2.2 Precision
TP
Precision = (2)
TP + FP
The percentage of accurately predicted positive findings to all findings in the actual
class is computed as a recall [15].
TP
Recall = (3)
TP + FN
4.2.4 F1-Score
Precision ∗ Recall
F1 − Score = 2 ∗
Precision + Recall
The findings of the developed CNN for the APTOS dataset were compared to two
recent studies that employed the same K-Fold CV Method. Table 5 shows a contrast of
the performance measures. The constructed CNN model generated the highest perfor-
mance metrics for detecting the five steps of DR, as depicted in this table. In order
to make a comparison, we looked at four separate indicators to make a comparison.
These are the following: (i) Method, (ii) Number of class, (iii) Training Accuracy, and
(iv) Average CV Accuracy. In comparison with existing methodologies, the proposed
customized CNN strategy gives 93.39% training accuracy and 89.14% average CV
accuracy. This improvement is obtained due to the use of two convolutional layers
for feature extraction and three fully connected layers for classification.
5 Conclusion
Diabetes is an incurable disease that has spread throughout the world. The only
method to fix this problem is to detect the disease early and take preventative action
to reduce the disease impact. In this study, a customized CNN model is established
206 S. N. Firke and R. B. Jain
for DR classification with the K-Fold CV technique. A customized CNN model has
5 layers, including 2 convolutional layers for feature extraction and 3 fully connected
layers for classification. The customized model shows more promising results than
pre-trained models. Experimental results show an average CV accuracy of 89.14%
by the K-Fold CV technique.
References
1. Taylor R, Batey D (2012) Handbook of retinal screening in diabetes: diagnosis and manage-
ment, 2nd edn. Wiley-Blackwell
2. International diabetes federation—what is diabetes. Available online at https://www.idf.org/
aboutdiabetes/what-is-diabetes.html. Accessed on Aug 2020
3. Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access pp
514–525
4. Sungheetha A, Sharma R (2021) Design an early detection and classification for diabetic
retinopathy by deep feature extraction based convolution neural network. J Trends Comput Sci
Smart Technol (TCSST) 3(02):81–94
5. Zheng Y et al (2012) The worldwide epidemic of diabetic retinopathy. Indian J Ophthalmol pp
428–431
6. Chandrakumar T, Kathirvel R (2016) Classifying diabetic retinopathy using deep learning
architecture. Int J Eng Res Technol (IJERT), pp 19–24
7. Chen H et al (2018) Detection of DR using deep neural network. In: IEEE twenty third
international conference on digital signal processing (ICDSP)
8. Lands A et al (2020) Implementation of deep learning based algorithms for DR classification
from fundus images. In: Fourth international conference on trends in electronics and informatics
(ICTEI), pp 1028–1032
9. Xiaoliang et al (2018) Diabetic retinopathy stage classification using convolutional neural
networks. In: International conference on information reuse and integration for data science
(ICIRIDS), pp 465–471
10. Shaban et al (2020) A CNN for the screening and staging of diabetic retinopathy. Open Access
Research Article, pp 1–13
11. APTOS 2019 Blindness detection dataset. Available online at https://www.kaggle.com/c/apt
os2019-blindness-detection. Accessed on Feb 2021
12. Alzubaidi L et al (2021) Review of deep learning: CNN architectures, concepts, applications,
challenges, future directions. Open Access J Big Data, pp 2–74
13. FC layer. Available online at https://docs.nvidia.com/deeplearning/performance/dl-perfor
mance-fully-connected/index.html. Accessed on Mar 2021
14. Precision and recall. Available online at https://blog.exsilio.com/all/accuracy-precision-recall-
f1-score-interpretation-of-performance-measures/. Accessed on Mar 2021
15. F1-Score. Available online at https://deepai.org/machine-learning-glossary-and-terms/f-score.
Accessed on Mar 2021
Study on Class Imbalance Problem
with Modified KNN for Classification
1 Introduction
Data warehouse is a vast environment where we can get an enormous amount of data.
Data mining is a good environment for the data scientist to get needed data from the
source of the warehouse. Data mining environments are widely used to perform the
evaluation to produce good results with good output. Hence, the data imbalance may
cause severe effects in any kind of sector. In this paper, the analysis of the imbalance
problem and the suitable machine learning technique to solve the imbalance issue
could be considered for the brief study. Flow chart shows the test and training data
classification of data in supervised and unsupervised learning. Before going to the
imbalance problem, the study about classification and clustering could be a broad
knowledge gaining statistics to bring out the idea of several handling mechanisms
for imbalanced data.
Before knowing about the handling mechanism of imbalance analysis, knowing
about the imbalance problem and how it occurs are the most important views to
solve the imbalance issue. In the related work section, clarification about imbalance
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 207
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_15
208 R. Sasirekha et al.
Fig. 1 Flow chart for imbalanced classification in supervised and unsupervised environment
problems have been carried out. First, the study about the occurrence of imbalance
problems. Secondly, what can be indicated here as an imbalance problem. Finally,
how the imbalance problem can be handled. As in many contexts, KNN is the best
algorithm to handle imbalance problems. In the upcoming sections various modi-
fied KNN algorithms are discussed and how those algorithms are suited for the
classifications are also discussed in detail (Fig. 1).
2 Related Works
There are be a lot of classifiers used in the classification supervised and unsupervised
data. Random forest, Naive Bayes and KNN classifiers are the widely used classifiers
in the field of class imbalance problem solving. In this paper, related works are based
on the study of imbalance problem and what are the handling ways are proposed and
how it could be worked on it.
In the talk about imbalance data problems, the main curriculum is the learning about
supervised and unsupervised environments. After collecting huge information about
these environments, it would be easy to handle imbalance problems. Supervised
Study on Class Imbalance Problem with Modified … 209
learning: Labelled data comes under supervised learning. Each and every classi-
fication algorithm comes under supervised learning. The classification algorithms
are used for the classification of data based on their nature. The decision vector
plays a vital role in the classification based approaches. The supervised learning
environment could be strong to give predicted results. In case of weakly supervised
learning [1], there could be a more intensive care amongst semi-supervised learning,
domain adaptation, multi-instance learning and label noise learning demonstrate our
effectiveness. Unsupervised Learning: The decision tree algorithm and the clustering
algorithm come under the unsupervised environment. The study about the unsuper-
vised environment is quite difficult when compared to a supervised environment.
Recently, unsupervised learning has been done to accurately getting information
from heterogeneous specimens. However, most of the used methods are consuming
huge times [2]. The formation of clusters is based on the characteristics of the data
points and the centroid could be identified to form a group of data points which are
in the particular range. K-means algorithms which are broadly used in the clustering
of imbalanced data. The data which is said to be imbalanced is very hard to handle.
Such imbalanced data are identified easily by means of the k-means algorithm. The
centroid of the k-means algorithms is identified using Euclidean distance metrics.
The distance for the data points in the particular range is identified and the cluster
could be formed easily by the use of the Euclidean algorithm. Many of them are
not aware about what KNN and K-means algorithms are. Some people think that K-
means and KNN are the same. In this paper, the KNN classifier is studied carefully
and the major difference between KNN, and K-means are described in the upcoming
sections (Fig. 2).
Fig. 2 Sample experiment in weka tool for supervised learning-soybean dataset using naive bayes
classifier
210 R. Sasirekha et al.
Before entering into any kind of research work there is a need for rich analysis about
the topic. First, clarity about where the imbalance problem occurs. When the class
of specimens are classified into several classes based on their behaviour, imbalance
problem occurs [3] if the number of specimens in one group is not equal to the number
of specimens in another group. Secondly, clarity about what would be considered as
an imbalance problem. The data in the group is not sufficient to produce an exact
result for the evaluation is considered as imbalanced data and the situation which
can’t be handled at the moment is considered as an imbalance problem. Finally,
clarity about the ways to handle imbalance problems. An imbalance problem can be
handled by various approaches widely used approaches are resampling, ensemble,
clustering, evaluation metrics. These approaches are discussed deep in the upcoming
sections. Remaining sections cover the different handling methods of imbalance
classification.
3 Resampling
Widely used approaches to handle data imbalance problems are resampling tech-
niques. There are several resampling techniques used to solve imbalance issues.
Resampling techniques are used to resample the trained data.
4 Ensemble
Ensemble [9] uses Specimens of the majority group can be divided into several equal
sections to solve the imbalance problem. For example: if group A has 40,000 spec-
imens but group B has only 4000 specimens. In this scenario, the working strategy
of the ensemble is to divide the 40,000 specimens in 10 sets which can have 4000
specimens for each. After dividing a large group of specimens into an equal group
of specimens, it can resample the trained data to handle imbalance problems.
4.1 Bagging
dataset with replacement. The CART model has been trained on each sample. Given
a new dataset, calculate the average forecast from each model. For example, if we
had 7 bagged decision trees that predicted the following classes for a given input
sample: read, write, read, write, write, read, and read, we would anticipate read as
the most frequent class. Bagging would not cause the training data to be overfit.
Clustering is also done under the undersampling of data. Centroids of the cluster
remove the points whichever apart from it. K-means [12] is an algorithm which can
be applied eagerly by the data scientist to handle imbalance problems. KNN comes
under supervised learning to handle imbalance problems. KNN is an algorithm to
predict neighbour behaviour it checks the similar characteristic samples and group
it in a category of which character the neighbours are belonging to.
When using a KNN in any application it will produce balanced data to get an exact
result. Nowadays, we are in need of choosing the best classifier to produce good
accuracy. KNN is a user-friendly algorithm to predict nearest points in a network.
KNN is widely used in the field of medical and in the field of engineering technology.
In the KNN algorithm, K indicates the number of nearest neighbour. KNN is an
excellent algorithm for the classification and in the prediction of nearest neighbours
[13] (Table 1).
Sample application of KNN in weka tool:
The centroid to cluster an item having similar behaviour. Sample application
of KNN in chosen trained dataset. Weather dataset has been chosen to show the
application of KNN (Fig. 3).
While pre-processing a trained dataset in the weka tool, the nominal dataset has
been chosen to apply the KNN algorithm. Classification has been implemented in
nominal datasets only. Hence, the weather nominal dataset has been chosen. Study
on class imbalance problem with modified KNN for classification 7. In this example,
k value is assigned as 4. Hence, there would be only 4 centroids which can form
a group based on the similarities of a data. Distance of points calculated by means
of Euclidean distance. Now, let us discuss about the various modified KNN. Fuzzy
k-NN: Mainly, a theoretical analysis of knn could be taken as a major thing to come
to know about fuzzy K-NN. Main rule of fuzzy K-NN [14] is grouping of fuzzy sets
which can be made more flexible to analyse the instance of a class in the context
of ensuring membership of data in a class. Fuzzy set theory has been taken rather
concentrating on the centroids to the distance of points. Bayes’ decision rule fails
due to not satisfying the rule of fuzzy K-NN. D-KNN: D-KNN is used to improve
the chance of K-nearest neighbour search over distribution storage and distributed
calculations are done effectively. It reduces the storage space of main memory by
means of providing distributed storage nodes [15]. This algorithm is used in cyber
physical social networks.
6 Discussion
In this section, the important elements to handle imbalance problems have been
discussed. Imbalance problems may occur when trying to classify specimens. A
suitable classifier to be chosen to prevent these kinds of imbalance problems imbal-
ance of data may cause severe effects in any kind of sector widely used classifiers
are Naive bayes, Random forest and KNN getting details about the neighbour node
would be a big task. But, it could be solved by means of K-NN classifier. The
difference between the K-NN and K-Means algorithm is very much important in the
classification of specimens. KNN has been discussed in an earlier section.
214 R. Sasirekha et al.
As per the study, the evaluation can be done in the key classification metrics: Accu-
racy, Recall, Precision, and F1-Score. The Recall and Precision may differ in certain
cases (Fig. 4).
Decision Thresholds and Receiver Operating Characteristic (ROC) curve. The
first is ROC curve and the determination of ROC curve is suites or not by noticing at
AUC (Area Under the Curve) and the other parameters are also known as confusion
metrics. A confusion matrix is a data table which has been used in the description
of a classification model performance on a test data for the true values are already
found. Except Auc all the measures have been calculated by considering the left most
four parameters.
The correctly predicted observations are true positive and true negatives. Reduc-
tion of false positives and false negatives is to be considered.
True Positives (TP)—They are correctly predicted positive values that is the value
of original class is Yes and the value of predicted class is also Yes.
Study on Class Imbalance Problem with Modified … 215
True Negatives (TN)—They are correctly predicted negative values that is the
value of original class is No and value of predicted class is also NO.
False Positives (FP)—The value of original class is No and predicted class is Yes
(Table 2).
False Negatives (FN)—The value of original class is Yes but predicted class in
No.
Understanding these four parameters is important to calculate Accuracy, Preci-
sion, Recall and F1 score.
1. Accuracy—it is said to be a ratio of correctly predicted specimens to the taken
total taken specimens. Accuracy = (TP) + TN/TP + FP + FN + TN
2. Precision ratio of correctly predicted positive specimens to the total predicted
positive specimens. Precision = TP/TP + FP
3. Recall is the ratio of correctly predicted positive specimens to the all taken
specimens in original class is Yes. Recall = TP/TP + FN
4. F1 score is the weighted average of Precision and Recall. It takes both false posi-
tives and false negatives specimens. F1 Score = 2*(Recall * Precision)/(Recall
+ Precision)
8 Conclusion
In this paper, the proper study on supervised and unsupervised learning has been
carried out and the imbalanced problem handling mechanisms are also discussed. In
this survey, several modified KNN classifiers are studied with an Outlier analysis and
classification metrics are discussed elaborately with the training samples to represent
each specimens by choosing suitable K-nearest neighbour based classifier and also
intensive study on classification metrics has been carried out.
9 Future Work
References
1. Li Y-F, Guo L-Z, Zhou Z-H (2021) Towards safe weakly supervised learning. IEEE Trans
Pattern Anal Mach Intell 43(1):334–346
2. Xiang L, Zhao G, Li Q, Hao W, Li F (2018) TUMK-ELM: a fast unsupervised heterogeneous
data learning approach. IEEE Access 6:35305–35315
3. Lu Y, Cheung Y, Tang YY (2020) Bayes imbalance impact index: a measure of class imbalanced
data set for classification problem. IEEE Trans Neural Networks Learn Syst 31(9):2020
4. Lin W-C (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26
5. Yi H (2020) Imbalanced classification based on minority clustering smote with wind turbine
fault detection application. IEEE Trans Ind Inform 1551–3203
6. Zhang L, Zhang C, Quan S, Xiao H, Kuang G, Liu L (2020) A class imbalance loss for
imbalanced object recognition. IEEE J Sel Top Appl Earth Observ Rem Sens 13:2778–2792
7. Ng WW, Xu S, Zhang J, Tian X, Rong T, Kwong S (2020) Hashing-based undersampling
ensemble for imbalanced pattern classification problems. IEEE Trans Cybern
8. Lu Y, Cheung YM (2021) Self-adaptive multiprototype-based competitive learning approach: a
k-means-type algorithm for imbalanced data clustering. IEEE Trans Cybern 51(3):1598–1612
9. Yang Y, Jiang J (2016) Hybrid sampling-based clustering ensemble with global and local
constitutions. IEEE Trans Neural Networks Learn Syst 27(5):952–965
10. Chakraborty S, Phukan J, Roy M, Chaudhuri BB (2020) Handling the class imbalance in
land-cover classification using bagging-based semisupervised neural approach. IEEE Geosci
Remote Sens Lett 17(9):1493–1497
11. Yang W, Nam W (2020) Brainwave classification using covariance-based data augmentation.
IEEE Access 8:211714–211722
Study on Class Imbalance Problem with Modified … 217
12. Zhang T (2019) Interval type-2 fuzzy local enhancement based rough k-means clustering
imbalanced clusters. IEEE Trans Fuzzy Syst 28(9)
13. Zhuang L, Gao S, Tang J, Wang J, Lin Z, Ma Y, Yu N (2015) Constructing a nonnegative low-
rank and sparse graph with data-adaptive features. IEEE Trans Image Process 24(11):3717–
3728
14. Banerjee I, Mullick SS, Das S (2019) On convergence of the class membership estimator in
fuzzy nearest neighbor classifier. IEEE Trans Fuzzy Syst 27(6):1226–1236
15. Zhang W, Chen X, Liu Y, Xi Q (2020) A distributed storage and computation k-nearest
neighbor algorithm based cloud-edge computing for cyber-physical-social systems. IEEE
Access 8:50118–50130
16. Chen D, Jacobs R, Morgan D, Booske J (2021) Impact of nonuniform thermionic emission on
the transition behavior between temperature-and space-charge-limited emission. IEEE Trans
Electron Devices 68(7):3576–3581
17. Ma H, Gou J, Wang X, Ke J, Zeng S (2017) Sparse coefficient-based k-nearest neighbor
classification. IEEE Access 5:16618–16634
18. Bezdek JC, Keller JM (2020) Streaming data analysis: clustering or classification. IEEE Trans
Syst Man Cybern: Syst
19. Chakrabarty N, Biswas S (2020) Navo minority over-sampling technique (NMOTe): a
consistent performance booster on imbalanced datasets. J Electron Inform 02(02):96–136
Analysis of (IoT)-Based Healthcare
Framework System Using Machine
Learning
Abstract In recent years, Internet of things (IoT) are being applied in several fields
like smart healthcare, smart cities and smart agriculture. IoT-based applications are
growing day by day. In healthcare industry, wearable sensor devices are widely used
to track patient’s health status and their mobility. In this paper, IoT-based framework
for healthcare us ing a suitable machine learning algorithm have been analysed
intensely. Transmission of data using various standards are reviewed. Secure storage
and retrieval of medical data using various ways are discussed. Machine learning
techniques and storage mechanisms are analysed to ensure the quality of service to
the patient care.
1 Introduction
B. Lalithadevi (B)
Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Chennai, India
S. Krishnaveni
Department of Software Engineering, SRM Institute of Science and Technology, Chennai, India
e-mail: krishnas4@srmist.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 219
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_16
220 B. Lalithadevi and S. Krishnaveni
These variety of management services makes the patient life easier and assist
the medical experts as well as hospital to manage and delivering the service within a
short time intervals [2]. IoT is a dynamic community infrastructure that interconnects
different sensor networks via the Internet, accumulates sensor data that transmits and
receives records/facts for further processing. According to structure and strategies
followed in IoT environment, security challenges are there in terms of maintain the
person’s clinical records and hospital records confidentially.
1.1 Contributions
2 Background
IoT has included a lot of research ideas over the years and the possible areas of
success of it being implemented are studied strictly. Elder people can track down
their health levels through the use of IoT is reduces the stress and dependency of
people on the healthcare system [2].
Analysis of (IoT)-Based Healthcare Framework … 221
Mobile—IoT (m-IoT) has come up in recent days following the fusion of contem-
porary digital fitness standards with LTE connectivity. It has been summed up with
high speed 4G networks to boost up the standard further. The 6LoWPAN is summed
up with the 4G protocols.
The health-related dataset includes the following attributes such as, lung disease,
severe headache, kidney disorder, liver disorder, LDL, TC, DBP, HDL, TC, obesity,
BG, and HR [3]. Any one of these attributes can be the major cause of hyperten-
sion disease [4]. Figure 1 represents the overview of healthcare framework model.
According to healthcare industry sector, the basic building blocks of an IoT are
various wearable sensors that collects the data from patients remotely and relevant
data is collected from sensors.
These data is transmitted into cloud server and stored on it. In this, artificial
intelligence or machine learning based prediction and detection model is incorporated
to predict the risk values. Alert or warning message is send to medical experts via the
cloud server. In turn, the respective physician can prescribe the appropriate medicine
and take necessary action to protect the persons in critical situation.
Pulse Oximetry Sensors It estimates the oxygen level in the blood. The degrees
of oxygen in the blood are resolved. This is not a much essential thing to consider
in designing a medical wearable device but can certainly provide an edge in certain
cases [1].
Accuracy and precision of sensed data from wearable devices may be corrupted
by the malicious intruders and change it as erroneous data. So it leads to misguide
the end user in terms of decision support system in treatment. If we want to make a
smart healthcare environment based on IoT, then need to provide a highly secured
framework and efficient model to maintain the privacy and confidentiality of patient
medical data.
Iot architecture describes the flow of data transmission from edge devices to cloud
server through interconnected network for data analysis, storage and retrieval. In
that, all sensed data will be processed further through our prediction model. Figure 3
represents the evolution of IoT architecture. About IoT Healthcare Services and their
Applications, different fields assume a significant part in the administration of private
wellbeing and wellness, care for pediatric, management of persistent sicknesses, elder
patients among others [2].
The proposed model has the significant ability to transmit the data from wearable
medical sensors into cloud server via data transmission and communication protocol
standards as shown in Fig. 1. Smart phone acts like as an intermediate agent between
sensors and web apps. Request and response protocol (XMPP) is applied to send the
amount of data payload and data length code from sensors to android listening port
finally web interface layer activate the physician, patients and hospital to respond the
action based on request parallel. Figure 4 represents the list of Healthcare Services in
medical field. A robotic supporter is used for tracking senior citizen status. This robot
is utilized for the ZigBee sensor gadget to extraordinarily distinguish individuals that
it is tracking [7].
Doctors cannot attend to all patients at all the time, so patients can check the progress
in their health by themselves through the wearable IoT devices. This is made by
the healthcare system in a very efficient way for doctors. Basic research challenge
involved in this, monitor multiple reports of various patients at a time through their
mobile phones and give appropriate medications on time without any delay.
The various fitness rate are useful to monitor the overall condition of the patient. One
of the research challenges is associated with the doctor. They should give preventive
medications and the patients also have to suffer less if the disease is diagnosed in the
early stage.
IoT contraptions labeled with sensors are utilized for observing the real-time district
of clinical bits of gear like wheelchairs, nebulizers, oxygen siphons, and so on.
Appointment fees and travel costs can be reduced by IoT devices [8]. Azure Web
application is used to store the data to perform analysis and predict the health condi-
tions within the expected time [9]. The most important advantages and challenges
of IoT in healthcare consist of:
• Cost and error reduction
• Improved and proactive Treatment
• Faster Disease Diagnosis
• Drugs and Equipment Management.
The medical field has experienced innovation in disease diagnosis and medical data
analysis since it has collaborated its research work with machine learning [10]. The
data of patient details which are generated every day is huge and cannot be surveyed
by simple methods. This statistical data is given to the model which has been trained.
The model might not achieve higher accuracy but near to it.
226 B. Lalithadevi and S. Krishnaveni
To understand the naive Bayes theorem, let us first understand the Bayes theorem. It
is based on conditional probability. There are two types of events namely dependent
and independent events.
Machine learning has found its use in various fields and medical science is one of
them. Machine learning or deep learning models need very little human assistance
to solve any problem.
Figure 5 shows the overview of artificial neural network model. A machine
learn ing model uses feature selection techniques to find out the probable outcome.
Fig. 6 Representation of
support vector machine
There are three kinds of learning techniques in Artificial Intelligence such as regu-
lated, unaided, and support learning. SVM lies under supervised learning which is
used for classification and regression analysis. Figure 6 shows the support vector
machine classifier. Classification datasets contain categorical data as target variables
and regression datasets contain continuous variable as a target [12]. SVM is a super-
vised learning technique that is working on labeled data [12]. If there is a dataset
consisting of circles and quadrilaterals, it can predict whether the new data is a circle
or a quadrilateral. SVM creates a boundary between the two classes. This boundary
plane is the decision boundary which helps the model to predict.
Ensemble techniques in machine learning are methods that combine multiple models
into one model. There are two types of ensemble techniques, bagging and boosting.
Random forest is a bagging technique. In bagging, many base models can be created
for feature extraction [13]. A new random set of data will be given as a sample to the
models. This method is also known as Row Sampling with replacement. Figure 7
represents the random forest classifier model. For a particular test data, the output
of different models are observed. The output of all models may not be the same so
we used a voting classifier. All the votes are combined and the output which has the
228 B. Lalithadevi and S. Krishnaveni
highest frequency or majority votes are considered [5]. Multiple decision trees are
used in a random forest. Decision trees have low bias and high variance. The various
decision trees of different models are aggregated and the final output achieved after
majority voting has low variance [5].
The short-range communication techniques present in this paper are ZigBee and
Bluetooth. Both are come under the home network. Zigbee is a technology that was
created to control and sense a network. The various layers in Zig-Bee [15] are the
application layer, security layer, networking layer, media access control (MAC) layer,
and physical layer. As demonstrated in Table 2, a few communication standards are
mentioned for short distance coverage.
Analysis of (IoT)-Based Healthcare Framework … 229
Cloud computing is a technology that emerged when the size of data generated daily
became impossible to store and handle. The patient data needs security and privacy.
The resources that are available on our computers like storage and computational
power are managed by the cloud services according to the data [15]. Figure 8 shows
an impact of cloud computing in healthcare field for instant data transmission.
Storage Access Control Layer is the spine of the cloud enabled environment,
which access medical services by utilizing sensors along with BG and sphygmo
manometers in every day’s exercises [16]. Data Annotation Layer resolves hetero
geneity trouble normally occurs throughout statistics processing. Data testing layer
analyzes the medical records saved inside the cloud platform. Portability and integrity
level of data transfer from end devices to cloud server is a challenging task in IoT
healthcare platform.
Previously, the ability for medical services in 2017 was actualized by utilizing Fog
Computing [16]. A healthcare system was launched in 2016, called health fog [15].
Figure 9 illustrates the recent fog computing studies in healthcare industry using
machine learning and deep learning techniques. Next, to improve system reliability,
cloud-based security functionality was included into health fog [15].
The benefits of edge computing for home and hospital control systems have been
exploited by the current architecture. Health technologies have been moving from
cloud computing to fog computing in recent years [17]. Similar work has been
performed by authors [18] as a four-layered healthcare model comprising of sensation
layer, classification layer, mining layer, and application layer.
Sensation layer obtained the data from various sensors which are located in the
office room. Classification is done based on five different categories such as, Data
about health, Data about Environment, Data about Meal, Data about Physical posture,
Analysis of (IoT)-Based Healthcare Framework … 231
Data about behavior in classification layer. The mining layer is used for extract the
information from a cloud database. Finally, the application layer provides various
services like personal health recommender system, remote medical care monitoring
system and personal health maintenance system to the end user [19].
The primary goal of IoT security is to safeguard the consumer privacy, data confi-
dentiality, availability, transportation infrastructure by an IoT platform. Block chain
innovation improves responsibility among patients and doctor [23]. DDoS attacks
perhaps one of the great examples of the issues that include shipping gadgets with
default passwords and no longer telling customers to exchange them as soon as they
obtain them [16]. As the wide variety of IoT related gadgets proceeds to upward
push before very long, a wide assortment of malware and ransomware are utilized
to misuse them [24]. An IP camera is suitable for capturing sensitive statistics on
the usage of a huge variety of places, such as your private home, paintings office, or
maybe the nearby gasoline station [25]. There are various security problems in IOT.
Mostly the password of IoT devices is weak or hard coded [26]. Various security
challenges in IoT are,
• Insufficient checking out and updating
• Brute-forcing and the issue of default passwords
• IoT malware and ransomware
• Data security, privacy and untrustworthy communication.
Investigators applied the various machine learning and deep learning models in
healthcare framework for different disease detection. Table 4 summarize the various
methodology used for disease prediction and detection.
Table 4 Summary of Data analysis using Machine learning and Deep learning algorithms in healthcare
Reference/Attainment Objective Type of Methodology Performance
data metrics
[27]/Better decision support given using XG Early prediction and survival rate through MRI Clinical Logistic regression, Support vector ma chine, XG AUC, ROC
boost classifier for further treatment images for breast cancer data—MRI boost, Linear discriminant analysis Precision,
[28]/High accuracy pre- diction model for Development of deep learning system to Clinical DNN, KNN, Support vector machine Recall F1
survival rate of HCC disease pre-dict carcinoma and its risk level data Score,
Implementation of semi- automated Accuracy
[29]/Efficient feature se- lection model for framework based on genotype- phenotype Electronic Random forest, Decision trees, KNN, Bayes AUC,
classification of T2DM association for Diabetes Comparative analysis health classifier Precision,
of traditional and machine learning models for records Recall,
early prediction of emergency case Sensitivity,
Analysis of pattern classification models for Specificity
variety of brain tumors
[30]/Temporal based pre- diction model for Monitoring early and de- lay movement of Electronic Random forest, Gradient boost, Cox model Confidence
high risk factors of emergency ad- mission infant based on kinematics analysis model health matrix, AUC
Human activity recognition using DBN model records
[31]/SVM based multi-class classification of and dimensionality reduction through kernel Clinical Recursive feature elimination, Lin- Confusionma
Analysis of (IoT)-Based Healthcare Framework …
brain tumor principle component analysis data—MRI eardiscriminant analysis, KNN trix,Entropy,
[32]/Monitoring early movement of infants Detection and progressive analysis of Wearable Ada boost, Support vector machine, Logistic t-test for
through wearable sensors Parkinson’s disease using Machine learning sensordata regression Stan-dard
[33]/DBN based activity recognition through Deeplearningbased emotionclassification through Deep belief network, Feature extraction deviation
PCA and LDA approaches throughphysiological, environmentaland ankle of Accuracy,
location based signals Severitydetectionof infant Preci sion,
proteins associated with acute myeloid Wearable Recall, F1
leukemia body score
patients sensors Accuracy
(continued)
233
Table 4 (continued)
234
9 Discussion
In this section, discuss about the research limitations and challenges of IoT-based
healthcare framework. flexible wearable sensors are required to monitor the patient’s
health status. Design a secured framework for data transmission from edge device to
control device then cloud server. In that, various intruders may be involved to modify
the data and break the confidentiality. Analysis of signals should be done in ECG
and EEG monitoring using ML. Energy efficient optimization algorithm is needed to
protect the consumption and reduce the amount of usage level. Data privacy is more
important especially in healthcare domain. It can be achieved through cryptographic
model and standards.
10 Conclusion
References
1. Baker SB, Xiang W, Atkinson I (2017) Internet of things for smart healthcare: technologies,
challenges, and opportunities. Institute of Electrical and Electronics Engineers Inc., vol 5, pp
26521–26544, Nov. 29, 2017. IEEE Access. https://doi.org/10.1109/ACCESS.2017.2775180
2. Carnaz GJF, Nogueira V (2019) An overview of IoT and health- care question answering
systems in medical and healthcare domain view project NanoSen AQM view project Vitor
Nogueira Universidade de E´vora An Overview of IoT and Healthcare. Available: https://
www.researchgate.net/publication/330933788
3. Hussain S, Huh E, Kang BH, Lee S (2015) GUDM: automatic generation of unified datasets for
learning and reasoning in healthcare, pp 15772–15798. https://doi.org/10.3390/s150715772
4. Majumder AJA, Elsaadany YA, Young R, Ucci DR (2019) An energy efficient wearable smart
IoT system to predict cardiac arrest. Adv Hum-Comput Interact vol 2019. https://doi.org/10.
1155/2019/1507465
5. Ani R, Krishna S, Anju N, Sona AM, Deepa OS (2017) IoT based patient monitoring and
diagnostic prediction tool using ensemble classifier. In: 2017 International Conference on
Advanced Computing and Communication Informatics, ICACCI 2017, vol 2017-January, pp
1588–1593. https://doi.org/10.1109/ICACCI.2017.8126068
236 B. Lalithadevi and S. Krishnaveni
6. Joyia GJ, Liaqat RM, Farooq A, Rehman S (2017) Internet of Medical Things (IOMT): applica-
tions, benefits and future challenges in healthcare domain, May 2018. https://doi.org/10.12720/
jcm.12.4.240-247
7. Konstantinidis EI, Antoniou PE, Bamparopoulos G, Bamidis PD (2015) A lightweight frame-
work for transparent cross platform communication of controller data in ambient assisted living
environments. Inf Sci (NY) 300(1):124–139. https://doi.org/10.1016/j.ins.2014.10.070
8. Saba T, Haseeb K, Ahmed I, Rehman A (2020) Journal of Infection and Public Health Secure
and energy-efficient framework using Internet of Medical Things for e-healthcare. J Infect
Public Health 13(10):1567–1575. https://doi.org/10.1016/j.jiph.2020.06.027
9. Krishnaveni S, Prabakaran S, Sivamohan S (2016) Automated vulnerability detection and
prediction by security testing for cloud SAAS. Indian J Sci Technol 9(S1). https://doi.org/10.
17485/ijst/2016/v9is1/112288
10. Yang X, Wang X, Li X, Gu D, Liang C, Li K (2020) Exploring emerging IoT technologies in
smart health research: a knowledge graph analysis 9:1–12
11. Nashif S, Raihan MR, Islam MR, Imam MH (2018) Heart disease detection by using machine
learning algorithms and a real-time cardiovascular health monitoring system. World J Eng
Technol 06(04):854–873. https://doi.org/10.4236/wjet.2018.64057
12. Krishnaveni S, Vigneshwar P, Kishore S, Jothi B, Sivamohan S (2020) Anomaly-based intrusion
detection system using support vector machine. In: Artificial intelligence and evolutionary
computations in engineering systems, pp 723–731
13. Ram SS, Apduhan B, Shiratori N (2019) A machine learning framework for edge computing to
improve prediction accuracy in mobile health monitoring. In: Lecture notes in computer science
(including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics),
July 2019, vol 11621 LNCS, pp 417–431. https://doi.org/10.1007/978-3-030-24302-930
14. Umar S, Alsulaiman M, Muhammad G (2019) Deep learning for EEG motor imagery classi-
fication based on multi-layer CNNs feature fusion. Futur Gener Comput Syst 101:542–554.
https://doi.org/10.1016/j.future.2019.06.027
15. Minh Dang L, Piran MJ, Han D, Min K, Moon H (2019) A survey on internet of things and
cloud computing for healthcare. Electronics 8(7). https://doi.org/10.3390/electronics8070768
16. Dewangan K, Mishra M (2018) Internet of things for healthcare: a review. Researchgate.Net
8(Iii):526–534. Available: http://ijamtes.org/
17. Sood SK, Mahajan I (2019) IoT-Fog-based healthcare framework to identify and control hyper-
tension attack. IEEE Internet Things J 6(2):1920–1927. https://doi.org/10.1109/JIOT.2018.287
1630
18. Bhatia M, Sood SK (2019) Exploring temporal analytics in fog-cloud architecture for smart
office healthcare. Mob Networks Appl 24(4):1392–1410. https://doi.org/10.1007/s11036-018-
0991-5
19. Raj JS (2021) Security enhanced blockchain based unmanned aerial vehicle health monitoring
system. J ISMAC 3(02):121–131
20. Nandyala CS, Kim HK (2016) From cloud to fog and IoT-based real-time U- healthcare moni-
toring for smart homes and hospitals. Int J Smart Home 10(2):187–196. https://doi.org/10.
14257/ijsh.2016.10.2.18
21. Dubey H, Yang J, Constant N, Amiri AM, Yang Q, Makodiya K (2016) Fog data: enhancing
Telehealth big data through fog computing. In: ACM international conference on proceeding
series, vol 07–09-October-2015. May 2016. https://doi.org/10.1145/2818869.2818889
22. He W, Yan G, Da Xu L, Member S (2017) Developing vehicular data cloud services in the IoT
environment. https://doi.org/10.1109/TII.2014.2299233
23. Suma V (2021) Wearable IoT based distributed framework for ubiquitous computing. J
Ubiquitous Comput Commun Technol (UCCT) 3(01):23–32
24. Hariharakrishnan J, Bhalaji N (2021) Adaptability analysis of 6LoWPAN and RPL for
healthcare applications of internet-of-things. J ISMAC 3(02):69–81
25. Pazienza A, Polimeno G, Vitulano F (2019) Towards a digital future: an innovative semantic
IoT integrated platform for Industry 4.0. In: Healthcare, and territorial control
Analysis of (IoT)-Based Healthcare Framework … 237
26. Aceto G, Persico V, Pescap´e A (2018) The role of Information and Communication Tech-
nologies in healthcare: taxonomies, perspectives, and challenges. J Netw Comput Appl
107:125–154. https://doi.org/10.1016/j.jnca.2018.02.008
27. Tahmassebi A et al (2019) Impact of machine learning with multiparametric magnetic resonance
imaging of the breast for early prediction of response to neo adjuvant chemotherapy and survival
outcomes in breast cancer patients. Invest Radiol 54(2):110–117. https://doi.org/10.1097/RLI.
0000000000000518
28. Kayal CK, Bagchi S, Dhar D, Maitra T, Chatterjee S (2019) Hepatocellular carcinoma
survival prediction using deep neural network. In: Proceedings of international ethical hacking
conference 2018, pp 349–358
29. Zheng T et al (2017) A machine learning-based framework to identify type 2 diabetes through
electronic health records. Int J Med Inform 97:120–127. https://doi.org/10.1016/j.ijmedinf.
2016.09.014
30. Rahimian F et al (2018) Predicting the risk of emergency admission with machine
learning: development and validation using linked electronic health records. PLoS Med
15(11):e1002695. https://doi.org/10.1371/journal.pmed.1002695
31. Zacharaki EI et al (2009) Classification of brain tumor type and grade using MRI texture and
shape in a machine learning scheme. Magn Reson Med 62(6):1609–1618. https://doi.org/10.
1002/mrm.22147
32. Goodfellow D, Zhi R, Funke R, Pulido JC, Mataric M, Smith BA (2018) Predicting infant
motor development status using day long movement data from wearable sensors. Available:
http://arxiv.org/abs/1807.02617
33. Hassan MM, Huda S, Uddin MZ, Almogren A, Alrubaian M (2018) Human activity recognition
from body sensor data using deep learning. J Med Syst 42(6):99. https://doi.org/10.1007/s10
916-018-0948-z
34. Lonini L et al (2018) Wearable sensors for Parkinson’s disease: which data are worth collecting
for training symptom detection models. npj Digit Med 1(1). https://doi.org/10.1038/s41746-
018-0071-z
35. Kanjo E, Younis EMG, Ang CS (2019) Deep learning analysis of mobile physiological, envi-
ronmental and location sensor data for emotion detection. Inf Fusion 49:46–56. https://doi.org/
10.1016/j.inffus.2018.09.001
36. Liang CA, Chen L, Wahed A, Nguyen AND (2019) Proteomics analysis of FLT3-ITD mutation
in acute myeloid leukemia using deep learning neural network. Ann Clin Lab Sci 49(1):119–
126. https://doi.org/10.1093/ajcp/aqx121.148
Hand Gesture Recognition for Disabled
Person with Speech Using CNN
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 239
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_17
240 E. P. Shadiya Febin and A. T. Nair
1 Introduction
2 Literature Survey
“A Deep Learning Method for Detecting Non-Small Cell Lung Cancer.” The
researchers [1] conducted a series of experiments in order to build a statistical model
enabling deaf individuals to translate speech to sign language. Additionally, they
created a system for automating speech recognition using ASR using an animated
presentation and a statistical translation module for a variety of sign sets. They trans-
lated the text using state transducer and phrase defined system methods. Various
sorts of figures were utilized during the review process: WER, BLEU, and finally
NIST. This article will walk you through the process of speech translation using an
automatic recognizer in each of the three configurations. The research produced a
result for the output of ASR employing a finite type state transducer with a word
error rate of between 28.21 and 29.27%.
A review on deaf-mute communication interpretation [2]: This article will
examine the several deaf-mute communication translator systems in use today. Wear-
able communication devices and online learning systems are the two major commu-
nication techniques used by deaf-mute persons. Wearable communication systems
Hand Gesture Recognition for Disabled Person … 241
are classified into three categories: glove-based systems, keypad-based systems, and
handicom touch-screen systems. All three of the aforementioned technologies make
use of a number of sensors, an accelerometer, a suitable microcontroller, a text to
voice conversion module, a keypad, and a touch-screen. The second method, namely
an online learning system, obviates the need for external equipment to understand
messages between deaf and hearing-impaired individuals. The online learning system
makes use of a number of instructional techniques. The five subdivision techniques
are as follows: SLIM module, TESSA, Wi-See Technology, SWI PELE System, and
Web-Sign Technology.
An efficient framework for recognizing Indian sign language implementation of
the wavelet transform [3]: The suggested ISLR system is a technique for pattern
recognition that entails two key modules: feature extraction and classification. To
recognize sign language, a combination of feature extraction using the Discrete
Wavelet Transform (DWT) and closest neighbor classification is utilized. According
to the experimental results, the suggested hand gesture recognition system achieves
a maximum classification accuracy of 99.23% when the cosine distance classifier is
utilized.
Hand gesture recognition using principal component nalysis in [4]: The authors
proposed a database-driven hand gesture recognition technique that is useful for
human robots and related other applications. It is based on a skin color model
approach and thresholding approach, as well as an effective template matching
strategy. To begin, the hand area is divided into segments using the YCbCr color
space skin color model. The subsequent stage makes use of thresholding to differ-
entiate foreground from background. Finally, using Principal Component Analysis,
a recognition technique based on template matching is created (PCA).
Hand gesture recognition system for dumb people [5]: The authors presented a
static hand gesture recognition system using digital image processing. The SIFT
technique is used to construct the vector representing the hand gestures. At the
edges, SIFT features that are invariant to scaling, rotation, and noise addition have
been calculated.
An automated system for recognizing Indian sign language in [6]: This article
discusses an approach to automatic sign identification that is based on shape-based
characteristics. The hand region is separated from the pictures using Otsu’s thresh-
olding approach, which calculates the optimal threshold for minimizing the variance
of thresholded black and white pixels within classes. Hu’s invariant moments are
utilized to determine the segmented hand region’s characteristics, which are then clas-
sified using an Artificial Neural Network. The performance of a system is determined
by its accuracy, sensitivity, and specificity.
Recognition of hand gestures for sign language recognition: A review in [7]: The
authors examined a variety of previous scholarly proposals for hand gesture and
sign language recognition. The sole method of communication accessible to deaf
and dumb persons is sign language. These physically handicapped persons use sign
language to express their feelings and thoughts to others.
The design issues and proposed implementation of a deaf and stupid persons
communication aid in [8]: The author developed a technique to help deaf and dumb
242 E. P. Shadiya Febin and A. T. Nair
individuals in interacting with hearing people using Indian sign language (ISL), in
which suitable hand gestures are converted to text messages. The major objective
is to build an algorithm capable of instantly translating dynamic motions to text.
Following completion of testing, the system will be incorporated into the Android
platform and made accessible as a mobile application for smartphones and tablet
computers.
Indian and American Sign Language Real-Time Detection and Identification
Using Sift In [9]: The author demonstrated a real-time vision-based system for hand
gesture detection that may be used in a range of human–computer interaction appli-
cations. The system is capable of recognizing 35 distinct hand gestures used in
Indian and American Sign Language, or ISL and ASL. An RGB-to-GRAY segmen-
tation method was used to reduce the chance of incorrect detection. The authors
demonstrated how to extract features using an improvised Scale Invariant Feature
Transform (SIFT). The system is modeled in MATLAB. A graphical user interface
(GUI) concept was created to produce an efficient and user-friendly hand gesture
recognition system.
A Review of the Extraction of Indian and American Sign Language Features in
[10]: This article examined the current state of sign language study and development,
which is focused on manual communication and body language. The three steps of
sign language recognition are generally as follows: pre-processing, feature extraction,
and classification. Neural Networks (NN), Support Vector Machines (SVM), Hidden
Markov Models (HMM), and Scale Invariant Feature Transforms (SIFT) are just a
few of the classification methods available (Table 1).
The present technique makes use of the orientation histogram, which has a
number of disadvantages, including the following: comparable motions may have
distinct orientation histograms, while dissimilar gestures may have similar orienta-
tion histograms. Additionally, the suggested technique worked effectively for any
items that took up the majority of the image, even if they were not hand motions.
3 Proposed System
This article makes use of Deep Learning techniques to identify the hand motion.
To train the recommended system, static image datasets are employed. The network
is built using convolutional neural networks rather than pre-trained models. The
proposed vision-based solution does not require external hardware and is not
restricted by dress code constraints. The deep learning algorithm CNN is used to
convert hand motions to numbers. A camera is utilized to capture a gesture, which is
subsequently used as input for the motion recognition system. Conversion of signal
language to numerical data and speech in real time, or more precisely: Recognize
male and female signal gestures 2. Creating a model for image-to-text content trans-
lation using a system-learning approach three. The genesis of words 4. Composition
of sentences 5. Composing the comprehensive text 6. Convert audio to a digital
format. Figure 1 depicts the steps necessary to accomplish the project’s objectives.
Hand Gesture Recognition for Disabled Person … 243
Gestures are captured by the web camera. This Open CV video stream captures
the whole signing period. Frames are extracted from the stream and transformed
to grayscale pictures with a resolution of 50 * 50 pixels. Due to the fact that the
entire dataset is the same size, this dimension is consistent throughout the project. In
the gathered photographs, hand movements are recognized. This is a preprocessing
phase that occurs before to submitting the picture to the model for prediction. The
paragraphs including gestures are highlighted. This effectively doubles the chance of
prediction. The preprocessed images are put into the keras CNN model. The trained
model generates the anticipated label. Each gesture label is connected with a prob-
ability. The projected label is assumed to be the most likely. The model transforms
recognized movements into text. The pyttsx3 package is used to convert recognized
words to their corresponding speech. Although the text to speech output is a simple
workaround, it is beneficial since it replicates a verbal dialog. The system architecture
is depicted in Fig. 2.
244 E. P. Shadiya Febin and A. T. Nair
Convolutional Neural Networks are used for detection (CNN) CNNs are a special
form of neural network that is highly effective for solving computer vision issues.
They took inspiration from the way image is perceived in the visual cortex of our
brain. They utilize a filter/kernel to scan the entire image’s pixel values and conduct
computations by assigning suitable weights to allow feature detection [25, 26]. A
CNN is made up of multiple layers, including a convolution layer, a maximum pooling
layer, a flatten layer, a dense layer, and a dropout layer. When combined, these layers
form a very powerful tool for detecting characteristics in images. The early layers
detect low-level features and move to higher-level features gradually. Alexnet is a
widely used machine learning method. Which is a sort of deep learning approach
that utilizes pictures, video, text, and sound to do classification tasks. CNNs excel
in recognizing patterns in pictures, allowing for the detection of hand movements,
faces, and other objects. The benefit of CNN is that training the model does not
need feature extraction. CNNs are invariant in terms of scale and rotation. In the
proposed system alexnet is used for object detection the alexnet has eight layers with
learnable parameters the model consists of five layers with a combination of max
pooling followed by 3 fully connected layers and they use relu activation in each of
these layers except the output layer.
3.2 MATLAB
window of the MATLAB interpreter. You may store text files or scripts (with .m
extensions) that include collections of commands and then execute them from the
command line. Users can also create MATLAB routines. Optimizations have been
made to MATLAB’s matrix [27] algebra procedures. Typically, loops take longer
to complete than straight lines. Its functions can be written as C executables to
increase efficiency (though you must have the compiler). Additionally, you may
utilize class structures to organize your code and create apps with intricate graphical
user interfaces. MATLAB comes pre-loaded with various functions for importing and
exporting audio files. MATLAB’s audioread and audiowrite functions allow you to
read and write data to and from a variety of different types of audio files. The sound
(unnormalized) or soundsc (normalized) functions in most versions of MATLAB
may send signals to the computer’s audio hardware.
The dataset for sign language-based numerical is carefully assembled in two distinct
modalities for test and training data. These datasets were trained using the Adam
optimizer over a 20-epoch period, providing accuracy, validation accuracy, loss, and
validation loss for each epoch, as shown in Table 2. It indicates a progressive rise in the
accuracy of instruction. As demonstrated in Fig. 4, accurate categorization requires
a minimum of twenty epochs. The accuracy value obtained at the most recent epoch
indicates the entire accuracy of the training dataset. The categorical cross entropy
of the loss function is used to determine the overall system performance. Between
training and testing, the performance of the [28, 29] CNN algorithm is compared
using a range of parameters, including execution time, the amount of time necessary
246 E. P. Shadiya Febin and A. T. Nair
Fig. 3 The gesture symbols for numbers that will be in the training data
for the program to accomplish the task. Sensitivity measures the fraction of positively
recognized positives that are properly classified. The term “specificity” relates to the
frequency with which false positives are detected. The graph displays the class 5 roc
curve. To obtain the best results, investigations are done both visually and in real
time. The CNN algorithm is advantageous in this task for a variety of reasons. To
begin, CNN is capable of collecting image characteristics without requiring human
intervention. It is faster at memorizing pictures or videos than ANN. CNN executes
more slowly than AN. The proposed system is implemented by using MATLAB 2019
version. The system can also be used in real time by using addition of cameras, used
to address the complex background problem and improve the robustness of hand
detection (Table 2).
Hand gesture recognition was created and developed in accordance with current
design and development methodologies and scopes. This system is very flexible,
allowing for simple maintenance and modifications in response to changing surround-
ings and requirements simply by adding more information. Additional modifications
to bring assessment tools up to date are possible. This section may be reorganized if
required.
4 Conclusion
The major goal of the system was to advance hand gesture recognition. A prototype
of the system has been constructed and tested, with promising results reported. The
device is capable of recognizing and generating audio depending on hand motions.
The characteristics used combine picture capture and image processing to improve
and identify the image using built-in MATLAB techniques. The project is built using
MATLAB. This language selection is based on the user’s needs statement and an
evaluation of the existing system, which includes space for future expansions.
References
1. Hegde B, Dayananda P, Hegde M, Chetan C (2019) Deep learning technique for detecting
NSCLC. Int J Recent Technol Eng (IJRTE) 8(3):7841–7843
2. Sunitha KA, Anitha Saraswathi P, Aarthi M, Jayapriya K, Sunny L (2016) Deaf mute
communication interpreter—a review. Int J Appl Eng Res 11:290–296
3. Anand MS, Kumar NM, Kumaresan A (2016) An efficient framework for Indian sign language
recognition using wavelet transform. Circuits Syst 7:1874–1883
248 E. P. Shadiya Febin and A. T. Nair
4. Ahuja MK, Singh A (2015) Hand gesture recognition using PCA. Int J Comput Sci Eng Technol
(IJCSET) 5(7):267–27
5. More SP, Sattar A, Hand gesture recognition system for dumb people. Int J Sci Res (IJSR)
6. Kaur C, Gill N, An automated system for Indian sign language recognition. Int J Adv Res
Comput Sci Software Eng
7. Pandey P, Jain V (2015) Hand gesture recognition for sign language recognition: a review. Int
J Sci Eng Technol Res (IJSETR) 4(3)
8. Nagpal N, Mitra A, Agrawal P (2019) Design issue and proposed implementation of commu-
nication Aid for Deaf & Dumb People. Int J Recent Innov Trends Comput Commun
3(5):147–149
9. Gilorkar NK, Ingle MM (2015) Real time detection and recognition of Indian and American
sign language using sift. Int J Electron Commun Eng Technol (IJECET) 5(5):11–18
10. Shinde V, Bacchav T, Pawar J, Sanap M (2014) Hand gesture recognition system using camera
03(01)
11. Gebrera ME (2016) Glove-based gesture recognition system. In: IEEE international conference
on robotics and biomimetics 2016
12. Siam SM, Sakel JA (2016) Human computer interaction using marker based hand gesture
recognition
13. Shneiderman B, Plaisant C, Cohen M, Jacobs S, Elmqvist N, Diakopoulos N (2016) Designing
the user interface: strategies for effective human-computer interaction Pearson
14. Lawrence DO, Ashleigh M (2019) Impact of Human-Computer Interaction (HCI) on users
in higher educational system: Southampton University as a case study. Int J Manage Technol
6(3):1–12
15. Chu JU, Jung DH, Lee YJ (2008) Design and control of a multifunction myoelectric hand with
new adaptive grasping and self-locking mechanisms. In: 2008 IEEE international conference
on robotics and automation, pp 743–748, May 2008
16. Marques O (2011) Practical image and video processing using MATLAB. Wiley
17. McAndrew A (2004) An introduction to digital image processing with Matlab notes for
SCM2511 image processing, p 264
18. Khan TM, Bailey DG, Khan MA, Kong Y (2017) Efficient hardware implementation for
fingerprint image enhancement using anisotropic Gaussian filter. IEEE Trans Image Process
26(5):2116–2126
19. Nishad PM (2013) Various colour spaces and colour space conversion. J Global Res Comput
Sci 4(1):44–48
20. Abhishek B, Krishi K, Meghana M, Daaniyaal M, Anupama HS (2019) Hand gesture recog-
nition using machine learning algorithms. Int J Recent Technol Eng (IJRTE) 8(1) ISSN:
2277-3878
21. Ankita W, Parteek K (2020) Deep learning-based sign language recognition system for static
signs. Neural Comput Appl
22. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep
convolutional neural networks. Comput Vis Pattern Recogn
23. Chuan CH, Regina E, Guardino C (2014)American sign language recognition using leap motion
sensor. In: 13th International Conference on Machine Learning and Applications (ICMLA),
pp 541–544
24. Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on
skeletal data. In: 13th IEEE International conference on automatic face & gesture recognition,
pp 106–113
25. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep
convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005
(29 p). https://doi.org/10.1142/S0219519421500056
26. Nair AT, Muthuvel K (2019) Blood vessel segmentation and diabetic retinopathy recognition:
an intelligent approach. Comput Methods Biomech Biomed Eng: Imaging Visual. https://doi.
org/10.1080/21681163.2019.1647459
Hand Gesture Recognition for Disabled Person … 249
27. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diag-
nosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030(29 p). https://doi.org/10.
1142/S0219467820500308
28. Nair AT, Muthuvel K, Haritha KS (2020) Effectual evaluation on diabetic retinopathy. In:
Publication in lecture notes. Springer, Berlin
29. Nair AT, Muthuvel K, Haritha KS (2021) Blood vessel segmentation for diabetic retinopathy.
In: Publication in the IOP: Journal of Physics Conference Series (JPCS)
Coronavirus Pandemic: A Review
of Different Machine Learning
Approaches
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 251
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_18
252 B. Singh and R. Agarwal
person. This virus, also called as extreme acute respiratory syndrome CORONA
VIRUS-2 , affects the respiratory system. The virus shows different reactions to
individual bodies [2]. The basic structure of COVID-19 virus is shown in Fig. 1. The
spread of COVID-19 disease is divided into different stages such as [4]: STAGE-1,
STAGE-2, STAGE-3, and STAGE-4.
STAGE-1 occurs when infected people travells from one country to another. Espe-
cially, people that have travelled overseas are found to be infected with respiratory
ailment. During that time, the illness is not spreading domestically. During STAGE-2,
localized transmissions occur and the source, i.e., the sick individual who may have
traveled to certain other nations that have already been affected, becomes recognized
and can be tracked down. The STAGE-3 is community transmission. At this stage,
the virus is spread from one person to another or one cluster to another. At this stage,
the infected person is hard to trace.
The most serious stage of a contagious diseases propagating inside a country is
STAGE-4. During this stage, there seem to be multiple spots of illness in various
sections of the population, as well as the disease has taken on pandemic proportions.
Because of the flu virus’s quick and extensive spread, the medical organization
classified coronavirus infection 2019 a pandemic around March 11, 2020. However, it
started as a Chinese outbreak, with the first case verified on February 26th in Hubei
province, Wuhan [4]. The epidemiological component of COVID-19 was chosen
independently as a novel coronavirus, initially dubbed 2019-nCoV [5]. The disease
genome was eventually transcribed, and it was termed SARS-CoV-2 as acute respi-
ratory syndrome by the World Association of Categorization of Infections because
it was morphologically closer to the COVID-19 breakout that triggered the SARS
outbreak in 2003.
Different tests are available for COVID-19 disease detection such as PCR (Poly-
merase Chain Reaction) [5], COVID-19 Antigen [6], and COVID-19 Antibody Test.
COVID-19 infection resulted in a slew of devastating diseases, with eye-catching
signs and symptoms [7]. Nausea, clogged nose, tiredness, and breathing problems
were among the flu-related symptoms mentioned by participants. The composition
of the COVID-19 viral infection is tough to grasp for any researcher. COVID-19 has
several symptoms. Certain signs are frequent, some that are less common, and some
that are unusual.
COVID-19 affects human health as well as mental status. The COVID-19
epidemic has caused many workers who have lost their jobs. People go through
issues of anxiety and stress. The various mental issues and symptoms of COVID-19
are depicted in Table 1.
Early detection is essential for COVID-19 identification, and it can improve
intensive physical rates. In the early stages of COVID-19, image processing is
a vital approach for exploring and identifying the disease. Although, manually
analyzing a large range of healthcare imaging can sometimes be time-consuming
and monotonous, and it is susceptible to human error and biased [10]. Deep learning
(DL) and AI (artificial intelligence) make it simple to distinguish between infected
and non-infected patients. There are various aspects by which AI and ML become
helpful for COVID-19 detection.
In review practice, AI approaches are used to diagnose disease and anticipate
therapy outcomes. AI can give essential information regarding allocation of resources
and judgment through prioritizing the requirement for mechanical ventilation and
breathing assistance in the ICU (Intensive Care Unit) respondents through question-
naires through supporting documents and clinical factors. AI also can be utilized
to forecast recovery or mortality in COVID-19, as well as to offer regular reports,
preservation and predictive analytics, and therapy tracking. AI is used to diagnose
individuals into low, medium, and serious divisions depend on the outcome of their
sensations, predisposition, and clinical reports so that alternative actions can be
implemented to treat individuals as quickly and efficiently as possible [11].
In medical applications, deep learning has achieved significant enhancement.
Deep learning can find patterns in very massive, complicated datasets. They have been
recognized as a viable tool for analyzing COVID-19-infected individuals. COVID-19
identification based on neural network is a deep learning model that uses perceptual
2-dimensional and 3-dimensional data collected from a dimensional lung CT scan to
2 Literature Review
For the analysis and detection of CORONA virus, Jain [13] designed a novel approach
which was based on the DL concept. The testing and training of the DL based model
were done with CXR images. The images of infected and non-infected persons were
utilized for the training purpose of various DL models. The images of chest x-ray
were filtered out and data augmentation was applied to them. The three-DL based
approaches ResNeXt, Inception V3, and Xception were examined based on their
accuracy of COVID-19 detection. The collected dataset had 6432 images of CXR
which were collected from Kaggle site. The collection of 5467 images was used for
the training purpose of models and 965 used for testing purposes. The Xception model
provided the highest accuracy among other models. The performances of the models
were examined on three parameters: precision rate, f1-score, and recall rate. As the
CNN approach provided a standard sate of results in the medical field. For efficient
results with a deep convolutional model, Kamal et al. [14] provided an evaluation of
prototypes based on pre-trained for the COVID-19 classification of CXR frames. The
Neural Architecture Search Network (NASNet), resnet, mobilenet, DenseNet, VGG-
19, and InceptionV3 pre-trained models were examined. The comparison outcomes
of pre-trained models show that three class classifications had achieved the highest
accuracy. Ibrahim et al. [15] proposed a system that can classify three different classes
of COVID-19. The AlexNet pre-trained model was implemented for the classifica-
tion of patients. The model was used to predict the type of COVID-19 class as well as
predict the infected patient or non-infected patient. The CXR medical images were
composed from public datasets. The database of images contained bacterial pneu-
monia, COVID-19, pneumonia viral infected, and healthy or CXR normal images.
The classification outcomes of the proposed model were based on two-way clas-
sification, three ways, and four-way classification. In a two-way classification of
non-infected or normal and viral pneumonia images, the proposed model provided
94.43% of accuracy. In normal and bacterial pneumonia classification, the model
provided 91.43% of accuracy. The model got 93.42% of accuracy in the four-way
Coronavirus Pandemic: A Review of Different … 255
For the detection of COVID-19, there are two approaches: AI based and DL based.
Due to artificial intelligence fast-tracking technologies, AI is helpful in lowering
doctors’ stress, because it can analyze radiographic findings using DL (deep learning)
and ML (machine learning) systems. AI’s fast-tracking platforms encourage cost-
effective and time operations through swiftly assessing a huge proportion of images,
leading to better patient care.
256 B. Singh and R. Agarwal
Artificial intelligence technologies are expanding into fields that were traditionally
thought to be in the domain of human intelligence, recent advances in comput-
erized information collection, predictive analytics, and computational technologies.
Medical practice is being influenced by machine learning. It is still difficult to develop
prediction systems that could really reliably predict and identify such viruses. AI
methods, also known as classification methods, may take in information, analyzes
everything statistically, and determine the future based on the statistical architectural
features. Many of these techniques have such a variety of uses, including image
Coronavirus Pandemic: A Review of Different … 257
Symptomatic
Symptomatic
COVID-19 AI Diagnosis
Analysis
patients dataset
Positive samples
AI based
Recovery taken & Start
treatment
Therapy
COVID-19 Positive or
Cured
Retest Negative
To control the impacts of the illness, AI techniques are used for a variety
of domains. Available treatments, image classification connected to COVID-19,
pharmacology investigations, and epidemiological are among the implementations.
When applied in the interpretation of multimodal images, DL-based models have the
potential to provide an effective and precise strategy for the diagnosis and classifying
COVID-19 disease, with significant increases in image resolution. Deep learning
methods have advanced significantly in the recent two decades, providing enormous
prospects for application in a variety of sectors and services. Figures 4 and 5 depict
the core architecture-based approach for detecting COVID-19. Deep learning-based
methods with keypoints, layers, advantages, and limitations are shown in Table 4.
There are several types of deep learning techniques are:
• Convolutional Neural Network (CNN): Convolutional Neural Networks have
grown in popularity as a result of their improved frame classification performance.
The activation functions of the organization, in conjunction with classifiers, aid
in the retrieval of temporal and spatial features from frames. In the levels, a
weight-sharing system is implemented, which considerably reduces computing
time [27].
• Recurrent Neural Network (RNN): Due to internal storage space, the RNN
(Recurrent Neural Network) was one of the first algorithms to preserve starting
data, making it excellent for computer vision difficulties involving sequential
Table 4 Approaches based on deep learning with keypoints, layers, advantages and limitations
Method Keypoints Layers Advantages Limitations
CNN based [13] Work only on Multilayers High accuracy Data scarcity
limited images
DCNN [14] Three class 50 layers Low computational Implementation is
differentiation cost hard
ResNet-50 Provided high 50 layers Less false results High
pre-trained model accurate results computational cost
[16]
Deep More suitable for Multilayer Highly accurate Small dataset
learning-based binary results
YOLO model [17] classification
References
1. Sungheetha A (2021) COVID-19 risk minimization decision making strategy using data-driven
model. J Inf Technol 3(01):57–66
2. Pereira RM, Bertolini D, Teixeira LO, Silla Jr CN, Costa YM (2020) COVID-19 identification in
chest X-ray images on flat and hierarchical classification scenarios. Comput Methods Programs
Biomed 194:105532
3. Haque SM, Ashwaq O, Sarief A, Azad John Mohamed AK (2020) A comprehensive review
about SARS-CoV-2. Future Virol 15(9):625–648
4. COVID-19: The 4 Stages Of Disease Transmission Explained (2021). Retrieved 24 June
2021, from https://www.netmeds.com/health-library/post/covid-19-the-4-stages-of-disease-
transmission-explained
5. Cai Q, Du SY, Gao S, Huang GL, Zhang Z, Li S, Wang X, Li PL, Lv P, Hou G, Zhang LN
(2020) A model based on CT radiomic features for predicting RT-PCR becoming negative in
coronavirus disease 2019 (COVID-19) patients. BMC Med İmaging 20(1):1–10
6. Mohanty A, Kabi A, Kumar S, Hada V (2020) Role of rapid antigen test in the diagnosis of
COVID-19 in India. J Adv Med Med Res 77–80
7. Coronavirus disease (COVID-19)—World Health Organization. (2021). Retrieved 9 June
2021, from https://www.who.int/emergencies/diseases/novel-coronavirus-2019?gclid=Cj0
KCQjwzYGGBhCTARIsAHdMTQwyiiQqt3qEn89y0AL5wCEdGwk1bBViX2aoqA__F7M
aGeQEiuahTI4aAh4uEALw_wcB
8. Larsen JR, Martin MR, Martin JD, Kuhn P, Hicks JB (2020) Modeling the onset of symptoms
of COVID-19. Front Public Health 8:473
9. Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2021) Deep-LSTM ensemble framework
to forecast Covid-19: an insight to the global pandemic. Int J Inform Technol, 1–11
10. Huang S, Yang J, Fong S, Zhao Q (2021) Artificial intelligence in the diagnosis of COVID-19:
challenges and perspectives. Int J Biol Sci 17(6):1581
11. Arora N, Banerjee AK, Narasu ML (2020) The role of artificial intelligence in tackling COVID-
19
12. Nayak J, Naik B, Dinesh P, Vakula K, Dash PB, Pelusi D (2021) Significance of deep learning
for Covid-19: state-of-the-art review. Res Biomed Eng, 1–24
13. Jain R, Gupta M, Taneja S, Hemanth DJ (2020) Deep learning-based detection and analysis of
COVID-19 on chest X-ray images. Appl Intell 51(3):1690–1700
14. Kamal KC, Yin Z, Wu M, Wu Z (2021) Evaluation of deep learning-based approaches for
COVID-19 classification based on chest X-ray images. Sign Image Video Process, 1–8
15. Ibrahim AU, Ozsoz M, Serte S, Al-Turjman F, Yakoi PS (2020) Pneumonia classification using
deep learning from chest X-ray images during COVID-19. Cogn Comput, 1–13
16. Annavarapu CSR (2021) Deep learning-based improved snapshot ensemble technique for
COVID-19 chest X-ray classification. Appl Intell, 1–17
17. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR (2021) Automated
detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol
Med 121:103792
18. Al-antari MA, Hua CH, Bang J, Lee S (2020) Fast deep learning computer-aided diagnosis of
COVID-19 based on digital chest x-ray images. Appl Intell, 1–18
19. Eljamassi DF, Maghari AY (2020) COVID-19 detection from chest X-ray scans using machine
learning. In: 2020 International Conference on Promising Electronic Technologies (ICPET),
pp 1–4
20. Tayarani-N MH (2020) Applications of artificial intelligence in battling against Covid-19: a
literature review. Chaos, Solitons Fractals 110338
21. Feng C, Huang Z, Wang L, Chen X, Zhai Y, Chen H, Wang Y, Su X, Huang S, Zhu W, Sun W
(2020) A novel triage tool of artificial intelligence assisted diagnosis aid system for suspected
COVID-19 pneumonia in fever clinics. MedRxiv
22. Annavarapu CSR (2021) Deep learning-based improved snapshot ensemble technique for
COVID-19 chest X-ray classification. Appl Intell 51(5):3104–3120
Coronavirus Pandemic: A Review of Different … 263
23. Bharti U, Bajaj D, Batra H, Lalit S, Lalit S, Gangwani A (2020) Medbot: conversational
artificial intelligence powered Chatbot for delivering tele-health after Covid-19. In: 2020 5th
International Conference on Communication and Electronics Systems (ICCES), pp 870–875
24. de Moraes Batista AF, Miraglia JL, Donato THR, Chiavegatto Filho ADP (2020) COVID-19
diagnosis prediction in emergency care patients: a machine learning approach. medRxiv
25. Mukhtar AH, Hamdan A (2021) Artificial intelligence and coronavirus COVID-19: applica-
tions, impact and future implications. The importance of new technologies and entrepreneurship
in business development: in the context of economic diversity in developing countries, vol 194,
p 830
26. Burugupalli M (2020) Image classification using transfer learning and convolution neural
networks
27. Ganatra N, Patel A (2018) A Comprehensive study of deep learning architectures, applications
and tools. Int J Comput Sci Eng 6:701–705
28. Chen JIZ (2021) Design of accurate classification of COVID-19 disease in X-ray images using
deep learning approach. J ISMAC 3(02):132–148
29. Welch Medical Library Guides: Finding Datasets for Secondary Analysis: COVID-19 Datasets
(2021). Retrieved 30 July 2021, from https://browse.welch.jhmi.edu/datasets/Covid19
30. Aishwarya T, Kumar VR (2021) Machine learning and deep learning approaches to analyze
and detect COVID-19: a review. SN Comput Sci 2(3):1–9
High Spectrum and Efficiency Improved
Structured Compressive Sensing-Based
Channel Estimation Scheme for Massive
MIMO Systems
Abstract Due to its high spectrum and energy proficiency, massive MIMO will
become the most promising technique for 5G communications in future. For accu-
rate channel estimation, potential performance gain is essential. The pilot overhead
in conventional channel approximation schemes is due to the enormous number
of antennas used at the base station (BS), and also this will be too expensive; for
frequency division duplex (FDD) massive MIMO, it is very much unaffordable. We
introduced a structured compressive sensing (SCS)-based temporal joint channel
estimation scheme which reduces pilot overhead where it requires, delay-domain
MIMO channels are leveraged whereby the spatiotemporal common sparsity. The
accurate channel estimation is required to fully exploit the mass array gain, which
states the information at the transmitter side. However, FDD downlink channel esti-
mation always requires more training and computation than TDD mode, even though
the uplink and downlink channel is always not straightforwardly reciprocal, due to the
massive number of antennas in base station. At the base station, we first introduce the
non-orthogonal pilots which come under the structure of compressive sensing theory
to reduce the pilot overhead where they are required. Then, a structured compressive
sensing (SCS) algorithm is introduced to approximate the channels associated with
all the other OFDM symbols in multiple forms, then the inadequate number of pilots
is estimated, and the spatiotemporal common sparsity of massive MIMO channels
is also exploited to recover the channel estimation with precision. Furthermore, we
recommend a space–time adaptive pilot scheme to decrease the pilot overhead, by
making use of the spatiotemporal channel correlation. Additionally, in the multi-cell
scenario, we discussed the proposed channel estimation scheme. The spatial corre-
lation in the wireless channels is exploited for outdoor communication scenarios,
where mostly in wireless channels. Meanwhile, compared with the long signal trans-
mission distance, the scale of the transmit antenna is negligible. By utilizing the
greater number of spatial freedoms in massive MIMO can rise the system capacity
and energy proficiency of magnitude. Simulation results will show that the proposed
system outperforms than all the other existing systems.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 265
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_19
266 V. Baranidharan et al.
1 Introduction
2 Related Work
Massive MIMO is completely based on boosting the spectral efficiency and multi-
plexing gain, which employs more than 100s of antennas in the base station [1].
At the transmitter side, the channel state information is required for accuracy and
the consumption of large amount of downlink channel estimation where especially
occurs at the FDD systems. To overcome this problem, we introduce distributed
compressive sensing (DCS). Slow variation in the channel statistics is fully exploited
in multi-channel frequency domain, sparsity will come under multiple sub-channels.
Hybrid training is proposed to support the channel matrix previous frames that can
be widely used to represent the downlink three components. These three compo-
nents proposed through uplink in real-time channel state information fast-tracking is
obtained. This technique is widely used for optimization of the convergence function
in channel estimation and adoption of low complexity.
In massive MIMO system, accurate channel estimation is important to ensure good
performance [2]. Comparing to time division duplex (TDD), high data rate and high
coverage of wide area are high in FDD. TDD does not require heavier training and
computation compared to FDD, and it is not straightforwardly reciprocal because of
a greater number of antennas. In the real-time estimation, channel variation renders
become more difficult in FDD massive MIMO and downlink channel estimation is
done here.
In FDD systems, estimation of uplink and downlink feedback to reduce pilot
overhead and cascade pre-coding have been used [3]. So, low-dimensional channel
estimation will be predicted accurately and feedback is also estimated by using
cascaded pre-coding techniques. The parametric model in massive MIMO is used
for downlink channel estimation. Through the decided forward link, the path delay
will be estimated first and then the base station will be quantized. Both downlink
and uplink have the identical path delay where parametric models lead to data fitting
errors.
The high spectrum and energy efficiency in the massive MIMO is the most
promising and developing technology for the wireless communications [4]. In FDD,
downlink channel estimation becomes unaffordable due to a greater number of base
station antennas. Perfect channel recovery in the minimum pilot symbols with Gaus-
sian mixture distribution is followed by each channel vector of the general channel
model. Weighted sum of Shannon mutual informal design pilot symbols between
the user and corresponds channel of grass mannianmoni FDD. NMSE level is not
that much good for multi-user scenarios. Least square (LS) method and distributed
compressive sensing (DCS) method are combined in the DCS techniques for better
estimation. Among the different subcarriers, channel vectors in the angular domain
are estimated in the form of two parts and the overall problem is that computation
complexity is high and channel estimation is not accurate and reduces pilot overhead.
In order to obtain the channel state information accurately at the transition side,
we have to exploit and improve the multiplexing and array gain of the multiple input
268 V. Baranidharan et al.
and multiple output systems (MIMO) [5]. Due to overwhelming pilot and feed-
back overhead, FDD will not support conventional channel estimation. Compressive
channel estimation is introduced to reduce the pilot overhead in the FDD massive
MIMO systems. Beam space is maximized in beam block massive MIMO and pilot
overhead in the downlink training can be reduced through beam block compressive
channel estimation scheme. For acquiring reliable CSIT, we wish to propose the
optimal block for orthogonal transmission which comes under the pursuit algorithm
at the limit of the pilot overhead, effective channel matrix algorithm is always used
for representation by amplitude and phase of signal which received and developed
at the feedback load.
In FDD-based massive MIMO system, the major problem in these uplink and
downlink for the channel estimation is discussed in work [6]. This will reduce the
pilots in uplink/downlink, codebook, and dictionary-based channel model to present
in this work for channel estimation, and robust channel representation is used by
observing the reciprocity of the AOA/AOD is calculated for the uplink/downlink
data transitions The downlink training overhead can be reduced by utilizing the
information from simple uplink training which is a bottleneck of FDD massive MIMO
system.
For massive MIMO, the parametric channel estimation has been done to propose
the channel estimation [7]. Then spatial correlation of wireless channel is estimated.
The wireless channel is sparse, where the spatial correlated values of wireless channel
are exploited and the scale of the antenna array will be negligible compared to long
signal transmission distance. The similar path delay of the transmitting antennas
usually shares the channel impulse response (CIR). Here we propose a parametric
channel estimation method which exploits the spatial common sparsity of massive
MIMO which leads to reduce the pilot overhead significantly. The accuracy of the
channel will be increased gradually by increasing the number of antennas to acquire
the same accuracy by reducing the number of pilots and the limitation is it does not
support low dimension CSI.
In 5G wireless communication, the broadband channel shows that they exhibit the
delay domain in the sparsity with the extensive of experimental studies. Because
of the large time arrival the channel delay spreader as much as the earliest path.
The transmitter domain is the antenna which is place at the base station the channel
impulse response (CIR) is expressed as
T
h m,r = h m,r [1], h m,r [2], . . . , h m,r [L] , 1 < m < M,
where the term r represents the index of OFDM in delay domain. Then value L is
always indicates the equivalent channel length Dm,r = supp{hm,r } = {l:|hm,r [l]|>pth
High Spectrum and Efficiency Improved Structured … 269
They referred as spatio common sparsity in wireless massive MIMO channels. The
advanced LTE carrier frequency in working system fc = 2 GHz where the bandwidth
is single as fs = 10 MHz.
Then the uniform linear array (ULA) the distance is calculated as 8λ/2c = 4/fc
= 0.002 µs compare with sample period of system Ts = 1/fs = 0.1 µs. Then λ and
c are denoted as wavelength, and velocity is the difference in the transmitting and
receiving side with the same scatters should be pointed or uncorrelated event due to
non-isotopic antennas.
Dm,r = Dm,r +1 = · · · = Dm,r +R−1 , 1 < m < M
For the reliable channel estimation, we should use the algorithm known as SCS-
based algorithm which is used at the user side. For the further reduction in the pilot
overhead, we should propose the scheme known as spatial time adaptive pilot scheme.
In the extension of multi-cell scenario, the proposed channel estimation scheme is
discussed.
The Nyquist sampling theorem is the completely based on classical framework design
of orthogonal pilots in conventional method, which is all forms of existing MIMO
systems. The different subcarriers can occupy the different transmit antennas. Where
270 V. Baranidharan et al.
orthogonal pilots can be illustrated. The pilot of the different antennas in the trans-
mitter is same subcarrier which they occupy completely where it is fully based on CS
theory which supports the proposed non-orthogonal pilot scheme for effective data
transmission. The sparse nature of the channel is leverage to reduce substantially.
The MIMO channel estimation is in the form of OFDM symbol we have to
consider, for the Proposed non-orthogonal scheme, subcarrier pilot in the index set
is denoted as which is non-similar sequence set of values ranges varies from 1 to N
which is non-unique for all antennas in transmission there Np = | ξ |c in the subcarrier
of pilot in the OFDM symbol. N is the symbol that denotes the number of subcarriers
in the symbol of OFDM. p m ∈ C Np×1 is denoted as pilot sequence mth transmitting
antennas.
The pilot sequence which received as yr ∈ C Np×1 of the rth antenna of an OFDM
symbol can be expressed by the following equation where the removal of discrete
Fourier transform (DFT) and guard interval.
M
yr = diag{Pm} F|ξ h m,r + wr
m=1 0(N −L)×1 m=1
M
M
= Pm F|ξ h m,r + wr = φm h m,r + wr
m=1
yr = φh r + wr
Y = ψD + W
∗
Y = yr , yr +1 , . . . , yr +R−1 ∈ C N p R
∗
D = d̃r , d̃r +1,..., d̃r +R−1 ∈ C M L R
T
D = D1 T , D2 T , . . . , DL T
where the size of M × R for 1 ≤ l ≤ L at D, then the signal gain of lth delay in
path is mth row calculated and rth column of element places in D1 matrix, where the
mth transmitting antenna of the rth OFDM symbol. The structured sparsity exhibits
the equivalent CIR matrix in D for the best performance analysis of the channel
estimation so, only we go for intrinsic sparsity in D matrix.
This subsection demonstrates the proposed channel estimation scheme from a single-
cell scenario to a multi-cell scenario. To solve the pilot contamination from the
interfering cells, the frequency division multiplexing (FDM) scheme can be utilized;
i.e., in the frequency domain, pilots of adjacent cells are orthogonal. However, the
channel estimation performance of users in the target cells may be degraded by
downlink pre-coded data from adjacent cells. Thus, we can conclude that the TDM
scheme can be considered as the suitable approach to mitigate pilot contamination in
multi-cell FDD massive MIMO systems due to slight performance loss in the FDM
scheme and reduction in pilot overhead.
272 V. Baranidharan et al.
The sensing matrix ψ is always more effective and reliable to compress the high-
dimensional sparse signal D in CS theory of design. The pilot placement design ξ
is converted in the design of ψ and the pilot sequence {pm}M m = 1. The sensing
matrix can be determined by parameter ξ and {pm}M m = 1. The correlation of small
column of ψ is desired in the sparse signal recovering which is reliable in CS theory.
We enlighten in the design of ξ and {pm}M m = 1 which is considered appropriately.
The design of {pm}M is small considered with cross-correlation compared to
specific pilot design, and for columns of ψr is given any l, cross-correlation is only
determined by the {pm}M m = 1. Therefore,
H
H
(ψm1,l ) H ψm2,l = ψ1(m1) ψ1(m2) = φm1 (l) (l)
φm2
H
= Pm1◦ F p(l) Pm2◦ F p(l) = (Pm1) H Pm2
H
ψm1,l ψm2,l (P|m1) H Pm2
lim = lim =0
N p→∞ ψm1,l ψm2,l N p→∞ Np
2 2
Np ∼
(ψm1.l ) H ψm2.l K =1 exp ( jθk )
lim = lim =0
N p→∞ Np N p→∞ Np
The sparsity level s = P which is accurate convergence in the proposed ASSP algo-
rithm. In case of s = P, we have to provide the convergence that we proposed
and should stop the criteria. The signal sparse vector can be analyzed through the
conventional SP algorithm and model-based SP algorithm. For the reconstruction,
we provide the convergence of structure sparse matrix.
Theorem 1 For Y = ψD + W, where the ASSP algorithm with the sparsity level
as s = P.
< c p
W
F
k D − D Fk−1
R > c p R > c p
W
F
F F
δP, δ2P, and δ3P. The investigation of convergence case which is s = P, then we
can consider D = D > s + D - D > s where D > s which denotes matrix where the
largest sub-matrix {Dl}L l = 1 according to F-norms and the sub-matrix to 0 of sets
where the expression can be
then
W = ψ(D − D > s) + W
for the case of s = P, where P is the sparse signal D and the s is sparse signal Ds
which is estimated. The acquired and partial correct in the set of estimation s-sparse
matrix is appropriate SRIP theorem. Then
s ∩
T = where
s supports the
s-sparse matrix and
s is the true support of D and is denoted as null set. Hence,
s ∩
T = which will reduce the number of the iterations in convergence of
sparsity level s + 1. The (s + 1) which is the first iteration of the sparsity level
s
and the prior information. The estimate support of the sparsity level which is pointed
out of the proof theorem.
DFT size N = 4096, system carrier fc = 2 GHz, length of the guard interval Ng
= 64, and the system bandwidth fs = 10 MHz, which might prevent the maximum
delay spread of 6.4 µs. We assume the 4 × 16 planar antenna array (M = 64), and
MG = 32 is considered to guarantee the spatial common sparsity of channels in each
antenna group. Hence, for SNR = 10 to 30 dB the pth value will be estimated as 0.1,
0.08, 0.06, 0.05, and 0.04, respectively (Table 1).
From the simulations, it is clear that the ASSP algorithm outperforms the oracle
ASSP algorithm for ηp > 19.04%, and its performance is even better than the perfor-
mance bound obtained by the oracle LS algorithm with Np_avg > 2P at SNR =
10 dB. This is because the ASSP algorithm adaptively acquires the effective channel
sparsity level, which was denoted by Peff , instead of using P to obtain better channel
estimation performance. Considering ηp = 17.09% at SNR = 10 dB as an example,
we can find that Peff = 5 with high probability for the proposed ASSP algorithm.
Therefore, the average pilot overhead obtained for each transmit antenna Np_avg
= Np/MG = 10.9 is still larger than 2Peff = 10. From the analysis, we can conclude
that, when Np is insufficient to estimate channels with P, the proposed ASSP algo-
rithm can be utilized to estimate sparse channels with Peff > P, where the path gains
accounting for the majority of the channel energy will be determined, meanwhile
those with the small energy are discarded as noise. Also, the MSE performance fluc-
tuation of the ASSP algorithm at SNR = 10 dB is because Peff increases from 5 to
6 when ηp increases, which leads some strong noise to be obtained at the channel
paths and thus leads to the degradation of the performance of MSE (Table 2).
The channel sparsity level of the proposed ASSP algorithm against SNR and
pilot overhead ratio is depicted in the simulations, where the vertical axis and the
horizontal axis represent the used pilot overhead ratio and the adaptively estimated
channel sparsity level, respectively, and the chroma represents the probability of the
estimated channel sparsity level. We consider R = 1 and fp = 1 without exploiting
the temporal channel correlation in the simulations. Comparisons between the MSE
performance of the introduced pilot placement scheme and conventional random
pilot placement scheme are made where the introduced ASSP algorithm and the
oracle LS algorithm are exploited (Fig. 1).
We consider R = 1, fp = 1, and ηp = 19.53% in the simulations. It is clear that
both the schemes yield a very similar performance. The proposed uniformly spaced
276 V. Baranidharan et al.
Fig. 1 Sparsity
pilot placement scheme can be more easily implemented in practical systems due
to the regular pilot placement. Hence, uniformly spaced pilot placement scheme is
used in LTE-Advanced systems to facilitate massive MIMO to be compatible with
current cellular networks.
The MSE value is compared with the proposed ASSP algorithm with (R = 4) and
without (R = 1) for tie varying channel of massive MIMO systems. The SCS algo-
rithm does not function perfectly due to a smaller number of pilots. The downlink bit
error rate (BER) performance and average achievable throughput per user, respec-
tively, in the simulations where the BS using zero-forcing (ZF) pre-coding is assumed
to determine the estimated downlink channels. The BS with M = 64 antennas simulta-
neously serves K = 8 users using 16-QAM in the simulations and the ZF pre-coding
is based on the estimated channels under the same setup. It can be noted that the
proposed channel estimation scheme performs better than its counterparts (Table 3).
Comparisons between the average achievable throughput per user of different
pilot decontamination schemes are made. We can observe that, a multi-cell massive
MIMO system with L = 7, M = 64, K = 8 sharing the same bandwidth with the
High Spectrum and Efficiency Improved Structured … 277
average achievable throughput per user in the central target cell suffering from the
pilot contamination is analyzed. Meanwhile, we consider R = 1, fd = 7, the path
loss factor is 3.8 dB/km, the cell radius is 1 km, the distance D between the BS and
its users can be from 100 m to 1 km, the SNR (the power of the unpre-coded signal
from the BS is considered in SNR) for cell-edge user is 10 dB, the mobile speed of
users is 3 km/h. The BSs using zero-forcing (ZF) pre-coding is assumed to know
the estimated downlink channels achieved by the proposed ASSP algorithm. For the
FDM scheme, pilots of L = 7 cells are orthogonal in the frequency domain (Fig. 2).
Pilots of L = 7 cells in TDM are transmitted in L = 7 successive different time
slots. In TDM scheme, the channel estimation of users in central target cells suffers
from the pre-coded downlink data transmission of other cells, where two cases are
considered. The “cell-edge” case indicates that when users in the central target cell
estimate the channels, the pre-coded downlink data transmission in other cells can
guarantee SNR = 10 dB for their cell-edge users. While the “ergodic” case indicates
that when users in the central target cell estimate the channels, the pre-coded downlink
data transmission in other cells can guarantee SNR = 10 dB for their users with the
ergodic distance D from 100 m to 1 km.
4 Conclusion
In this paper, we have introduced the new SCS-based spatial–temporal joint channel
evaluation scheme for massive MIMO systems in FDD. To decrease the pilot over-
head, the spatial–temporal common sparsity of wireless MIMO channels can be
exploited. The users can easily evaluate channels with decreased pilot overhead with
the non-orthogonal pilot scheme at the BS and with the ASSP algorithm. According
to the mobility of the user, the space–time and adaptive pilot scheme will reduce
the pilot overhead. Additionally, to achieve accurate channel estimation under the
framework of compressive sensing theory, we discussed the non-orthogonal pilot
design, and the proposed ASSP algorithms are also discussed. The simulated results
show that the modified SCS-based spatial channel estimation scheme will give the
better results than the existing channel estimation schemes.
References
1. Zhang R, Zhao H, Zhang J (2018) Distributed compressed sensing aided sparse channel esti-
mation in FDD massive MIMO system. IEEE Access 6:18383–18397. https://doi.org/10.1109/
ACCESS.2018.2818281
2. Peng W, Li W, Wang W, Wei X, Jiang T (2019) Downlink channel prediction for time-varying
FDD massive MIMO systems. IEEE J Sel Top Sign Process 13(5):1090–1102. https://doi.org/
10.1109/JSTSP.2019.2931671
3. Liu K, Tao C, Liu L, Lu Y, Zhou T, Qiu J (2018)Analysis of downlink channel estimation based
on parametric model in massive MIMO systems. In: 2018 12th International Symposium on
Antennas, Propagation and EM Theory (ISAPE), Hangzhou, China, 2018, pp 1–4. https://doi.
org/10.1109/ISAPE.2018.8634083
4. Gu Y, Zhang YD (2019) Information-theoretic pilot design for downlink channel estimation in
FDD massive MIMO systems. IEEE Trans Sign Process 67(9):2334–2346. https://doi.org/10.
1109/TSP.2019.2904018
5. Huang W, Huang Y, Xu W, Yang L (2017) Beam-blocked channel estimation for FDD massive
MIMO with compressed feedback. IEEE Access 5:11791–11804. https://doi.org/10.1109/ACC
ESS.2017.2715984
6. Chen J, Zhang X, Zhang P (2020) DDL-based sparse channel representation and estimation for
downlink FDD massive MIMO systems. In: ICC 2020—2020 IEEE International Conference
on Communications (ICC), Dublin, Ireland, 2020, pp 1–6. https://doi.org/10.1109/ICC40277.
2020.9148996
High Spectrum and Efficiency Improved Structured … 279
7. Gao Z,Zhang C, Dai C, Han Q (2014) Spectrum-efficiency parametric channel estimation scheme
for massive MIMO systems. In: 2014 IEEE international symposium on broadband multimedia
systems and broadcasting, Beijing, China, 2014, pp 1–4. https://doi.org/10.1109/BMSB.2014.
6873562
8. Gao Z, Dai L, Dai W, Shim B, Wang Z (2016) Structured compressive sensing-based spatio-
temporal joint channel estimation for FDD massive MIMO. IEEE Trans Commun 64(2):601–
617. https://doi.org/10.1109/TCOMM.2015.2508809
A Survey on Image Steganography
Techniques Using Least Significant Bit
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 281
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_20
282 Y. Bhavani et al.
2 Related Work
In this process as shown in Fig. 1, a cover image and a message in the form of binary
image will be taken as input. The output of this process will be stego image.
• First cover image and message image should be read.
• The pixel values of the images should be converted into binary format.
• XOR operation should be performed between seventh and sixth bits of the binary
format of cover image.
• Once again, XOR operation is performed between the result obtained by the above
operation and eighth bit of the binary format of the cover image.
• Now XOR operations should be performed on the message image bits with the
three MSB bits, i.e. eighth, seventh and sixth bits.
• The obtained result is saved in the message bits. By converting this result into
unit8, the pixel value of the stego image will be obtained.
In this process as shown in Fig. 2, the input is stego image and the output is recovered
message image.
• First the stego image should be read.
• The pixel values of the image should be converted into binary format.
• XOR operation should be performed between seventh bit and sixth bit of the
binary format of the stego image.
• Once again, the XOR operation is performed between the result obtained by the
above operation and eighth bit of the binary format of the stego image.
• Now XOR operations should be performed on the LSB with the three MSB bits,
i.e. eighth, seventh and sixth bits.
• The obtained result is saved on the LSB. By converting this result into unit8, the
pixel value of the message image will be obtained.
This technique [2] is very safe, simple, and it gives high PSNR and MSE values,
so that the information which is hidden will be undetectable. The process will be
completed fast and easily using the XOR operation. The secrecy is maintained very
strictly that the inserted bits will not be detected directly using the XOR operator.
Furthermore, the XOR operation is performed three times in which three keys were
used. The stego file will be kept the same size by using the embedded key in the
cover image and eliminates the need for key distribution to the recipient, which will
increase the speed of communication without changing the size of the file.
3 Critical Analysis
−1 G−1
H
MSE = A f (h, g) − S f (h, g) (2)
h=1 g=1
4 Conclusion
In this paper, different image steganography techniques are analysed on the basis of
characteristics of image. The different methods which are currently being used in
the image steganography were highly secured as they won’t allow to detect the pres-
ence of message and retrieve the message for unauthorized access. The combination
of steganographic and cryptographic techniques results in an accurate process for
maintaining secrecy of information. Majority of these techniques use LSB algorithm
to maintain confidentiality and quality of image. Since it is very advantageous as it
is simple and provides imperceptibility, robustness for the data.
References
1. Ardy RD, Indriani OR, Sari CA, Setiadi DRIM, Rachmawanto EH (2017) Digital image signa-
ture using triple protection cryptosystem (RSA, Vigenere, and MD5). In: IEEE International
conference on smart cities, automation & intelligent computing systems (ICON-SONICS), pp
87–92
2. Astuti YP, Setiadi DRIM, Rachmawanto EH, Sari CA (2018) Simple and secure image
steganography using LSB and triple XOR operation on MSB. In: International conference
on information and communications technology (ICOIACT), pp 191–195
3. Bhavani Y, Sai Srikar P, Spoorthy Shivani P, Kavya Sri K, Anvitha K (2020) Image segmentation
based hybrid watermarking algorithm for copyright protection. In: 11th IEEE international
conference on computing, communication and networking technologies (ICCCNT)
4. Bhuiyan T, Sarower AH, Karim R, Hassan M (2019) An image steganography algorithm using
LSB replacement through XOR substitution. In: IEEE international conference on information
and communications technology (ICOIACT), pp 44–49
5. Dumitrescu S, Wu X, Wang Z (2003) Detection of LSB steganography via sample pair analysis.
IEEE Trans Sign Process 51(7):1995–2007
6. Fridrich J, Goljan M (2004) On estimation of secret message length in LSB steganography
in spatial domain. In: Delp EJ, Wong PW (eds) IS&T/SPIE electronic imaging: security,
steganography, and watermarking of multimedia contents VI. SPIE, San Jose, pp 23–34
7. Islam MR, Siddiqa A, Uddin MP, Mandal AK, Hossain MD (2014) An efficient filtering based
approach improving LSB image steganography using status bit along with AES cryptography.
In: IEEE international conference on informatics, electronics & vision (ICIEV), pp 1–6
8. Ker AD (2005): Steganalysis of LSB matching in gray scale images. IEEE Sign Process Lett
12(6):441–444
9. Yang H, Sun X, Sun G (2009) A high-capacity image data hiding scheme using adaptive LSB
substitution. J. Radio Eng 18:509–516
10. Joshi K, Dhankhar P, Yadav R (2015) A new image steganography method in spatial domain
using XOR. In: Annual IEEE India conference (INDICON), pp 1–6, New Delhi
290 Y. Bhavani et al.
11. Irawan C, Setiadi DRIMC, Sari A, Rachmawanto EH (2017) Hiding and securing message on
edge areas of image using LSB steganography and OTP encryption. In: International conference
on informatics and computational sciences (ICICoS), Semarang
12. Swain G ((2016)) Digital image steganography using variable length group of bits substitution.
Proc Comput Sci 85:31–38
13. Channalli S, Jadhav A (2009) Steganography an art of hiding data. J Int J Comput Sci Eng
(IJCSE) 1(3)
14. Dhaya R (2021) Analysis of adaptive image retrieval by transition Kalman filter approach based
on intensity parameter. J Innov Image Process (JIIP), pp 7–20
15. Manoharan JS (2016) Enhancing robustness of embedded medical images with a 4 level
Contourlet transform. Int J Sci Res Sci Eng Technol pp 149–154
16. Mathew N, Manoharan JS (2012) A hybrid transform for robustness enhancement of
watermarking in medical images. Int J Digital Image Process 4(18):989–993
17. Bhardwaj R, Sharma V (2016) Image steganography based on complemented message and
inverted bit LSB substitution. Proc Comput Sci 93:832–838
Efficient Multi-platform Honeypot
for Capturing Real-time Cyber Attacks
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 291
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_21
292 S. Sivamohan et al.
1 Introduction
In this study, to learn more about attackers, their motivations, and techniques,
a honeynet system was used. These systems let attackers engage with them, while
monitoring attacks by posing as actual machines with sensitive information [6]. We
setup an effective active protection architecture by integrating the usage of Docker
container-based technologies with an enhanced honeynet-based IDS. T-Pot Platform
will be used to host a honeynet of different honeypots in the real-time AWS cloud
environment [7].
A network attack benchmark should include a wide range of cyber-attacks created
by various tools and methodologies in addition to representative normal data. The
outcomes of building and assessing detection models using representative data are
realistic, this can bridge the gap between machine learning-based building identifi-
cation algorithms and their actual deployment in real-world cloud networks. Integra-
tion of honeypot collected information with a firewall and the IDS could be made
to reduce the occurrence of false-positive and to improved security. There are two
types of honeypots namely, research and production honeypots. ‘Research Honeypot’
collects information relating to blackhat [8]. This is done by giving full access to the
system without any filtration. Production honeypots are used where they acted as a
filter between the state and blackhat and for preventing malicious attacks [9]. Honey-
pots are characterized based on their design, deployment, and deception technology.
Figure 2 illustrated the various types of honeypots.
The complexity of attacks, changes in attack patterns, and technique are all factors
that should be considered in the cloud environment. Inability to resolve these secu-
rity breaches has always had serious impacts and has made the environment more
susceptible [10]. From the attacker’s perspective, there has been an increase in cyber-
related events, as well as greater complexity [11, 32]. To address the concerns, the
study presented in this paper suggests the use of honeypots to analyze anomalies
and patterns based on honeypot data. This study intends to create a prototype as
a proof of concept to find appropriate attack detection approaches via honeypots.
The objectives of this work will be leveraged to further intrusion attack detection
analytical tools and techniques. As a result, the followings are the main objective of
this current work:
• To detecting attacks in a cloud environment by developing an intrusion detection
system based on honeypots.
• To develop a prototype as a proof of concept.
• To learn from the attacker’s actions in a virtual environment.
• To evaluate and interpret cyber-attacks.
The main goal of this paper is to deploy a multi honeypot in a cloud environment
that captures attacker patterns and then evaluates the collected data for intrusion
detection functionality. This paper presents attack detection approaches based on the
use of honeypots in a cloud environment to create an intrusion detection system. The
following are the key performance contributions:
• Improved honeynet-based IDS that will be used to identify attacks and anomalies
in a cloud environment. It provides the multi honeypots platform for analyzing
and understanding the behavior of attackers.
• Identified the abnormalities and intrusion attempts by using anomaly detection.
• Analyzed and recognized anomalies in attacks in a cloud environment by learning
from an attacker’s behaviors.
• The development of a honeynet data collection system is the major contribution
of this study.
• A rapidly deployable pre-configured network of honeypots, which are devices
possible to detect active threats in public cloud, is a unique component of this
system.
The following is how the rest of the paper is structured: A overview of relevant
honeypot work for intrusion detection systems is included in Sect. 2. Section 3
describes the proposed framework and detecting intrusion attacks in the cloud and
offers a methodology for data collection, and Sect. 4 presents the findings of the data
analysis and the experiments. Finally, we come to this conclusion in Sect. 5.
2 Related Works
This section presents the relevant honeypot work for intrusion detection systems.
The honeypots are a type of network traffic and malware investigation tool that has
been widely utilized. Lance Spitzner [12], the Honeynet Project’s creator, defines a
honeypot as “a security resource” whose usefulness is contingent on being targeted
or compromised.
Efficient Multi-platform Honeypot for Capturing … 295
Majithia et al. [13] have used the model of running honeypots of three types
on a Docker server, with a logging management mechanism that is built on top of
the ELK framework, and discussed issues and security concerns associated with
each honeypot. The honeypots used were HoneySMB7, Honey WEB-SQLi, an
HTTP protocol honeypot that includes SQL injection vulnerability, and HoneyDB,
a honeypot built for MySQL databases vulnerabilities, the work displayed analysis
of the attacks using unique IPs and the distribution among the honeypots.
Adufu et al. [14] investigated and compared running molecular modeling simu-
lation software, auto dock, on container-based virtualization technology systems
and hypervisor-based virtualization technology systems, and concluded that the
container-based systems managed memory resources in an efficient manner even
when memory allocated to instances are higher than physical resources, not to
mention the reduction in the number of execution times for multiple containers
running in parallel.
Seamus et al. [15] built a study honeypot aimed at Zigbee device attackers. Zigbee
devices are typically used in Manets. Since IoT devices are becoming more exten-
sively used, their risks are being more generally recognized, motivating the develop-
ment of this honeypot. As a result, a risk evaluation of these devices is critical. They
used the honeypot in their implementation.
To catch the hacker’s unethical behavior, Jiang et al. [16] used an open-source
honeynet setup. During the process of the study, nearly 200,000 hits were discovered.
This test explored ways for intruders to be notified of their goals, such as a web server,
FTP server, or database server.
Sokol et al. [24] created a distribution of honeypot and honeynet used for OS
virtualization, a method that was largely unexplored in research at the time. The
research’s most major contribution is in the automation of honeypots, with their
technique for generating and evaluating their honeynet solution being remarkable.
According to the study, OS-level virtualization has very little performance or
maintenance overhead when compared to virtualization technologies or bare-metal
systems. They also point out that utilizing containers to disguise honeypots adds an
extra element of obfuscation. Even though, they are confined environments sharing
the kernel of a legitimate operating system, when fingerprinted, they are more likely
to appear as a valid system [17, 30].
Alexander et al. [25] have employed as an alternative to virtualization, researchers
investigated the usage of Linux containers to circumvent a variety of virtual environ-
ment and monitoring tool detection methods. The goal was to see if using container
environments as a way to host honeypots without being identified by malware would
be possible in the long run [18].
Chin et al. [26] proposed a system called HoneyLab, which is public infrastructure
for hosting and monitoring honeypots with distributed computing resource struc-
ture. Its development was prompted by the discovery that combining data collected
from honeypots in diverse deployment scenarios allows attack data to be connected,
allowing for the detection of expanding outbreaks of related attacks. This system
collects data from a huge number of honeypots throughout the globe in order to
296 S. Sivamohan et al.
identify attack occurrences. Their approach, on the other hand, is based on two low-
interaction honeypots, which restrict the amount of data acquired from the attack.
[20]. In order to gain a better knowledge of attacker motivations and techniques, an
improved system would be able to gather more data on attack occurrences [28].
Table 1 summarized the comparative analysis of five different honeypots in the
tabular form.
3 Methodology
This work proposed a new honeynet-based intelligent system for detecting cyber-
attacks in the cloud environment. It demonstrates the system configuration of
container-based multiple honeypots that have the ability to investigate and discover
the attacks on a cloud system. The complete implementation of all honeypots created
and deployed throughout the investigation, as well as a centralized logging and moni-
toring server based on the Elasticsearch, Logstash, and Kibana (ELK) stack, were
included in the section. This tracking system is also capable of monitoring live traffic.
Elasticsearch was chosen because it can provide quick search results by searching an
index rather than searching the text directly. Elasticsearch is a scalable and distributed
search engine [19]. Kibana is a freely available client-side analytics and search dash-
board that visualizes data for easier understanding. It’s used to display logs from
honeypots that have been hacked [21].
The information was acquired over a period of a month, during which time all
of the honeypots were placed in various locations across the globe. The honeypots’
capabilities can considerably assist in attaining the recommended method to reducing
threats to critical service infrastructures. Many of these tasks have been recognized
as being provided by containers. The simplicity with which identically configured
environments may be deployed is one of the major advantages of container technolo-
gies. Container technologies, on the other hand, cannot provide the same simplicity
of deployment for a fully networked system [18]. This motivated the development
of a deployment mechanism for the whole system, allowing for its reconditioning in
a limited span of time. Figure 3 shows the detailed system overview of the model
framework.
Efficient Multi-platform Honeypot for Capturing … 297
As there are several existing techniques, both in research and development that
have made significant contributions to such a solution. There is no one method that can
give a viable, workable way to deploy active network defense as a single-networked
deployable unit [22].
The preliminary development approach used a network design that would help
the researchers achieve their main goal of creating a flexible honeynet in a Cloud
scenario, which can manage any illegal entry or untrusted connections and open a
separate Docker container for each attacker’s remote IP address. Figure 4 describes
the dataflow of the proposed solution. This proposed system was designed to be
scalable and adaptable, allowing new features to be added rapidly and the platform
to adapt to the unique requirements of a given infrastructure. It is made up of three
primary components that were created independently using the three-tier application
approach as follows:
• DCM: It is a data collection module that gathers essential information from a
variety of physical and virtual data sources.
• DAM: It is a data analysis module that provides the user with a set of advanced
analyzes to produce physical, cyber, or mixed intelligence information (for
example, cyber threats evaluation, facility classification by criticality, and pattern
detection from social interactions) by processing the stored raw data.
• DVM: It is a data visualization module that provides true awareness of the physical
and cyber environments through a combined and geospatial representation of the
security information.
This experimental design has tested the two types of containers (SSH and HTTP).
The Suricata container has been added as an example of the different types of honey-
pots that were utilized in the model. In order to overwhelmed limits and the difficult
setup of the honeypot network, virtualized systems on cloud infrastructures have
been used. AWS cloud provider was considered for these purposes. The architecture
of the honeypot system is illustrated in Fig. 5. The route of the attackers in the attack
scenario is depicted in Fig. 6. Within the set-up, the following attack scenario was
carried out: Using SSH or Telnet, an attacker was able to obtain access. Any root
credentials would get access to the SSH session when requested to log in. The attacker
would again try to find further weaknesses on the computer. When the attacker is
satisfied, he will try to download and run malicious programs on the machine. In this
approach, a purposeful vulnerability is presumed, the goal of which is to fool the
attacker into thinking the system has a flaw, essentially studying the attacker’s path
and attack tactics.
The experimental setup comprises of five honeypots and a supplementary system
for collecting the logs generated [23]. This experimental design has tested the two
types of scenarios (SSH and HTTP).
Testing Scenarios
In this experiment, we applied three test case scenarios for the purpose of verification
of the functionality of the model.
SSH Scenario
SSH scenario created an SSH connection from a considerate simulated attacker and
observed the following:
• An instance of the Kippo container was created, and the traffic was forwarded to
it as shown in Fig. 7.
• The attacker was able to navigate through the Kippo interactive terminal with fake
file-system observed in Fig. 8.
• A fingerprinting attack easily detected a well-known fingerprint indicator for
Kippo honeypot using the command ‘vi’.
• Kippo honeypot container logs were saved and forwarded to syslog for recording
all the interactive communication with the attacking session.
HTTP Scenario
Creating an http request to the reverse proxy address would result into the following:
• Create an http honeypot using Glastopf Docker image with the specified naming
convention shown in Fig. 9.
• Attacker browsing a fake web server page where he can apply different attacks
trying to authenticate shown in Fig. 10.
• Container logs collected and sent to syslog highlighting the source IP of the
original attacker.
The honeynet behavior, as expected was creating a container per attacking session
(unique IP) with the naming convention of having the image name associated with
the IP of the originated source of attack to make it exclusive for this session. The
attacker was directed to a fake website to apply different attacks that recorded and
limited inside the dedicated Glastopf container [24].
The experimental study was performed for a period of six month, over 5,195,499
log entries from attackers were acquired for further analysis. The real-time data
was gathered from August 19, 2020, to February 19, 2021, and the findings were
compiled from the dataset using the Kibana interface, which allows for data aggre-
gation across multiple fields of the whole database. The main task is to find solutions
Efficient Multi-platform Honeypot for Capturing … 301
to particular investigation queries, such as the source, target, and attack technique.
The observation reflects the legitimate implementation of honey net system. Honey
net with a flexible and dynamic transition between honeypots can reveal some of
the future and potential attacks for a cloud environment, through allowing attacker
strike a fraudulent system with the same potential vulnerabilities [29]. The intrusion
data was investigated by manipulating the counts, ratios, statistical Chi-Squared χ 2
test and the P-value =<0.0001 for each honeypot. A Chi-Squared statistical test was
used in this study to determine the statistical significance of the results. It is a most
well-known statistical measuring approach, that analyzes the connection between
two variables. The following is the Chi-Squared (CS) formula [26]: The following
is the Chi-Squared (CS) formula [26]:
2
χ2 = Oi j E i j /E i j (1)
ij
5,195,499 attacker’s hits were discovered in the period from August 19, 2020, to
February 19, 2021.
Table 2 displays the total number of counts and the percentage of honeypot attacks
from the top 10 regions. The statistical χ 2 test value of 75,010.689 for examining
the independence of the measures of those attacks from these countries is got as
significant at the 0.0001 level. The results obtained showed that excessive preva-
lence of attacks from Netherland was seen. Figure 5 in Chap. 10 depicted the graph
of maximum prominent ten countries attack on honeypots from the leading ten
countries. Overall, Netherland produced the topmost attacks (23.68%) and Canada
generated the least traffics (1.96%) on honeypots. Subsequently, Netherland had the
extreme prevalence of attacks on honeypots. The total 177,615 counts of hits were
revealed from Netherland and the least counts 14,681 of hits were revealed from
Canada. The results obtained were clearly showed that the excessive prevalence of
attacks from Netherland and Canada generated the least on honeypots.
Table 3 summarizes the overall counts and percentages of threats across all honey-
pots based on the top ten most visible source IP addresses. For analyzing the indepen-
dence of the measurements of those attacks from these IP addresses, the statistical χ 2
test value of 42,143.1345 is significant at the 0.0001 level. The result is a large number
of cyber-attacks from the IP address 37.49.231.70, and the IP address 119.32.3.66
was the source of the fewest attempt.
Figure 12(6) depicts the graphs of honeypot cyber-attacks from the top 10 source
IP addresses.37.49.231.70 generated the supreme attacks (92.8%) and the IP address
119.32.3.66 produced the least attacks (0.02%). Consecutively, the results obtained
were clearly showed that the IP address 37.49.231.70 had the immense prevalence
of attacks on honeypots from different countries, such as US, Germany, Russia,
republic of Korea, France, Australia, Netherlands and Canada. For investigating
the independence of the consequences of the attack from these IP addresses, the
statistical χ 2 test value of 42,143.1789 is significant at the 0.0001 level. Attacks
from IP address 37.49.231.70 have had a huge impact, as seen in Fig. 10. Clearly,
the IP address 37.49.231.70 generated the most cyber-attacks (92.8%), whereas the
IP address 119.32.3.66 generated the fewest (0.02%).
Table 4 summarizes the total number of counts and ratios of the attacks on all
the honeypots from the maximum ten source port used by intruders from different
countries. The statistical χ 2 test value of 11,322.923 for studying the independence
of the measures of those attacks from sources port is seen as significant at the 0.0001
level. The results obtained showed that excessive prevalence of attacks using port
5900.
Figure 12(1) shows the trend of honeypot attacks from the top ten source ports.
Clearly, port 5900 was used for ultimate assaults (22.38%) and port 7070 was used
for minimal attacks (2.78%) from the United States. A total of 3177 attempts were
made. As a result, attacks on honeypots from China were quite common on port 5900.
304 S. Sivamohan et al.
Figure 12(2–4) visualizes the username and password on honeypot Cowrie from
different countries, noticeably, root used the most attempts on cowrie while admin
used second attempts on cowrie. Subsequently, root had the tremendous occurrence
of attacks on honeypots. Table 5 gives details on the overall number of counts and
attack ratios across all honeypots from the top ten users. The most repeatedly used
usernames such as root, admin, enable, user, shell, etc., the root was the top most
often used username.
For analyzing the similarity of the totals of usernames used in the attacks, the
statistical χ 2 test result of 73,009.956 is significant at the 0.0001 level. The results
obtained showed that excessive prevalence of attacks from username root was seen.
Figure 12(2) depicts the graph of the attacks on the honeypots from the maximum
ten usernames. Clearly, root used the most (24.68%) and super user was used the
minimum (1.96%). Subsequently, the topmost incidence of attacks using root. Table 6
displays the total number of counts and ratios of the attacks on all the honeypots from
the leading ten passwords. The statistical χ 2 test value of 75,010.896 for examining
the independence of the extents of passwords used in the attacks is significant at the
0.0001 level. The findings reveal that an excessive occurrence of attacks with the
password sh was seen.
Figure 12(5) visualizes the attacks on the honeypots from the topmost ten pass-
words. Evidently, sh was used the ultimate (23.68%) and user used the minimum
(1.96%). Subsequently, sh had the extraordinary prevalence of attacks on honeypots.
The total 177,615 counts of hits were revealed from password sh and the least counts
14,681 of hits were revealed from user.
Figure 12(6) visualizes the attacks on the honeypots from the topmost ten IP
address. Honeypots source IP reputation are segregated into known attacker, bad
reputation, anonymizer, malware, form spammer, bot, crawler, mass scanner, bitcoin
node, mining node, Hackers use various OS distribution such as Window 7, Linux
3.11, Linux3.x, etc. Figure 11 Depicts the per day Attack’s analysis on different
honeypots.
5 Conclusion
References
1. Grance T, Mell P (2009) The NIST definition of cloud computing. National Institute of
Standards & Technology (NIST). http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf
2. Roschke S, Cheng F, Meinel C (2009) Intrusion detection in the cloud. Dependable, autonomic
and secure computing. In: IEEE international symposium on cloud computing, pp 729–734
3. Hoque MS, Bikas MA (2012) An implementation of intrusion detection system using genetic
algorithm. Int J Netw Sec Appl (IJNSA)
4. DTAG Community Honeypot Project (2016) T-Pot 16.10—multi-honeypot platform redefined.
http://dtag-dev-sec.github.io/mediator/feature/2016/10/31/t-pot-16.10.html. Accessed 2 June
2018
5. Amazon Web Services, Inc. (2018) What is AWS?—Amazon web services. https://aws.ama
zon.com/what-is-aws/. Accessed on 5 Apr 2018
Efficient Multi-platform Honeypot for Capturing … 307
6. Mohallel AA, Bass JM, Dehghantaha A (2016) Experimenting with Docker: linux container and
base OS attack surfaces. In: 2016 international conference on information society (i-Society),
pp 17–21
7. Docker Inc. (2018) Docker security—docker documentation. https://docs.docker.com/engine/
security/security/. Accessed on 21 Apr 2018
8. Docker Inc. (2018) Docker hub. https://hub.docker.com/. Accessed on 06 May 2018
9. Elasticsearch BV (2018) Heap: sizing and swapping—elasticsearch: the definitive guide
[2.x]—elastic. https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html.
Accessed on 13 Apr 2018
10. Anicas M (2017) How to install elasticsearch, Logstash, and Kibana (ELK Stack) on
Ubuntu14.04—DigitalOcean. https://www.digitalocean.com/community/tutorials/how-to-ins
tall-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04. Accessed on 15 Apr 2018
11. Amazon Web Services, Inc. (2018) AWS IP address ranges—Amazon web services. https://
docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html. Accessed on 16 Apr 2018
12. Spitzner L (2003) Honeypots: catching the insider threat. In: Proceedings of the 19th annual
computer security applications conference, ACSAC ’03. IEEE Computer Society, Washington,
DC, p 170
13. Majithia N (2017) Honey-system: design, implementation & attack analysis. PhD thesis, Indian
Institute of Technology, Kanpur
14. Adufu T, Choi J, Kim Y (2015) Is container-based technology a winner for high performance
scientific applications. In: Network operations and management symposium (APNOMS).
IEEE, pp 507–510
15. Dowling S, Schukat, Melvin M (2017) A zigbee honeypot to assess IoT cyberattack behaviour.
In: 28th Irish signals and systems conference (ISSC), pp 1–6
16. Jiang X, Xu D, Wang Y-M (2006) Collapsar: a vm-based honeyfarm and reverse honeyfarm
architecture for network attack capture and detention. J Parallel Distrib Comput 66(9):1165–
1180
17. Han W, Zhao Z, Doupé A, Ahn G-J (2016) Honeymix: towards sdn-based intelligent honeynet.
In: Proceedings of the 2016 ACM international workshop on security in software defined
networks & network function virtualization. ACM, pp 1–6
18. Kaur T, Malhotra V, Singh DD (2014) Comparison of network security tools—firewall,
intrusion detection system and honeypot, p 202
19. Krawetz N (2004) Anti-honeypot technology, IEEE Security & Privacy, pp 76–79
20. Sahu N, Richhariya V (2012) Honeypot: a survey. Int J Comput Sci Technol
21. Vasilomanolakis E, Karuppayah S, Kikiras P, Mu¨hlh¨auser M (2015) A honeypot-driven cyber
incident monitor: lessons learned and steps ahead. In: Proceedings of the 8th international
conference on security of information and networks, SIN ’15. ACM, New York, NY, pp. 158–
164
22. Leung A, Spyker A, Bozarth T (2018) Titus: introducing containers to the netflix cloud.
Commun ACM 61:38–45
23. Combe T, Martin A, Pietro RD (2016) To docker or not to docker: a security perspective. IEEE
Cloud Comput 3:54–62
24. Pisarcık P, Sokol P (2014) Framework for distributed virtual honeynets. In: Proceedings of the
7th international conference on security of information and networks, SIN ’14. ACM, New
York, NY, pp 324:324–324:329
25. Kedrowitsch A, Yao DD, Wang G, Cameron K (2017) A first look: using linux containers for
deceptive honeypots. In: Proceedings of the 2017 workshop on automated decision making for
active cyber defense, SafeConfig@CCS 2017. Dallas, TX, USA, October 30–November 03,
2017, pp 15–22
26. Chin WY, Markatos EP, Antonatos S, Ioannidis S (2009) HoneyLab: large-scale honeypot
deployment and resource sharing. In: 2009 third international conference on network and
system security, pp 381–388
27. Kyung S, Han W, Tiwari N, Dixit VH, Srinivas L, Zhao Z, Doupé A, Ahn G-J (2017)
Honeyproxy: design and implementation of next-generation honeynet via SDN. In: IEEE
conference, on communications and network security (CNS)
308 S. Sivamohan et al.
28. Sokol P, Míšek J, Husák M (2017) Honeypots and honeynets: issues of privacy. EURASIP J
Inf Sec (1):4
29. Krishnaveni S, Prabakaran S, Sivamohan S (2018) A survey on honeypot and honeynet systems
for intrusion detection in cloud environment, American scientific publishers all rights reserved
printed in the United States of America. J Comput Theoret Nanosci 10(15):2956–2960
30. Sivaganesan D (2021) A data driven trust mechanism based on blockchain in IoT sensor
networks for detection and mitigation of attacks. J Trends Comput Sci Smart Technol (TCSST)
3(01):59–69
31. Samuel MJ (2021) A novel user layer cloud security model based on chaotic Arnold
transformation using fingerprint biometric traits. J Innov Image Process (JIIP) 3(01):36–51
32. Shakya S, Pulchowk LN, Smys S (2020) Anomalies detection in fog computing architectures
using deep learning. J Trends Comput Sci Smart Technol 1:46–55
A Gender Recognition System
from Human Face Images Using VGG16
with SVM
Abstract Human beings have many distinct attributes, and facial features are a
significant part of them. Facial features help in distinguishing people. An auto-
matic gender recognition (AGR) system recognizes a person’s gender based on
these distinct features by highly advanced human cognitive skills built through suffi-
cient training. This paper proposes a system which combines convolutional neural
network and SVM to classify the gender of human. It aims in achieving efficiency on
a larger dataset which includes face images of all human stages. Initially, human face
images are trained using the pre-trained model of CNN, i.e., VGG16. In addition,
the extracted features are loaded into the SVM classifier for classification, which
identifies the gender class and labels it to which class the input image belongs to,
i.e., male or female. The system performance is evaluated on a larger dataset. The
proposed method achieves good classification results on the larger dataset.
1 Introduction
The recent technological advancement and increased popularity in areas like artificial
intelligence, machine learning, data analysis, etc., have opened a wide range of
opportunities for research. Image processing is also one of the popular research
areas. It is used in the data manipulation and studying of a dataset-based images
which help in image analysis and experimentation.
Face recognition technology, nowadays, has gained a wider base; the gender
identification from the facial images plays a big role in many of these areas. In
the previous years, many techniques have been developed for classifying the gender
from images. The gender identification or classification is a binary classification tech-
nique which classifies the gender to male and female class. This identity performs
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 309
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_22
310 S. Mandara and N. Manohar
2 Related Work
Prior to presenting the proposed technique, we audit related work for gender clas-
sification and give a superficial outline of the existing work related to our area of
research. The process of extracting gender related features from facial images has
recently received a lot of technical aspects, and several techniques have been indi-
cated for the same [1]. We note that this show disdain toward the extraction on features
matrix arrangement as classify the exact gender assessment [2], the overview below
incorporates techniques intended for one or the other undertaking.
Early gender personality methodologies rely upon figuring extents between
various evaluations of facial reflexes. The moment when the facial reflections (such
as eyes, nose, mouth, and chin) are restricted and their size and spacing are evaluated,
they are allowed to extend between them and used to query faces of different gender
orientations, as shown by the manual principle. Every one of the later strategies [3]
uses an equivalent method to manage model gender orientation grid in subjects under
gender class. As these techniques require localization of actual facial features, a diffi-
cult problem by itself, they are unacceptable for in-the-wild footage where one might
want to seek out on amicable stages. As an alternative use, a system that treats devel-
opment cooperation as a topological space [4] or posh [5]. A disadvantage of those
systems is that they need input footage to be shut forward looking. These systems as
such present preliminary outcomes simply on constrained instructive assortments of
close forward looking pictures (example MORPH [6]). As a result, these methods
are again confused with dissolute images.
A Gender Recognition System from Human Face … 311
on contains more test images than the images provided by LFW and uses a more
general structure suggested for all information to inform performance.
3 Proposed Work
The methodology proposed in this work envisions to improvise and enhance the
system’s performance and efficiency on a larger dataset which outperforms the earlier
works. Previously, machine learning methods used traditional visualization func-
tions such as local binary files, color histograms, and SIFT [16]. However, literature
shows that CNN’s performance is better than these previous ones. Figure 1 shows
the architecture diagram of our proposed system, which includes two stages: training
and testing. The images collected are loaded into the pre-trained VGG16 network
model. CNN automatically learns these images attributes in hierarchical structure.
This network comprises of rich feature representatives that are from a wide set of
images. The below layer checks the corner and edges of attributes such as color and
shape, and the upper layer represents objects in the image [17]. Therefore, CNN is
best suited for image classification tasks. In traditional networks, the entire system is
trained from scratch due to insufficient data. Therefore, we use a pre-trained CNN for
training the system. The pre-trained network (CNN) can train more than million of
image and classify those images into categories of 1000 objects. Few most popular
networks are VGG16, AlexNet, VGG19, GoogLeNeT, etc. Larger dataset of face
images is used in training by VGG16 model. From the initial input layer to the final,
i.e., max pooling layer (marked as 7 × 7 × 512), the feature extraction part is taken
into account, and the other layers, i.e., last fully connected layer is removed.
We use SVM in this model for the purpose of classification. As SVM provides
greater results for two class classification problems, merging this SVM with CNN
can yield to a greater results and works well on huge dataset also. So, the feature
matrix extracted from CNN is then fed into this classifier for the prediction of gender
class labels (male or female) as shown in Fig. 2.
314 S. Mandara and N. Manohar
4 Experimental Details
This section briefly elaborates the dataset that is created and collected for experiment
to test and do evaluation on our proposed methods.
A Gender Recognition System from Human Face … 315
4.1 Dataset
The dataset for our experimentation is comprised of 24,000 face images with two
classes. Since convolutional neural networks only work effectively when the dataset
is large, we created our own dataset to experiment and some images are captured,
some were collected from the person, and other images were downloaded from
online source like celebrity images, group photos of people, and publically available
image sets. There are 12,000 images of each class, i.e., male and female class which
comprises of all age group’s human faces. For analyzing the effectiveness of projected
methodology, we considered the images of children’s face, old age human faces where
it makes the gender predictions difficult as the face features might vary and analyzing
it based on the pre-defined learning becomes challenging. Some problems such as
multi-faceted images with different lighting, occlusion, faces captured in different
positions, and different views are common. Figure 4 shows some example images
from randomly selected images in the dataset (Table 1).
This implementation runs on the above mentioned dataset, with training and testing
phases. For these steps, the input image has been resized to 227 × 227. The implemen-
tation is done on 1 GB Radeon HD 6470M GPU processor. In the training phase,
VGG16 pre-training weights are used for training the images and extracted 4096
features. The testing phase uses SVM classifier to extract the same set of features
from unlabeled face image. SVM utilizes linear kernel for training.
5 Results
Two significant closing can be created with our results. Regardless, in either case,
CNN is useful to obtain complex genre results, because the size of the image set is
more unlimited. Second, the effectiveness of our model shows that using prepared
information, more complex structures can improve results and accuracy. Here, SVM
is used as classifier for human face recognition issue. Figure 5, 6, and 7 are some
image samples used for gender classification.
These show that a significant part of the misunderstandings made by our structure
are a result of incomprehension in the viewing conditions of the reference image
segment of the dataset. Most conspicuous are botches achieved by dark or low
objective and obstacles (particularly from considerable beautifiers). Gender orienta-
tion evaluation befuddles regularly occur with pictures of newborn children or little
young people where clear gender credits are not yet evident. So our system has tried
to overcome this issue using varied images which have considered all age groups
face images.
The proposed model is evaluated using standard verification criteria such as preci-
sion, recall, and F-measure. These scores are calculated based on the confusion matrix
obtained when classifying human gender images. To evaluate the performance of the
proposed system, we conducted experiments with our dataset and showed the results
by training and testing on different set of samples.
A Gender Recognition System from Human Face … 317
In the first trial, out of 24,000 samples, we considered 7200 images (30% of 12,000
male and 30% of 12,000 female class samples) for training and 16,800 samples for
testing to evaluate the performance.
In the second trial, out of 24,000 samples, we considered 12,000 images (50%
of 12,000 male and 50% of 12,000 female class samples) for training and 12,000
samples for testing to evaluate the performance.
In the final trial, out of 24,000 samples, we considered 16,800 images (70% of
12,000 male and 70% of 12,000 female class samples) for training and 7200 samples
for testing to evaluate the performance.
The accuracy and the performance of the system for each trial are shown in the
Table 2.
318 S. Mandara and N. Manohar
6 Conclusion
In this project, along with deep learning CNN model VGG16, we have combined
SVM classifier to classify human gender. Initially, the images are fed into the pre-
trained system, i.e., VGG16 for feature extraction. From the extracted features, SVM
classifies the gender. Despite the fact that several previous strategies have addressed
the issue of gender identification, much of this study has recently focused on images
taken in gender classification. For experimentation, we created and also gathered
images from different sources which consist of 24,000 face images in the dataset
that includes samples of all age groups and with two classes. By the analysis of
the gender classification using deep learning approach with SVM, we obtained good
accuracy of 92.34% on a larger dataset. The system classifies and performs well even
when varied and different images of face are used.
A Gender Recognition System from Human Face … 319
References
1. Fu Y, Guo G, Huang TS (2010) Gender classes synthesis and estimation via faces: a survey.
Trans Pattern Anal Mach Intell 32(11):1955–1976
2. Fu Y, Huang TS (2008) Human gender classes estimation with regression on discriminative
aging manifold. Int Conf Multimed 10(4):578–584
3. Gao F, Ai H (2009) Face age, gender classification on consumer images with gabor feature and
fuzzy LDA method. In: Advances in biometrics. Springer, pp 132–141
4. Eidinger RE, Hassner T (2014) Gender estimation of unfiltered faces. Trans Inform Forensics
Security
5. Cortes VV (1995) Support-vector networks. Mach Learn 20(3):273–297
6. Gallagher AC, Chen T (2009) Understanding images of groups of people. In: Proceedings
conference on computer vision pattern recognition. IEEE, pp 256–263
7. Geng X, Zhou ZH, Smith-Miles K (2007) Automatic gender classes estimation based on facial
aging patterns. Trans Pattern Anal Mach Intell 29(12):2234–2240
8. Hillel B, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence
relations. Int Conf Mach Learn 3
9. Chao WL, Liu JZ, Ding JJ (2013) Facial gender classes estimation based on label sensitive
learning and gender classes oriented regression. Pattern Recogn 46(3):628–641
10. Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2010) Wld: a robust local
image descriptor. Trans Pattern Anal Mach Intell 32(9):1705–1720
320 S. Mandara and N. Manohar
11. Baluja S, Rowley HA (2007) Boosting gender identification performance. Int J Comput Vision
71(1):111–119
12. Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns:
application to face recognition. Trans Pattern Anal Mach Intell 28(12):2037–2041
13. Choi SE, Lee YJ, Lee SJ, Park KR, Kim J (2011) Gender classes estimation using a hierarchical
classifier based on global and local facial features. Pattern Recogn 44(6):1262–1281
14. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:
delving deep into convolutional nets. arXiv:1405.3531
15. Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: European conference
on computer vision. Springer, pp 484–498
16. Manohar N, Pranav MA, Akshay S, Mytravarun TK (2020) Classification of satellite images,
information and communication technology for intelligent systems. In: ICTIS 2020. Smart
innovation, systems and technologies, vol 195. Springer
17. Golomb BA, Lawrence DT, Sejnowski TJ (2000) Sexnet: a neural network identifies sex from
human faces. Neural Inform Process Syst
18. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional
neural networks. In: Proceedings of the 25th international conference on neural information
processing systems, vol 1, pp 1097–1105
19. Rafique I, Hamid A, Naseer S, Asad M (2019) Age and gender prediction using deep
convolutional neural networks. In: International conference on innovative computing
20. Akshay S, Apoorva P (2018) Segmentation and classification of FMM compressed retinal
images using watershed and canny segmentation and support vector machine. In: (ICCSP)
2017 international conference on communication and signal processing
21. Manohar N, Kumar YHS, Kumar GH (2020) Supervised and unsupervised learning in animal
classification. In: International conference on advances in computing, communications and
informatics (ICACCI). Jaipur, pp 156–161
22. Fei L, Yajie W, Hongkun Q, Linlin W (2014) Gender identification using SVM based on human
face images. In: International conference on virtual reality and visualization
Deep Learning Approach for RPL
Wormhole Attack
Abstract The network of smart devices and gadgets forms the Internet of things
(IoT). The IoT technology implemented in our day-to-day devices has shown more
advantages to the users. With this, the use of IoT devices has also increased which
increases the network traffic. An increase in network traffic has attracted many
hackers to inject more network attacks. The more the usage, the more it is vulnerable
to attacks. One such IoT attack is the RPL protocol wormhole attack. Thus, there
is a need for an intrusion detection system (IDS) to protect the network data. The
proposed work concentrates on generating real-time wormhole attacks in the Cooja
simulator, and using a recurrent neural network (RNN), deep learning model to detect
and classify the wormhole attack data from the normal data in the IoT network traffic.
The proposed work produced an accuracy of 96%. The F1 score produced is 96%.
1 Introduction
The increased usage of IoT devices in our day-to-day life has raised the importance of
research in the security of these devices [1]. The work records the dataset generation
of the RPL wormhole attack in the Cooja simulator. The generated dataset is deducted
and classified using the RNN model.
Any aspect of our lives takes continuously tracked down by data. Routing occur-
rences produced the Internet of things (IoT) by surrounding devices, regulators, and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 321
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_23
322 T. Thiyagu et al.
selectors. Several tools exist to model the IoT domain, and hence, big data are used
to analyze these technologies more easily [2]. Furthermost, general emulators are
Cooja, GNS-3, notify, and MATLAB. Inappropriately, developing and maintaining
effective IoT contact are a challenging job.
As the amount of data produced grows, the term data protection has become
increasingly important [3]. Particularly, the safety of complex records needs to be
under the ethics of data safety (confidentiality, integrity, and availability) [4]. Since
there is a lack of stable routing rules, there are several occurrences of IoT attacks
such as car hacking, DDoS, and other physical attacks.
The security of IoT devices has become a major concern to the users with the increase
in IoT network traffic. The user is affected by novel intrusions day by day. The devices
are vulnerable to denial of service, flooding, worm, and many other attacks. The RPL
protocol attacks pose a major challenge in deduction and mitigation. The availability
of existing datasets produces a higher performance. However, it fails to perform
well when it is implemented practically. Thus, a new dataset is required which is
generated in the real-time IoT environment for every routing attack. Also, a suitable
deep learning model is required to efficiently deduct and classify these attacks from
the normal data traffic.
1.2 Contribution
In this paper, Sect. 2 presents the background and related works about the RPL attacks
and deep learning models. Section 3 shows the proposed methodology. The result is
Deep Learning Approach for RPL Wormhole Attack 323
discussed in Sect. 4, and the conclusion is discussed in Sect. 5. Future work presents
in Sect.6.
1.4 Background
Cooja Simulator:
With the availability of more IoT network simulators such as ifogsim, cloudsim,
OMNET++, and many other simulators, the Cooja simulator suits best for the given
problem statement [5]. The Cooja simulator provides a wide scope to create many
number of nodes in the wireless network [6]. Figure 1 shows the Cooja interface.
The generation of network traffic in this simulator is easier.
2 Related Works
Many research on RPL attacks has been done in the IoT industry. The analysis of
different categories of RPL attacks such as wormhole attacks, blackhole attacks,
flooding attacks, and synchole attacks is extensively done [7] and gives a survey
about the effects of RPL wormhole attacks, and its detection methods [8] give a
detailed research finding about wormhole attacks. The attack detection is done by
the packets leashes. Temporal leashes and geographical leashes are the two main
categories considered. The work provides both detection and mitigation of the attack.
The work in [9] represents the method of generating dataset in the Cooja simulator
and injecting attacks in the normal traffic. The work explains the method of capturing
the generated dataset through Wireshark. The proposed methodology in [10] shows
the deep learning approach for the generated dataset. The work uses ANN for RPL
rank attack detection and classifies it from the normal data packets in the generated
network traffic.
3 Methodology
Figure 2 explains the methodology used to generate the RPL wormhole attack dataset
injected in normal traffic through the Cooja simulator. Then, the generated dataset
is applied to the RNN model to classify them into malicious and normal data.
Simulation: The Cooja simulator is used to generate and record network traffic in
WSN. In this research, the simulator is used to set up 1000 s of nodes and establish
communication between them. Then, the RPL wormhole attack is injected into it.
The Wireshark is the most popular application that uses .pcap file interval circulation
monitoring [11]. Wireshark can be used on Windows, Mac OS X, and Linux. As long
as the appropriate programs are enabled, these .pcap files can be accessed. Some
frequently used tools generate pcap files, such as Wireshark, WinDump, tcpdump,
Packet Square-Capedit, and Ethereal.
The .CSV file is a simple text register that encloses a set of documents separated
by commas. Similar data are often used to transfer data between applications [8]. In
this proposed work, the network generated from the Cooja simulator is captured by
Wireshark and stored as .pacap and .csv files.
The captured network traffic is sent for feature extraction. Feature extraction is the
process of analyzing the number of properties needed to represent a large amount
of data. Several professionals of deep learning claim that properly optimized feature
extraction is the secret to build successful models. It is a method for identifying
essential information components. Pattern identification and recognizing common
patterns in a wide number of documents are two examples of this approach. Spam
detection is another example of this process [12]. It is an effective data preprocessing
technique that has been scientifically designed to improve feature dimensionality and
improve the efficiency of deep learning in implementation.
File preprocessing is a data mining method that includes translating raw data into
an understandable format. Actual data have some missing, unreliable, and defi-
cient in specific performances and patterns. It may also contain numerous errors.
Preprocessing the data is a true way of addressing such problems.
The network traffic after data preprocessing is sent to the recurrent neural network
(RNN). RNN is a supervised learning model which processes data in sequence. As
in Fig. 3, the three stages of RNN working are
1. First, the data are moved to the hidden layer which predicts an output.
2. Then, the predicted value is compared with the actual value. The difference is
recorded as a loss function. The less the loss function value, the better the RNN
prediction performance.
326 T. Thiyagu et al.
3. Finally, depending on the loss function, the unmatched packets are sent to the
input lane through back-propagation and node values are adjusted to match with
the actual value.
This paper mainly focuses on the performance of the deep learning RNN model in
the classification of network traffic packets as RPL wormhole attack data and normal
data. The network traffic captured from the Cooja simulator forms the base for the
performance evaluation. Figures 4 and 5 show the output of traffic generated in the
Cooja simulator.
The performance of the deep learning model is effectively done by confusion matrix
which is shown in Fig. 6. The true positive (TP) and false negative (FN) say that the
predicted value matches the actual value. That is, it predicts true for actual true and
false for actual false. On the other hand, true negative (TN) and false positive (FP)
say that the predicted value does not match the actual value. The more the TP and
FN value, the more the accuracy of the detection model (Fig. 6).
Deep Learning Approach for RPL Wormhole Attack 327
According to the work done, the value is true for wormhole attack data and false for
normal data.
Table 1 gives the confusion matrix of the work.
Accuracy:
Accuracy is given by the number of correct predictions divided by the total number
of predictions.
328 T. Thiyagu et al.
TP + TN
Accuracy = = 0.94.
TP + TN + FN + FP
Precision:
Precision is given by the actual positive values divided by the predicted positive
values.
TP
Precision = = 0.94.
TP + FP
Recall:
The recall is given by total positives divided by the number of correctly predicted
values.
TP
Recall = =1
TP + FN
F1 Score:
The F1 score gives the harmonic mean of recall and precision value obtained.
2 × Precision × Recall
F1Score = = 0.96.
Precision + Recall
Deep Learning Approach for RPL Wormhole Attack 329
The F1 score of the RNN model is high which proves that the proposed RNN
detection model performs well for RPL wormhole attacks in the IoT network traffic.
5 Conclusion
The work shows the importance of research on RPL wormhole attacks and their
severity in IoT networks. First, the network traffic is generated in the Cooja simulator
with normal traffic and wormhole attack which is captured through Wireshark. This
data are sent to the RNN classifier which classifies the dataset to normal data and
malicious data. The performance of the deep learning model is evaluated through
a confusion matrix. The F1 score achieved is 0.96 which shows that the proposed
method performs well for the classification of RPL wormhole attacks.
6 Future Work
The work has generated output for RNN deep learning classification for RPL worm-
hole attacks. The similar work can be extended to various other RPL attacks such
as blackhole attacks, sinkhole attacks, DoS attacks, flooding attacks, rank attacks,
version attacks, and other novel RPL attacks. Also, the work concentrates on detection
techniques. This can be extended by applying mitigation techniques in the network
traffic.
References
1. Pongle P, Chavan G (2015) Real-time intrusion and wormhole attack detection in internet of
things. Int J Comput Appl 121(9)
2. Tahboush M, Agoyi M (2021) A hybrid wormhole attack detection in mobile ad-hoc network
(MANET). IEEE Access 9:11872–11883
3. Cakir S, Toklu S, Yalcin N (2020) RPL attack detection and prevention in the Internet of Things
networks using a GRU based deep learning. IEEE Access 8:183678–183689
4. Morales-Molina CD, Hernandez-Suarez A, Sanchez-Perez G, Toscano-Medina LK, Perez-
Meana H, Olivares-Mercado J, Portillo-Portillo J, Sanchez V, Garcia-Villalba LJ (2021) A
dense neural network approach for detecting clone id attacks on the rpl protocol of the iot.
Sensors 21(9):3173
5. Mahmud A, Hossain F, Choity TA, Juhin F (2020) Simulation and comparison of RPL,
6Lowpan, and coap protocols using cooja simulator. In: Proceedings of ınternational joint
conference on computational ıntelligence. Springer, Singapore, pp 317–326
6. Rana AK, Sharma S (2021) Contiki cooja security solution (CCSS) with IPv6 routing protocol
for low-power and lossy networks (RPL) in ınternet of things applications. In: Mobile radio
communications and 5G networks. Springer, Singapore, pp 251–259
7. Dutta N, Singh MM (2019) Wormhole attack in wireless sensor networks: a critical review.
Adv Comput Commun Technol 147–161
330 T. Thiyagu et al.
8. Hu Y-C, Perrig A, Johnson DB (2006) Wormhole attacks in wireless networks. IEEE J Sel
Areas Commun 24(2):370–380
9. Malik M, Dutta M (2017) Contiki-based mitigation of UDP flooding attacks in the internet
of things. In: 2017 ınternational conference on computing, communication and automation
(ICCCA). IEEE, pp 1296–1300
10. Choukri W, Lamaazi H, Benamar N (2020) RPL rank attack detection using deep learning. In:
2020 international conference on innovation and intelligence for informatics, computing and
technologies (3ICT). IEEE, pp 1–6
11. Singh U, Samvatsar M, Sharma A, Jain AK (2016) Detection and avoidance of unified attacks
on MANET using trusted secure AODV routing protocol. In: 2016 symposium on colossal data
analysis and networking (CDAN). IEEE, pp 1–6
12. Tun Z, Maw AH (2008) Wormhole attack detection in wireless sensor networks. World
Academy of Science, Engineering and Technology 46
13. Sivaganesan D (2021) A data driven trust mechanism based on blockchain in IoT sensor
networks for detection and mitigation of attacks. J Trends Comput Sci Smart Technol (TCSST)
3(01):59–69
Precision Agriculture Farming
by Monitoring and Controlling Irrigation
System Using Sensors
Abstract IoT facilitates the authorization of things and device activities that are
connected across the cloud network interface remotely. It has a very significant
contribution toward revolutionary farming methods. This paper describes about an
autonomous crop irrigation system. The ability to control and monitor irrigating
plants that not only reduces the human intervention but also to sense and record
the processing of the system status in real time makes our system more unique and
simplified than any existing system.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 331
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_24
332 B. D. Kumar et al.
of crop they had lost. So, in the account of scientific and technical innovation, the
future of farming is poised to be pushed to new heights.
In the recent times, with the high-speed, secure, and emerging IOT technology,
each and every one can use their smart gadgets to monitor their devices present in
the field to gain real-time stats on their farms. This system can sustain both water
and electricity. With the onset of open-source Arduino boards along with low-tariff
sensors, it is viable to create a device that can monitor the soil wetness and flush
the fields or the landscape subsequently [1]. This technique, in point of fact, mini-
mizes the amount of extra manpower needed for the actual procedure in day-to-day
operations.
Now compared to the existing systems, we wanted to elaborate the accuracy of
the working of the system to an extent to provide sufficient and healthy growth of
the crop. The fact that soil humidity sensor which is practically available outside can
only detect around 20–30 cm based on the texture of the soil. So, to overcome this,
we are going to use multiple sensors in the field making the system capable to read
number of values from these different sensors to calculate the average values and
take the decision based on the instructions. The threshold value given in the system
is also not defined as static or fixed. This threshold value is determined by the type
of soil used and the temperature values. The type of soil is set by the user while
installing the device in a particular field.
There are so many techniques used for irrigation, but the basic and most widely used
techniques in the modern days are drip irrigation system and sprinkler irrigation
system.
In this type of irrigation system, the crop roots are irrigated by carrying water directly
to the root nodes using an array of pipes and valves. It is easy to liquid fertilize the
roots of the plants directly which would do a great work in saving both irrigating
water as well as the fertilizers. Figure 1 shows the drip irrigation method used in
recent times [2].
Sprinklers are a great method for irrigating large fields and can even save a lot of
water. Small gardens can mostly use normal sprinklers, but larger sprinklers with
plenty of coverage need to be used if you have several acres of lands to irrigate.
One of the best advantages about this method is that it can be automated on a timer.
Precision Agriculture Farming by Monitoring … 333
One can set the timer to when the sprinkler system must turn on every single day.
Additionally, they can also allot it to run for a certain amount of time. Figure 2 shows
the modern sprinkler system [2].
2 Existing System
Jani et al. proposed to support aggressive water management for the agricultural land
by implement a smart irrigation system using IoT [3]. Bhanu et al. have designed a
system that considers a few parameters for data analysis using IoT cloud platform [4].
334 B. D. Kumar et al.
Bajwa et al. developed a smart solution for irrigating plants by using three different
modules which are different sensors [5]. Rawal et al. proposed to build a monitoring
system for determining the humidity [1]. These systems even after being utilized to
the extent to save irrigation and reduce the human labor, they significantly are hard
to implement and would require considerate amount of time to be established.
Koduru et al. proposed a framework-based cloud application for smart irriga-
tion by utilizing excessive rain water [6]. Krishnan et al. have implemented a smart
system using global system for mobile (GSM) communications to provide notifica-
tion messages about job’s statuses such as dampness level of soil [7]. Laksiri et al.
have proposed a smart irrigation system that was said to provide an effective method
to irrigate farmer’s cultivation by implementing both remote and manual irrigation
system and uploading the stats online via Internet [8]. Singh et al. developed a system
by using different sensors to analyze and respond to the different soil conditions [9].
Chapungo et al. have developed a system using specific sensing technologies and
sensor deployment using satellites and drones which irrigates and sprays pesticides
[10]. Amin et al. have developed a system that specifically uses drip irrigation to
maintain the soil moisture within a specific range on potted wheat plants proposing
it to be useful for the wheat production especially in drought areas [11]. Hadi et al.
have designed a system that used the application of IoT for irrigating the gardens
remotely by the owner that allows to both measure and detect the soil moisture [12].
Laabidi et al. have proposed a smart grid irrigation system by dividing the field into
grids and each grid acting as a separate group for the IoT application [13]. These
systems require more number of sensors and would cost high for initial purchase.
Karar et al. have presented a design of a water pump control for the development
for smart irrigation in their system [14]. Karpagam et al. have proposed an IoT
enabled watering system for water management and distribution [15]. Rohith et al.
have designed a smart irrigation system using basic sensors to control the water
[16]. Stojanović et al. have presented an application of digital technologies in the
field of agriculture for supporting smart computers [17]. Murlidharan et al. have
developed an application of precision agriculture using IoT and ML on the basis of
the existing technology [18]. The present day technology allows us to do things that
are imaginable in every aspect possible, but the side effects of these are excessive
investment or cost and lack of adequate knowledge of the complexity involved.
3 Proposed System
In this section, the functionality and working of the device model are explained.
Considering IoT as main principle for the working of system, Arduino UNO micro-
controller is used for further applications like controlling the system automated using
the parameters used from different payloads with each having their own unique
characteristics.
Precision Agriculture Farming by Monitoring … 335
The materials used in our project are Arduino UNO, temperature sensor, soil moisture
sensor, relay module, irrigation motor, power adapter.
Arduino UNO is the basic processor used here for the functioning of the system like
reading the inputs and providing the output as designed. Figure 3 shows the image
of Arduino [4].
The function of this module is to connect the terminals based on the input condition
provided to the module. Based on the switching condition, either of the two terminals
will be powered up which are notated as NC and NO, respectively. Figure 4 shows
the relay module [16].
The irrigation motor is a device which pumps water by pressurizing it from a water
source either pond or groundwater from a well. In other words, this is the heart of
irrigation. The HP of the pump used here can differ based on the capacity of the field
to be irrigated. This pump is controlled by the Arduino which regulates when to turn
on and off based on the conditions met by the program code. Figure 5 is a sample
irrigation pump used for prototype.
336 B. D. Kumar et al.
This sensor calculates the temperature around the soil to study and understand the
environmental conditions and to determine the amount of irrigation required for the
plants in that specific condition comparing it to the room temperature. The sensor
module used here is LM35. Figure 6 shows the image of LM35 temperature sensor.
This sensor when placed in the land detects the humidity present around its area
on basis of the percentage of electron flow on basis of the sensitivity. It returns the
percentage value as output. Figure 7 indicates the sample soil moisture sensor [4].
Precision Agriculture Farming by Monitoring … 337
The execution of the system starts by first reading the temperature and humidity
levels from the respective sensors. After the values are recorded, the Arduino does the
work by analyzing and performing the designated function. The function performed
here is to power the motor to irrigate the land, thus making this system completely
autonomous. Figure 8 describes the basic flow of the system explaining principle of
working.
338 B. D. Kumar et al.
3.4 Implementation
The system functioning can be simply put as the Arduino reading the inputs from the
installed sensors and controlling the relay module which drives the power to the irri-
gation motor. The sensor outputs go through a screening process where the Arduino
analyzes the readings with the threshold values input by the user. These threshold
340 B. D. Kumar et al.
values are set based on the crop planted, and the current season which decides whether
the crop requires more irrigation or not. The reading from the temperature sensor is
used to either increase or decrease the threshold value. Higher the temperature value,
more humidity will be required for a healthy crop. The sensor values are transmitted
to the Arduino as digitally binary signals as high and low. We used an Arduino UNO
board which covers most of the processing done by the system. There is an option
to manually override the functioning of the system by just activating the payload
manually using a switch and not through the Arduino board. The process of valua-
tion and analysis is repeated usually for every 1–2 h. The practical implementation
of our system is shown in Fig. 10.
Using the parameters and values defined the system analyzes, the practical values
that are saved in the clipboard of Arduino to trace back the output function in a
runtime instance. The sample recorded readings can be observed from the port.
The snaps in Figs. 11 and 12 are the recorded outputs from the sensors by the
Arduino. Based on these values, the Arduino does the decision making to do the
designated job which, here, is the operation of irrigation motor.
Precision Agriculture Farming by Monitoring … 341
5 Conclusion
This designed system works efficiently using IoT in the basis of precision agricul-
ture to detect the dampness in the soil and analyzing all the practical values from the
sensors used to control the irrigation flow when required. The system is completely
autonomous making it easier for the farmer to reduce extra human effort and save
342 B. D. Kumar et al.
extra expenses. Our system uses multiple sensors at different parts of the field consid-
ering the fact that a moisture sensor can only work efficiently for a range of around
20–30 cm depending on the soil texture. The system analyzes multiple readings,
process them, and then does the decision making to determine the necessity of water,
thus making the system dynamic to temperature, soil conditions, and texture varia-
tions. This dynamic calculation of threshold value exclusively by the system itself
not only saves time but also makes it portable and realistic for the environmental
changes and is economically feasible.
References
1. Rawal S (2017) IoT based smart irrigation system. In: Proceedings of the international journal
of computer applications, vol 159
2. Prakash BR, Kulkarni SS (2020) Super smart irrigation system using internet of things. In:
2020 7th international conference on smart structures and systems (ICSSS), pp 1–5
3. Kansara K, Zaveri V, Shah S, Delwadkar S, Jani K (2015) Sensor based automated irrigation
system with IOT: a technical review. In: proceedings of the international journal of computer
science and information technologies, vol 6, pp 5331–5333
4. Bhanu KN, Mahadevaswamy HS, Jasmine HJ (2020) IoT based smart system for enhanced
ırrigation in agriculture. In: 2020 ınternational conference on electronics and sustainable
communication systems (ICESC), pp 760–765
5. Safdar Munir M, Bajwa IS, Ashraf A, Anwar W, Rashid R (2021) Intelligent and smart ırriga-
tion system using edge computing and IoT. In: Proceedings of the ınnovative and ıntelligent
technology-based services for smart environments
6. Koduru S et al (2018) Smart irrigation system using cloud and internet of things. In: Proceedings
of 2nd ınternational conference on communication, computing and networking, pp 195–203
7. Krishnan RS et al (2020) Fuzzy logic based smart irrigation system using internet of things. J
Clean Prod 252:119902
8. Laksiri HGCR, Dharmagunawardhana HAC, Wijayakulasooriya JV (2019) Design and opti-
mization of IoT based smart ırrigation system in Sri Lanka. In: 2019 14th conference on
ındustrial and ınformation systems (ICIIS), pp 198–202
9. Singh R, Srivastava S, Mishra R (2020) AI and IoT based monitoring system for ıncreasing
the yield in crop production. In: Proceedings of the international conference on electrical and
electronics engineering, pp 301–305
10. Chapungo NJ, Postolache O (2021) Sensors and comunication protocols for precision agricul-
ture. In: Proceedings of the 2021 12th international symposium on advanced topics in electrical
engineering (ATEE)
11. Amin AB, Dubois GO, Thurel S, Danyluk J, Boukadoum M, Diallo AB (2021) Wireless sensor
network and ırrigation system to monitor wheat growth under drought stress. In: 2021 IEEE
ınternational symposium on circuits and systems (ISCAS), pp 1–4
12. Hadi MS, Nugraha PA, Wirawan IM, Zaeni IAE, Mizar MA, Irvan M (2020) IoT based smart
garden ırrigation system. In: Proceedings of the 4th ınternational conference on vocational
education and training
13. Laabidi K, Khayyat M, Almohamadi T (2021) Smart grid irrigation. In: Proceedings of the
innovative and intelligent technology-based services for smart environments
14. Karar ME et al (2020) IoT and neural network-based water pumping control system for smart
irrigation. In: Proceedings of the arXiv:2005.04158, pp 107–112
15. Karpagam J et al (2020) Smart irrigation system using IoT. In: Proceedings of the 2020 6th
ınternational conference on advanced computing and communication systems (ICACCS), vol
6 (15). IEEE
Precision Agriculture Farming by Monitoring … 343
16. Rohith M, Sainivedhana R, Sabiyath Fatima N (2021) IoT enabled smart farming and ırrigation
system. In: Proceedings of the 2021 5th ınternational conference on ıntelligent computing and
control systems (ICICCS), pp 545–552
17. Stojanović R, Maraš V, Radonjić S, Martić A, Durković J, Pavićević K, Mirović V, Cvetković
M (2021) A feasible IoT-based system for precision agriculture. In: Proceedings of the 2021
10th mediterranean conference on embedded computing (MECO)
18. Murlidharan S, Shukla VK, Chaubey A (2021) Application of machine learning in precision
agriculture using IoT. In: 2021 2nd international conference on intelligent engineering and
management (ICIEM), pp 34–39
Autonomous Driving Vehicle System
Using LiDAR Sensor
1 Introduction
Every year, around 1.35 million people died because of vehicle crashes throughout
in the world. Among those people, over half are pedestrians, cyclists, and motor-
cyclists and the number goes beyond fatalities. Consistently, each year nearly 50
million peoples are injured vehicle crashes in the worldwide [1]. The great majority
of these accidents have a common thread which is human error and inattention.
Additionally, there are several factors including speeding, distraction, drowsiness,
and alcohol consumption. The autonomous vehicle can assist in reducing risky behav-
iors and accidents. Autonomous driving vehicles are known as driverless vehicles
that combined sensors and the software for control to navigate self-driving. It depends
on their perception systems and ability to gain information from the nearby envi-
ronment. For proper self-driving, it is important to identify the presence of different
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 345
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_25
346 S. Islam et al.
the vehicle and nearby objects can be accurately calculated with LiDAR sensor.
For interpreting the environment, LiDAR data can easily be transformed into 3D
maps. In low light conditions, LiDAR performs well, regardless of ambient light
variations. Direct measurement of distance is enabled by LiDAR data which does not
require decoding or interpretation, therefore enabling faster processing. In LiDAR,
a large amount of measurements can be made instantaneously and can be precise to
a centimeter.
2 Working Principle
where s is covered distance, c is the speed of light (3 × 108 m/s), and t is time taken
by laser beam between emission and reception. Various factors contribute to find the
range for pulse-based laser system whose equation is as follows:
Range = (P ∗ A ∗ Ta ∗ To )/(Ds ∗ π ∗ B) (2)
LiDAR consists of three components which are embedded in its package [3]. The
first one is a transmitter which is usually a laser that stands for light amplification
through stimulated emission of the radiations. The laser is characterized by rapid
sending of laser light pulses that emit normally 150,000 pulses/sec. It is classified
according to its wavelength between 600 and 1000 nm toward the shorter wavelength
range. It is not visible in human eye as can be seen in electromagnetic spectrum in
Fig. 3. The majority of methods are based on continuous lasers, and pulsed is used.
Another component is the receiver which is shown in Fig. 4, and travel time is
determined by measuring the incoming light beam. The object’s reflected light scans
the surroundings and generates the 3D coordinates. Low-intensity light demands a
highly sensitive receiver. The following apparatus is used for low light detection:
• Silicon PIN detector
• Silicon avalanche photodiode (APD)
• Photomultiplier tube (PMT).
The last component is position and navigation system. This navigation system
is used for determining the position and angle with the respect to the current posi-
tion. The object’s position is determined using the global positioning system (GPS),
which measures an object’s longitude, latitude, and altitude. In contrast, the inertial
measurement unit (IMU) allows us to precisely measure an object’s angle [7]. It is
especially used in airborne sensors.
An active LiDAR sensor always can be divided into two sections. Figure 5 shows the
basic configuration of LiDAR system. One transmitter sends the signal in the form of
a laser beam while a receiver reflected radiation by an optical detector in the form of
a photodiode that electric signal analyzes in computer. The beam expander is able to
include in the system within the sender unit to decrease the branching of light beam
prior it emits into atmosphere. The reflected photons are captured by geometrical
optical structure throughout the atmosphere at receiving end. It is followed by optical
analyzing method that depends on specific application and selects wavelength which
states out accumulated light. The detector receives the specified optical wavelength
and converts it into the electrical signal. The signal performance is defined by number
of time that elapses since the laser pulse is emitted. The object’s distance is determined
by using electronic time measurements and stored data in computer. Figure 6 depicts
the three-dimensional configuration of LiDAR sensor. For perform scanning over at
minimum one layer of these systems focuses on single pair emitter that combined
some moving mirror effect. This mirror reflects the emitting light from diode and also
reflects return light to detector. This device can swiftly measure the surface area, and
the rate is more than 150 kHz. Wavelengths of LiDAR depend on application, and it
extends the range between 250 and 11 µm. LiDAR uses several beams to minimize
the movement mechanism. For example, Velodyne series uses array laser diode to
enhance point of cloud density.
In automotive LiDAR scanning system, the most popular solution is spinning
mechanism [5]. In general, two types of systems are used: polygonal mirror system
and nodding mirror system, for instance, as shown in Fig. 6, a mechanical spinning
mechanism. The lasers are tilted by an integrated nodding mirror system to create
a vertical field of view. LiDAR base is rotated to achieve 360° horizontal field of
view (FOV). The state of the art for multiple beams is used in LiDAR to decrease the
moving mechanism. Mechanical spinning offers a number of advantages over large
FOV. The Velodyne VLP series, for example, increases point cloud density with
arrays of laser and photodiodes. A rotating mechanism is enormous for implemen-
tation inside the vehicle and is vulnerable to extreme circumstances like vibration,
which is ubiquitous for automotive applications. FOV is the angle that is captured
by a sensor. When using a camera with a LiDAR sensor, it is better to select the FOV
carefully so that the LiDAR outputs match the region covered by aerial photographs.
4 LiDAR Technologies
LiDAR operates with many laser beams by scanning field of view. This is accom-
plished of a delicately constructed beam controlling system. This amplitude pulsed
laser diode emitting at infrared frequency generates with laser beam. The surround-
ings reflect to the laser beam which returns to scanner. Photodetector receives the
returning signal. The signal is filtered by fast electronics, which measures the differ-
ence between transmitter and receiver signals that are proportional to distance. This
difference is used to calculate the range from sensor model. Almost all 3D points and
Autonomous Driving Vehicle System Using LiDAR Sensor 351
An object-ranging device that measures distance with a laser beam is called a laser
rangefinder. The operation depends on the shape of modulation that is used on the
laser beam. Direct detection rangefinder uses pulsed lasers so that their time of flight
(ToF) can be determined. A frequency-modulated continuous wave (FMCV) works
indirectly for velocity and distance measured using Doppler effect [9]. The term
coherent refers to these types of structures.
In order to receive the reflected signals, the laser signals must be generated, emitted,
and received by the receiver electronics. Additionally, the rangefinders performance
and cost are determined by reflected signals. The ToF LiDAR sensor requires the
352 S. Islam et al.
pulsed (amplitude modulated) laser signal. A fiber laser diode or pulsed laser is used
for generating this type of signal. The laser diode oscillates as an electric current
flowing through diode junction. There are two types of diode lasers: surface-emitting
semiconductor laser (VCSEL) and edge-emitting lasers (EELs). In the telecommu-
nications industry, EEL has been used for a long-term period. The output of VCSEL
beam is circular, and contrarily EEL sends elliptical laser beam and needs extra optics.
In the automotive applications, pulsed laser diodes are hybrid devices. A capacitor is
mounted on laser chip, and it is activated by a MOSFET transistor [10]. C. Photode-
tector: The photoelectric effect transforms light energy into electrical energy in a
photodetector. One of the most important features is photosensitivity, which speci-
fies how a photodetector reacts when it receives photons. The photosensitivity of a
laser beam depends on its wavelength. As a result, it is very important to consider
the selection of laser wavelength when choosing a LiDAR system’s photodetector.
A scanning system permits the lasers to transmit rapidly in a vast area. Mechanical
spinning or solid-state scanning are the two most common scanning technologies.
A rotating mirror system, such as the HDL64 from Velodyne, is typically included
in the former autonomous driving history in its early stages. Automotive industry
preferred moving parts where solid state refers only scanning system.
Figure 8 shows an example of a common product: Velodyne’s HDL64. Although
the Velodyne HDL-64E is a relatively expensive sensor, it is often used in the auto-
mobile sector. It offers a high-resolution picture and 3D information about the envi-
ronment. This sensor has 64 lasers in a group of four, each with 16 laser emitters,
and detectors in a group of two, each with 32 detectors. It is mounted on the car’s
roof and spins constantly at speed is 5–20 Hz. It possesses 360° horizontal and 26.8°
vertical field of view. An angular resolution of very fine angular precision allows
Autonomous Driving Vehicle System Using LiDAR Sensor 353
for a very clear and detailed view. A distinction of 0.08° can be made between very
small objects. A range of sampling points is also available up to 2.2 Mpoints/s [11].
ϕ, θ ) provides a better method. Through distance and other criteria, the rest of the
non-ground objects can be grouped easily [8]. On the other hand, the object recog-
nition method based on machine learning approach provides semantic information
(e.g., types of pedestrians, vehicle, truck, plant, building, etc.). The procedure of
recognition feature extraction is employed to calculate compact object descriptors,
and a step of modeling arises from pre-trained classification objects. Another way to
acquire generic shape features is through principal component analysis (PCA) in 3D
objects. By evaluating the eigenvalues generated by PCA, three salient characteris-
tics (surfaceness, linearness, and scatterness) can be acquired [12]. An example of
supervised machine learning is the classification method that provided by following
feature extraction: The class of an input is predicted by a statistical model trained
on the ground truth dataset. The number of well-known datasets is available, and
KITTI provides an abundance of resources. There are plenty of machine learning
(ML) tools available in the arsenal of ML, some algorithms, such as naive Bayes,
support vector machine (SVM), KNN, and so on, maybe applied [13]. The SVM
involving radial basis functions (RBF) is the most popular method due to efficiency
and speed. Figure 6 illustrates the results of our implementations (SVM with RBF
kernel) on the identified on-road items. The neural network is used to classify LiDAR
objects. Often in practice, classes are unknown, so a classification method can well
handle this situation.
Autonomous Driving Vehicle System Using LiDAR Sensor 355
Objects are tracked using multiple algorithms which associate and locate the objects
through information received from spatial–temporal consistency. As a model of
movement in state space, a single object tracker evaluations the movement based on
Bayesian filters. By extending the single dynamic model to several operation models,
the interacting numerous model filter can pact with complicated cases. Particle filter
(PF) is another frequent technique that is meant for more broad scenarios that do not
fulfill the Gaussian linear assumption. Radar-based multiple object tracking (MOT)
typically model all detectable objects as points, while LiDAR-based MOT model
their detections as points is distinguished for tracking both the shape and the number
of detected targets. A sophisticated method that uses multiple shapes models: poly-
gons, lines, L-shapes, and points. The form of a moving item changes with varia-
tions in posture and sensor perspective when tracking it. A tracking method has been
developed that estimates the states of both poses and movement simultaneously 2D
polylines representing shapes. LiDAR detects objects as points as opposed to radar,
which represents detections as points [14]. The distinctive feature of MOTs based
on shapes is that the detections can be tracked along with their shapes.
Waves of deep learning (DL) follow the enormous success in computer vision and
speech recognition, and the same applies to LiDAR data processing. An algorithm
for deep learning is part of the machine learning field that works for multilayer
neural network. The traditional methods of machine learning, such as in the same
way that SVM efficiently extracts features from a raw input, DL systems are able
to do the same. DL has also implemented object tracking. A deep structure model
under tracking has been proposed by the traditional tracking algorithm [15].
RADAR and SONAR are two associated technologies that exploit the same
phenomenon of generating pulses and receiving signals back as LiDAR. Radio detec-
tion and ranging is an acronym for RADAR. It employs longer-wavelength radio
waves. The range of frequency used for detecting an object in front of it is 3 Hz–
3000 GHz. In comparison with RADAR, LiDAR provides more accurate results. The
differences between LiDAR and RADAR are shown in Table 1. Sound navigation
and ranging is an acronym for SONAR. It detects objects using sound waves. The
frequency range is extended from 10 kHz to 1 GHz. The differences between LiDAR
and SONAR are shown in Table 2.
356 S. Islam et al.
• Airborne LiDAR: Airborne LiDAR has made the most progress in recent years by
processing and delivering 3D point clouds. The drone industry is also developing
lightweight sensors and autonomous drones.
• Agriculture: Farming technologies (AgTech) can use LiDAR to identify areas that
get the best exposure to sunshine. A machine learning system can also use the
data to identify crops that need water and fertilizer.
• Robotics: LiDAR technology is utilized to provide mapping and navigation capa-
bilities to robots. This technology is used for self-driving cars, that the vehicle
can detect the distance between itself and other objects in its surrounding [17].
• Exploration for Oil and Gas: LiDAR can detect tiny molecules in the atmosphere
since it has a smaller wavelength than other technologies. Gas and oil deposits
can be traced using differentiable absorption LiDAR (DIAL).
• Land Management: Organizations that manage land resources can monitor them
in real time, which allows for an increased level of efficiency in mapping and less
time spent conducting aerial surveys.
• Renewable Energy: In order to harness solar energy properly, LiDAR can be
adapted to determine optimal panel positions. In addition to calculating wind
direction and speed, it can also be used to place wind turbines at wind farms [18].
• Military and Law enforcement: In military, LiDAR technology is used to identify
the targets such as image processing is used for tanks and missiles making digital
maps of the terrain and different objects in their path. The same principles have
been applied to law enforcement speed limits within cities. A laser speed gun is
used to accomplish this, and a camera is used to capture the images based on the
time of flight for calculating the speed.
7 Conclusion
In this paper, we have provided a review of LiDAR sensor technology and future
safety roads may use it as a companion. LiDAR is generally more precise than camera
or radar in terms of measuring distance. As a result, LiDAR-based algorithms are
highly reliable in evaluating physical information (object positions, headings, shapes,
etc.). We demonstrate that LiDAR-based detection systems for autonomous vehicles
can be compromised by adversaries. Developing deep learning for 3D data from
LiDAR’s will be one of the most important future directions.
References
1. World Health Organization (WHO) (2018) Global status report on road safety 2018. https://
www.who.int/violence_injury_prevention/road_safety_status/2018/en/externalicon. Accessed
28 Oct 2020
2. Liu J, Sun Q, Fan Z, Jia Y (2018) TOF LiDAR development in autonomous vehicle. In: 2018
IEEE 3rd optoelectronics global conference. Shenzhen, pp 185–190
358 S. Islam et al.
3. High Definition LiDAR Sensor for 3D Application, Velodyne’s HDL-64E, White Paper/Oct
2007
4. Fujii T, Fukuchi T (2005) Laser remote sensing. CRC Press, ISBN 10:0-8247-4256-7
5. Warrian P (2018) Mining: the inversion of industry 4.0, CDO conference, Vancouver
6. Wenzl K, Ruser H, Kargel C (2012) Decentralized multitarget-tracking using a LIDAR sensor
network, Graz
7. Weitkamp C (ed) (2006) LiDAR: range-resolved optical remote sensing of the atmosphere,
102, Springer Science & Business
8. Li Y, Ibanez-Guzman J (2020) LiDAR for autonomous driving: the principles, challenges, and
trends for automotive LiDAR and perception systems. IEEE Signal Process Mag 37:50–61
9. Horaud R et al (2016) An overview of depth cameras and range scanners based on time-of-flight
technologies. Mach Vis Appl 27:1005–1020
10. Baker WE et al (2014) LiDAR-measured wind profiles: the missing link in the global observing
system. B Am Meteorol Soc 95:543–564
11. Velodyne LIDAR Key features. Available: https://velodyneLiDAR.com/products/hdl-64e/.
Accessed: 22 June 2021
12. Zermas D et al (2017) Fast segmentation of 3D point clouds: a paradigm on LiDAR data for
autonomous vehicle applications. In: IEEE international conference on robotics and automation
(ICRA), Singapore
13. Li Y, Ruichek Y (2012) Moving objects detection and recognition using sparse spacial ınfor-
mation in urban environments. In: 2012 IEEE intelligent vehicles symposium. Madrid, pp
1060–1065
14. Chiu C, Fei L, Liu J, Wu M (2015) National airborne LiDAR mapping and examples for
applications in deep-seated landslides in Taiwan. In: 2015 IEEE international geoscience and
remote sensing symposium (IGARSS). Milan, pp 4688–4691
15. Rasshofer RH, Gresser K (2005) Automotive radar and LiDAR systems for next generation
driver assistance functions. Adv Radio Sci 3:205–209
16. Baras N et al (2019) Autonomous obstacle avoidance vehicle using LIDAR and an embedded
system. In: 2019 8th international conference on modern circuits and systems technologies
(MOCAST). Thessaloniki, pp 1–4
17. Kim JK et al (2015) Experimental studies of autonomous driving of a vehicle on the road using
LiDAR and DGPS. In: 2015 15th international conference on control, automation and systems.
Busan, pp 1366–1369
18. Duong HV et al (2012) The electronically steerable flash LiDAR: a full waveform scanning
system for topographic and ecosystem structure applications. IEEE Trans Geosci Remote Sens
50:4809–4820
Multiple Face Detection Tracking
and Recognition from Video Sequence
M. Athira (B)
Electronics and Communication Engineering, KMCT College of Engineering, Kozhikode, Kerala,
India
A. T. Nair · K. Namboothiri
KMCT College of Engineering, Kozhikode, Kerala, India
K. S. Haritha
Government Engineering College, Kannur, India
N. Gopinath
Rajadhani Institute of Science and Technology, Mankara, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 359
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_26
360 M. Athira et al.
1 Introduction
Faces are critical components of human interactions. Nowadays, the face is employed
as a biometric identifier in a variety of commercial applications, including access
control for security, criminal identification and surveillance. Face detection is a tech-
nique for extracting the human facial region from photographs. Face detection is a
prerequisite for all of the above-mentioned applications and other face-related appli-
cations. There are numerous methods for detecting and recognizing faces. While
face recognition has received much attention in the literature, there have been few
attempts at visual detection of many faces concurrently in videos, which could have
practical uses in video monitoring.
The purpose of this paper is to discuss the detection, recognition and tracking of
several faces in videos. Multiple faces are detected in movies, and the tasks of multiple
face recognition (MFR), multiple face tracking (MFT) and multiple face recognition
are performed concurrently. The Viola–Jones method and a neural network are used
in this example to detect several faces in a video stream. The KLT tracking algorithm
is used to track the detected faces, and the faces are recognized using the eigenface
approach.
2 Literature Survey
Face detection methods based solely on the Viola–Jones technique and the code
“vision. Cascade Object Detector” demonstrated a lower success rate. The failures
were attributed to the following factors: insufficient lighting, partial occlusion of
the face and a high false detection rate [1]. The proposed strategy for increasing
the success rate includes the development of an algorithm that solves several of the
disadvantages of the Viola–Jones algorithm. It significantly minimizes the rate of
erroneous detection. The success rate was increased to 90%.
Face detection algorithms come in a variety of flavours. These algorithms are roughly
categorized into two kinds [1]: technique based on features and method based on
learning. Faces are detected using feature-based approaches based on a few simple
traits found in the facial regions [2, 3]. They make no allowance for the effect of
ambient light, rotation, or position. The skin colour model is one of the most exten-
sively used approaches in this category. Statistical models and machine learning algo-
rithms underpin learning-based techniques [4]. These techniques are more resilient
Multiple Face Detection Tracking and Recognition … 361
and take more time to compute than feature-based methods. They produce excellent
results in a variety of rotational stances, even in low-light situations. As a result,
these methods are chosen over feature-based methods.
This is an object detection technique that is based on learning. Here, we’ll use it to
detect faces. It identifies things using Haar characteristics and a cascade of classifiers
[2, 5]. By utilizing the integral image, the Haar characteristics are computed. The
adaptive boost (AdaBoost) method is used to select the best features [1]. This process
occurs at each stage, and there is a cascade of these stages. At each stage, faces that
are wrongly detected are discarded. Thus, the more phases, the more accurate the
face detection.
Haar features are obtained by dividing the entire image into small windows or
rectangular sections of size M × M. Each window’s features is determined indepen-
dently [6]. Haar-like features are rapidly generated using an image’s intermediate
representation—the integral image.
For each window, a huge number of Haar characteristics are computed (approx-
imately 180,000). The majority of these features are superfluous. The AdaBoost
algorithm is used to minimize redundancy. The AdaBoost algorithm is a classifica-
tion function that is used to eliminate redundant characteristics and condense a big
number of features [7]. It is, in essence, a classifier constructed from a weighted
mixture of weak classifiers.
After determining which windows contain the best features, we must now deter-
mine which of these windows contain faces. On average, less than 0.01% of all
windows in an image are positive, containing faces. The first recognized faces must
travel through a series of cascaded stages in order to locate the positive windows.
The detection of objects in images is one of the most frequently utilized application
of neural networks [1]. It does it through the usage of unsupervised neural networks.
The image is used as the input in this case. This network’s job is to locate the required
object(s) within the image (x-coordinate, y-coordinate, width, height of the rectangle
around the object).
The Computer Vision Systems Toolbox is often used in MATLAB to do object
detection. It includes several cascade object detectors based on the Viola–Jones
method. MATALB has object detectors for identifying faces, eyes, mouths and noses,
among other things [8, 9]. Additionally, one can design a customized detector to
identify additional things [10]. A neural network must be trained to develop such
a detector. To accomplish this, the MATLAB package includes the function “train
Cascade Object Detector”. The network must be fed both positive and negative
pictures during the training process. Positive photographs contain the required object,
362 M. Athira et al.
while negative images do not [11]. Positive images have a border around the desired
thing. This highlighted area is referred to as the region of interest (ROI) [12]. After
the network has been successfully trained, an .xml file is generated. This.xml file
is used by the Cascade Object Detector system object to detect objects in the test
images.
The term “object tracking” refers to the process of maintaining a record of a certain
type of object. Due to the primary focus on the face in this research, we monitor human
faces using the input features [5]. Continuous tracking teaches us to disregard issues
like lighting, position fluctuation and so on. Here, human face tracking is performed
on a video sequence [13].
The so-called eigenface technique is one of the easiest and most direct PCA methods
used during face recognition systems [1]. This technique reduces faces to a tiny
collection of key properties called eigenfaces, which form the basis of the first set
of learning images (training set). Recognition is accomplished by projecting a new
image into the eigenface [15] subspace, and then classifying the person by matching
its location in eigenface space to the positions of previously recognized individuals
[8, 16]. Table 1 summarizes the results of the literature survey.
Multiple Face Detection Tracking and Recognition … 363
3 Proposed Method
To begin, the faces in an image are detected using the system’s built-in object detector.
This technique is based on the concept of integral picture and makes use of Haar-like
features. The threshold is modified to decrease the rate of false detection.
To begin, this method recognizes Harris corners in the initial frame. After that, it
proceeds to detect points utilizing optical flow by computing the velocity of the pixels
in a picture. The optical flow of the image is computed for each translation motion.
Harris corners are recognized by connecting successive frames’ motion vectors to
create a track for each Harris point. To ensure that we do not lose track of the video
sequence, we apply a Harris detector every ten to fifteen frames. This is nothing more
than verifying the frames on a periodic basis. This allows for the tracking of new
and existing Harris points. In this paper, we will consider solely two-dimensional
motion, specifically translation movement.
Assuming that the initial position of the corner is (x, y). Then, if it is displaced by
certain variable vector (b1 , b2 , … bn ) in the subsequent frame, the displaced corner
point of the frame is equal to the sum of the initial point and the displaced vector.
The new point’s coordinates will be x’ = x + b1 and y’ = y + b2 . As a result,
the displacement should now be calculated in relation to each coordinate. This is
accomplished through the usage of the warp function, which is a function that takes
coordinates and a parameter. It is referred to as Eq. 1,
W (x; p) = (x + b1 ; x + b2 ). (1)
Multiple Face Detection Tracking and Recognition … 365
The formation is estimated using the warp function. The initial identified points are
used as a template image in the first frame. The difference between the displacement
and the preceding point is used to calculate the subsequent tracking points in the
following stages. Alignment is determined by Eq. 2,
[l(W (x; p) − T (x)]2 (2)
x
The displacement is calculated by locating the Taylor series and then differenti-
ating it with respect to p as in Eq. 4.
T
∂w
p = H −1 ∇l · [T (x) − l(W (x; p))] (4)
x
∂p
where H denotes the hessian matrix. This is how the displacement p is estimated
and the next traceable point is located.
To begin recognizing faces, we built and loaded the dataset. Cropping, resizing and
saving the recognized faces during runtime into a folder is also loaded. Following
that, a random index was constructed using the random function, as well as a random
index of the observed faces. Using random index sequence, photographs from the
database are also loaded into a separate variable. Following that, we find the total of
all of the photographs and subtracted it. These photographs were used to calculate
the eigenvectors. After obtaining the eigenvalues, a matrix was built in which each
row included the signature of an individual image. This means that we now have
the eigenvalues and the image’s signature to identify them. In the previous step, we
subtracted the image’s mean value. Then it was multiplied by the eigenvector. Finally,
depending on the discrepancy between current picture signatures and recognized face
signatures.
366 M. Athira et al.
new points are estimated. This transformation is done to the faces’ bounding box.
Figure 4 depicts the face traits that were detected and the detection process
Simultaneously, the cropped faces saved in a folder during run time are read,
the random index of the photographs is determined, and the eigenvalue is calculated,
yielding a matrix of all image signatures. Subtracted the average value from the image
to be recognized. Then it was multiplied by the eigenvector. Finally, based on the
difference between existing picture signatures and the recognized facial signature.
Figure 5 illustrates the eigenface approach for face detection.
It achieves a 96% accuracy in face detection alone due to the usage of a neural
network and the Viola–Jones algorithm, but drops to roughly 92% when combined
with recognition and tracking. That indicates that out of every 100 trials, it may make
eight errors.
The dataset is collected from mainly two Web sites for face detec-
tion tracking and recognition “https://www.mathworks.com/matlabcentral/fileex
Multiple Face Detection Tracking and Recognition … 369
change/47105-detect-and-track-multiple-faces” “https://www.nzfaruqui.com/face-
recognition-using-matlab-implementation-and-code/”, and the face database is
created by including needed faces, and further, changes to the dataset are made
as for multiple face detection tracking and recognition.
370 M. Athira et al.
4 Conclusion
The combination of the Viola–Jones algorithm and the neural network results in a
higher level of accuracy in face detection than the Viola–Jones algorithm alone. It is
more than 90% accurate. Multiple faces are discovered with the help of Viola–Jones
and the neural network. The facial traits are recognized, and the Kanade–Lucas–
Tomasi tracking system tracks the faces using those features. Additionally, it may
occasionally fail to recognize the face when the person rotates or tilts his head.
Face recognition is accomplished using the eigenface approach. Face recognition is
demonstrated in isolation in the following figures. Face recognition may occasionally
fail due to lighting circumstances, although the mistake probability is much lower.
Its accuracy rate is 92% in this case.
References
1. Singh G, Goel AK (2020) school of computer science and engineering “Face Detection and
Recognition system using digital image processing, IEEExplore
2. Nigam S, Singh R, Misra1 AK (2017) Efficient facial expression recognition using histogram
of oriented gradients in wavelet domain, Springer Science
3. Lalitha SD, Thyagarajan KK (2018) Microfacial expression recognition based on deep rooted
learning algorithm
4. Tanvar S, Chawla P, Maadam R, Bhadrana P (2020) Authentication of face using MATLAB.
IEEE PLOREISBN:978-1-7281-5371-1
5. Boda R, Jasmine Pemeena Priyadarsini M (2016) Face detection and tracking using KLT and
Viola Jones, School Electronics and Communication Engineering. ARPN J Eng Appl Sci
6. Singh N, Daniel N, Chaturvedi P (2017) Template matching for detection & recognition of
frontal view of human face through matlab, ICICES
7. Boda R, Jasmine Pemeena Priyadarsini M (2016) School of Electronics and Communication
Engineering, Face detection and tracking using KLT and Viola Jones. ARPN J Eng Appl Sci
11(23)
8. Chaudhari MN, Ramrakhiani G (2018) Face detection using Viola Jones algorithm and neural
networks, 978-1-5386-5257-2/18/$31.00 ©IEEE
9. Rizwan SA, Kim K, Jalal A (2017) An accurate facial expression detector using multi-
landmarks selection and local transform features. IEEE
10. Ranganathas YP, Gouram (2016) A novel fused algorithm for human face tracking in video
sequence. In: International conference on computational system and information system for
sustainable solution
11. Suresh D, Rohit Kumar K, Subin S, Shanbhag S, Naveena N (2020) Int Res J Modern Eng
Technol Sci 02(04)
12. Sripriya AV, Geethika M, Radhesyam V (2020) Real time detection and recognition of human
faces, ICICCS IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2
13. Zhou X, Jin K, Chen Q, Xu M, Shang Y (2017) Multiple face tracking and recognition with
identity-specific localized metric learning
14. Lalitha SD, Thyagharajan KK (2019) Micro-facial expression recognition in video based on
optimal convolutional neural network (MFEOCNN) algorithm, journal
15. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diag-
nosis of diabetic retinopathy. Int J Image Graph 20(4):2050030 (29pages). World Scientic
Publishing Company. https://doi.org/10.1142/S0219467820500308
Multiple Face Detection Tracking and Recognition … 371
16. Sudheer Kumar T, Vishwanath N, Karthik K (2020) Face detection using matlab. IJSDR, vol 5
17. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep
convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005
(29 pages). World Scientific Publishing Company. https://doi.org/10.1142/S02195194215
00056
18. Nair AT, Muthuvel K Blood vessel segmentation and diabetic retinopathy recognition: an
intelligent approach. In: Computer methods in biomechanics and biomedical engineering:
imaging & visualization, Taylor & Francis. https://doi.org/10.1080/21681163.2019.1647459
19. Nair AT, Muthuvel K, Haritha KS (2020) Effectual evaluation on diabetic retinopathy, Lecture
Notes, Springer
20. Nair AT, Muthuvel K, Haritha KS (2021) Blood vessel segmentation for diabetic retinopathy,
publication in the IOP J Phys Conf Ser (JPCS)
Review Analysis Using Ensemble
Algorithm
Abstract Today digital reviews play a vital part in influencing the customer. E-
commerce companies provide a platform for consumers to share their thoughts and
comments, and thus, it provides an insight into the performance of the product to the
company as well as to buyers. To make it as useful, one classification of a review
is needed. Opinion mining is also known as sentiment analysis which in general
is a process of extracting subjective information from the data collected. Machine
learning provides better insights by automatically analyzing the product review and
separating them into classes and labels. Opinion mining is an artificial intelligence
tool, and its research is very useful for determining the sentiment of comments. A
feed-forward neural network classifier is used to determine the sentiment tendency of
the comment. The proposed revamp of the sentiment analysis approach was compared
with RNN and CNN approaches. The result is displayed in the form of a chart that
has higher precision. Thus, this technique is helpful for comments analysis.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 373
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_27
374 V. B. Shalini et al.
1 Introduction
In recent years, a huge amount of data like reviews and opinions are collected through
web sites and social networking sites. Because of the rapid growth of the internet
and social media [1], an increasing number of people are beginning to openly share
their views on the Internet [2, 3]. It shows the importance of sentiment analysis in
different fields. Every day, a large amount of feedback is shared on the internet [4].
The proposed revamp of the sentiment analysis method is used to determine the
user’s opinion in a chunk of text.
Sentiment analysis aims at getting emotion or opinion-related knowledge espe-
cially when the data received is large in size. Sentiment analysis is a text analysis
tool, which involves natural language processing (NLP) [5], machine learning (ML),
data mining, knowledge retrieval, and other research areas. The sentiment analysis
of comments primarily focuses on the sentiment orientation of the comment corpus
[5]. Here the NLP is the best way to uncover and understand the emotion expressed
in the text.
The NLP is the preprocessing method that is performed before the actual imple-
mentation of the sentiment analysis model. The analysis of the review [6] indicates
the user’s emotions and is classified as positive or negative [7] set up a slant inves-
tigation through removing number of tweets with the assistance of prototyping and
the outcomes coordinated client’s perspectives by means of tweets into positive and
negative. We can perform this sentiment analysis using various algorithms of machine
learning [8] and deep learning [9]. This algorithm has been used by numerous special-
ists for picture characterization [10] and tweet classification [11]. Even our proposed
method is using two algorithms to make the classification effective.
In this paper, the standard term frequency–inverse document frequency (TF-IDF)
algorithm incorporates the contribution of the word’s sentiment to text sentiment
classification and the weighted word vector is created. The word vector incorporates
a low dimension and holds semantic information of the word. However, distributed
word vectors do not contain sentiment information regarding words. The sentiment
analysis technique of comments using LSTM and naive bayes classification (NBC)
[3] is proposed. In this research, the different sentiment analysis researches were
expounded and the experiments also done, and finally, the proposed system get
summarized.
The rest of the paper is organized into four different sections. Section 2 covers
various previous systems and their drawbacks. Section 3 explains the proposed
method, and Sect. 4 includes results and discussion. Section 5 summarizes the review
of the work.
Review Analysis Using Ensemble Algorithm 375
2 Related Work
Hongchengsoong et al. [12] used to input the data in various formats such as HTML,
PDF, Word, and XML. The document specified that the corpus was converted into text
and preprocessed. One of the most crucial stages in the development of successful
classifiers is feature extraction. Sentences that contain subjective expression are
kept, while sentences containing objective communications are rejected. There are
three different types of sentiment classification: supervised, unsupervised, semi-
supervised. Supervised learning methods include support vector machine, artificial
neural network, random forest, decision tree, naive bayes, and K-nearest neighbor.
Unsupervised methods utilize a dictionary-based approach. Semi-supervised or
hybrid approaches are used to overcome the flaws of both supervised and unsu-
pervised methods. The drawback is that it does not determine which methods of
separation will produce the best results.
Surya et al. [13] used the amazon product review with 600 records. The method
used to split data is the naive bayes classifier. Prior and posterior probability is used
to classify the data. The data is separated into two categories: training and testing.
The purpose of test data is to analyze it. An R-analysis tool is utilized, and different
packages are installed. The pure command of corpus is used for processing. Before
proceeding, the sentence is divided into words and the consistency of each word is
examined. After this process, an algorithm is applied. Each word is considered for
both positive and negative likelihood. The high probability type will be the result
of a split. The output is in matrix mode. Predicting accuracy, using the confusion to
predict accuracy, the accuracy obtained is 80%. The disadvantage is word similarity,
and the negation handling of words can be improved to get a better result.
Mohammed et al. [14] have developed a technique for analyzing emotions utilizing
both content and emoticon. Machine learning and deep learning are used. The
database is used for twitter-based flight updates. The dataset has 14,460 reviews.
The information was tokenized, and the stop words, URLs, digits were eliminated.
In the next section, the punctuation marks and emoticon are removed and the opinion
is analyzed. Next by combining content and emoticon information, the feelings were
considered utilizing ML and DL algorithms like SVM, NBC, RF, LSTM, and CNN.
The algorithm separates and highlights like TF-IDF, bag of words, n-gram, and
emoticons. In each case, apply ML and DL algorithms and record their scores. As a
result, with combined text and emoticon data, 89% accuracy is obtained.
Nithyashree et al. [15] are very focused on collecting data from tweets. These
tweets are downloaded using the programming language of the Twitter API and java.
In this paper, the author releases the text of a particular hotel from twitter and the
comments are converted into an information framework. The SVM machine learning
process is used to separate tweets. For classification and regression, the ML algorithm
is utilized. After collecting data, every piece of information will be labeled using an
unsupervised algorithm and the words are compared to text files (positive, negative)
if there is a similarity then the word is classified. As a result, the accuracy obtained
is 61.11%. The drawback is the accuracy of the result.
376 V. B. Shalini et al.
Erfianjunianto et al. [16] proposed a text mining model for emotion detection
that is applied using particle swarm optimization and naive bayes classifier. Here the
dataset is taken from Twitter. The input data is divided into three groups: The data is
preprocessed by the following techniques transforming the cases tokenization, stop
word removal, and stemming. After that, the vector creation is done by using the TF-
IDF. The weighted vectors get optimized by using the particle swarm optimization
approach, and then the naive bayes classifier method is used for the final step of
classification and the data get classified as anger, fear, joy, sadness. Nearly 7000
data have been used in this research methodology, and the output is represented as a
confusion matrix. The results of this classification method using PSO and NBC have
an accuracy of 66.54% but the drawback is that it takes a lot of time to perform PSO
and NBC.
Wan et al. [17] compared six classification methods with the proposed outfit
approach, which combines the five particular algorithms into a multi-algorithm
ensemble-based classification including NBC, SVM, Bayesian network classifier,
decision tree, and random forest. All methodologies were trained and evaluated
on a similar dataset of 12,864 tweets, with the classifiers being validated using a
tenfold evaluation. The ensemble classifier achieves the highest accuracy of 84.2%
in the experiment with three classes in terms of precision, recall, and F-measure. The
ensemble classifier also has the maximum accuracy of 91.7%, in the test with just
two classes positive and negative which is more accurate. The drawback is real-time
data analysis is not done.
Suci et al. [18] have used a naive bayes algorithm in paper. The data was gathered
from a YouTube comment containing a KFC video salted egg and a tweet about
the KFC salted egg. The paper is separated into two sections, before and later than
affirmation where the information is collected for preprocessing, and at last, classifi-
cation is done and results represented as a confusion matrix. The methodology here is
including problem identification, information preprocessing, information handling,
and evaluation. The accuracy achieved in this paper was 86.48%.
Guixian et al. [19] elaborated that bidirectional long short-term memory
(BiLSTM) is used for analysis and they have used the TF-IDF algorithm for word
vectoring. It is used to generate the weighted word vector as an input. Then the
output from BiLSTM is utilized to represent the text. At last, the neural network and
softmax mapping are used to obtain the tendency of sentiment text. Here they have
collected 15,000 hotel comment text from web sites. The drawback of this paper is
the training period that takes a longer time.
Perera et al. [20] referred that opinion mining or sentiment analysis is such a
fine method to examine the comments, and it also classifies it polarity as positive,
negative, and neutral. Usually, opinion mining has three distinct degrees that are
document-based, sentence-based, and aspect-based. From the above three levels, the
author specifically focused on the aspect-based level. The proposed system holds
preprocessing, aspect extraction, dependence parser, and SentiWordNet. Here, the
author collected the data from the Zomato application were the collected reviews
of 100 restaurants. The passable accuracy of this system is 70%. In this paper, the
estimation result occupied from “testing manually” and “testing systematically,” and
Review Analysis Using Ensemble Algorithm 377
compared. In their future work, they have said that could improve the best approach
to discover the opinion word associated with aspect opinion level.
3 Proposed System
Sentiment analysis is a technique for analyzing subjective data within the text. This
is the method for extracting useful information from people’s feelings, thoughts,
and emotions about entities, events, and their attributes. Customers use and make
decisions about online shopping and decide based on the views of others have an
enormous effect on the product. A method using BiLSTM and naive bayes to analyze
the sentiment of reviews is proposed. The proposed system as shown in Fig. 1 consists
of four parts – input, preprocessing, classification, and output. The input data are the
comments received from the users so a user interface is created, and the dataset for
training and testing the model is uploaded in the database before implementation of
this model.
The data provided through the user interface get stored in the database as well.
The preprocessing is done once the data get loaded into the system. In the prepro-
cessing, the stop words, punctuations, and other supportive words which have no
value will be removed and the preprocessed data get into the classification process.
This preprocessing is accompanied by the BiLSTM algorithm. The classification of
data is done using the NBC algorithm. The output of this model is illustrated in the
form of a pie chart.
Database
Pre-processing
& Classification
The relation between the device and the user is the input design. It entails creating
data preparation specifications and procedures, as well as the steps required to convert
transaction data into a functional format for processing, which can be accomplished
by entering the data directly into the system. The input is created in such a way that it
offers protection and convenience while maintaining privacy. The dataset documents
were collected from the web site that we have created. It is a data collection, and
it can be in any format like, for example, HTML, CSV OR XML. The data that is
collected should be divided into phrases or tokens. Tokenization is the process of
filtering unnecessary words. It converts the sentence into words. We also designed a
user interface web application using Django to get real-time input.
3.2 Preprocessing
3.3 Classification
Naive Bayes uses a language model to assign class names sometimes, which can
be represented using mathematical strategies [9]. Naive Bayes is a conditional
opportunity model, based on the bayesian theorem.
P(B|A)P(A)
P(A|B) =
P(B)
• P(A|B) the probability of event A ocuuring, given event B has occurred
• P(B|A) the probability of event B occurring, given event A has occurred
• P(A) the probability of event A
• P(B) the probability of event B
“very bad,” “poor,” etc. Some special characters and invalid characters in the result
need to be deleted.
Before feeding our data on learning algorithms, we need to process it in advance.
In this database, we removed all punctuation marks and convert all characters into
lowercase letters and splited the database into a train and test setup. The product
update database is collected from the web site, and the database is intended for
analysis purposes.
In this work, a method of analyzing the feelings of the comments based on the
naive bayes algorithm is proposed. LSTM is an active neural network of sentence
formation with its ability to capture long-term dependence. BiLSTM uses forward
and backward to process sequences.
After this stage, the information that was prepared for classification was created.
This stage utilizes the naive bayes classifier to calculate the probability value from
a report to decide its class. The prior procedure was supposed to make classification
easier while also improving accuracy. Now, this is where machine learning comes.
In this process, supervised learning method was applied. We can employ a lexicon
(pre-classified set of words dictionary) or a bag of words to model the information.
When it comes to classifying data, an algorithm is the most important. The super-
vised learning method is used for testing and classification of the data we collected
to attain good accuracy. The naive bayes classifier was employed in this study as the
classification algorithm [9]. Classification technique based on bayes theorem it is
very easy to built and useful for very large dataset. The NBC is simple and effective.
The naive bayes classifier technique is a text document classification algorithm that
is often used. This technique is used to classify data, and it is simple and accurate.
After the preprocessing stage, the algorithm is applied.
The best output is which meets all the prerequisites of the end client and presents the
data. In any framework, results are conveyed to the clients and to other framework
through outputs. The item comments are gathered from the web site and that data is
considered for analysis.
In this work, the revamp of sentiment analysis using reviews based on BiLSTM
and naive bayes classifiers is proposed. LSTM is an effective neural network for
sentence modeling for its ability to capture long-term dependencies. BiLSTM uses
a forward and backward LSTM to process sequence. We are using Django to
import models, forms, application configurations, and NLTK for natural language
processing, matplotlib. We try to predict the positive (label 1) or negative (label 0)
sentiment of the sentence.
It is represented as a pie chart indicating the positive and negative. The chart also
represents the percentage of both the labels, and it performs analysis on real time
which is the biggest advantage of this work every comment given by the customer
got added into the analysis dataset; thus, those comments also get into count up to the
380 V. B. Shalini et al.
last analysis and we reviewed nearly 60,000 comments which include the training
dataset also. Table 1 shows the results of the analysis done.
A column chart is helpful and is utilized for a better understanding of the outcome.
In this representation, the positive is 1 and negative is 0 which are coordinated along
the horizontal lines and the number of comments is coordinated along the vertical
line.
5 Conclusion
In this paper, a method of analyzing emotions is suggested and used in the process
of analyzing ideas. Due to the lack of word representation in the current study, the
data collections of the combined data are integrated with the TF-IDF algorithm for
the calculation of term weight, and a new way of representing the word vector is
proposed based on term weight development. Also, the model fully analyzes contex-
tual information and can obtain a better textual representation of comments. Finally,
with a neural feed-forward network and a softmax map, a tendency to text sensi-
tivity is achieved. By using a comparative study of traditional methods of emotional
analysis, the accuracy of the proposed method of analysis is improved. However, the
comment analysis method takes longer in the training model. In the future, a way to
speed up the model training process can be studied.
References
1. Alattar F, Shaalan K (2021) Using artificial ıntelligence to understand what causes sentiment
changes on social media. IEEE
2. Amara S, Balaji K, Subramanian R, Akshith N, Murthy GN, Vikas M (2021) A survey on
sentiment analysis. IEEE
3. Long F, Zhou K, Ou W (2019) Sentiment analysis of text based on bidirectional LSTM with
multi-head attention. IEEE
4. Wijayanto UW, Sarno R (2018) An experimental study of supervised sentiment analysis using
Gaussian Naive Bayes. IEEE
5. Verma B, Thakur RS (2018) Sentiment analysis using lexicon and machine learning-based
approaches: a survey. Springer
6. Kumar KLS, Desai J, Majumdar J (2016) Opinion mining and sentiment analysis on online
customer review. IEEE
7. Kariya C, Khodke P (2020) Twitter sentiment analysis. IEEE
8. J. Ramakrishnan DM, Srinivasan K, Mubarakali A, Narmatha C, Malathi G (2020) Opinion
mining using machine learning approaches: a critical study. IEEE
9. Dhola K, Saradva M (2021) A comparative evaluation of traditional machine learning and deep
learning classification techniques for sentiment analysis. IEEE
10. Umer M, Sadiq S, Ahmad M, Ullah S, Choi GS, Mehmood A (2020) A novel stacked CNN
for malarial parasite detection in thin blood smear ımages. IEEE
11. Sadiq S, Mehmood A, Ullah S, Ahmad M, Choi GS, On BW (2021) Aggression detection
through deep neural model on twitter. IEEE
12. Soong HC, Rehanakbar, Norazirabint, Jalil IA, Ayyasamy R (2019) The essential of sentiment
analysis and opinion mining in social media. IEEE
Review Analysis Using Ensemble Algorithm 383
13. Surya Prabha PM, Subbulakshmi B (2019) Sentiment analysis using Naive Bayes classifier.
IEEE
14. Ullah MA, Marium SM, Begum SA, Dipa ND (2020) An algorithm and method for sentiment
analysis using the text and emotion. Elsevier
15. Nithyashree T, Nirmala MB (2020) Analysis of the data from the twitter using machine learning.
IEEE
16. Erfianjunianto, Ranchman R (2019) Implementation of Text mining model to emotion detection
on social media comments using particle swarm optimization and Naive Bayes classifier. IEEE
17. Wan Y, Gao Q (2015) An ensemble sentiment classification system of twitter data for airline
services analysis. IEEE
18. Ramdhani SL, Andreswari R, Hasibuan MA (2018) Sentiment analysis of product reviews
using Naive Bayes algorithm: a case study. IEEE
19. Xu G, Meng Y, Qiu X, Yu Z, Wu X (2017) Sentiment analysis of comment texts based on
BiLSTM. IEEE
20. Perera IKCU, Caldera HA (2017) Aspect based opinion mining on restaurant reviews. IEEE
A Blockchain-Based Expectation
Solution for the Internet of Bogus Media
Abstract Fake media, also known as the Web of dishonest media, has emerged in a
variety of areas of digital culture, including politics, media, and social networks. Due
to the frequency with which the media’s credibility is threatened, radical measures
are required to prevent further deterioration. IoFMT is becoming more common
with today’s artificial intelligence and deep learning developments; however, such
concessions to learning may be severely limited. In order to define ownership and
integrity of all digital output, it is critical to present evidence of its authenticity.
A blockchain is a digital ledger of distributed ledger technology. A promising new
decentralized safety platform has been proposed in order to assist in dealing with
the problem. In a data-driven environment, fake media’s technical component is
crucial although several blockchain-based solutions for authentication have been
presented. However, the majority of existing studies are based on irrational post-
incident beliefs. This proposal proposes a preventative approach for IoFMT utilizing
a blockchain-based solution, the suggested approach also incorporates a weighted-
ranking algorithm to identify the truthfulness of misinformation while providing
an incentive feature to encourage its dissemination. Although our approach focuses
on fake news, the platform can also be used to create other kinds of electronic
information. This position applies to a demonstration of the benefits of the solution
proposed.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 385
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_28
386 R. R. Singh et al.
1 Introduction
The phrase “fake information” was coined in 2001, and it is currently used online
365 percent more often than it was at the time of its creation. It is the responsibility of
media outlets to deal with misleading information, which is a sensitive and difficult
topic to address. The spread of fake news is becoming more prevalent in a number
of areas, including the economics, politics, and diplomacy, among other things, as
technology development continues to improve. This is something that does occur
from time to time, although it is rare. Thanks to a large number of free tools currently
accessible, it is now simpler than ever for individuals to create and falsify fake data.
Individuals may now generate and forge fake data more easily than ever before
because of the abundance of free tools available to them. A large proportion of the
population may benefit from the applications and technologies created by Song,
Kim, Hwang, and Lee since they are easily accessible to the public. Because of
the proliferation of social media networks, individuals are more prone than ever to
spread incorrect information, as shown in the graph below. Through their social media
networks, individuals are more likely than ever before to spread false information
more quickly, farther, and deeper than genuine news stories and information.
Although there are significant differences between countries, ethnic groups, and
individuals, the government is now attempting to limit the misuse of publicly
available information, both collectively and individually, as well as to prevent its
widespread distribution, despite the fact that there are significant differences between
countries, ethnic groups, and individuals. Human integrity cannot be compromised
under any circumstances, including in the face of erroneous information from news
media sources. The terms honesty, candor, and attentiveness come to mind when
thinking about the coming decade as well as the generation that will be more aware
of and cognizant of the issue as a whole. These first three words come to mind when
we think of the next decade and the generation that will be more alert and aware of the
issue than previous generations. Because of technological advancements, misleading
media contributes to the reversal of those objectives. Information may now travel
around the globe as fast as an infectious illness virus, which is a result of deceptive
media’s ability to mislead the public. In recent years, technological advancements
and human aspirations have joined forces to expand the scope of efforts to fight
the creation and dissemination of “fake news,” which has grown more common in
society. Recently, new techniques for producing and distributing misleading content,
such as text, have been developed and implemented on the internet in order to deceive
the public. In addition, there has been a rise in the use of visuals and motion pictures
to mislead the general public [1]. Fake news has been successfully combated with
the use of artificial intelligence, deep learning, and blockchain technologies, all of
which have been used in conjunction with one another to great effect. The area in
question has been the subject of some scientific investigation to date. Bitcoin has
emerged as one of the most innovative technologies to emerge in recent years, and
it is expected to continue to do so.
A Blockchain-Based Expectation Solution … 387
Blockchain technology, in its most basic form, ensures the integrity of transaction
data after it has been recorded on a distributed ledger network using cryptographic
methods, and this is known as proof of stake. As a result of this design, blockchain
technology is the most suitable technology for acting as the foundation for this
kind of business. As nodes in the network work together to create blocks, they are
also involved in block-related activities, both of which need consensus to be used
for the network to function properly. In a trustworthy environment, the suggested
blockchain-based approach for detecting and preventing false media content may be
successful in identifying and preventing fake media material. The implementation of
a consensus method to regulate bitcoin operations is suggested in order to preserve
openness while at the same time restricting the flow of cash. False information is
spread in a variety of ways, including via the use of false media. As part of our
assessment process, we are led by the game feature, which specifies what constitutes
good and bad behaviors in the context of a certain situation [2].
The following are the research’s major benefits:
It is more important to prevent fake news than only detect it.
• A solid evidence protocol specific to you
• Possible applications of blockchain technology outside the realm of finance [3].
In this section, we provide an overview of the Web of fake news as defined in the
paper, as well as all related work that has been presented in the context of fake news
detection, specifically in the blockchain space [4].
When dealing with a significant quantity of information and services, it may be chal-
lenging to preserve privacy and security at the same time. A false piece of information
is any piece of information that is not true in any manner, shape, or form whatso-
ever. The opposite of truthful statements claims that are just partially or completely
false in some manner. Content that is shared on social media or promoted via adver-
tising would also be appropriate distribution methods. It is possible that false media
information will cause physical harm, in addition to ambiguity, material correctness,
opinion influence, bad judgment, and voting habits. In the 2016 presidential election,
all of these variables were present, and they had all been anticipated to take place
before the election [5].
388 R. R. Singh et al.
When false news began spreading, new skills and ideas for fighting it were created,
particularly in light of the effect fake news had on the political and military sectors,
which resulted in unprecedented occurrences all over the globe. Using machine
learning methods that are based on human linguistic processing, it is feasible to draw
attention to linguistic patterns and evenhandedness in a specific language. It is the
end result of the whole procedure that a machine learning vocabulary is created by
merging two different classifier models with another classifier model to generate a
machine learning vocabulary [2]. For detecting false news stories, it was essential
to create a technique of this kind. In news organizations and social media platforms
alike, fake news is quickly becoming the most often utilized kind of propaganda. It is
also gradually becoming the most extensively disseminated type of propaganda [6].
According to the Bitcoin Foundation, bitcoin blockchains have the potential to be
used to identify false information on the Internet. The systems that utilize blockchain
technology to trace the sources of news items are distinguished from the systems that
do not use blockchain technology at all, according to our research. This framework,
which incorporates a distributed structure as a component of its design, makes smart
agreements as well as agreement techniques available to the public. With the use
of blockchain technology and a consensus process, it is feasible to enhance data
tracking, which will make it easier to double-check information in the future [4]. If
you want your media, information chains to be effective, you have to follow a set
of rules and regulations. The bulk of the study on this subject has focused on the
idea that social media platforms play a significant role in the dissemination of false
information, which is supported by the data.
However, there has been some debate over whether this is true. We will go into
more detail on the idea of decentralization, as well as the concept of Ethereum smart
contracts, further down on this page. The restaurant franchise has a strong presence
on social media sites, such as Facebook and Twitter. Auditor ratings are made public
to the press and the media at various periods throughout the year at different times.
In addition to a user-accessible rating that indicates the correctness/authenticity of
specific news, the item will be validated using a weight-based validation technique,
which will be explained in more depth later on. The validity of validators who are
located in the same geographical area as the validity of validators who are not are
given the highest weight. A non-profit organization was formed in order to fight the
spread of false information. Smart contracts are being used to streamline the process
of registering and publishing news items, which will ultimately save time and money.
The editors and authors of newspapers have filed a formal appeal with the federal
government. A public key and hidden pair with a public key and hidden duo with a
public-key cryptography are given to the publisher after passing an initial check [7].
A secret pair with a public key and hidden duo with a public-key cryptography is also
issued to the publisher after passing an initial check. In order to evaluate whether or
A Blockchain-Based Expectation Solution … 389
not an author can be trusted, a credibility score is assigned to them at the moment of
publication. This score increases over time when more information is made available
about the author. The P2P network was used to distribute the breaking news [8].
3 Model of Architecture
Given the less permissive nature of cryptocurrency systems such as Bitcoin and
Ethereum, as well as the difficulty in achieving consensus in the context of blockchain
technology, it has been highlighted as a subject for debate. Anyone with an Internet
connection may participate in the mining process and earn money as a miner, regard-
less of where they are physically situated. According to Nakamo, the most common
forms of distributed networks are proof of work and proof of stake, but there are
a number of other types of distributed networks as well, including (2008). It is
preferable to use protocols that are based on strong data rather than those that are
not. Evidence in the real world is a mathematical argument based on hashing that
is time-consuming to construct due to the fact that it is computationally costly to
compute. This has resulted in it serving as the backbone for each and every one of
the main bitcoin systems that are now in use, albeit at a high cost and with limited
performance scalability owing to the inherent constraints of the bitcoin protocol. Our
situation necessitates the use of evidence since it will assist in combating the spread
of false information.
Additionally, some proof-of-stake features are added, which reduces the need
to spend a large amount of energy verifying the blocks, which is beneficial. For
this, we use an algorithm to identify the blockchain members who have the most
interest in ensuring that the aforementioned blocks are checked, thus decreasing
the amount of energy spent verifying the blocks. There isn’t any question about
that, to be honest with you. At the same time, Byzantine fault, the mother of them
all, is also raising a little brood of its own. It has been shown that it is feasible
to develop algorithms that are fault-tolerant in certain situations. By looking at the
algorithms’ surface appearances, it seems like they operate in a cyclic manner. During
this period, mine management is carried out by a member of the governing party
who is in charge of proposing new mining blocks to be mined during this period.
In order to achieve agreement on important issues in a timely way, it is possible to
develop a procedure. In spite of the fact that PoA requires fewer message exchanges
than BFT, overall performance is improved as a consequence of the decrease in
communication overhead. Dinh and his associates were taken into custody. Despite
this, it is still uncertain as to what the real implications of such improved performance
will be, particularly in terms of cost. In a genuine ultimately synchronous network
architecture, the reliability and consistency of requirements are guaranteed (such as
the Internet). As part of our network’s development, we created our own version of
the principle of affirmative action, which is based on the concept that people who
have an economic stake in the network have an obligation to act in the network’s best
interests. Their most urgent issues have been addressed, and as a result, they have
390 R. R. Singh et al.
grown more motivated over time to see the system through to completion. Instead of
monetary compensation, conventional proof-of-stake methods are used in this case;
however, the stake is of a symbolic rather than monetary character. It is necessary to
know who performed the validation and when they finished it in order to complete the
validation process. Otherwise, the procedure will not be completed. “Identification”
will be used to refer to the presence of validators’ identities on a website where news
organizations may participate in a decentralized and dynamic manner in the capacity
of validators [4].
Media Solid
Authorised
Organizatio Evidence
Data
n Protocol
Evidence
Lying and
Media authorised
protocol
of the pieces of information included inside this entity. People who do not have
real-world news reporting experience will not be considered news organizations
since we are building together a well-established network of reporters who are
known for their dependability and professionalism. If their bids for inclusion in
the blockchain have been filed, news organizations such as CNN, the BBC, and
France24 will want to be included as well [4].
2. Authentication Data: When applying for registration, businesses will be asked
to provide certain information about themselves as well as confirmation that
they are news organizations. The kind of documentation that is utilized in this
scenario includes, for example, business licenses, which are samples of what is
required. In addition to newspapers and television stations, the media business
comprises radio stations and other types of broadcasting enterprises. Once the
content has been verified, the database will be made available for use by the
public. Participation by news organizations is strongly welcomed. As news orga-
nizations get authentified, cryptocurrencies such as bitcoin and smart contracts
are being utilized to make the process more convenient for everyone involved.
The nodes that have been classified have been highlighted in yellow. It is critical
to keep this in mind.
3. Solid evidence Protocol: As previously indicated, the unique solid evidence
protocol that in a consensus method for determining the credibility score of
False Press Things is also included in the paper, and it is described in detail.
The confirmation of a node or news organization gives them the opportunity to
request that their news be published. The ability to request news publication is
available to certain of these organizations, depending on their credibility score.
In the event that certain nodes are selected to join, they will be tasked with
determining whether or not a particular transaction is still in process. When
False Press Things are submitted for approval, they are subjected to a testing
procedure, during which validators evaluate if the transaction is real or not.
When distinguishing between transactions that are “genuine” and those that are
“fake,” and if they meet the Degree of Fakeness criteria, the transaction’s hash
on the chain is altered, allowing for more freedom in choosing what steps to
take next for publishing. The transaction will be labeled as such, however, if the
criteria are not fulfilled. This is because we are in the midst of creating an open
network for news outlets and will designate it as such if the requirements are
not met. Despite the fact that it is a “fake,” it is simple to track down the source
of the information.
4. Lying Media Things: False information may be spread via a variety of mediums,
including text, images, and videos, among others. The most important character-
istics of fake news are as follows: When it is feasible, things should be preserved
and improved upon. In order to successfully fight the issue, it is necessary to
maintain information on the kind of fake being dealt with, as well as sensitive
information on the individuals involved. Finding news sources would become
much simpler in the future as a result of this development. Despite the fact that
we are implementing a preventative approach, there remains a backlog of risks
that must be addressed in the interim. In order to detect fraud in the future and
392 R. R. Singh et al.
validators in the future will be required to honor this value if they arrive at an approval
value that is equal to or greater than the value already in use by a validator approval
node in the meanwhile. The findings of our investigation will be explored in more
depth in the following sections. A technique of validating proof of concept (PoC)
papers has been created, which includes a credibility score as well as a way of
evaluating them. Technique for calculating a score and validating it [4].
Publishers of bogus news will see a significant drop in their credibility score
Variables that are constant.
Static factor
• Geographical: Seeing that if a news company is closer to the area of the worried
news, it could suggest that the news is more credible and that it is simpler and more
likely to interact. As a result, when other organizations rate news, the evaluator
will have a weighted average take advantage of the assessment
• Media Truthfulness: The contract status will rise or fall depending on the findings
of the algorithm employed to serve as a guide. To validate the truth of the provided
news, an agreement must be reached [13].
When evaluating validators, they are given a credibility score, which is used to
determine whether or not they are competent to serve in this capacity. Participants in
the validation process will be given an invitation, and they will have the opportunity
to accept or reject the request at their discretion. As soon as the choice of the desired
participant is obtained, the algorithm is put into action. Depending on the outcome,
the system will either elevate the user to the position of validator or continue to
welcome the next individual to the system. Then, as can be seen in the image, it
will do an upgrade. The first algorithm is as follows: The anticipated number of
invitations varies based on the overall reputation of the ecosystem, but it is always
in the neighborhood of 100. The use of a guarantee to ensure that the aggregate
credibility score of all validators exceeds 50% of the total rating is one such method.
An excellent illustration is an order to become members of the network, the top
X individuals must be chosen based on the order in which their trust ratings are
spread across the network. As soon as the individual who has extended it declines an
invitation, an invitation will be extended to the next person on the list, depending on
their degree of trustworthiness. This method is utilized to bring in new participants
to the event by inviting them through email. When the method is invoked, the inputs
sumValCred and reqSumCred are used to complete the task. It is possible to go to
additional variables after starting with the sumValcred variable, which reflects the
overall trustworthiness of all validators in the environment.
The function reqSumCred is in charge of calculating the total amount of validator
credit that is needed for the request. In accordance with the usage case, the project
manager will make choices. If the sumValCred value is less than the required sum,
then the loop will exit. If it is more than the required amount, the loop will continue.
Regardless of whether or not this is correct, it shows that the validator’s group is
still accepting new participants. Upon receiving a new invitation, the most eligible
individual on the non-validator list will be contacted, and a variable called Decision
will be used to keep track of the authorized user’s option. Accepting an offer is feasible
for a user; in this instance, the system will assign the user to execute validation tasks,
A Blockchain-Based Expectation Solution … 395
and the new sumValCred will be raised by the amount of new sumValCred. It is given
a credibility grade based on his or her experience. After the algorithm has received
the choice from the most recently welcomed individual, the pointer will be moved
to the next person who has been invited. The loops will determine if it is required to
continue with the next least eligible user on the non-queue validator, based on the
results of the previous choice. Pretend you’re in the following situation: We have a
total of 26 users in our ecosystem, who are denoted by the letters A B C D Z in the
following: User 1 through User 26. The names of the individuals are given in the
following order: According to our example, User A has the greatest credibility score,
which is 26, followed by User B with 25, and User C with 24, and so on until User
A gets the lowest credibility score, which is 24, and so on until User A is last on the
list.
User Z received a perfect score of one for his overall performance. An overall
credibility rating of 351 has been given to the system as a result of these findings.
As an example, consider the situation in which we want the holders to be completely
honest in this situation. They will have a total credibility score that is more than 50%
of the total trust in the whole network, or 175.5, whichever is greater. Soon after the
first cycle begins, the offer to act as a validator is sent to the biggest websites based
on their trust rating, where it remains until the total credibility score of the authorized
users reaches 175.5, which is the required minimum. If the users at the top have a high
level of trustworthiness, the rest of the users will have a high level of trustworthiness
as well. Similarly, if a validator refuses to become a validator and then subsequently
declines the invitation to join the list of validators as the next validator on the list, the
invitation will be issued to the next validator on the list until the minimum threshold
is met. The following validation tool is being decommissioned.
There are a number of reasons why validators may be removed at any moment,
including a decrease in the repute of a product or a business. Certain individuals
may consider spreading fake information or giving their assent to manufactured
information acceptable. The outcome of a connection with a non-validator or when
the validity of a non-association validator is in question. It will ultimately exceed the
credibility of the verifiers if it continues to develop at its current rate. In addition,
an invitation will be sent out in each of the aforementioned situations, as previously
stated. Following that, when the system is refreshed or updated again, the system will
identify and approve or reject the eligible participants before sending the information
along to the validators group for final revisions and approval time. Using the method
outlined in Algorithm 1, it is shown in Algorithm 2 how to choose new members
who have higher credibility scores and to eliminate those who have lower credibility
scores by following the approach demonstrated by the first [4]. Validation of the
addition and removal of data using an algorithm. As with Algorithm 1, it takes the
same set of inputs as Algorithm 1 and gives the same result as Algorithm 1. It also
seeks sum credentials, which is similar to Algorithm 1. The loop will next check to
see if any non-validators with a higher credibility score are available, and if they are,
the loop will continue to the next step. To be precise, the algorithm will continue in
the same way as Algorithm 1 until the non-validator with the lowest credibility scores
gets a higher score than the validator with the greatest credibility score. Algorithm
396 R. R. Singh et al.
2 will then repeat the process. The one who stands out the most among the others.
The loop will be followed by a second loop that will do an additional check to
identify whether or not there are any redundant validators in the code to be executed.
sumVa will have higher credibility than reqSumCred, and vice versa. As a result,
if the claim is right, then follows that the algorithm must likewise be accurate. In
this scenario, the validator’s identity will be deleted, and the sumValCred will be
reduced by the amount of credit earned by the validator. As a foundation for this
novel consensus approach, De Angelis et al. used the Clique consensus method,
which was first given by them. Because of our consensus-based approach, we can
improve efficiency while simultaneously reducing the amount of messages that are
delivered. When new news items are submitted, they will be sent to the relevant
recipients for distribution. Authenticity checks will be performed on transactions put
on the waiting list in order to verify that they are genuine. Following are the technical
specs for your information, which should be considered exhaustive [4, 13, 14].
Epoch: The consensus method follows the pattern of time epoch; each decision on
whether to broadcast or not should be made at the same time as the rest of the system.
Those validators from that point in time. A customized transition block is sent to
the system ahead of time to prepare for the next phase of advancement that turn’s
collection of validators.
Validators for each period are as follows: Each epoch will have a sum of 1N. The
validator with the best score was chosen out of two validators. The person with the
highest believability rating will be regarded as a leader [15]. The credibility score
determines the likelihood of each validator being selected. Those with a level higher
of trustworthiness are much more likely to be selected. For each validator, they can
only introduce a new block 1 + N. There are two blocks. If a validator approves a
bit of news, it will be unable to make a choice on the next bit of information 1 + N
by 2 people are watching the news. With the Information, the selected auditors will
be able to suggest a new block [2].
Judgment at each epoch: The validators with the highest perceived credibility will
be given the content from the line waiting in the order of the delivery date. In other
terms, the very first news item received will be delivered to the leadership, who will in
turn hand the item to the next validator who will do so for the rest of the team. Every
epoch, a validator has the option of approving, rejecting, or quitting. If the news has
been validated [16], when the next block with the Information has been released,
everyone in the environment will be able to see the news. If suggested, while the
verifier denies the new block, the news will be kept from the users. Because of this,
we will be designed toidentify each validator’s choice in order to prevent malicious
conduct, such as a deliberate move against another news organization. If the validator
has a negative feeling toward the authenticity of the item, then the media is returned
to the top of the page with a block. A line of people who are waiting to be considered
in the following epoch.
Fork: There is some delay because each ledger is physically separated from others.
The fork could occur throughout each era cycle. The rule that the verifier with the
A Blockchain-Based Expectation Solution … 397
most credibility has the top importance on the chain, however, resolves this issue.
As a result, throughout each epoch, the majority of the verifiers will place the blocks
with the most confidence first. Various news organizations may engage in malevolent
activities toward those news organizations. Each validator’s verdict is stored on the
blockchain. The users who benefit from bitcoin’s accountability can start a vote
against validators who are acting maliciously.
References
1. Douglas A (2006) News consumption and the new electronic media. Int J Press/Polit 11(1):29–
52
2. View at: Publisher Site|Google Scholar
3. Wong J (2016) Almost all the traffic to fake news sites is from Facebook, new data show
4. Thakral M, Singh RR, Jain A, Chhabra G (2021) Rigid wrap ATM debit card fraud detec-
tion using multistage detection. In: 2021 6th international conference on signal processing,
computing and control (ISPCC), 2021, pp 774–778, https://doi.org/10.1109/ISPCC53510.
2021.9609521
5. Lazer DMJ, Baum MA, Benkler Y et al (2018) The science of fake news. Science
359(6380):1094–1096
6. García SA, García GG, Prieto MS, Guerrero AJM, Jiménez CR (2020) The impact of the term
fake news on the scientific community’s scientific performance and mapping in the web of
science. Social Sci 9(5)
7. Holan AD (2016) 2016 lie of the year: fake news, politifact, Washington, DC
8. Kogan S, Moskowitz TJ, Niessner M (2019) Fake news: evidence from financial markets.
https://ssrn.com/abstract=3237763
9. Robb A (2017) Anatomy of a fake news scandal. Rolling Stone 1301:28–33
10. Soll J (2016) The long and brutal history of fake news. Politico Magazine 18(12)
11. Hua J, Shaw R (2020) Coronavirus (covid-19) “infodemic” and emerging issues through a data
lens: the case of China. Int J Environ Res Public Health 17(7):2309
12. Conroy NK, Rubin VL, Chen Y (2015) Automatic deception detection: methods for finding
fake news. Proc Assoc Inform Sci Technol 52(1):1–4
13. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media. ACM
SIGKDD Explor Newsl 19(1):22–36
14. Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science
359(6380):1146–1151
15. Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Persp
31(2):211–236
16. Rubin VL, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? using satirical cues to
detect potentially misleading news. In: Proceedings of the second workshop on computational
approaches to deception detection. San Diego, CA, pp 7–17
Countering Blackhole Attacks in Mobile
Adhoc Networks by Establishing Trust
Among Participating Nodes
1 Introduction
M. Shukla (B)
Department of Information Technology, Shri G. S. Institute of Technology & Science, Indore,
India
e-mail: mukul@sgsits.ac.in
B. K. Joshi
Electronics & Telecommunication and Computer Engineering, Military College of
Telecommunication Engineering, MHOW, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 399
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_29
400 M. Shukla and B. K. Joshi
2 Related Works
Saddiki et al. [4] proposed a scheme known as neighbors trust-based and includes
neighbor nodes’ participation to detect misbehaves nodes. Also, they presented a
study of security issues related to routing in MANET, specifically in OLSR protocol
and its exposure for cooperative Blackhole attacks. The feasibility of the scheme is
tested through NS-2. As future work, it can be extended for other types of attacks.
Keerthika et al. [5] presented a hybrid weighted trust-based artificial bee colony
2-opt algorithm. They have been used AODV to secure MANET from a blackhole
attack. This algorithm finds a secure optimal path, and for improvisation, they have
been used 2-opt as local search. It will use current solutions to generate new solutions.
For evaluation of the performance, PDR, hop sink, and end-to-end delay were used.
Mehetre et al. [6] proposed a routing scheme that is secure and trusted. For
selecting nodes and securing the WSN data packet, they have been used 2-step
security and dual assurance scheme. The cuckoo search algorithm plays an essential
role in this scheme as it provides a secure routing path for identified trusted paths.
Energy is used as a performance parameter.
Countering Blackhole Attacks in Mobile Adhoc … 401
Arulkumaran et al. [7] focused on MANET’s role in the military; for this, they
have focused on a fuzzy logic strategy to improve AODV performance. The certifi-
cate provided to only trusted nodes using fuzzy logic helps detect malicious nodes
in the proposed approach. As a future enhancement, it can be deployed for other
fields like emergency operations and PAN. Also, the improvement of throughput and
decrements of end-to-end delay value is expected from future work.
Singh et al. [8] use trust points of mobile along with the clustering technique.
For detection of the attack, the activities of clusters head are monitored by the trust
points, and in case of any Blackhole detection, it will generate an alert in the network.
As a future enhancement, it can be used to detect other attacks and improve accuracy.
Singh et al. [9] proposed a solution for attacks like a blackhole, wormhole, and
collaborative Blackhole. Trust values are used as a decision factor whose value lies
between 0 and 1. Value more than 0.5 will allow the node in the network else to block
the node. As a future enhancement, trust broadcasting and aggregation are needed.
Singh et al. [10] presented a solution that uses a trusted AODV routing protocol for
a collaborative Blackhole attack. Trust value, calculated using a hyperbolic tangent
function, is considered for finding the malicious node. As future work, trust value
can be used for finding other attacks also.
Hazra et al. [11] proposed a trust model with a different level of computations.
In the context of data forwarding, it will identify and isolate blackhole attackers.
This trust model can detect other types of attacks for the ad hoc network as a future
enhancement.
3 Blackhole Effect
This section includes the blackhole effect and the process of AODV in the network.
The blackhole attack is a type of data traffic attack in which one of the nodes behaves
as a suspicious node [12–14]. It works similarly as a Blackhole exists in-universe.
As the energy and matter disappear in the Blackhole in the universe in the same way,
packets disappear when they follow this malicious node’s route. This malicious node
attracts the packet by showing the route to its destination.
In Fig. 1, a network consists of 7 nodes (0-1-2-3-4-5-7), among which 0–5 totals
of six nodes are actual while node number 7 is a malicious node. In this network, a
minimum path is required to deliver the packet to its destination. It has been seen that
while passing through any of the nodes from 0 to 5, the value of hop count will be
2, but if it travels via node 7, hop count will be one. Here, hop count means distance
travel via packet, while moving from source to destination.
While using AODV, having multiple REPLY requests gives preference to the route
with the maximum sequence number. So, 0 is taken as a source, and five is considered
the destination, so node number 7 will show the maximum sequence number as it is
a malicious node that will pretend itself to be the destination. So, there is a need to
detect it.
402 M. Shukla and B. K. Joshi
4 Proposed Scheme
This section presents an trusted AODV routing protocol that secures route selection
with the trust. In this, neighbor plays an essential role as the trust values adjustment
depends on the node’s experiences with its neighbor. The proposed approach is
divided into three algorithms dependent on each other and explains the approach’s
working.
A. Status of Trust
In the proposed algorithm, the AODV and trust estimate function are embedded
together. Trust between the nodes and cooperation are the key factors responsible for
communication in MANET.
Input: A network having mobile nodes.
Output: An efficient route search and a possible blackhole attack on a network
scenario.
Procedure:
(1) Let us consider a network with several random nodes such as 20, 40, 60, etc.
(2) The REQ request is generated and waits for the TRIP reply to establish
communication between source and destination.
(3) Once it receives multiple replies, it will select the best communication route
based on the sequence number and hop count.
(4) Different nodes can be classified as UnTrusted, Trusted, and most trusted based
on neighbor’s trust and threshold values.
Countering Blackhole Attacks in Mobile Adhoc … 403
• UnTrusted: The UnTrusted node is the node having a low value of trust.
When a naive node arrives at the network scenario in a given scenario, its
association with other nodes is negligible. So, it is treated as an UnTrusted
node.
• Trusted: When the node receives some packets, then its trust level increases.
So, it is considered a Trusted node. Its trust value lies in between UnTrusted
and MostTrusted.
• MostTrusted: MostTrusted nodes have the highest trust value and are
considered the most reliable node. Here, high trust refers to the successful
transmission of packets with the neighbors.
(5) The trust estimation function returns the trust status of all the nodes based on
their reliable nature. The record of the status of a node is maintained in a table
known as a Trust Table.
(6) The Trust table works as follows—Whenever any node receives a packet, it
will refer to the table. Suppose a naive node joins the network that time, it is
measured as an UnTrusted node, which increases the possibility of the attack.
The Trusted node will be referred to in the absence of MostTrusted node, but
the UnTrusted node has never been chosen as an option.
B. Trust Calculation
Trust value is a decision-maker in the network, which helps to identify nodes as reli-
able or not. For neighbors, different threshold values decided to become Trusted,
UnTrusted, and MostTrusted. Threshold values for the UnTrusted, Trusted, and
MostTrusted is T ut , T t, and T mt, respectively. The trust calculation is described below.
Input: As input, it requires values of Route Request Success Rate (RREQS), Route
Request Failure Rate (RREQF), Route Replay Success Rate (RREPS), Route Replay
Failure Rate (RREPF), Data Success (DATAS), and Data Failure (DATAF).
Output: Trust value
Procedure:
(1) Value of t 1 and t 2 will be calculated as,
where
t 31 = (Route Request Success Rate − Route Request Failure Rate)/(Route Request
Success Rate + Route Request Failure Rate)
t 32 = (Route Replay Success Rate − Route Replay Failure Rate)/(Route Replay
Success Rate + Route Replay Failure Rate)
t 33 = (DATAS − DATAF)/(DATAS + DATAF)
(3) Intermediate values obtained in step 2 are used to calculate the final trust value
(FT).
FT = tanh(x)
Where
x = t1 + t2 + t3
This section includes the implementation details of the proposed approach. We have
used the NS-2 simulator to check the results of the simulation. A detailed study of
NS-2 is given below.
A. Experimental Simulator
Network Simulator-2 (NS-2) is an object-oriented and readily available simulator. It
is an OTcl script interpreter whose main components are simulation event scheduler,
which plays an essential role in tracking simulation time. It is responsible for the
action associated with the packet, which is pointed by the event. Another component
is object libraries and module libraries for network setup. OTcl is also written in C++
to reduce the event’s packet and processing time. These two languages, i.e., C++,
and OTcl, are connected through the TclCL link [15]. In the NS2 simulation, we
have applied the AODV protocol for routing [16].
In this paper, we have to apply the proposed approach on network simulator 2,
hardware for simulation work like processor, Intel(R) Core(TM) i3-6006U CPU @
2.00 Ghz, Installed memory (RAM), 8.00 GB (7.89 GB usable), operating system,
Ubuntu 18.10, 64-bit. Minimum Required hardware, 100 GB for the experiment.
Countering Blackhole Attacks in Mobile Adhoc … 405
B. Simulation Parameters
Table 1 gives a brief of experimental parameters and corresponding values. Three
routing protocols are used in the 1200 m * 1500 m, showing how the attacker’s
performance is affected.
The given network has used five scenarios like 20 nodes, 40 nodes, 60 nodes, 80
nodes, and 100 nodes. Our experimental simulation time is 100 ms. Initial energy,
transmission powers, response power, idle power, and sense of node power in terms
of a watt are essential parameters in the simulation. As a result of this experiment,
node 2, 4, 6, 8, 10 is malicious nodes while transferring a size 1024-byte packet.
406 M. Shukla and B. K. Joshi
Table 1 Experimental
Parameter Values
parameters
Simulator name NS 2.35
Protocol AODV, BAODV, TAODV
Nodes 20, 40. 60, 80, 100
Time 180 s
Type of traffic TCP and UDP
Size of packet 1024 bytes
Pause time 16
Size of scenario 1200 × 1500
Speed (maximum) 18 m/s
Malcious node 3, 6, 9, 12, 15
5.1 Result
This section has presented results generated by the proposed algorithm. Validate our
result in terms of throughput, end-to-end delay, and PDR. In the NS2 simulator, we
have considered scenarios that include a total node as 45 in-network.
The parameters are as follows:
Throughput: the data retrieval at the destination node in any time interval unit is
termed throughput [17]; it can be defined as Eq. 1.
received bytes ∗ 8
Throughput = kbps (1)
time of simulation ∗ 1024
Avg End-to-End Delay: The time a packet uses to reach the source to destination
is called the end-to-end delay [17]; it can be defined as Eq. 2.
1 N
Avg EE delay(ms) = (Rn − Sn ) (2)
N n=1
Packet Delivery Ratio (PDR): The data packets’ ratio sent to the data packets
received is termed the PDR [17]. Mathematically, it can be defined as Eq. 3.
packets recieved
PDR(%) = (3)
packets sent
PDR represents the ratio of packets received by the destination and generated packets
by the source.
Countering Blackhole Attacks in Mobile Adhoc … 407
Figure 3 shows that the Blackhole affected AODV network shows poor perfor-
mance, but AODV shows satisfactory results than trusted AODV. Packet Delivery
Ratio of trusted AODV is improving results by 40–50% as per compare BAODV.
B. End-to-End Delay
End-to-end delay is when the packet takes a while it moves to cover the distance
from the source to destination.
Figure 4 shows that delay increases in BAODV, but the TAODV value of delay is
low, which means better performance. End-to-end Delay of trusted AODV improves
results by 40–50% compared to BAODV in milliseconds.
C. Throughput
Throughput shows the number of packets successfully received per unit of time.
In Fig. 5, it has been shown that the value of throughput is almost similar for AODV
and TAODV, but in the case of BAODV, it shows poor performance. Throughput of
trusted AODV improves results by 40–50% compared to BAODV in terms of kbps.
Fig. 5 Throughput
6 Conclusion
MANET plays an essential role in several fields, so its security is a priority. This
paper consists of studying MANET security attacks like Blackhole and their resolve
problem using the trust-based method. The performance of the approach was eval-
uated using parameters like packet delivery ratio (PDR), end-to-end delay, and
throughput, which shows an average number of packets deliver. TAODV performs
well improves the network’s efficiency compared to BAODV.
In future, this study can be useful and can be used to build a trust-based system
for MANET for different types of attacks like wormhole attacks.
References
6. Mehetre DC, Emalda Roslin S, · Wagh SJ (2018) Detection and prevention of blackhole and
selective forwarding attack in cluster ed WSN with active trust. ClusterComputing, Springer
Science + Bussiness Media, LLC, part of Springer Nature 2018, pp 1–16
7. Arulkumaran G, Gnanamurthy RK (2017) Fuzzy trust approach for detecting blackhole attack
in mobile ad hoc network. Mobile Netw Appl, Springer Science+Bussiness Media, LLC, part
of Springer Nature 2017, pp 1–8
8. Singh M, Singh P (2016) Blackhole attack in MANET using mobile trust points with clustering,
© Springer Nature Singapore Pte Ltd. 2016 A. In: Unal et al (eds) SmartCom 2016, CCIS 628,
2016, pp 565–572
9. Singh U, Samvatsar M, Sharma A, Jain AK (2016) Detection and avoidance of unified attacks
on MANET using trusted secure AODV routing protocol. In: Symposium on colossal data
analysis and networking (CDAN), pp 1–6
10. Singh S, Mishra A, Singh U (2016) Detecting and avoiding of collaborative black hole attack
on MANET using trusted AODV routing protocol. In: Symposium on colossal data analysis
and networking (CDAN), pp 1–6
11. Hazra S, Setua SK (2014) Blackhole attack defending trusted on demand routing in ad-hoc
network. Smart Innovation, Systems and Technologies 28, © Springer International Publishing
Switzerland, pp 1–8
12. Vo TT, Luong NT, Hoang D (2019) MLAMAN: a novel multi-level authentication model and
protocol for preventing blackhole attack in mobile ad hoc network. Wireless Netw 25:4115–
4132. https://doi.org/10.1007/s11276-018-1734-z
13. Arulkumaran G, Gnanamurthy RK (2019) Fuzzy trust approach for detecting black hole attack
in mobile adhoc network. Mobile Netw Appl 24:386–393. https://doi.org/10.1007/s11036-017-
0912-z
14. Cai RJ, Li XJ, Chong PHJ (2019) An evolutionary self-cooperative trust scheme against routing
disruptions in MANETs. IEEE Trans Mob Comput 18(1):42–55. https://doi.org/10.1109/TMC.
2018.2828814
15. Issariyakul T, Hossain E (2012) Introduction to network simulator NS2, Springer Science +
Business Media, LLC, pp 1–20
16. https://tools.ietf.org/html/rfc3561
17. Uddin M, Taha A, Alsaqour R, Saba T (2017) Energy-efficient multipath routing protocol for
a mobile ad-hoc network using the fitness function. IEEE Access, vol 5, pp 10369–10381.
https://doi.org/10.1109/ACCESS.2017.2707537
Identification of Gene Communities in
Liver Hepatocellular Carcinoma: An
OffsetNMF-Based Integrative Technique
1 Introduction
S. M. M. Hossain (B)
Computer Science and Engineering, Aliah University, Kolkata, West Begal 700160, India
A. A. Halsana
Computer Science and Engineering, Jadavpur University, Kolkata, West Begal 700032, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 411
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_30
412 S. M. M. Hossain and A. A. Halsana
Dataset Preprocessing:
1. rlog Normalization
2. DE Gene Analysis
Normal LIHC
Samples Samples
Gene Coexpression
Network (GCN)
offset NMF
Integrated gene
communities (IGCs)
Fig. 1 The figure shows overall framework adopted in the present work
2 Method
This work focuses on two biological data types: gene expression profiles and protein-
protein interaction (PPI) data. Liver hepatocellular carcinoma (LIHC) RNASeq
dataset was obtained from The Cancer Genome Atlas (TCGA). It comprises of raw
414 S. M. M. Hossain and A. A. Halsana
counts of 56,493 Ensembl genes across 421 samples which are categorized into 371
tumor samples and 50 normal samples. Normalized gene expression was extracted
from the raw read counts using a regularized logarithm (rlog) transformation [1].
The edgeR [28] R/Bioconductor package was used here to analyze read counts from
RNASeq gene expression profiles to evaluate differential gene expression for it’s
efficient statistical negative binomial (NB) modeling on count data. 1964 differen-
tially expressed (DE) official genes were identified with adjusted p-value ≤ 0.01
by comparing the tumor and normal samples. The protein-protein interaction net-
work (PPIN) of the DE genes was obtained from the STRING webserver comprising
13,210 interactions. Normalized gene expression profiles of the DE genes along with
their PPI information were taken into account for further analysis.
A gene co-expression network (GCN) specifies the connection strength among the
participating genes indicating the correlation of their expression profiles. Initially,
we constructed the network represented by a symmetric adjacency matrix Ad jd×d ,
where each element, Ad ji j denotes the connection strength between the genes i and
j and d is the number of participating genes. We have used the Pearson correlation
of expression profiles for each pair of genes to compute their connection strength in
the present work.
Ai j = |Ad ji j |τ (1)
Later, the scale-free GCN was augmented through the topological overlap measure
(TOM)-based similarity measure that defines the relative inter-connection strength
between each two nodes considering their shared neighborhood [21].
The identification of gene co-expression modules has been carried out by CoExpNets
[6] that incorporates a refinement in the widely used weighted gene co-expression
network analysis (WGCNA) [21] framework via an additional processing step to
Identification of Gene Communities in Liver Hepatocellular Carcinoma … 415
1
Dis Sim I,J = (1 − cor(M E I , M E J )), (2)
2
where cor indicates the Pearson correlation coefficient. The dissimilarities computed
as above were then used to discover the GCN meta-modules by reapplying the average
linkage hierarchical clustering [20].
Juan et al. proposed CoExpNets [6] that uses a signed eigengene-based connec-
tivity k M E J (i) between the expression profile of each gene (i) and the J th eigengene
M E J , expressed as 1 − k M E J (i) as a distance measure to perform an additional
k-means clustering to discover refined modules from a GCN:
1
k M E J (i) = (1 + cor(E xi , M E J )), (3)
2
where, E xi refers to the expression profile of the gene i. CoExpNets uses a k-means
algorithm that initialize k cluster centroids with the obtained module eigengenes
(ME) through the previous WGCNA-based framework described above. Genes are
iteratively re-assigned to a new cluster to form the improved gene modules until a
stopping criterion (decrease in the number of misplaced genes) is met.
|E out | 1
CN P = , (5)
|E in | η
Milligan et al. proposed the TrCovW index which depicts the trace of within clusters
pooled covariance matrix. The optimal solution is obtained based on the highest
difference scores among the index hierarchy levels [24]. TrCovW score (sc.tr covw)
is defined using:
sc.tr covw = trace(covariance(Dm )), (6)
where Dm is the matrix of within-group dispersion for data clustered into m modules
and is defined as:
Identification of Gene Communities in Liver Hepatocellular Carcinoma … 417
m
Dm = (vi − ck )(vi − ck )T , (7)
i=1 i∈Ck
where vi is the ith d-dimensional feature vector and ck is the centroid for the
module Ck .
It has been among the most popular validity indices to be used in clustering context.
This criterion is monotonically increasing with solutions leading to lesser number of
clusters. Thus, the optimal number of modules are determined based on the maximum
value of second difference scores [9]. The TraceW score (sc.tracew) is computed by
This index was introduced by Friedman et al. to be used as a basis for a non-
hierarchical clustering technique [10] and the Friedman score (sc.Friedman) is
computed by
sc.Friedman = trace(Dm−1 Bm ), (9)
where numi is the number of objects in module Ck with x as the centroid of overall
data matrix.
In this work, we have used offset NMF approach by Badea et al. [4] as a consensus
clustering technique to integrate GCN modules and protein complexes. This method
is a modified version from standard NMF which uses simple multiplicative updates
based on a Euclidean distance to fit a model including an intercept. To execute the
NMF algorithm, initially, we prepared two module assignment matrices (C m×n ),
separately, one from the GCN modules (GCM) and the other from the PPIN modules
(PCM):
1 if gene [q] ∈ module [ p]
C pq = (11)
0 otherwise,
where, m denotes the number of modules (either in GCM or in PCM) and n refers
to the number of genes. We prepared a single combined module assignment matrix
from the above two matrices by horizontal concatenation and used it as an input
to the offset NMF method to discover gene meta-communities that incorporates
characteristics of both gene expression profiles and PPI information. The rank for
NMF decomposition was set to the optimal number of modules detected from the
gene co-expression network.
3 Result
In the present work, we have have selected top 1964 DE official genes in LIHC
following the method described in Sect. 2.1. Figure 2 presents the MA (ratio intensity)
plot indicating significantly up- and down-regulated genes. We found that the gene
‘REG3G’, ‘PGC’, ‘REG3A’, ‘REG1B’, ‘LGALS14’, ‘REG1A’, ‘CLPS’, ‘LIN28B’,
‘PAEP’, ‘DCAF4L2’, ‘PAGE1’, ‘COL2A1’, ‘PRSS1’, ‘CPLX2’, ‘MAGEB2’ were
the top 15 DE genes in LIHC from our edgeR analysis.
Fig. 2 The figure shows the mean-difference plot (MA plot) picturizing significantly up- and
down-regulated genes in LIHC with absolute fold change ≥ 2
Fig. 3 The figure shows the cluster dendogram obtained through dynamic tree cut and merged
dynamic method
420 S. M. M. Hossain and A. A. Halsana
a b c
Fig. 4 The figure shows the cluster validity scores for finding optimal number of clusters in LIHC
using a TrCovW, b TraceW and c Friedman indices
Additionally, we obtained 265 complexes from the PPIN of the DE genes through
the PC2P algorithm (Sect. 2.3). We discovered that the optimal number of modules for
gene communities in LIHC RNASeq data of DE genes is 7 using the TrCovW, TraceW
and Friedman cluster validity indices. Figure 4 shows the score of the TrCovW,
TraceW and Friedman cluster validity indices.
We have represented the identified GCM and PCM as two separate binary module
assignment matrices with 1964 × 4 and 1964 × 265 dimension, respectively. Finally,
we obtained 7 consensus modules by applying offsetNMF-based module integration
technique proposed in Sect. 2.5 that incorporates both gene expressions and PPI
information in LIHC. The discovered clusters contained 296, 257, 469, 738, 24, 122
and 58 genes, respectively. Figure 5 shows the first four integrated gene communities
(IGCs) depicting significantly up-, down-regulated and hub genes based on maximal
clique centrality (MCC) scores.
TFDP3 MDGA2
PAGE4 ISX DSCR4
MT3 BEST4 LY6K GABRG2
TINAG
REG1B CST1 DMP1
PAGE1
DCAF8L1GABRR1 GABRA3
ANO2 AMBN
WNT3A ANKFN1 DCAF4L2
TUBA3C CA10 DSCR8
PGC CPLX2 KCNJ9 OTOG DCAF8L2
LIN28B
SPARCL1 MAGEA11
SH3GL3 HS3ST4
HOXA13
TERT
KCNU1 PCSK1
IGDCC3 REG3A
EDDM3A Gene
CPA1SLC22A8 Community 4
SI MEP1A
Gene KIRREL2
CRHBP DNER
Community 2 CALCA AFP
IGFN1 SEZ6 BPIFB2
HHATL SCG3 GABRA2
NDST3
PRSS1 NELL1
ARHGAP36
SLC9A3 CLCNKA KCNC1 GRIN1
FAM163B CHGA ECEL1
DIO2
FGF23 LYPD1 LY6H
MARCO PRSS21
INS-IGF2 PRND PSCA
B4GALNT2
TM4SF20
NTM BMPER
ANGPTL7 CALY CEACAM7
IGF2BP1 CPA2
SCG2 DAPL1 GABRE NGB
SLCO1C1 MAG
FCN2 ALPI ZIC5
GP2 TMEM132A
SCNN1G THY1
ASB15 ZIC2 GABRD SLC26A6 SPP1 BPIFA1
MSLN
BMP10 CLEC1B
ADIPOQ NR0B1 CHGB TMEM151B CYP19A1
STAB2 GPC3
CLEC4G RBPJL COL2A1
MS4A10 PZP
MAFA GPR50
FCN3 CLPS DLK1TRIM71
CLEC4M
Up regulated DE genes
1 13 25 37 50
Fig. 5 The figure shows the discovered first four gene communities from NMF based integrated
clustering approach
Fig. 6 The figure shows the top two significant biological processes (BP), KEGG pathways, and
diseases genes association enrichment of the identified integrated gene communities (IGCs)
5 Conclusion
insights into biological networks. We found that the identified integrated gene com-
munities (IGCs) are highly associated with LIHC related biological processes and
pathways. This study may be further enriched by integrating other distinct omics data
and other modern machine learning algorithms. LIHC is one of the major primary
liver cancers worldwide causing millions of pre-mature deaths every year. Further
experimental study of the gene communities are needed to elucidate in-depth under-
standing about this deadly disease. Discovery of potential biomarkers and survival
analysis of the identified genes inside the gene communities may offer more precise
diagnosis and therapeutic remedies of LIHC at an early stage.
References
1. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Nat Pre-
cedings 1–1
2. Arista-Nasr J, Fernández-Amador JA, Martínez-Benítez B, de Anda-González J, Bornstein-
Quevedo L (2010) Neuroendocrine metastatic tumors of the liver resembling hepatocellular
carcinoma. Annals Hepatol 9(2):186–191
3. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K,
Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet
25(1):25–29
4. Badea L (2008) Extracting gene expression profiles common to colon and pancreatic ade-
nocarcinoma using simultaneous nonnegative matrix factorization. In: Biocomputing. World
Scientific, pp 267–278
5. Bai KH, He SY, Shu LL, Wang WD, Lin SY, Zhang QY, Li L, Cheng L, Dai YJ (2020)
Identification of cancer stem cell characteristics in liver hepatocellular carcinoma by WGCNA
analysis of transcriptome stemness index. Cancer Med 9(12):4290–4298. https://doi.org/10.
1002/cam4.3047
6. Botía JA, Vandrovcova J, Forabosco P, Guelfi S, Sa D, Hardy K, Lewis J, Ryten CM, Weale
M (2017) An additional k-means clustering step improves the biological features of WGCNA
gene co-expression networks. BMC Syst Biol 11(1):1–16
7. Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining
the relevant number of clusters in a data set. J Stat Softw 61(6):1–36
8. Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and
spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining.
SIAM, pp 606–610
9. Edwards AW, Cavalli-Sforza LL (1965) A method for cluster analysis. Biometrics 362–375
10. Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc
62(320):1159–1178
11. Gu Y, Li J, Guo D, Chen B, Liu P, Xiao Y, Yang K, Liu Z, Liu Q (2020) Identification of 13
key genes correlated with progression and prognosis in hepatocellular carcinoma by weighted
gene co-expression network analysis. Front Genet 11:153. https://doi.org/10.3389/fgene.2020.
00153
12. Hossain SMM, Halsana AA, Khatun L, Ray S, Mukhopadhyay A (2021) Discovering key
transcriptomic regulators in pancreatic ductal adenocarcinoma using Dirichlet process Gaussian
mixture model. Sci Rep 11(1):7853. https://doi.org/10.1038/s41598-021-87234-7
13. Hossain SMM, Khatun L, Ray S, Mukhopadhyay A (2021) Identification of key immune
regulatory genes in hiv-1 progression. Gene 792:145735. https://doi.org/10.1016/j.gene.2021.
145735
424 S. M. M. Hossain and A. A. Halsana
14. Hossain SMM, Mahboob Z, Chowdhury R, Sohel A, Ray S (2016) Protein complex detection in
PPI network by identifying mutually exclusive protein-protein interactions. Procedia Comput
Sci 93:1054–1060. https://doi.org/10.1016/j.procs.2016.07.309
15. Hossain SMM, Ray S, Mukhopadhyay A (2019) Identification of hub genes and key modules
in stomach adenocarcinoma using nsnmf-based data integration technique. In: IEEE 2019
international conference on information technology (ICIT), pp 331–336
16. Hossain SMM, Ray S, Mukhopadhyay A (2017) Preservation affinity in consensus modules
among stages of HIV-1 progression. BMC Bioinform 18(1):181
17. Hossain SMM, Ray S, Mukhopadhyay A (2020) Detecting overlapping gene communities dur-
ing stomach adenocarcinoma: a discrete nmf-based integrative approach. In: 2020 IEEE inter-
national conference on advent trends in multidisciplinary research and innovation (ICATMRI),
pp 1–6. https://doi.org/10.1109/ICATMRI51801.2020.9398458
18. Hossain SMM, Ray S, Tannee TS, Mukhopadhyay A (2017) Analyzing prognosis characteris-
tics of Hepatitis C using a biclustering based approach. Procedia Comput Sci 115(Supplement
C):282 – 289
19. Krstic J, Galhuber M, Schulz TJ, Schupp M, Prokesch A (2018) p53 as a dichotomous regulator
of liver disease: the dose makes the medicine. Int J Mol Sci 19(3):921
20. Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between
co-expression modules. BMC Syst Biol 1(1):1–17
21. Langfelder P, Horvath S (2008) Wgcna: an r package for weighted correlation network analysis.
BMC Bioinform 9(1):1–13
22. Li X, Wang X, Gao P (2017) Diabetes mellitus and risk of hepatocellular carcinoma. BioMed
Res Int
23. Masserot-Lureau C, Adoui N, Degos F, de Bazelaire C, Soulier J, Chevret S, Socié G, Leblanc T
(2012) Incidence of liver abnormalities in Fanconi anemia patients. Am J Hematol 87(5):547–
549
24. Milligan GW, Cooper MC (1985) An examination of procedures for determining the number
of clusters in a data set. Psychometrika 50(2):159–179
25. Omranian S, Angeleska A, Nikoloski Z (2021) Pc2p: parameter-free network-based prediction
of protein complexes. Bioinformatics
26. Ray S, Hossain SMM, Khatun L (2016) Discovering preservation pattern from co-expression
modules in progression of HIV-1 disease: an eigengene based approach. In: 2016 IEEE interna-
tional conference on advances in computing communications and informatics, ICACCI 2016,
Jaipur, September 21–24, 2016. IEEE, pp 814–820
27. Ray S, Hossain SMM, Khatun L, Mukhopadhyay A (2017) A comprehensive analysis on
preservation patterns of gene co-expression networks during Alzheimer’s disease progression.
BMC Bioinform 18(1):579
28. Robinson MD, McCarthy DJ, Smyth GK (2010) Edger: a bioconductor package for differential
expression analysis of digital gene expression data. Bioinformatics 26(1):139–140
29. Saitta C, Pollicino T, Raimondo G (2019) Obesity and liver cancer. Annals Hepatol 18(6):810–
815
30. Song E, Song W, Ren M, Xing L, Ni W, Li Y, Gong M, Zhao M, Ma X, Zhang X, An R (2018)
Identification of potential crucial genes associated with carcinogenesis of clear cell renal cell
carcinoma. J Cell Biochem 119(7):5163–5174. https://doi.org/10.1002/jcb.26543
31. Sun M, Song H, Wang S, Zhang C, Zheng L, Chen F, Shi D, Chen Y, Yang C, Xiang Z, Liu Q,
Wei C, Xiong B (2017) Integrated analysis identifies microrna-195 as a suppressor of hippo-
yap pathway in colorectal cancer. J Hematol Oncol 10(1):79. https://doi.org/10.1186/s13045-
017-0445-8
32. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network
analysis. Stat Appl Gene Mol Biol 4(1). https://doi.org/10.2202/1544-6115.1128
Machine Learning Based Approach
for Therapeutic Outcome Prediction
of Autism Children
C. S. KanimozhiSelvi, K. S. Kalaivani, M. Namritha, S. K. Niveetha,
and K. Pavithra
1 Introduction
Autism spectrum disorder is a condition affecting the early development of the child
and continued as a lifelong disorder [1]. The condition cannot be cured but can
improve with the appropriate therapy. These therapies may help the kid function and
participate in the community by reducing symptoms, improving cognitive ability
and daily living skills. It needs a proper therapy plan to accomplish success. The
condition can be controlled if treated at early stage [2]. Each autistic child varied
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 425
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_31
426 C. S. KanimozhiSelvi et al.
with signs and symptoms; require unique therapy plan for every child. As a result,
therapy plans are often interdisciplinary, include parent-mediated interventions, and
are tailored to the child’s specific requirements.
Behavioral intervention plans have emphasized the development of social commu-
nication skills, especially at young ages when children are naturally learning these
abilities, as well as the elimination of restricted interests and repetitive and problem-
atic behaviors. Occupational and speech therapy, as well as social skills training and
medication, may be beneficial for some children. Depending on an individual’s age,
skills, challenges, and characteristics, the optimum treatment or intervention may
change.
Due to the less availability of expert therapeutic decision system, choosing the
appropriate therapy for a particular child is always very difficult for the therapist.
Often the therapeutic cost becomes high and unaffordable to parents with weak
economic background.
Governmental institutions do not have sufficient number of therapists, special
teachers and other experts to handle the increasing number of autism children. It is
always very difficult for the smaller number of available special therapists to plan
the therapy and monitor the progress and outcome of the therapy for every child. The
parent-mediated therapy has the proven benefits and the being useful method with
the India like populated country. The parent-mediated therapy will be possible if the
therapeutic system available being user-friendly and useable with minimal education
and expertise and even by the children themselves [3].
The game based therapeutic application may help to improve the skills of the
autistic child. However, it could not become the real-life interaction, and the child
may found difficulty in applying the learned skills with the real-life situation. The
autistic children having the tendency to addict to visual sensory stimuli and tendency
to stick to the routine may prone to addiction with the therapeutic games itself
and found difficult to divert from the games. The parent-mediated therapy may not
possible if the parent themselves suffer from any mental difficulty, being working
parents having little time to spend with their disabled child, etc.
If the simple game based therapeutic tool available for early developmental period
of an autistic child will be of great help for the parents and will help the skill devel-
opment at the early developmental period than at the school age [4]. The therapist’s
burden can be reduced if an expert system is available to predict and suggest the
therapeutic plan and can measure the therapy progress and outcome of the therapy
for every autistic child, and further can predict the possible therapeutic plan for a
child coming with similar set of autistic features.
2 Literature Review
using emotion recognition system and closed-format questionnaire. Both the analysis
showed positive results in acceptance of the parrot-inspired robot and AMRM.
Linstead et al. [13] used the Applied Behavior Analysis (ABA) technique for
effective treatment for autism spectrum disorder (ASD) children. They studied and
evaluated the autism children’s’ treatment intensity and duration of the treatment
period of various treatment domains like social, cognitive, adaptive skills, academic,
language, motor, executive and play skills by using linear regression. The result
shows that academic and language domains are highly response for the treatment
intensity and duration. The domains are based on the Applied Behavior Analysis
(ABA).
Adam Mourad Chekroud et al. developed a 12 week citalopram course for
predicting outcome of symptomatic remission, which was used by machine learning
based algorithm with clinically rated antidepressant data. This system is supported to
evaluate disease risk, reappearance, or supportability of treatment. This model is most
effective to identify the future respondent’s for the person with depression, this system
validated improvement of depressive level on next 2 weeks. The model is trained 25
identified features by using Sequenced Treatment Alternatives to Relieve Depression
(STAR*D) and COMED trials, and validation is done with ten-fold cross validation.
This technique is not suitable for prediction of non-responder in the medication and
this model outcome prediction is suitable only for basic level.
Linstead et al. [14] demonstrated the benefits of machine learning techniques to
predict learning outcome of behavioral therapy for children with autism spectrum
disorder (ASD). The Applied Behavioral Analysis (ABA) therapies depends on chil-
dren behavioral principles of learning, motivation, stimulus control, generalization
and reinforcement. This system supports to select high-intensity ABA treatment or
low intensity ABA treatment. In the neural network, the linear regression model
has best suitable to identify the learning outcome prediction dataset. This creates the
correlation between age, gender, treatment hours and intensity of the ASD treatment.
Dvornek et al. [15] focused on selecting best therapy for the particular ASD
affected children and avoids ineffective intervention outcomes in loss of money and
intervention time. The Pivotal Response Treatment (PRT) therapies supported to the
motivation and self-initiation which is more time consuming to training ASD children
and care takers. So, this machine learning techniques random forest and tree bagging
models are used to suggest the best therapy from large amount of available therapies
to use and get better outcome by using visual based fMRI images.
3 Flow Diagram
Figure 1 represents the flow diagram of predicting the outcome of therapy given
to the autistic children. There are different phases namely collecting data, building
model, allocating treatment plan and predicting the therapy improvement. Initially,
the dataset which is the form of rows and columns that are gathered from special
schools and autism centers, are converted to a CSV (Comma-Separated Values) file.
Machine Learning Based Approach for Therapeutic Outcome … 429
Then, clustering of autistic children is done with each cluster containing children with
most similar symptoms. This will be done using K-Means clustering algorithm. Then,
treatment will be allocated based on the cluster formed. Then, improvement will be
predicted based on the scores obtained from the therapy using Pearson correlation
algorithm.
430 C. S. KanimozhiSelvi et al.
4 Module Descrıptıon
The data has been collected for children with autism spectrum disorder. The data
collected was the answers from parent’s questionnaire. A total of 84 children partic-
ipated, and based on the data collected from their parents, they were clustered into
five groups. The clustering has been done by K-means algorithm. It was to separate
children with most similar symptoms into different categories. After the clustering
of children into groups, they were suggested therapy in the form of games. A game
is proposed that would measure and also improve the severity of symptoms in the
children, in their listening skills [16]. This game would consist of three parts, which
help in different issues that arise with ASD affected children in the listening skill
area. A child affected by ASD can lack what other normal children do, in specific
ways that are mentioned by the parents’ questionnaire, by severity that ranges from
mild to highly severe. The severity can be determined by the time it takes for the
child to respond to name-calling, verbal commands and looking toward the source,
whether or not the child reacts to extraneous sounds, or whether the child can coor-
dinate other senses as well have a response to listening. Based on the results from
the game score database, the game scores are correlated with those scores from the
initial parents’ questionnaire, which will then be used to determine how far a child
has progressed or regressed [17].
5 Data Collection
Real-time dataset has been collected from SSA schools. Figure 1 is the sample
dataset. The source of dataset is parent questionnaire [18] this is to be filled by the
parents. This questionnaire contains 14 categories of questions with subdivision in
each category. The 14 categories are communicative skill, listening skill, social skill,
non-verbal skill, imitation, use of objects and interest, emotional response, visual
response, body movement and use level of activity, senses, adaptation to change,
fear and nervousness and intellectual and functional language skills. These questions
are specific to symptoms that are identified in the autism affected children and are
categorized to skills list accordingly. The answers to these questions are obtained in
the form of rating scale [1–4] which increases with increase in severity. Further the
result of questionnaire is converted into required dataset.
The dataset, which consists of the score of the 84 children, now would be clustered
into five groups. The clustering is necessary to identify children with closely relatable
symptoms, and so, prescription of therapy would be made easier. Figure 2 is the visual
Machine Learning Based Approach for Therapeutic Outcome … 431
7 Game Applıcatıon
Here, simple game based therapeutic tool available for early developmental period of
an autistic child will be of great help for the parents and will help the skill development
at the early school age. On account of this, a game is developed to enhance the
listening capability in children which is a category in the parent questionnaire. It was
designed to help literate or non-literate children with ASD aged between six [19]
and twelve to increase their listening skill.
Three different types of game are developed where each game maps improve the
symptom that is questioned to identify the level of listening ability in the autistic child.
The first game of application has the category of things like vegetables, fruit, shapes,
animals, colors and birds. Children can select any of the categories mentioned, and
they will be directed to the next page where sound of an animal or else will be played
and total of three pictures of specifies category will be displayed. After listening to
the sound, children have to choose the appropriate picture. Here, sound can be played
any number of times. If the children choose the correct picture, score will be added
but the score will be reduced based on the number of times sound played. Full score
will be given only when the kid finds the correct at the first try [Fig. 3].
The second game consists of audio that is mixed with noises that play in the
middle of smooth music. The participant has to identify if they were distracted by
the noise, for which the game would record a timestamp. Depending on the time, it
432 C. S. KanimozhiSelvi et al.
took for the participant to recognize that there is a noise, the severity of autism for
listening skill would be determined, and score is recorded [Fig. 4].
The third game consists of a set of pictures, and the corresponding sounds related
to them. The audio would consist of sounds that relate to the pictures given, in any
order. The participant must choose if the pictures in the same order as they hear the
clips from the audio file. Finally, the score is recorded [Fig. 5].
Progress of treatment can be measured frequently using the score gained by chil-
dren and based on that extra care will be given to the specified children. Finally,
the Game score will be calculated and converted to the format of the dataset that
was obtained before the clustering of children and allocation of treatment. Using
these two sets of datasets obtained, outcome of the treatment will be predicted using
covariance.
This game will be helpful for both treating the autistic children and also to measure
the enhancements in the treatment given (Figs. 6–8).
The activity of children in the game will be monitored, and the therapy progress will
be measured as numerical score. The numerical score obtained three different parts
of the game is converted to values in the scale between 1 and 4 which is equivalent
to the scale in the parent questionnaire test data obtained before involving the child
in the therapy. The parameters considered in score evaluation are the total number
of attempts, the score of child in the before treatment dataset (Fig. 1) and number of
correct and wrong answers by the child. In the report page, the highest and recent
score of individual game is displayed, and the next displayed is the severity level of
the child as far as listening skill is considered (Fig. 6).
434 C. S. KanimozhiSelvi et al.
Pearson’s relationship coefficient gives an approach to assess how well two arrange-
ments of information are identified with one another, X versus Y on a diagram.
The direct relationship you are searching for includes utilizing a straight line
(y = mx + b), however, a connection coefficient can be determined utilizing various
recipes like polynomials.
Data that is collected after involving autistic child in games relevant to skills in
which the child is deficit is considered. The new dataset must be correlated [20]
with the original dataset that was collected at the beginning of the therapy. Pearson’s
correlation coefficient has been used to correlate between the initial stage of the chil-
dren, and the progress that has been made after game was suggested. The Pearson
correlation algorithm is applied to the two datasets, and it creates the correlation
coefficient for each of the data as shown in Fig. 9. Here, negative correlation coef-
ficient indicates that the treatment plan has proven successful in treating the child
[Fig. 10]. This is done to continually monitor the progress made by the children, and
whether or not it has created an impact on them. If the current plan doesn’t work,
another plan would be suggested.
436 C. S. KanimozhiSelvi et al.
10 Conclusion
Autism spectrum disorder is a condition affecting the early development of the child
and continued as a lifelong disorder. However, the condition cannot be curable fully
but can improve with the appropriate therapy. It needs a proper therapy plan to
accomplish success. Each autistic child varied with signs and symptoms; require
unique therapy plan for every child. Due to the less availability of expert therapeutic
decision system, choosing the appropriate therapy for a particular child is always very
difficult for the therapist. Often the therapeutic cost becomes high and unaffordable
to parents with weak economic background. The Governmental institutions do not
have sufficient number of therapists, special teachers and other experts to handle
the increasing number of autism children. It is always very difficult to plan the
therapy and monitor the progress and outcome of the therapy for every child due
to a smaller number of available special therapists. The parent-mediated therapy
Machine Learning Based Approach for Therapeutic Outcome … 437
has the proven benefits and being useful method for most populated country like
India. Parent-mediated therapy will be possible if the therapeutic system available as
user-friendly can be useable with minimal education and expertise and even by the
children themselves. If a simple game based therapeutic tool is available for early
developmental period of an autistic child will be of great help for the parents and
will help the skill development at the early developmental period than at the school
age. The therapist’s burden can be reduced if an expert system available to predict
and suggest the therapeutic plan and can measure the therapy progress and outcome
of the therapy for every autistic child, and further predict the possible therapeutic
plan for a child coming with similar set of autistic features.
In this paper, a game application has been developed considering only listening
skill. Further therapeutic game tool has to be developed for other 13 categories of
skill considered in parent questionnaire. By allowing several autistic children to use
the tool developed and by monitoring the progress, more statistical data of outcome
of the treatment will be collected, processed and be used to train the machine learning
model to suggest more accurate treatment plan in future.
438 C. S. KanimozhiSelvi et al.
References
1. Bernardes M, Barros F, Simoes M, Castelo-Branco M (June 2015) A serious game with virtual
reality for travel training with autism spectrum disorder. In: 2015 International conference on
virtual rehabilitation (ICVR). IEEE, pp 127–128
2. De Urturi ZS, Zorrilla AM, Zapirain BG (July 2011) Serious game based on first aid education
for individuals with autism spectrum disorder (ASD) using android mobile devices. In: 2011
16th International conference on computer games (CGAMES). IEEE, pp 223–227
3. Hiniker A, Daniels JW, Williamson H (June 2013) Go go games: therapeutic video games for
children with autism spectrum disorders. In: Proceedings of the 12th international conference
on interaction design and children. pp 463–466
4. Malinverni L, Mora-Guiard J, Padillo V, Valero L, Hervás A, Pares N (2017) An inclusive design
approach for developing video games for children with autism spectrum disorder. Comput Hum
Behav 71:535–549
5. Bhatt SK, De Leon NI, Al-Jumaily A (2017) Augmented reality game therapy for children
with autism spectrum disorder. Int J Smart Sens Intell Syst 7(2)
6. Hoque ME, Lane JK, El Kaliouby R, Goodwin M, Picard RW (2009) Exploring speech therapy
games with children on the autism spectrum
7. Phytanza DTP, Burhaein E (2019) Aquatic activities as play therapy children autism spectrum
disorder. Int J Disabil Sports Health Sci 2(2):64–71
8. Boyd LE, Ringland KE, Haimson OL, Fernandez H, Bistarkey M, Hayes GR (2015) Evaluating
a collaborative iPad game’s impact on social relationships for children with autism spectrum
disorder. ACM Trans Accessible Comput (TACCESS) 7(1):1–18
9. Grossard C, Grynspan O, Serret S, Jouen AL, Bailly K, Cohen D (2017) Serious games to
teach social interactions and emotions to individuals with autism spectrum disorders (ASD).
Comput Educ 113:195–211
10. Pennisi P, Tonacci A, Tartarisco G, Billeci L, Ruta L, Gangemi S, Pioggia G (2016) Autism
and social robotics: a systematic review. Autism Res 9(2):165–183
11. Wieckowski AT, White SW (2017) Application of technology to social communication
impairment in childhood and adolescence. Neurosci Biobehav Rev 1(74):98–114
12. Bharatharaj J, Huang L, Mohan R, Al-Jumaily A, Krägeloh C (2017) Robot-assisted therapy
for learning and social interaction of children with autism spectrum disorder. Robotics 6(1):4
13. Linstead E et al (2017) An evaluation of the effects of intensity and duration on outcomes cross
treatment domains for children with autism spectrum disorder. Transl Psychiatry 7(9):e1234
14. Linstead E, et al. (2015) An application of neural networks to predicting mastery of learning
outcomes in the treatment of autism spectrum disorder. 2015 IEEE 14th International
conference on machine learning and applications (ICMLA). IEEE
15. Dvornek NC, et al. (2018) Prediction of autism treatment response from baseline fmri using
random forests and tree bagging. arXiv preprint arXiv:1805.09799
16. Kasari C, Gulsrud A, Freeman S, Paparella T, Hellemann G. Longitudinal follow-up of children
with autism receiving targeted interventions on joint attention and play. J Am Academy Child
Adolesc Psychiatry 51(5):12
17. Omar KS, Mondal P, Khan NS, Rizvi MRK, Islam MN (2019) A machine learning approach
to predict autism spectrum disorder. In: 2019 International conference on electrical, computer
and communication engineering (ECCE). pp 1–6. https://doi.org/10.1109/ECACE.2019.867
9454
18. Fletcher-Watson S, Pain H, Hammond S, Humphry A, McConachie H (2016) Designing for
young children with autism spectrum disorder: a case study of an iPad app. Int J Child-Comput
Interact 7:1–14
Machine Learning Based Approach for Therapeutic Outcome … 439
19. Wei X, Wagner M, Christiano ER et al (2014) Special education services received by students
with autism spectrum disorders from preschool through high school. J Spec Educ 48:167–179
20. Usta MB, Karabekiroglu K, Sahin B, Aydin M, Bozkurt A, Karaosman T, Aral A, Cobanoglu C,
Kurt AD, Kesim N, Sahin İ (2019) Use of machine learning methods in prediction of short-term
outcome in autism spectrum disorders. Psychiatry Clin Psychopharmacol 29(3):320–325
An Efficient Implementation of ARIMA
Technique for Air Quality Prediction
Abstract Among all the natural resources that are required for survival of living
things, air is the most vital one. Therefore, for their existence, good quality of air is
very much essential. But day by day at an alarming frequency air is getting polluted.
Increased industrialization as well as use of vehicles and machines to a great extent
is some of the reasons behind air pollution problem. Change in nature’s life cycle and
disruption in the life cycle of human beings are the outcome of air pollution. Because
of pollution in air, short term and long-term health effects are being faced by all human
beings. Therefore, the utmost alarming concern for all of us is the air pollution. This
problem of air pollution can be conquered by progression in research work as well
as by use of many machine learning methods efficiently. Auto-regressive integrated
moving average (ARIMA) model is used to predict air quality or air pollution based
on machine learning techniques. In this paper, ARIMA model is implemented by
using Python, and result proves that implemented method is better in predicting the
air quality.
1 Introduction
Polluted air and its impact among the critical challenges are faced by humanity due
to globalization, accelerated industrialization, and urban development. Urbanization
is one of the root causes of air pollution growth, which has a major impact on public
health. The main air pollution metric is PM 2:5. These small and light particles are
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 441
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_32
442 R. Patil et al.
2 Related Work
here for air quality and/or pollution prediction are random forest regression, tree
egression decision, multi-layer perceptron regression, and gradient boosting regres-
sion. MAE and RMSE have been used as parameters for the comparison of regression
model dependent on processing time and sample size [9]. In addition, it was also
determined in order to figure out the superlative model considering the processing
time as well as the least fault rate time needed for processing for each form. The best
methods of random forest regression which uses the four different approaches are
listed below.
Decision tree, random forest, help vector machine [10], etc. algorithms are used
from tests. As per the experimental tests, the probability of accuracy increases with
the rise in the number of attributes [11]. When compared with individual model,
combined model’s outcome is higher. Among the various classification and regression
methods used for predicting air quality index (AQI) of main contaminants including
PM10, O3 , PM2.5, NO2 , CO, and SO2 . ANNs and help vector regression are best
suited for forecasting the air quality of New Delhi. Mean absolute error, Rd2, and
mean square error have been used to evaluate these methods [12]. BLSTM and IDW
techniques have been proposed for spatiotemporal forecasts of air quality at numerous
time granularities [13]. The LSTM network’s forecast results are delivered at hourly,
regular, and once in a week granularity and numerous periods of time. As per the
results of experiments, better predictive performance for concentration of PM2.5 can
be achieved with the proposed model [14]. The WNN model developed for PM2.5
concentration in the short-term forecast has some strong features compared to other
models [15, 16].
Objectives of the ARIMA model are listed here.
1. Examination of factors affecting the air quality and increasing the air pollution.
2. Recommending and creating a method for developing a procedure for measure-
ment of air pollution quickly and accurately.
3. Use of AI, ARIMA model to train the system air quality can be identified.
4. It recommends a system that suggests a method to decrease the variables
affecting air quality.
5. Verification of the results obtained against certain machine learning techniques
through analysis.
6. Planning a representation model in highly dense areas having confirmed or
assumed but still low air quality.
7. Suggesting most reliable information for assessing the population at risk from
revelation to poor air quality.
3 System Architecture
ARIMA prediction model with time series with external feedback as the technique of
machine learning for forecasting of air quality is utilized here. For writing code, we
have chosen Python script, and the dataset used here belongs to Delhi city. Figure 1
given below shows the overall basic system architecture [17].
444 R. Patil et al.
4 Implementation
(1) Pictorial representation of attribute values affecting air quality from dataset of
9350 records (Fig. 3).
(2) References used to implement AIR quality prediction (Fig. 4)
(3) Following references are used for implementing AIR quality prediction:
AQI categories are PM10 (24 h.), PM2.5 (24 h.), NO2 (24 h.), O3 (8 h.), CO
(8 h.), SO2 (24 h.), NH3 (24 h.), and PB (24 h.) [24].
Ranges are as follows:
• For PM10 (24 h.)—range of 0 to 50 is good; 51 to 100 is satisfactory; 101
to 250 is moderately polluted; 251 to 350 is poor; 351 to 430 is very poor,
and 430 above is severe.
• For PM2.5 (24 h.)—range of 0 to 30 is good; 31 to 60 is satisfactory; 61 to
90 is moderately polluted; 91 to 120 is poor; 121 to 250 is very poor, and
250 above is severe.
• For NO2 (24 h.)—range of 0 to 40 is good; 41 to 80 is satisfactory; 81 to
180 is moderately polluted; 181 to 280 is poor; 281 to 400 is very poor, and
400 above is severe.
• For O3 (8 h.)—range 0 to 50 is good; 51 to 100 is satisfactory; 101 to168 is
moderately polluted; 169 to 208 is poor; 209 to 748 is very poor, and 748
above is severe.
An Efficient Implementation of ARIMA Technique for Air Quality … 447
Fig. 2 Implementation
process
Fig. 3 Pictorial representation of attribute values affecting air quality from dataset of 9350 records
Fig. 4 Time series dataset with different 15 attributes affecting air quality
An Efficient Implementation of ARIMA Technique for Air Quality … 449
6 Conclusion
A detailed review of Delhi’s 24-day predictions of air quality, i.e., RH value predic-
tions are rendered and improved by machine learning process. Datasets qualified and
used are 100% for the check or prediction of next 24-days air quality. ARIMA time
Series model algorithms are being adapted with various features, and model forecast
results are estimated in machine learning processes. The probability of increasing
precision improves with an improvement in usability according to the proof of exper-
imental tests. All 15 datasets attributes are used during training. As compared to the
limited dataset model for method aspect, the result from a large model with big
dataset is better.
450 R. Patil et al.
7 Future Scope
By means of combining air quality input dataset and human health dataset, we can
forecast is the environmental conditions are good or will upsurge the danger of viral
infections on the human body.
References
1. Saba A, Asghar MN (2017) Comparative analysis of machine learning techniques for predicting
air quality in smart cities. IEEE. https://doi.org/10.1109/ACCESS.2019.2925082
2. Bo Liu L (2016) Forecasting PM2.5 concentration using spatio-temporal extreme learning
machine. In: 2016 15th IEEE international conference on machine learning and applications.
Beijing, China
3. Pooja B (2019) Air quality prediction using machine learning algorithms. https://doi.org/10.
7753/IJCATR0809.1006
4. Kong T, Wang Y (2017) Air quality predictive modeling based on an improved decision tree
in a weather-smart grid. https://doi.org/10.1109/ACCESS
5. Weizhen L (2018) Air pollutant parameter forecasting using support vector machines. City
University of Hong Kong, Hong Kong, Department of Building and Construction
6. Li X (2017) Long short-term memory neural network for air pollutant concentration predictions,
method development and evaluation. Elsevier: Environ Pollut 231(2017):97e1004
7. Amado TM (2018) Development of machine learning-based predictive. Proceedings of
TENCON 2018–2018 IEEE region 10 conference. pp 28–31
8. Ayyalasomayajula H (2016) Air quality simulations using big data programming models. In:
IEEE second international conference on big data computing service and applications 2016
9. Kuo J (2019) Deep learning-based approach for air quality forecasting by using recurrent
neural network with gaussian process in Taiwan. In: 2019 IEEE 6th international conference
on industrial applications and engineering
10. Li X (2017) Long short-term memory neural network for air pollutant concentration predictions:
method development and evaluation. Environ Pollut 231(Pt 1):997–1004
11. Song L (6–11 July, 2014) Spatio-temporal PM2.5 prediction by spatial data aided incremental
support vector regression. In: 2014 international joint conference on neural networks (IJCNN).
Beijing, China
12. Zhang S. Prediction of Urban PM2.5 concentration based on wavelet neural network. 978-1-
5386-1243-9/18/$31.00_c 2018 IEEE
13. Lee MH (2012) Seasonal ARIMA for forecasting air pollution index: a case study. Am J Appl
Sci 9(4):570–578
14. Salemdawod A (2017) Water and air quality in modern farms using neural network. ICET2017.
Antalya, Turkey
15. Qi Z, Deep air learning: interpolation, prediction, and feature analysis of fine-grained air quality.
IEEE Trans Knowl Data Eng
16. Soundari AG, Jeslin JG, Akshaya AC (2019) Indian air quality prediction and analysis using
machinelearning. Int J Appl Eng Res 14(11):181–186. ISSN 0973-4562
17. Ip F (2010) Forecasting daily ambient air pollution based on least squares support vector
machines. In: Proceedings of the 2010 IEEE international conference on information and
automation. Harbin, China
18. Septiawan WM (2018) Suitable recurrent neural network for air quality prediction with
back propagation through time. In: 2018 2nd international conference on informatics and
computational sciences (ICICoS)
An Efficient Implementation of ARIMA Technique for Air Quality … 451
19. Zhang C (2017) Early air pollution forecasting as a service: an ensemble learning approach.
In: 2017 IEEE 24th international conference on web services. Beijing, China
20. Xia Xi, “A Comprehensive Evaluation of Air Pollution Prediction Improvement by a Machine
Learning Method,” in 2015 IEEE International Conference on Service Operations And
Logistics, And Informatics (SOLI), Beijing, China, 2015.
21. Azzouni A, Pujolle G, NeuTM: a neural network-based framework for, LIP6/UPMC. Paris,
France
22. Tapale MT, Goudar RH, Birje MN et al (2020) Utility based load balancing using firefly
algorithm in cloud. J Data, Inf Manag 2:215–224. https://doi.org/10.1007/s42488-020-000
22-2
23. Rijal N (2018) Ensemble of deep neural networks for estimating particulate matter from images.
In: 2018 3rd IEEE international conference on image, vision and computing
24. Desai NS, IoT based air pollution monitoring and predictor system on Beagle bone black
A Survey on Image Emotion Analysis
for Online Reviews
Abstract Emotions are sentiments, opinions, and feelings which are expressed by
the public through text, images, and videos. Opinion analysis for the Internet data is
now attracting a growing research people provide feedback on the Internet through
the reviews and images in different platforms like Instagram, Facebook, Twitter, and
other online Websites on the products. Major work was implemented for processing
the sentences. Finite amount of the research that focuses on analyzing opinions of
image information. Image emotion topics will be ANPs, i.e., adjective noun pairs
manually concealed tags for Internet imagery those helpful of predicting opinions,
or else emotions convey by the people in terms of pictures. The main aim is to predict
emotions of the images which are not label. To raise this issue, deep learning methods
are utilized for opinion analysis of images, since the deep learning techniques have
the capabilities for successfully understanding the behavior of images.
1 Introduction
Currently, public provide more information on social Websites through images about
the places, products, or restaurants they visit every day or emotions depict in the
form of Mojis, pictures, and videos. Analyzing and processing the information like
this from social Websites or photo-sharing networks like Flickr, Twitter, Instagram,
Snapchat, etc., provide insight into the common emotion of the public about. Also, it
would be useful to know the emotion of an image depict to manually predict emotional
tags on—like happiness, sad, etc., significance or post with visual information often
G. N. Ambika (B)
Department of CSE, BMSIT & M, Yelahanka 560064, India
e-mail: ambikagn@bmsit.in
Y. Suresh
Department of CSE, BITM, Ballari 583104, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 453
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_33
454 G. N. Ambika and Y. Suresh
consists a short textual explanation or no wordings at all. So, the visual characteris-
tics express most of the people opinion or sentiment in these types of the contents.
Further, images can overcome language boundary and are easier to Interpret. Figure 1
shows several pictures which are collected from the different social networks where
various types of emotions like happiness and sadness are articulated. A study on
image emotion analysis is still in the initial stage. Interpreting the emotion from the
image is difficult because of the various causes. Image opinion analysis involves the
ability to identify object, view, action, and people judgment. Generation of charac-
teristics from the images to predict the opinion includes a large amount of human
attempt and period. Contrarily, deep Learning models include a huge quantity of
training information which is complicate to gather the images. Deep learning is a
subfield or a method of machine learning that makes the machine intellectual’s suffi-
cient, the computer which is able to study through the knowledge and recognize the
world of concepts. Machines obtain information through the existing world experi-
ence, without the help of human beings to make system understand the entire state
or to make decisions. The word “deep” defines the measuring the neural networks
hidden layers. This representation is trained by taking huge amount of data which is
labeled and uses the architecture called as neural network architecture that makes the
characteristics and parameters will be taken directly from the provided information
not including any individual interference. Deep learning plays an important respon-
sibility for image opinion analysis by providing different methods are convolutional
neural network (CNN), deep neural network (DNN), region neural network (RNN),
and deep belief network (DBN).
Deep learning is defined as a outline which produces precise learning param-
eter for classification of images. The main aim of manuscript to study and analyze
the various techniques of deep study architecture specifically; deep neural network
(DNN), convolution neural network (CNN), region neural network (RNN), fast R-
CNN. The Sect. 2, in manuscript, describes study implemented till now for picture
opinion analysis by means of the above mentioned methods. The Sect. 3 in the
2 Related Work
Several Investigators explore different methods for analyzing the image emotions,
and the results of machine learning algorithms are significant. Among various
machine learning methods, the techniques which rely on the deep learning are best
for image opinion examination. This section provides few important studies which
are done by the investigators by means of deep learning methods, along the results.
Deep neural network will be applied both for Image emotion analysis and textual
opinion analysis. Neural network has many layers, primarily imagery will be given
to input layer, then processed to produce the results via outcome layer. Between these
layers, many hidden layers at hand for further processing an input image, because
of many hidden layers for additional processing an input image, because of many
hidden layers in neural network, called deep neural network. Machine understands
the image in the form of a rows and columns; every pixel holding some value, pixel
defines activations. Every neuron is interconnected another neurons; activation of
primary layer determines activation for further layer. The main aim is to link the
pixels of image into edges; edges into the sub patterns, at last join the recognized
patterns to form an image for analyzing the emotions.
This paper [1] has presented different techniques that explain an image consistency
method to resolve whether the image data and the text information are dependable
with each other. This manuscript includes the models like convolution neural network
which is suited for image emotion analysis. Deep belief networks are utilized for data
which are not labeled and to beat the limitation of unlabeled images. Their studies
convey that deep learning methods are good as compare to supervised vector machine
(SVM).
This study [2] has implemented new BDMLA which explains bidirectional atten-
tions plus multilevel associations among image and text information for classifica-
tion. This method is focused on demonstrative image regions related. This study has
used the social network images for emotion analysis.
456 G. N. Ambika and Y. Suresh
Convolutional neural network defined as feed forward neural network mainly applied
in image processing, classifying the images and prediction of images. One of the key
applications is the image and visual analysis. A series of operations are implemented
for image emotion analysis with convolutional neural network. Convolutional neural
network consists convolutional layer follows nonlinear layer which is followed by
pooling layer then fully connected layer. The first layer of convolutional neural
network image classification convolutional layer where imagery will be given as a
input. Reading an image will begins from top left corner, image is converted into
a matrix which is known as filters. There are many convolutional layers. When the
image will be send to convolutional layer and output of one layer is input to next layer.
Second layer is nonlinear layer in pipeline, there is an activation function provides
the CNN a nonlinear behavior. After the nonlinear layer, there is a pooling layer
which decreases the workload by dropping the characteristics of an image size if
provided image is of large size. If any of characteristics were previously recognized
in model during earlier convolution operation. Then, it is not further processed for
further classification, this process is called down sampling or subsampling. After the
pooling layer if result still expected, then a thought of fully connected layer will be
provided. Results from convolutional networks will be considered (Fig. 2).
This paper [3] describes the image text regularity which is determined using
multimodal emotion analysis technique; this technique identifies the relationship
among the images beside the sentences. They have used the SentiBank for describing
visual concept.
In this paper [4], deep multimodal attentive fusion (DMAF) is used. This describes
the discriminative sorts plus the interior relationship concerning visual represen-
tation. Here, they have presented a technique that contains convolutional neural
network architecture for text and image emotion analysis to perform multimedia
emotion analysis. They have considered images from the Twitter and Tumblr. The
information contains both +ve and −ve reviews.
This study [5] implemented the opinion analysis foe text sentiment as well as
image. This model based on Facebook datasets. Dataset consists of both positive
as well as negative data. Technique implemented provides comparison of the text
between CNN and SVM. They concluded that convolutional neural network has
performed effectively best compare to the machine learning techniques.
This paper [6] implemented a model using the convolutional neural network to
extract some characteristics from image and classify image, according to the behavior
and features in proper class. Create various neural networks to train the model for
examination of pattern and experienced performance.
The main aim for R-CNN of considering an input image provides list for bounding
boxes as results, every bounding box consists of object and also category (e.g., car
or pedestrian) of object. Currently, R-CNN has been absolute for performing other
computer vision responsibilities. Subsequent concepts cover few versions of R-CNN
that has been implemented. R-CNN provides an input image; R-CNN starts by using
a mechanism known as selective search for fetching the regions of interest (ROI),
where each ROI is a bounding box that may correspond to boundary for object in
image analysis.
The Table 1 provides the comparisons between the different techniques which
are used for text and image classification. The deep learning technique provides the
good accuracy for image opinion analysis.
3 Conclusion
Classification of images into different classes like happy, sad, and neutral is a difficult
task, multiple factors can be considered. Currently, many new and helpful images
classification techniques are evolving and investigators examine them in terms of
image classification accuracy and time efficiency. A main insight of a particular
image.
Classification target is to choose a perfect technique. Various methods may have
various performances in different tasks. To check the most optimized technique,
investigators should choose the correct data type, data size, and expected outcome.
In common, the classification system will be designed based on the dataset chosen.
Table 1 Literature survey
458
Sl. No. Author and year Proposed work/algorithm Merits Demerits Observation
used
1 Udit Doshi (2021) Adopted convolutional Implemented for Happiness, surprise, Deep learning techniques
“Emotion detection and neural network model classification of images for sadness, anger, and fear are can be used to predict and
sentiment analysis of static social networks not predicted improve the accuracy
images”
2 Zhao [4] “An image-text Multimodal adaptive This method exploits an Not described the interior Using deep learning
consistency driven sentiment analysis method image consistency method relationship among image method, characteristics of
multimodal sentiment and convention SentiBank to resolve whether the and semantic text contents. image and text can be
analysis method for social methods are used image data and the text Tested only on input of extracted
media” information are dependable social media. Dataset
with each other considered is small
3 Huang et al. [5], Two separate unimodal This model knowledgeable This model is not designed Design of an additional
“Image–text sentiment models were proposed operative sensation for learning the sensible deep perfect for
analysis via deep classifiers for graphic and multimodal characteristics learning the multimodal
multimodal attentive fusion textual modality tic is more real for image characteristic is required.
2019” correspondingly text sentiment exploration Finding the fine granularity
relationship among image
pair and text combines is
required
(continued)
G. N. Ambika and Y. Suresh
Table 1 (continued)
Sl. No. Author and year Proposed work/algorithm Merits Demerits Observation
used
4 Xu et al. [6], Has proposed a new To achieve the bidirection Exploring the consequence Proposing the
“Visual-textual sentiment bidirectional multilevel attentions plus multilevel of social networks among discriminative sorts and the
classification with attention (BDMLA) associations among the social imagery on opinion inner relationship among
bidirectional multilevel technique is proposed image and textual study of public pictures is images
attention networks” information for emotion required
classification,
this prototype focused on
demonstrative image
regions related to the
agreeing text narrative, a
visual thoughtfulness grid
is planned
5 Ortis et al. [7], 2020, Addressed delinquent of Adventures the associations Cannot handle images of Deep visual representations
“Exploiting objective text image opinion study among visual and textual different structures to be considered.
description of images for converging on the valuation structures inter-related to Examining the assignment
A Survey on Image Emotion Analysis for Online Reviews
Sl. No. Author and year Proposed work/algorithm Merits Demerits Observation
used
6 Ye [8], “Visual-textual Tucker fusion method is A new deep tucker fusion Not addressed on all the Interconnected
sentiment analysis in used technique addressed the discriminative mage-sentence data four
product review” difficult of visual-textual characteristics kinds of structures should
sentiment analysis be subjugated
7 Yang [9], “Sentiment Proposed a new emotion Analyzed customer Be able to merely share This study of the opinion
analysis for e-commerce analysis model SLCABG feedback, we can provide emotion into positive and fineness classification of
product reviews in Chinese model assistance to merchants on negative classifications, images is required
based on sentiment lexicon e-commerce stages to which is not appropriate in
and deep learning” achieve user feedback in areas with extraordinary
time to advance their necessities for sentiment
examination eminence and refinement
fascinate extra customers to
utilize
8 Kausar [10], “A sentiment Proposed a sentiment Ability to progression It has a disadvantage of Clarification of dissimilar
polarity categorization polarity categorization different kinds of textual handling diverse styles aspects of consumer
technique for online technique statistics such as cynicism analyzes on merchandise
product reviews” excellence can be
considered
G. N. Ambika and Y. Suresh
A Survey on Image Emotion Analysis for Online Reviews 461
References
1. Doshi U, Barot V, Gavhane S (2020) Classification of images for social networks. In: 2020
IEEE international conference on convergence to digital world—quo vadis. ICCDW
2. Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for
sentiment analysis. In: ACL
3. You Q, Cao L, Jin H, Luo J (2016) Robust visual-textual sentiment analysis: when attention
meets tree-structured recursive neural networks. In ACM MM
4. Zhao Z, Zhu H, Xue Z, Liu Z, Tian J, Chua MCH, Liu M (2019) An image-text consis-
tency driven multimodal sentiment analysis method for social media. Inf Process Manag
56(6):102097
5. Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image–text sentiment analysis via deep
multimodal attentive fusion. Knowl-Based Syst 167:26–37
6. Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Visual-textual sentiment classification
with bi-directional multi-level attention networks. Knowl-Based Syst 178:61–73
7. Ortis A, Farinella GM, Torrisi G, Battiato S (2020) Exploiting objective text description of
images for visual sentiment analysis. Multimedia Tools and Appl 1–24
8. Ye J, Peng X, Qiao Y, Xing H, Li J, Ji R (2019) Visual-textual sentiment analysis in product
reviews. In: 2019 IEEE International conference on image processing (ICIP). IEEE, pp 869–873
9. Yang L, Li Y, Wang J, Sherratt RS (2020) Sentiment analysis for e-commerce product reviews
in Chinese based on sentiment lexicon and deep learning. IEEE Access 8:23522–23530
10. Kausar S, Huahu X, Shabir MY, Ahmad W (2019) A sentiment polarity categorization technique
for online product reviews. IEEE Access
11. Pang B, Lee L et al (2008) Opinion mining and sentiment analysis. Found Trends R Inf Retrieval
2(1–2):1–135
12. Bollen J, Mao H, Pepe A (2011) Modeling public mood and emotion: twitter sentiment and
socio-economic phenomena.” In: ICWSM
13. Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals.
In: WWW
14. Ren Y, Zhang Y, Zhang M, Ji D (2016) Context—sensitive twitter sentiment classification using
neural network. In: AAAI
15. D’Avanzo E, Pilato G (2015) Mining social network users’ opinions’ to aid buyers’ shopping
decisions. Comput Hum Behav 51:1284–1294
An Efficient QOS Aware Routing Using
Improved Sensor Modality-based
Butterfly Optimization with Packet
Scheduling for MANET
S. Arivarasan (B)
Sathyabama Institute of Science and Technology, Chennai 600119, Tamil Nadu, India
S. Prakash
Department of Electronics and Communication Engineering, Bharath Institute of Science and
Technology, BIHER (Deemed To Be University), Chennai 600073, Tamil Nadu, India
S. Surendran
Department of Computer Science and Engineering, Tagore Engineering College, Chennai 600127,
Tamil Nadu, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 463
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_34
464 S. Arivarasan et al.
1 Introduction
2 Literature Review
Energy aware ad hoc on-demand multipath distance vector routing protocol was
introduced by Kokilamani and Karthikeyan [10] as a novel technique for path selec-
tion strategy utilizing energy factor. The proposed system aims to address the afore-
mentioned issues by choosing energy aware nodes along the path. The node is only
evaluated during the route selection process if it has a value greater than less energy
threshold data. NS2 simulator is used to evaluate the developed model, and the results
are significant [10].
Chen et al. [11] created a topological change adaptive ad hoc on-demand multipath
distance vector routing protocol that can adjust to high-speed mobility node while
maintaining quality of service. A stable path selection method is planned in this
protocol that takes into account not only resources of node (residual energy, available
bandwidth, and queue length) but also link stability probability between nodes as
path selection parameters. Moreover, in order to adapt to fast topology changes,
the protocol includes a link interrupt prediction mechanism that modifies routing
strategy depending on periodic probabilistic estimations of stability of link. On the
NS2 platform, various scenarios with node speeds ranging from 10 to 50 m/s, data
rates ranging from 4 to 40 kbps, and node counts ranging from 10 to 100 are simulated.
The findings demonstrate that when the speed of the node is greater than 30 m/s,
the proposed protocol’s quality of service metrics (PDR, E2E delay, throughput)
are considerably improved, but they are greater when the speed of node is less than
30 m/s [11].
For MANET, Saravanan et al. [12] devised a PSO-based method based on expected
transmission count measures. The current technique illustrates that when repeating
computations that are related to network load, the supplier should select the optimal
way. As a result of the unusually large number of transmissions, it has an important
effect on the system’s efficiency. PSO method is used to improve trail choice by
integrating ETX metrics. The ETX metric values are used in the PSO-based method
to scale back the transmissions which are used to efficiently packet’s deliver to
specified destinations. ETX only considers an entire range of transmissions when
PSO selects the best route. Outcomes of simulation demonstrate that the PSO-ETX
strategy outperforms with regard to delivery ratio, time delay, and throughput [12].
In MANET, Kasthuribai and Sundararajan [13] developed a safe and QoS-based
energy aware multipath routing. Particle swarm optimization-gravitational search
method is proposed for multipath route selection. This method selects network’s
466 S. Arivarasan et al.
energy-efficient multipath routes. A route’s link quality may degrade after a certain
amount of transmissions. So, using the cuckoo search method, that is, dependent on
cuckoo behavior, an optimal path is chosen from the network’s existing paths. The
designed work’s performance measurements demonstrate that it improves energy
efficiency and network lifespan [13].
A unique dynamic time-division multiple access scheduling approach for
MANETs was proposed by Ye et al. [14]. To begin, a service priority-based dynamic
TDMA scheduling method is described, which uses service priority as a reference
parameter for slot assignment while also taking into account transmission throughput
and E2E delay performance. A MD-CCH approach is also provided to improve the
frame structure for better slot utilization across the system. The unique strategy is
created by combining the SP-DS and MD-CCH approaches. Simulation findings indi-
cate that suggested method performs better with regards to slot usage, slot allocation
competence, E2E delay, and transmission throughput [14].
3 Proposed Methodology
Packet transmission
Performance evaluation
Let us consider the problem space as a graph G = (V, E), nodes are represented by
vertex, all the nodes in the network are represented as V. A link is indicated by edge
that exists between two nodes, and set of all links is indicated by E. When d ≤ r ,
there is a two-way link e(e ∈ E) between them. P indicates set of paths from (s ∈ V )
to destination D(D ∈ V ). Collection of entire edges and collection of entire nodes
of a path p( p ∈ P) is represented as E( p) and N ( p), correspondingly. TC indicates
the quality of service of a path on T from s to destination d.
Each radio link has its own cost, which includes things like energy, reliability,
bandwidth, static resource capacity, quality, and delay. The MANET’s mobile nodes
are first configured in chosen changing surroundings. Mobile nodes serve as both
host and router. Network region is where the link among mobile nodes is established.
Mobile nodes’ coordinates are calculated, allowing the location and mobile nodes’
velocity to be determined in future. m is the number of nodes in the MANET that
is indicated as 1 < i < m. The suggested multipath selection scheme’s second stage
is path discovery. The number of links among nodes connected with the relevant
communication channel determines path from source to the destination mobile node.
Assume P be total number of paths connecting source and destination nodes and is
indicated as 1 < j < P.
The QoS routing issue is modeled as an optimization problem whose prime goal is
to determine optimal paths by taking into account of energy, reliability, bandwidth,
static resource capacity, quality, and delay.
Energy: Node’s energy is computed for each potential path in MANET. Because
the MANET is battery-powered, it uses less energy, resulting in higher performance.
For improved communication between mobile nodes, the energy parameter should be
set to its maximum value. The energy associated with nodes also affects the network’s
lifetime. The energy function is defined as follows:
1
mn P
R energy = En i j (1)
mn ∗ p i=1
j =1
i∈ j
Here, 1 < i < mn mobile nodes number is 1 < j < p is paths number, En i j is
every node’s energy in the resulted route. Energy value is defined as
j j
energy En i j = PTi p ∗ En iT X + PRp ∗ En R X (2)
468 S. Arivarasan et al.
hop p
Rp = Rlinkl (4)
l=1
where Rlink is the link reliability and l is the total available links.
Bandwidth: The bandwidth calculated as
Static resource capacity (SRC): The packet queues size Pq (M B), CPU’s
speed PC PU (G H z), battery power Mb (mW ), and maximum presented band-
width BandWidth(kbps) are described as node’s static resource capacity (SRC). SRC
is computed through
S RC = γ ∗ Pq + λ ∗ PC PU + β ∗ Mb + α ∗ Bandwidth (6)
Here, γ , λ, β, and α are characteristics of node which are weighted, their total is 1.
Quality: The equation shown for computing the link quality between node N and
its 1-hop neighbors:
M1hopRe
quality = (7)
T M1hopSent
n
(received time − sent time)
delay = (8)
N =0
N
1
p
F(x) → Fitness = T Ci (9)
p i=0
An Efficient QOS Aware Routing Using Improved Sensor Modality … 469
f = cI a (10)
470 S. Arivarasan et al.
where f i is the fragrance apparent level, that is, how more grounded the fragrance is
smelled by ith butterfly, c is the sensory modality, I is the objective function (node
fitness), and a is the power exponent dependent on modality that accounts changing
absorption levels. Global search and local search stage are the two most important
phases in the algorithm. This is equivalent to claiming that fragrance levels rise in an
admired area. A butterfly will produce a fragrance that may be noticed from anywhere
in the region this manner. In the initial (global) search stage, nodes make a stride
toward the fittest node g* that may be depicted as
xit+1 = xit + r 2 × g ∗ − xit × f i (11)
where xit is the solution vector xi for ith nodes in iteration number t. Where g ∗
indicates present best solution (node) identified within entire solutions (all the nodes)
in the present stage. ith butterfly’s fragrance is indicated as f i and r is an arbitrary
value between [0, 1]. Neighborhood (local) search stage is formulated as
xit+1 = xit + r 2 × x tj − xkt × f i (12)
where x tj and xkt are jth and kth nodes from the search space. When x tj and xkt have
a comparable swarm and r is a arbitrary value between [0, 1] then (12) turns to
neighborhood random walk. Butterflies may search for food and a mating partner on
a neighborhood and worldwide scale. So, as part of BOA, a switch probability p is
utilized to switch among standard worldwide quests and local search.
Improved Sensor Modality-based Butterfly Optimization (ISMBO) Algorithm
In conventional butterfly optimization algorithm, static value of sensory modality c is
used for searching process of butterfly optimization algorithm. However, it does not
perform well. Theoretically, a large value of c will enable the butterflies to explore
new search space; however, it will have adverse effect on the convergence toward
global optimum solution. Whereas if a small value of c is used, the results will be
bad. This means c has great effect on the searching abilities of the butterflies and
if the value of c is modified according to the requirements of stage of optimization
process, it will prove beneficial to the performance of the algorithm. Hence, in this
work, improved adjusting strategy of sensory modality is designed and used. The
sensory modality c is calculated as following: ISMBO, value of cnew is calculated as
0.02
cnew = ct + (13)
c × tMax
t
where
t current iteration while executing the algorithm.
tMax maximum number of iteration.
Using the concepts, in ISMBO, the fragrance is updated as
An Efficient QOS Aware Routing Using Improved Sensor Modality … 471
f = cnew I a (14)
Once the paths selected, the packets are transmitted through that selected path
according to the allocated time slots. Modified TDMA-based packet transmission
method for MANET is presented in this study work in order to prevent transmission
collisions, achieve high levels of power conservation, and extend network lifetime.
Control packets are broadcasted in control phrase in the conventional TDMA frame
architecture. The control slots in the control phrase are not utilized for data packet
transmission. To reduce control packet slot usage, the conventional TDMA frame
structure system is modified by removing the control phrase from TDMA frame.
472 S. Arivarasan et al.
MTDMA frame structure system is shown in Fig. 2. It has a smaller control phrase
than a conventional TDMA frame. Assume that control packet is broadcasted during
data phase’s data slot. If the node gets the response control packet, this data slot will
be reserved. Data packets are then broadcasted in designated data slot.
A TDMA frame in MTDMA frame model only has a large number of data slots.
Depending on the needs of the node, each data slot is utilized to broadcast data
packets, control packets, or both. When required, data packet might be broadcasted
along the control packet. It can keep away from data packet conflict in broadcasting
by using a specific slot reserve algorithm in the modified TDMA frame.
4 Experimental Results
In this proposed research work, the simulation was conducted using NS2 simulator.
In this simulation, they used 50 mobile nodes that were randomly placed in a 1000
× 1000 square area. The speed of each node varies from 5 to 30 m/s, and it follows
a random waypoint model. Each node’s transmission range is set at 250 m. The
performance of the proposed improved sensor modality-based butterfly optimiza-
tion for quality of service aware routing with packet scheduling (ISMBOQAR-PS)
is compared with the previous QoS aware differential ant-stigmergy (QDAS) and
red deer algorithm-based Energy-efficient QoS routing (RDA-EQR) with regard to
throughput, PDR, energy usage, and E2E delay.
Here, considered four measures for measuring the suggested QDAS performance:
(i) throughput, (ii) packet delivery ratio (PDR): ratio of total packets received to total
packets, (iii) energy consumption: the total energy consumed for completing the
successful data transmission, and (iv) end-to-end (E2E) delay: It represents total
time spent to broadcast data packet from source to destination.
An Efficient QOS Aware Routing Using Improved Sensor Modality … 473
Throughput (kbps) 1
0.8
0.6
0.4
0.2
0
10 20 30 40 50
Number of nodes
Fig. 3 Throughput versus number of nodes
250
200
150
100
50
0
10 20 30 40 50
Number of nodes
5 Conclusion
References
Abstract The electricity theft in the public spaces is increasing day by day and has
affected the regular electricity management process. It has been a tedious task in the
electricity management to identify the theft of energy in public connections. As a
result, a system that can identify power theft and make the necessary decisions in both
normal and stolen conditions is required. The idea presented here proposes the use of
Microcontrollers to monitor the electricity distribution system, which can be readily
implemented with current electronic meters. The system continuously measures the
amount of power delivered by the distribution unit as well as the amount of energy
consumed at the consumer’s location. If energy was tapped straight from an overhead
distribution feeder, we can discover and take appropriate action immediately by
comparing the two values. Also the nontechnical losses at customer’s site can be
detected likewise. The proposed idea can communicate in two directions namely, the
consumers and the utility company. A web application is being developed to obtain
data on consumption power for consumers, as well as a desktop application for real-
time data monitoring for administration and we can also terminate the service on the
basis of power theft and reconnect on specific measures.
1 Introduction
The need for electricity has grown in response to the growing demand for modern
equipment and the general public’s desire for a more opulent lifestyle. The theft of
electricity has resulted in the power distribution problem. If electricity is illegally
consumed, the country’s economic situation will be severely disrupted. It is critical
to control electrical power usage and make optimum use of it without wasting it.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 477
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_35
478 S. Saadhavi et al.
It is an easy task to find out the power theft from the legal consumer meter rather
than checking the same with the illegal consumers. There are various types of power
theft such as energy meter by-passing, removing the wires from the energy meters,
energy meter tampering, etc. The electricity theft majorly occurs at two places, the
distribution line and on the energy meters attached at home. There is a need for an
electricity management system to find the amount and the location of the power theft.
The only possible way is to compare the consumer power consumption data with the
power distribution transformer data.
In this paper, we have proposed a model to check the power theft and inform
to the authority. Monitoring of electricity distribution system helps in electricity
management and finding out the misuse of power consumption. The hardware module
consists of an Arduino board connected to sensors deployed for real-time monitoring
the power supply and theft related aspects. The current sensors will monitor the data
related to power and sends the data to the Arduino board. The data collected from
the electricity distribution system will be stored in the database GoDaddy.
An application to monitor the theft is developed for desktop application. The
application is able to administer the power theft and also take actions by automatically
disconnecting the power supply through relay module.
A desktop application is developed for the monitoring the energy theft by the
meter tampering. Based on the tampering details of the customer the suitable action
is taken to disconnect the power supply through the relay module. A web application
is used at customer site to get the information on energy consumption and email
generation in case of power disconnection due to theft.
2 Related Work
In [1], R. E. Ogu and G. A. Chukwudebe have proposed the functional system for
detecting the electricity theft. They have suggested the preventive mechanism to
avoid the energy theft using IoT platform. The Arduino MKR1000 microcontroller
board is used to coordinate the overall functionalities. An Infrared sensor is used to
detect the theft when the humans try to open the sensitive part of the meter.
In [2], Prashant Choudhary and Jitendra Nath Bera have proposed an algorithm
to calculate the voltage drop across the power distribution network. A mechanism is
proposed to identify the theft locations and sends SMS about energy consumption
information.
In [3], Mohd. Uvais, have implemented a detection system to find the power theft
in the distribution line. The system is implemented using the MATLAB to detect
the theft. To read the voltage and current reading from LT side of the distribution
transformer a controller is designed.
In [4], Ms. Aishwarya P. Kamatagi et al., have proposed a system using IoT to
monitor the energy meter readings. The Raspberry pi is connected to the energy
meter is used for monitoring the variations in the meter readings and print the same.
IoT Based Electricity Theft Monitoring System 479
In [5], Muhammad Badar Shahid et al., have designed a power theft detection
system. The system will monitor the theft using consumer load profiling, which may
happen at the customer place or at the power distribution line. A prevention algorithm
is designed to monitor the theft and take suitable action. When the power theft occurs,
measures will be taken to disconnect the legal consumers and to send a high voltage
pulse at the distribution line.
In [6], H. E. Amhenrior et al., have designed and implemented an automatic
tamper detect and reporting system. The system will identify the bypass internal
theft and external theft on the service cable fetched from the electric poles. The
system consists of a developed Single Phase Prepayment Energy Meter and the
supply authority Global System for Mobile Communications (GSM) capable device
platform. To detect the external bypass energy theft the wireless Current Transducers
were used.
In [7], M.M. Mohamed Mufassirin, et al., have designed and proposed a model to
detect the theft in electricity in Sri Lanka without human interaction. The system
implemented will detect the energy theft and sends the alert message to autho-
rized energy provider. The system is designed with Global System for Mobile
communication (GSM) technology.
In [8], Makarand Sudhakar Ballal, et al., have presented a theft detection and
prevention system based on logic control with consumer care unit. The system will
identify the pilferage locations and estimates the power stolen by illegal consumers.
The system is maintaining the voltage regulation of legal consumers and has
contributed helped for proper revenue collection of electricity.
In [9], Jaya Deepthi B, et al., have proposed and designed an electricity theft moni-
toring and detection system. The system is built using Arduino and GSM Modues.
The power usage data and amount of theft of electricity is displayed on the LCD. To
find the percentage of theft the difference between the power used and power stolen
is calculated. The status of the power theft information is sent to electricity board
and the consumer.
In [10], Kumar Nalinaksh, et al., have proposed system for power theft detection
system, based on a grid and power discoms. The proposed solution is used as an
add-on the existing system, which monitor and record the theft statistics without any
human intervention.
In [11], Sukumar, P., et al., have proposed and implemented a system to monitor
the power theft and also identify the location of theft. The system designed a system
consists of a microcontroller, and a ZigBee module to check for the theft of electricity.
The alarm system will send an alert signal to the user. The various software’s used
are AVR studio and WIN AVR compilers.
In [12], A. U. Kulkarni, et al., have proposed a solution to the electricity theft
problem using IoT. The system consists of Arduino Uno and Raspberry Pi3, which
monitors the lines connected in parallel. Once the theft is monitored, the theft location
is identified and reported to the admin through SMS.
In [13], Zahoor Ali Khan has proposed a technique based on supervised learning
technique to detect the electricity theft. The model is efficient compared to other
480 S. Saadhavi et al.
existing models as the preprocessing of the data is done using sigma rule and normal-
ization method. The Adasyn algorithm is proposed to overcome the class imbalance
problem.
In [14], Mahima Singh, et al., have proposed a smart energy meter to keep track
of the number of units used. The system is designed to monitor the theft using the
wireless sensor networks. The integrated technology is going to measure the normal
power consumption of the consumer and also identifies the theft occurred in the
distribution line.
In [15], Anish Jindal. et al., have discussed various data-driven techniques that can
be used for detection of electricity theft in smart grid infrastructures. The two types
of theft detections at the meter level and at the aggregate level were discussed. The
values at the meter level and at aggregate level can be used to identify the anomalous
measurements.
Problem Statement
Electricity pilfering is the nontechnical losses in the power distribution system. As
there are no any approved estimates of theft but it is assumed to be around 30%. Due
to power theft true bills are not generated by the meter and electricity authority cannot
take the actual charges of electricity usages. Power theft has resulted in significant
earning losses in respected authority, creating a fund crisis for investment in power
system and demanding an expansion in generating capacity to cope with power
losses. Yet, the attitude toward electricity theft control remains pathetic. Manually, it
is difficult to identify the theft. Looking upon these consensuses, there is a need for
the real-time automatic theft detection systems which can also be easily incorporated
with existing infrastructure.
3 Existing System
The earlier energy meter was built on aluminum disc, and the values of voltage
and current values real-time. But, since there was no safety measures used, the
thieves would easily tamper the system to misuse the power supply. The thieves are
implementing several techniques to tamper the electricity using meter. There is no
mechanism in meter to detect and deal with the theft happened. To handle the misuse
of the power the electric meters were designed to overcome the issues such as meter
data accuracy and power theft. The electronic meter will display the amount of power
consumed on a LED display. There is a need for a technician to monitor the monthly
billing process.
IoT Based Electricity Theft Monitoring System 481
4 Proposed System
5 Design
The power theft can be identified by the additional current flowing in the distribution
line. The theft of power supply is monitored and once the usage limit crosses the
range then immediate action will be taken by the Admin or the controller. Based
on the amount of theft the decision will be taken to prevent the external tapping in
the distribution line. Similarly, if difference between the amount of power in phase
and neutral wire exceeds beyond a limit, meter tampering at consumer’s side can be
detected.
The proposed system is designed to solve many of the existing issues of power
theft. The system wireless network is used to establish reliable and effective commu-
nication of the system. The system monitors the distribution line and detects the
power theft based on the data received at the instant. The power theft location can be
determined based on the additional current consumed from a particular place. The
electricity inspection line men will inform the electricity authority, incase if there is
a direct power theft from the distribution line. Simultaneously controller sends the
signal to relay to block the electricity. If the theft was detected at customer’s site,
it will send an alert message for customer to remove theft load. The system will be
reinitialized after the rectification.
The Fig. 1 shows the block diagram of the components of the electricity monitoring
system. The electricity monitoring system consists of a Microcontroller ARDUINO
and NODEMCU, Relay module and Current sensors.
482 S. Saadhavi et al.
6 Implementation
• Arduino IDE is an open source software, where the editor is used for writing the
required code and compiler is used for compiling and uploading the code into the
given hardware module.
• Microsoft Visual Studio is used to develop ASP.NET web application for
consumer login and C#.NET framework Windows form desktop application for
administrator.
7 Results
Initially, when the system is turned on, the microcontroller reads the values from the
current sensors. It takes around 30 seconds for microcontroller to calculate sampling
values for the first time. The system will collect the sensor information with a duration
of 3 seconds and the sensed data is aggregated in the Microcontroller to obtain the
instantaneous value.
The Fig. 3 shows the working model of an electricity monitoring system,
consisting of SCT013 CURRENT SENSOR. For measuring alternating current
across a wire. ARDUINO UNO (R3). For reading current sensor values. NODEMCU
ESP8266 has inbuilt Wi-Fi to upload sensor values to the cloud. 5V SINGLE
CHANNEL RELAY used to disconnect the power supply at theft condition. Jumper
484 S. Saadhavi et al.
Fig. 3 Electricity
monitoring system
IoT Based Electricity Theft Monitoring System 485
wires are used to interconnect the components. AC Main Chords are used for
consuming power supply. The bulbs are used for showing the theft condition.
The admin has to click ‘start monitoring’ button to read the sensor values. The
Fig. 4 shows the desktop application designed for an admin. The purpose of this
application is to monitor the customer’s power consumption status. The monitoring
for a particular customer is done by entering their email and clicking the start moni-
toring button. The power consumption values are read and stored in the cloud, then
sent to the desktop application for further process. Then the value is added to list
view tool and appropriate log message is generated. The values indicate 3 conditions
namely, the normal condition, internal theft condition and external theft conditions.
During the normal power consumption the buttons used for indicating internal
and external theft remain green as shown in Fig. 5. In case of the internal theft, the
condition can be detected using the power consumption values as shown in Fig. 6.
The button for internal theft turns yellow on the first attempt of theft and turns red if
the same condition continues. A theft warning mail is sent to the customer and power
supply is automatically disconnected. The power supply is reconnected only after
the values reach normal condition and the internal theft button turns back green.
The desktop application is able to detect the external theft. The button is changed
to red color indicating the external theft. The admin notices this change and inform
the respective authorities to check the power supply lines and take necessary actions.
The button for external theft turns green only after all the issues are resolved and
normal values are being read by the sensors.
The consumer application consists of registration, login and view consumption
pages. The power consumption page contains the details such as date, time, phase
value and the summation of the total energy consumed. The phase value is the product
of voltage and the current consumed. The Fig. 7 shows the consumer login page and
Fig. 8 shows the power consumption details page of consumer application.
The electricity monitoring system is designed to monitor the power theft in the
distribution system. The measured data is collected from the sensors and updated to
MQTT broker public cloud. The systems also generate an alert and send the theft
information to the concerned authorities. The experimental result shows the elec-
tricity monitoring system effectively monitors and alert against power theft. The
system designed is very reliable and low cost. The system designed is scalable and
capable of handling the data collected from the different sensors. The system helps
the admin to access and control the theft remotely without human intervention. This
IoT Based Electricity Theft Monitoring System 487
will reduce the electric line tampering to the maximum extent. The proposed frame-
work is capable of resolving the most prevalent issue of power theft. By designing
appropriate devices, the functionality of the system can be moved over the edge for
easy implementation in existing system. Online billing notification through SMS or
email should be linked with this system.
References
13. Khan ZA, Adil M, Javaid N, Saqib MN, Shafiq M, Choi J-G (2020) Electricity theft detection
using supervised learning techniques on smart meter data. Sustainability 12:8023. https://doi.
org/10.3390/su12198023, http://www.mdpi.com/journal/sustainability
14. Singh M, Kumari A, Goyal V, Kumar P (Feb 2019) Energy theft detection by smart energy
meter using WSN in real time. Int J Eng Res Technol (IJERT) 8(2) ISSN: 2278-0181
IJERTV8IS020089
15. Jindal A, Schaeffer-Filho A, Marnerides AK, Smith P, Mauthe A, Granville L (30 March
2020) Tackling energy theft in smart grids through data-driven analysis. In: 2020 international
conference on computing, networking and communications (ICNC), IEEE Xplore, INSPEC
accession number: 19493849. https://doi.org/10.1109/ICNC47757.2020.9049793
An Exploration of Attack Patterns
and Protection Approaches Using
Penetration Testing
1 Introduction
Data privacy and information security are on the uppermost precedence list for an
organization nowadays—all sensitive information in business demands to protect to
develop a competitive success. Therefore, penetration testing examines an organi-
zation’s information technology infrastructure that incorporates software, hardware,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 491
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_36
492 K. Barik et al.
penetration test explores vulnerabilities within employees, and ethics includes tail-
gating, phishing attacks, and eavesdropping [8]. In physical testing, an endeavor is
to get the physical path to test the integrity of cameras, sensors, RFID systems, entry
systems, keypads [9]. Finally, cloud penetration testing measures the protection of
cloud-based assets, including configuration, networks, credentials, procedure, and
data sensitivity [10].
Three types of penetration tests conducted using independent networks in a
protected environment are presented. Different penetration analysis tools within the
Kali Linux platform are explored. The results are formulated and delivered, the
defensive strategies to ward off those compromises.
The remaining paper is formulated as follows. Section 2 discusses the related work.
Section 3 comprises the proposed methodology, set up a laboratory environment,
and performed attacks. In Sect. 4, different types of penetration attacks, captured
and presented in this paper. Section 5 comprises the mitigation strategies, outcomes,
and discussions. Finally, conclude the paper in VI and provide various directions for
future research.
2 Related Work
Reddy and Yalla [11] proposed mathematical analysis of penetration testing and
simulation and graphs to develop data security and strategy. Guarda et al. [12]
suggested a framework to provide guidelines for penetration testing in virtual envi-
ronments. Nagpure and Kurkure [13] performed vulnerability assessment and pene-
tration testing of web applications using both manual and automation processes.
They showed that the automatic penetration testing process is more accurate than the
manual process. Zitta et al. [14] performed penetration testing of intrusion detection
and prevention system Suricata IPS tools in applications of security of embedded IoT
devices. Hasan and Meva [15] studied the mechanics of the VAPT process and gath-
ered tools that are useful during the VAPT process in web applications. Lyashenko
et al. [16] analyzed the effectiveness of detection of phishing attacks using real-time
data using different tools.
Salahdine and Kaabouch [17] presented a detailed study of social engineering
attacks, classifications, detection strategies, and prevention procedures. Rahalkar
[18] proposed a study on the essential details and configurations invoking other tools
in the Metasploit framework. Cayre et al. [19] proposed a new security audit and
penetration testing framework called mirage dedicated for IoT systems. Patel [20]
surveyed current vulnerabilities, tools used to determine vulnerabilities to secure the
organization from cybersecurity threats. Patel and Patel [21] presented an analytical
study of penetration testing using tools to enhance wireless infrastructure security.
Raj and Walia [22] performed a scanning exploits system using the Metasploit frame-
work tool. Pandey et al. [23] conducted a vulnerability assessment and penetration
testing using a controlled setup using Raspberry Pi 3b+. Alabdan [24] showed a
494 K. Barik et al.
comprehensive analysis of phishing attack techniques using vectors and other tech-
nical approaches. Lu and Yu [25] proposed monitoring, scanning, capturing, data
analysis on Wi-Fi networks using Kali Linux.
3 Proposed Methods
Kali Linux platform [26] is used for this study which is an open source tool, download-
able from (www.kali.org). It comes with preinstalled tools to support information
security assignments like ethical hacking. Offensive Security [27] developed Kali
Linux, a renowned information security company primarily related to superior pene-
tration testing and security checking. Some significant points of Kali Linux are: It is
customizable, It includes over 600 penetration testing tools, It has a custom kernel
and the latest patches included, It supports multi-language and GPG signed pack-
ages, It is Filesystem Hierarchy Standard (FSH) compliant and Known bug issues
are detailed.
There are many penetration testing tools available in the market, and a comparison
study of various tools is presented in Table 1.
In this section, three types of penetration tests are conducted in a secure
environment.
This section discusses how to generate the credential harvester attack method [36].
This attack method is applied to clone a website to perform phishing attacks to get user
credentials from the system. Using Kali Linux, Metasploit, and Social engineering
toolkit, Turstedsec [35], a clone of the application has been created. A clone URL of
https://gmail.com is started and running on port no 80. Once the target user clicks on
the link on our secured test setup, it will present them with a replica of gmail.com.
Once the user login in by sharing username and password, the user will be redirected
back to the legitimate site. This setup can trap the user and password of the user logged
using cloned URL, Fig. 1. shows a replica of gmail.com, and Fig. 2. represents the
number of users who clicked and tried to log in using a cloned URL.
It creates a site clone and displays the target user with a link affirming that the website
has moved into a new location [37]. A clone application is created using Kali Linux,
Metasploit, and Social engineering toolkit, Turstedsec [21]. In this section, gmail.com
cloning is done; the URL floated over it would be gmail.com. When the user clicks
the moved link, Gmail opens and is replaced with the malicious web server. The
timing of the web jacking attack can be changed in the config/set_config flags. A
standard website cloner, https://gmail.com with port no 80, is used, represented in
Fig. 3.
After clicking the links, it will redirect users to the reproduced web page shown
in Fig. 4.
Kali Linux, virtual machine, and android emulators are used for smartphone
penetration testing [38]. An android emulator is an android device on which
penetration testing tasks are performed. First, start Kali Linux, log in as a root
user through the virtual machine. Create a deployable application using Kali
Linux and Metasploit. Enter the command in Kali Linux terminal msfvenom—
p android/meterpreter/reverse_tcp LHOST (Our IP address) LPORT = 4444 R >
pentest.apk. The pentest.apk file is generated shown in Fig. 5.
Load the Metasploit application using msfconsole command, and enter the
multi/handler exploit, shown in Fig. 6. Set payload android/meterpreter/reverse_tcp
while creating an APK file with msfvenom. Set intended IP address and port no 4444.
Now transfer the. apk file to the target mobile device. After installing the app on the
target mobile device, access the smartphone, the meterpreter session.
4 Results
These experiments utilize credential harvester attack, web jacking, and smartphone
penetration testing on a secured testing platform using the Metasploit framework.
Two computers, one for the attacker and another one for the server, have been
employed. The server computer runs the Windows 10 Professional operating system
with Intel (R) Core (TM) i7 5 GHz and 16 GB of RAM processor. Additionally, three
virtual machines are installed inside the server with Ubuntu Server 14.04.6 LTS ×
86 with 1 GM RAM are used.
498 K. Barik et al.
Figure 7 presents the credential of the harvest attack. Figure 8 shows the web
jacking attack method, and Fig. 9 shows mobile device penetration testing performed.
Based on the result, classification is formed in three representations, e.g., successful,
partial, and failure. The blue color represents a successful attack, the red shows partial
success, and the green color shows unsuccessful attacks.
The experiment is conducted among 40 users, out of which nine attempted are
successful, six are partial, and 25 are failed attacks. Successful means users have
responded to clone URL, and partial means users partially reacted to clone URL.
22%
Successful
63% 15%
Partial
Failed
20%
53%
Successful
27% Partial
Failed
40%
45% Successful
Partial
15% Failed
In web jacking attack methods, attacks are performed among 40 users, out of
which eight attempts are successful, twenty-one is partial, and eleven are failed.
In mobile device penetration testing, attacks are performed among 40 users, out
of which sixteen are successful, six are partial, and eighteen are failed.
500 K. Barik et al.
5 Discussions
Protecting ourselves and building consciousness among users against digital crimes
are the major requirements. First, identify the vulnerabilities and plan to patch them.
In this section, mitigation measures for three different attacks in the laboratory setup
are illustrated.
Credential Harvester Attack Method
User awareness and employee response are significant aspects; indeed, the organi-
zation uses antiphishing, antivirus software with the latest version. They must be
conscious of phishing and should not open any link from an unknown source. In
addition, they should check the URL address details properly before validating user
credentials to avoid being duped.
Web Jacking Attack Method
It is another type of social engineering phishing attack and illegally attempting
command of a site. The user should not provide sensitive information to unknown
links and adequately check the website URL to prevent this attack. Additionally,
users should not consider this is a legitimate site because it sounds ok; a browser
with an antiphishing detection program can be employed.
Smartphone Penetration Testing
In the laboratory setup examined in this paper, taking advantage of Kali Linux and
Metasploit tools testify to describe remote control of an android device. To safeguard
against abuses, the user should not download an application from unknown sources,
download applications from cloud websites, and apply antivirus with a regular update
on mobile devices.
The graph shown in Fig. 10 represents the typical reasons and causes of security
violations in organizations in 2020 [39]. As per the survey report, 34% responded
because of malware attacks, and 29% responded because of data exposure, the orga-
nization suffered security incidents [39]. As per report [39], there are 2,647,428 failed
login counts, tried login as an administrator, 376,206 were unable to log as admin,
9384 were unable to log in as the user RDP login attempts. Figure 11 presents the
comprehensive report of forgotten login in the RDP login attempt.
Awareness is crucial in preventing malicious activities from these experiments,
including phishing attacks, hacking of smartphones, ATM fraud, online banking
fraud, etc. However, there are certain areas where this work can be substantially
improved. For example, the experimental evaluation in this work is performed in
a secured test platform. Still, in a real-life attack scenario, the actual environment
can vary to a great extent. In such a situation, there is always a chance of the tools
as mentioned above behaving unpredictably. Therefore, this work will significantly
benefit from expanding the experimental scenario from secure to real-life.
An Exploration of Attack Patterns and Protection Approaches … 501
Cryptojacking
Malware
Account
Compromise
Exposed Data
Ransomware
administrator
83%
administrator admin user ssm-user test
6 Conclusion
Different penetration testing processes are discussed in this paper, several factors
are considered while conducting penetration tests, and popular tools are utilized to
perform the penetration test. With Internet technology and fast digitization advance-
ment, information security is quite challenging for organizations and regular users.
Penetration testing plays a significant role in achieving the security analysis gap in
the existing setup. Open source penetration testing tools can be customized as per
user requirements and used in diverse domains. Three types of attacks are analyzed:
Credential Harvester Attack Method, Web Jacking Attack Method, and Smartphone
502 K. Barik et al.
Penetration Testing are performed in a secured environment. The attacks are analyzed
with three different scenarios and presented with the corresponding mitigation tech-
niques. The future scope is to explore and examine other cyberattacks and devise
algorithmic strategies to prevent them.
Declaration The work is performed in a secure laboratory setup and does not possess any malicious
intent.
References
1. Weissman C (1995) Handbook for the computer security certification of trusted systems.
Information assurance technology analysis center falls church VA.
2. Denis M, Zena C, Hayajneh T (April 2016) Penetration testing: concepts, attack methods, and
defense strategies. In: 2016 IEEE long ısland systems, applications and technology conference
(LISAT). IEEE, pp 1–6
3. Shah S, Mehtre BM (2015) An overview of vulnerability assessment and penetration testing
techniques. J Comput Virol Hacking Tech 11(1):27–49
4. Shorter JD, Smith JK, Aukerman RA (2012) Aspects of ınformational security: penetration
testing is crucial for maintaining system security viability. Technol Plann 13
5. Blackwell C (2014) Towards a penetration testing framework using attack patterns. In:
Cyberpatterns. Springer, Cham, pp 135–148
6. Shuaibu BM, Norwawi NM, Selamat MH, Al-Alwani A (2015) Systematic review of web
application security development model. Artif Intell Rev 43(2):259–276
7. Rahman A, Ali M (Aug 2018) Analysis and evaluation of wireless networks by implementation
of test security keys. In: International conference for emerging technologies in computing.
Springer, Cham, pp 107–126
8. Shindarev N, Bagretsov G, Abramov M, Tulupyeva T, Suvorova A (Sep 2017) Approach
to identifying of employees profiles in websites of social networks aimed to analyze social
engineering vulnerabilities. In: International conference on ıntelligent ınformation technologies
for ındustry. Springer, Cham, pp 441–447
9. Al Shebli HMZ, Beheshti BD (May 2018) A study on penetration testing process and tools.
In: 2018 IEEE long ısland systems, applications and technology conference (LISAT). IEEE,
pp 1–7
10. Mishra S, Sharma SK, Alowaidi MA (2020) Analysis of security issues of cloud-based web
applications. J Ambient Intell Humanized Comput 1–12
11. Reddy MR, Yalla P (March 2016) Mathematical analysis of penetration testing and vulnera-
bility countermeasures. In: 2016 IEEE ınternational conference on engineering and technology
(ICETECH). IEEE, pp 26–30
12. Guarda T, Orozco W, Augusto MF, Morillo G, Navarrete SA, Pinto FM (Dec 2016) Penetra-
tion testing on virtual environments. In: Proceedings of the 4th ınternational conference on
ınformation and network security. pp 9–12
13. Nagpure S, Kurkure S (Aug 2017) Vulnerability assessment and penetration testing of web
application. In: 2017 ınternational conference on computing, communication, control and
automation (ICCUBEA). IEEE, pp 1–6.
14. Zitta T, Neruda M, Vojtech L, Matejkova M, Jehlicka M, Hach L, Moravec J (Dec 2018)
Penetration testing of intrusion detection and prevention system in low-performance embedded
IoT device. In: 2018 18th international conference on mechatronics-mechatronika (ME). IEEE,
pp 1–5
15. Hasan A, Meva D (2018) Web application safety by penetration testing. Int J Advan Stud Sci
Res 3(9)
An Exploration of Attack Patterns and Protection Approaches … 503
16. Lyashenko V, Kobylin O, Minenko M (Oct 2018) Tools for ınvestigating the phishing attacks
dynamics. In: 2018 ınternational scientific-practical conference problems of infocommunica-
tions. Science and technology (PIC S&T). IEEE, pp 43–46
17. Salahdine F, Kaabouch N (2019) Social engineering attacks: a survey. Future Internet 11(4):89
18. Rahalkar S (2019) Metasploit. In: Quick start guide to penetration testing. Apress, Berkeley,
CA. https://doi.org/10.1007/978-1-4842-4270-4_3
19. Cayre R, Nicomette V, Auriol G, Alata E, Kaâniche M, Marconato G (Oct 2019) Mirage:
towards a metasploit-like framework for IoT. In: 2019 IEEE 30th ınternational symposium on
software reliability engineering (ISSRE). IEEE, pp 261–270
20. Patel K (April 2019) A survey on vulnerability assessment & penetration testing for secure
communication. In: 2019 3rd ınternational conference on trends in electronics and ınformatics
(ICOEI). IEEE, pp 320–325
21. Patel AM, Patel HR (March 2019) Analytical study of penetration testing for wireless ınfrastruc-
ture security. In: 2019 ınternational conference on wireless communications signal processing
and networking (WiSPNET). IEEE, pp 131–134
22. Raj S, Walia NK (July 2020) A study on metasploit framework: a pen-testing tool. In: 2020
ınternational conference on computational performance evaluation (ComPE). IEEE, pp 296–
302
23. Pandey R, Jyothindar V, Chopra UK (Sep 2020) Vulnerability assessment and penetra-
tion testing: a portable solution Implementation. In: 2020 12th ınternational conference on
computational ıntelligence and communication networks (CICN). IEEE, pp 398–402
24. Alabdan R (2020) Phishing attacks survey: types, vectors, and technical approaches. Future
Internet 12(10):168. https://doi.org/10.3390/fi12100168
25. Lu HJ, Yu Y (2021) Research on WiFi penetration testing with Kali Linux. Complexity
26. https://www.kali.org/
27. https://www.offensive-security.com/
28. https://nmap.org/
29. https://www.tenable.com/products/nessus
30. https://www.metasploit.com/
31. https://www.wireshark.org/
32. https://www.ibm.com/jm/download/IBM_ISS_Overview.pdf
33. https://beefproject.com/
34. https://www.aircrack-ng.org/
35. https://www.trustedsec.com/tools/the-social-engineer-toolkit-set/
36. Boyanov PK, Savova ZN (Oct 2019) Implementation of credential harvester attack method in
the computer network and systems. In: International scientific conference “Defense technolo-
gies,” faculty of artillery, air defense and communication and ınformation systems. Shumen,
Bulgaria
37. Goutam A, Tiwari V (Nov 2019) Vulnerability assessment and penetration testing to enhance
the security of web application. In: 2019 4th ınternational conference on ınformation systems
and computer networks (ISCON). IEEE, pp 601–605
38. Alanda A, Satria D, Mooduto HA, Kurniawan B (May 2020) Mobile application security
penetration testing based on OWASP. IOP Conf Ser: Mater Sci Eng 846(1):012036. IOP
Publishing
39. SOPHOS (2021) Threat report. https://www.sophos.com/en-us/labs/security-threat-report.
aspx
Intrusion Detection System Using
Homomorphic Encryption
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 505
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_37
506 A. Singh et al.
1 Introduction
Our project lies in the domain of Cyber Security, in which it mainly focuses on
Intrusion Detection. Intrusion Detection System is one of the major topics going
on in the world. With hackers using new techniques and technologies, there is an
increased interest in the field of Cyber Attack Detection System as we now have more
advanced threats. For defense against cyberattacks like Denial of Service(DoS), R2L,
U2R and Probe Intrusion Detection Systems are a valid, and convenient solution.
Many IDS rely on two techniques for efficient detection: (1) surveilling IT systems
to collect data such as system logs and network packets, or (2) using detection models
like anomaly detection, classifiers, attack signatures, which is used to classify the
system data [1]. Needless to say, a precise detection model plays a critical part in the
operation of an IDS. Moreover, an IDS which is accurate enough can be formed only
when we have a set containing an ample amount of historical data indicating attacks
and good expertise in this field. Also, alleviation, prevention, and reaction after an
attack has occurred need teams which have some well-defined skill sets. Thus, exter-
nalizing the IDS to cybersecurity specialists is a good policy for many organizations.
The security operation center, also called SOCs, are a convenient and economical
alternative. The issue with current IDS is that the analysis of the system data is done
by external SOC (Security Operational Centers). Intrusion Detection Systems clas-
sify attacks by tracking various activities in IT systems containing various computers
and network links. This is done by monitoring system data, which can be taken from
multiple sources like system network traffic or log files which can reveal sensitive
information about the firm or organization. This brings up many security concerns
like revealing the details of the network packet which in turn reveals really impor-
tant information about the company’s regular activities [1]. The main objectives of
this paper include providing an end to end encrypted model such that the SOC is
not able to learn anything about the Data owner’s data, to evaluate the Intrusion
detection model on the system data using different machine learning algorithms. To
evaluate this model with other traditional or existing Intrusion Detection Systems
with respect to security analysis and performance. For this, we tried various machine
learning models and different types of encryption techniques. The main crux of our
paper is to create an Intrusion Detection System which is highly efficient, secure, and
maximizes the leakage prevention of sensitive information from the Data owner’s
side (Table 1).
2 Related Work
Intrusion Detection System is one of the major topics going on in the world. All
the work done by a company can be stolen in moments if the company cannot stop
intruders from stealing their data or if the company does not know that someone has
hacked their system, or if an attack has occurred or not in either of which cases the
Intrusion Detection System Using Homomorphic Encryption 507
data is going to be leaked. IDS at the moment has two types based on Data source:
either network based or host based. In Intrusion Detection using a host-based system
the data is being taken from the host’s computer, it also keeps check on log Files
and network traffic in accordance with host computer [2]. Network based IDS keeps
checking on data packets of user’s work in a network [2]. In this paper by Roshan
Kumar, the authors worked on a misuse based intrusion detection system. Intrusion
detection based on Anomaly and Misuse are also 2 categories of IDS’s [3]. Anomaly
Intrusion Detection System takes into account the history of user’s actions, whereas
Misuse IDS uses a set of predefined rules in order to work [3]. Updation of these rules
should be regular. As defined by S. Niksefat, we learn how to classify privacy issues
in intrusion detection systems [1]. There are no techniques that can identify all types
of intrusion, therefore to protect data, the model is chosen on the specific application
[1]. The dataset is very difficult to obtain for intrusion detection projects. The dataset
must contain various types of cyberattacks which can be used to attack the data owner.
I. Sharafaldin takes a dataset which includes various attacks and defines the best set
of features to be considered while tackling those attacks [4]. In this paper by R.A.
Popat, they encrypted the data before sending it to SOC and also encrypt features used
in IDS to prevent data leakage to security system owner and model to the data owner.
R.A. Popat implements three different algorithms in encryption where the decision
tree is three times more efficient than other methods [5]. D. Archer implements
steganography to secure data storage on the cloud [6]. The most optimizable and
secure encryption we have seen is Homomorphic encryption. As it easily works on
big data [7]. In their scenario, they used machine learning to their advantage by using
such models for intrusion detection purposes. We can put to work a machine learning
model which first ranks the security features based on the effect those features had
and later on help construct a specialized tree-based Intrusion Detection System on
the basis of the features that had previously been selected [8]. Also, we can use an
algorithm in which we first make random combinations of 3 features using simu-
lated annealing and then SVM is applied on that feature combination, which is then
able to detect anomalous behavior from the Internet data traffic [9]. Also using a
good fusion of machine learning feature selection techniques and classifiers, we can
produce high performance generating combinations [10]. Now Deep learning is one
of the complex branches of ML that helps us learn the ranked feature depiction
508 A. Singh et al.
Security Data: If a data which is obtained from a networked system, that basically
helps us to find if attacks, threats, suspicious behavior, anomalies, or any type of
unsanctioned action has occurred, then this data is known as security data. Example,
network packets, system log files, etc. The company or firm, which is responsible
for providing security data is known as the Data owner.
Detection Model: Detection model is basically a machine learning model which
takes the historical data pertaining to security as input and uses it for intrusion
detection. In the Fig. 1, the decision tree depicted is one of the, e.g., of model used
for detection where the node of the tree are the TCP flag description for source and
destination, flow direction and the name of the protocol [13].
Intrusion Policy: It is just a bunch of attack policies which when enforced,
applying the OR operation indicate if or what type of an attack has occurred.
Homomorphic encryption: It is a technique of encryption that allows us to
operate on encrypted data without decrypting it first [14]. It is a very important
concept used in our paper is it basically helped us to prevent the leakage of Data
owner’s data at the Security Operation Center(SOC).There are four main functions
provided to us by the Homomorphic encryption system, these are:
Encryption: Encryption to Cipher text from normal text. Decryption: Decrypting
a Cipher text to a normal text. Key generation: Producing private and public keys.
Evaluating: The process of performing on data that has been encrypted, carrying
out the procedure represented in the binary circuit. In every binary circuit it is
compulsory to describe the depth, number of inputs, and the size.
Intrusion Detection System Using Homomorphic Encryption 509
where m1 and m2 are messages, g is the base, r1 and r2 are random, and the Epsilon()
function represents the encryption of the message.
4 System Diagram
In the proposed system, there are two entities involved: Data Owner (DO) and Secu-
rity Operation Center (SOC). Security data is owned by the Data owner but lacks the
expertise in the field of intrusion detection and thus shares its data with the external
SOC which has the required expertise and offers its intrusion detection service to the
Data owner. DO but it is hesitant to share the data with an external party because of
security concerns and does only after having taken all the necessary precautions.
510 A. Singh et al.
1. First, SOC with the help of an intrusion policy which is just another defined
bunch of intrusion detection configurations forms its proprietary detection
model.
2. The feature selection process is used to eliminate features which are either
redundant or irrelevant to lower the computing time.
3. The data owner then encrypts the security data with its public key using partial
homomorphic encryption and sends it to the SOC.
4. After the pattern matching phase, the result of the phase which is encrypted by
default is sent to the DO. The DO then decrypts the results using its private key
and sends it to the SOC for examination.
5. The SOC then decrypts the result and learns about the offensive records and to
which rule in the intrusion policy these records are matched.
6. It then alerts the Data owner in case of an intrusion and sends the offensive
records and also advises on the steps to be taken in case of an attack.
5 Implementation Details
The duplicates have already been removed as NSL KDD dataset is already standard-
ized [15]. The nan and infinity values are replaced with zero initially. Preprocessing
operation is done on the dataset as the dataset contains numerical and non-numerical
values. One Hot Encoding is used for this operation. An integer matrix denoting
the values of the categorical features is an input to the One Hot Encoder. This will
transform all the categorical features to their corresponding binary features out of
which one will be active at a time. The dataset is then divided into four parts based
on the attacks (U2R, Probe, DoS, R2L) which need to be classified (Fig. 2).
Featuring scaling is performed to steer clear of features which have large values as
this will affect the final result. Standard Scaler is used to perform this operation. In
Standard Scaler the average for a feature is calculated and then the mean is subtracted
from the current value of the feature and the result is divided by the standard deviation.
The standard deviation will be 1 after each feature is scaled (Fig. 3).
Intrusion Detection System Using Homomorphic Encryption 511
It is the process in which irrelevant and unnecessary features are eliminated with
minimal information loss. Subsets of the features are selected which fully repre-
sents all the features in the dataset in terms of accuracy and other metrics. It is also
possible that there is a correlation between features when a large number of features
are present. Feature selection also helps to eliminate this problem. We have used
Recursive Feature Elimination (RFE) to perform this operation. We plot the graph of
Figs. 4, 5, 6, and 7 for the accuracy against the number of features and based upon
that we select the optimal number of features for each of the attacks. Here, we have
built two models: decision trees and random forest. Both machine learning models
are built for all 4 types of attack, i.e., U2R, R2L, DoS, and Probe. This model is used
on the dataset containing every feature (123) and also separately for the features (13)
selected after feature selection operation.
5.4 Encryption
– The customer data is encrypted using a public key at the Data owner and is sent to
the SOC and the encryption scheme used is a paillier cryptosystems-based partial
Homomorphic encryption system.
512 A. Singh et al.
– At the SOC the encrypted data is applied to the machine learning model which
produces the encrypted result as an output.
– That encrypted result is sent to the Data owner where it can be decrypted using a
private key which is only available with the Data owner and not the SOC.
– Then the unencrypted data is again encrypted using a simple encrypted scheme to
maintain end to end encryption and protect the system from external adversaries
knowing about the system.
514 A. Singh et al.
– The result is decrypted at the SOC and an alarm is raised if an intrusion has
happened and appropriate steps to be taken in this situation to reduce the severity
of the attack damage will be provided to the DO (Fig. 8).
6.2 Results
All the implementations which include training the data, extracting the features,
and Homomorphic encryption have been implemented using the python libraries.
Partial homomorphic encryption based on paillier cryptosystems is achieved using
the paillier library in python. Figure 9 compares our model with the existing work
in the literature [11]. Figure 10 represents the result for attacks (DoS, Probe, U2R,
Intrusion Detection System Using Homomorphic Encryption 515
R2L) obtained when decision trees is used as Intrusion Detection Model. Figure 11
represents the result obtained for attacks when random forest is used as an Intrusion
Detection Model.
7 Conclusion
In this paper, we present a protocol for signature-based IDS on security data which
is encrypted. This protocol helps the Data owner to trust the third party security
operations center which has the required expertise in IDS, because he is confident
that the security data will remain encrypted during the entire protocol and can never
516 A. Singh et al.
Fig. 10 Results for different attacks using decision tree as intrusion detection model
Intrusion Detection System Using Homomorphic Encryption 517
Fig. 11 Results for different attacks using random forest as intrusion detection model
be decrypted without the private key which is held only by the Data owner. Decision
trees and random forest are used for the machine learning model which are then
privately evaluated over the encrypted network data using Homomorphic encryption.
This intrusion detection protocol has several drawbacks mainly because of the high
computing power required by Homomorphic encryption algorithm and significantly
higher overhead generated by HE compared to the traditional approaches. Also the
IDS generates alerts after a certain time lag as the SOC does not have any clear
information on the output of the intrusion detection model which is also encrypted
and needs to be sent to the data owner where it is decrypted using the private key.
The decrypted results are then sent to the SOC for analysis and thus the time lag.
8 Future Work
For future, we would try to use parallel execution to reduce the overhead that comes
with Homomorphic encryption and to include other intrusion detection models and
classification methods in our proposed system.
518 A. Singh et al.
References
Abstract There are many ways to make data secure with different processing tech-
niques. The data is insinuated in a host and converted using encryption methods for
further transfer. The host medium is mutated using some principles of alteration rules,
and the genuine host medium is reclaimed back after the extraction of secret data
from it. This paper adopts reversible data hiding approach to surge the security by
hiding data in an image. This paper makes use of color images rather than grayscale
to exaggerate the capacity of hidden data. Senders can encrypt the original image
using data hiding in encryption (DHE) by using an encryption key and dynamic
histogram. The LSB is then compressed to make space for the data hiding key to be
used to hide the data. The receiver makes use of both encryption and hiding keys for
accurate retrieval of data. If the receiver makes use of only one key, the particular
functionality will respond depending on the key used.
1 Introduction
The idea of information concealment proposes that privileged data must be mediated
into a transporter medium with the idea of some host adjustment requirements. In the
usual approaches, the information concealment strategies will cause the host medium
D. N. V. S. L. S. Indira (B) · Y. K. Viswanadham (B) · Ch. Suresh Babu · Ch. Venkateswara Rao
Department of Information Technology, Gudlavalleru Engineering College, Gudlavalleru, AP
521356, India
J. N. V. R. Swarup Kumar
Department of Computer Science and Engineering, Gudlavalleru Engineering College,
Gudlavalleru, AP 521356, India
e-mail: swarupjnvr@gecgudlavalleru.ac.in
D. N. V. S. L. S. Indira · Y. K. Viswanadham · J. N. V. R. Swarup Kumar · Ch. Suresh Babu ·
Ch. Venkateswara Rao
Gudlavalleru Engineering College, Gudlavalleru, AP 521356, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 519
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_38
520 D. N. V. S. L. S. Indira et al.
to twist. In certain domains, such as clinical images and military stuff, these kinds
of twists are simply not allowed. There are a few reversible information concealing
strategies that have been presented. Lossless pressure-based techniques, distinction
development strategies, histogram adjustment techniques are some of the techniques
that have been developed [1, 2].
A lot of the commonly used lossless pressure-based tactics rely on factual repe-
tition of the host in order to create room for the concealment of sensitive data. The
difference upgrade is one of the most important steps in the picture preparation
procedure. As an example, we will discuss histogram leveling as a way to improve
the distinctiveness of a picture. A histogram is a visual representation of data [3, 4].
Histograms are graphical representations of the force of pixels in an image.
When we use this technique, we stretch the image so that it is more distinct.
This is a graphical representation of a picture that is dependent on its pixels and
the specific power of comparing pixels [5, 6]. The same is true for an 8-cycle dim
scale, where there are 256 unique potential powers. If you are interested in learning
more about the histogram of color photographs, click here. So as to increase the
difference of the photos, power levels are again remapped into images using the
histogram equalizing process and change the histogram of a genuine picture into a
level, uniform histogram. There is a constant center or close to middle brightness level
in the restored image, which indicates a high degree of brightness. Changes should be
made to photographs with low and high brightness values [5, 7]. A few standards are
employed for the enhancement of histogram balance-based differentiation upgrading,
such as bi-histogram adjustment (BBHE), equivalent region dualistic sub-picture
histogram evening out (DSIHE), and least mean brilliance blunder bi-histogram
balance (MMBEBHE) [8–10].
The method of BBHE precisely parceled the picture into two comparable parts
and the partition force is introduced by the information mean splendor esteem, which
is the normal power of all pixels that develop the information picture, and there two
parts are equalized (Figs. 1 and 2).
The procedure of dualistic sub-picture histogram leveling (DSIHE) follows a
similar model as followed by BBHE. While the least mean brilliance blunder,
bi-histogram evening out (MMBEBHE) is the expansion of BBHE for additional
improvement of differentiation [7, 11].
2 Related Work
The image separation will be improved by using the histogram equalization (HE) as
it performs dynamic arrive at expansion and levels a histogram. Norms show that the
entropy of the message source is highest when the message has uniform scattering. In
this manner, histogram evening out can start dynamic changes in the image contrast
[3, 12].
Reversible Data Hiding Using LSB Scheme and DHE for Secured … 521
Fig. 1 DHS
BBHE is one such procedure which is, by and large, used to stay aware of the
distinction of the image. In the BBHE connection, we will consider an image and
undertaking its histogram see. The image histogram will furthermore be apportioned
522 D. N. V. S. L. S. Indira et al.
into equal parts. The splendor regard is resolved as the mean of the image power
regard which is just the separation power which builds up the image. These two
picture histograms are autonomously leveled out to make a histogram which will lie
in the extent of information mean and dim level. Exactly when we join these two
histograms, we will make a histogram going from zero to L-1. When this histogram
is disconnected depending on power, we will convey two histograms of arrive at 0
and reach esteem [5, 13].
Least mean brightness botch bi-histogram system is one such strategy which is used
to stay aware of the distinction of the image. The MMBEBHE also accepts a compar-
ative communication as the BBHE. The solitary difference is that when the image
is sub-isolated into sub-modules, there we consider the cutoff levels of the pictures.
The output modules will have the levels which are in the running [0, lt] and I [lt +
1, L −1]. MMBEBHE is officially characterized by the accompanying systems:
(1) Determine the AMBE for each potential threshold level.
(2) Determine the XT threshold level that produces the smallest AMBE.
(3) Divide the input histogram into two halves based on the XT obtained.
Recursive mean separate histogram evening out method is moreover one such system
which is, by and large, used to save the splendor of the image. In the BBHE technique,
we will play out the mean separation and a while later parcel the image to save the
quality. By virtue of RMSHE, we will separate the image into extra events to stay
aware of the magnificence of the main picture. HE is identical to RMSHE level 0
(r = 0). BBHE is identical to RMSHE with r = 1.The yielded picture brilliance is
protected, and unique picture is acquired [5].
The standard interaction or technique for information stowing away makes some
aggravations in input picture while recovering information out of Stego picture.
The method of reversible information stowing away is a cycle where we inject the
privileged information inside a picture and recover the first cover picture with any
bending.
In the recent years, scientists had proposed numerous new philosophies for the
reversible information stowing away. In the distinction development technique, we
consider the two contiguous pixel worth of the picture and twofold the pixel worth
of them. The multiplying of the picture pixel will create new LSB esteem. The new
LSB esteem gives an extra space to implant the information in the picture (Fig. 4).
The information hider method is additionally one such strategy which performs
reversible information stowing away. In this technique, we consider the histogram’s
apex points and adjust the pixel values to introduce information into it. In the further
examination, there are numerous methods which play out the reversible information
concealing ways to deal with work on the presentation [15].
proprietor can encode unique picture utilizing a legitimate encryption key. Then,
at that point utilizing the concealing key we will pack the LSB bits to oblige the
information at all huge pieces. On the off chance that the recipient has just one kind
of key either covering up or encoded he can get just one yield either covered up
information or unscrambled picture.
4 Experimental Results
We are thinking about four kinds of host pictures of sizes 512*512 which we named
lena, mandrill, plane, and cake.
Both the sets A and B were isolated into 16 subsets. We need to anticipate that the
limit of the subset should be more than that of information. The measure of assistant
data of the past subset, the helper data of the subset is produced after the information
is implanting or installed. We can recuperate the first substance from this utilizing the
backward request. The ideal exchange instrument carried out for every single subset
with the exception of the last one is utilized to accomplish a decent payload-bending
execution. Utilizing the LSB substitution technique, we will implant the helper data
in the last subset and recuperate the substance in opposite request (Figs. 5, 6, 7, and
8).
5 Conclusion
and that the RMSHE procedure is able to take care of these cases. In addition to
the BBHE, the MMBEBHE is a technique that allows for maximum brightness
to be preserved in a photograph. In spite of the fact that these tactics are useful
Reversible Data Hiding Using LSB Scheme and DHE for Secured … 527
References
1. Zhang X (2011) Reversible data hiding in encrypted image. IEEE Signal Process Lett
18(4):255–258. https://doi.org/10.1109/LSP.2011.2114651,April
2. Pravalika SL, Joice CS, Joseph Raj AN (2014) Comparison of LSB based and HS based
reversible data hiding techniques. In: 2014 2nd international conference on devices, circuits
and systems (ICDCS). pp 1–4
3. Lee J-D, Chiou Y-H, Guo J-M (Oct 2013) Reversible data hiding scheme with high embedding
capacity using semi-indicator-free strategy. Comput Intell Image Process 2013
4. Qin C, Zhang X (2015) Effective reversible data hiding in encrypted image with privacy
protection for image content. J Vis Commun Image Represent 31:154–164
5. Puteaux P, Puech W (July 2018) An efficient MSB prediction-based method for high-capacity
reversible data hiding in encrypted images. IEEE Trans Inform Forensics Secur 13(7):1670–
1681
6. Anita H, Hangargi K, Pattan P (July 2019) Reversible data hiding in encrypted image. Int J
Innovative Technol Exploring Eng (IJITEE) ISSN: 2278-3075 8(9)
528 D. N. V. S. L. S. Indira et al.
7. Wedaj FT, Kim S, Kim HJ et al. (2017) Improved reversible data hiding in JPEG images based
on new coefficient selection strategy. J Image Video Proc 63
8. Gonzalez RC, Woods RE (2002) Digital image processing, 2nd edn. Prentice Hall
9. Al-qershi O, Ee KB (Oct 2009) An overview of reversible data hiding schemes based on
difference expansion technique. First international conference on software engineering and
computer systems
10. Peter N (2015) A system for separable reversible data hiding using an encrypted image. Int J
Eng Res Technol (IJERT) 3(28)
11. Abikoye O, Adewole S, Oladipupo J (2012) Efficient data hiding system using cryptography
and steganography. Int J Appl Inform Syst (IJAIS) 4:6–11. https://doi.org/10.5120/ijais12-
450763
12. Yu C, Zhang X, Tang Z, Chen Y, J Huang (2018) Reversible data hiding with pixel prediction
and additive homomorphism for encrypted image. Secur Commun Networks 2018:13. Article
ID 9103418
13. Manikandan VM, Masilamani V (2018) Reversible data hiding scheme during encryption using
machine learning. Procedia Comput Sci 133:348–356
14. Sabeen Govind PV, Wilscy M (2015) A new reversible data hiding scheme with improved
capacity based on directional interpolation and difference expansion. Procedia Comput Sci
46:491–498
15. Ayyappan S, Lakshmi C, Menon V (2020) A secure reversible data hiding and encryption
system for embedding EPR in medical images. Curr Signal Transduct Ther 15(2)
Prediction of Solar Power Using Machine
Learning Algorithm
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 529
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_39
530 M. Rupesh et al.
1 Introduction
2 Proposed Model
Reliable data availability and choosing the right attributes from the collected data
is very important to predict accurately especially in solar power generation [4]. In
this work, the data is collected from the BV Raju Institute of Technology, Narsapur,
Medak Dist., India.
We choose the 3 years of weather information dataset from the above-said location.
The dataset available is minutes-based values of weather parameters like irradiance,
temperature, panel temperature, wind direction, and speed. Different weather values
are collected from the year 2012 to 2014, i.e., dataset of 201,235 to analyze the
relationship between the weather parameters and the power generation for accurate
prediction (Figs. 1 and 2).
Prediction of Solar Power Using Machine Learning Algorithm 531
State of the art of solar power generation will only be established if forecast algorithm
predicts that how much power will be generated at any location and time [5–8]. The
machine learning algorithm will be developed with training, testing, and validating
the collected data, the workflow [9] is as followed in the flowchart as shown in Fig. 3.
We have tested the selected data of the given work in the commonly used machine
learning algorithm feed-forward neural network with back-propagation to evaluate
the performance with weather data. In this weather, parameters are given as an input
to the ANN [10] and it gives the output as predicted solar power. The neuron functions
532 M. Rupesh et al.
used in this model are used to train the dataset with the learning rate as 1000 epochs.
It is analyzed and observed that the RMS error value decreases as the increased
learning rate.
The given dataset is split into three categorized as testing, training, and validating
datasets.
A multilayer feed-forward neural network in our proposed method consists of
input layer, two hidden layers and output layer [11]. The input layer consists of the
weather parameters as attributes, and output layer consists of solar power and voltage
as attributes.
Figure 4 [12] is representing multilayer feed-forward back-propagation neural
network.
Cost function of gradient is defined as [13].
1
h w,b (x) − y 2
J (W, b; x, y) = (1)
2
From the above, the squared-error cost function is defined as
Prediction of Solar Power Using Machine Learning Algorithm 533
1
m
λ nl−1 (l) 2
Sl Sl+1
J (W, b) = [ J W, b; x (i) , y (i) + W ji (2)
m i=1 2 l=1 i=1 j=1
1 1
m
nl−1 (l) 2
Sl Sl+1
J (W, b) = h w,b x (i) − y (i) 2 + λ W ji (3)
m i=1 2 2 l=1 i=1 j=1
∂
Wilj = Wilj − α J (W, b) (4)
∂ Wilj
∂
bi(l) = bi(l) − α J (W, b) (5)
∂bil
1
m
∂
J (W, b) = [ J W, b; x (i) , y (i) + λWi(l)
j (6)
∂ Wi j
l m i=1
1
m
∂
J (W, b) = [ J W, b; x (i) , y (i) (7)
∂bil m i=1
Prediction of Solar Power Using Machine Learning Algorithm 535
End
536 M. Rupesh et al.
The R value for the given model is about 0.994, which tells about the accuracy of
the predicted model.
5 Conclusion
In this paper, the Sun irradiance, temperature, wind velocity, humidity, as input
variables, solar-generated voltage, power as output variables have been connected
from the BVRIT Narsapur, Medak Dist. Telangana, and generalized artificial
Prediction of Solar Power Using Machine Learning Algorithm 537
neural network model using machine learning algorithm, i.e., feed-forward back-
propagation algorithm have been developed for weather and solar power forecasting
using MATLAB/Simulink application. Finally, it can be concluded that the solar
forecasting is achieved with the accuracy of 99.4%; hence, our model can be used to
estimate the power generation from any solar plant at any location.
538 M. Rupesh et al.
Acknowledgements The authors would like to thank BVRIT, Narsapur Solar Plant in charge Mr.
N. Ramchandar, Associate Professor, EEE, BVRIT, Narsapur, and Mr. M. Sudheer Kumar, Assistant
Professor, BVRIT HYDERABAD College of Engineering for Women, Hyderabad.
References
5. Wu YK, Chen CR, Abdul Rahman H (2014) A novel hybrid model for short-term forecasting
in PV power generation. Int J Photoenergy 2014
6. Coelho JP, Boaventura-Cunha J (2014) Long term solar radiation forecast using computational
intelligence methods. Appl Comput Intell Soft Comput 2014(December):1–14
7. Gupta A, Kumar P, Pachauri RK, Chauhan YK (2014) Performance analysis of neural network
and fuzzy logic based MPPT techniques for solar PV systems. 2014 6th IEEE Power India Int
Conf 1–6
8. Khan I, Zhu H, Khan D, Panjwani MK (2018) Photovoltaic power prediction by cascade
forward artificial neural network. 2017 Int Conf Inf Commun Technol ICICT 2017
2017(December):145–149
9. Ahmed R, Sreeram V, Mishra Y, Arif MD (2020) A review and evaluation of the state-of-the-
art in PV solar power forecasting: techniques and optimization. Renew Sustain Energy Rev
124(June 2019):109792
10. Aljanad A, Tan NML, Agelidis VG, Shareef H (2021) Neural network approach for global
solar irradiance prediction at extremely short-time-intervals using particle swarm optimization
algorithm. Energies 14(4)
11. Shekher A, Khanna V (2016) Modelling and prediction of 150KW PV array system in Northern
India using artificial neural network. 5(5):18–25
12. Kabilan R, et al. (2021) Short-term power prediction of building integrated photovoltaic (BIPV)
system based on machine learning algorithms. Int J Photoenergy 2021
13. (2015) Multi-layer neural network neural network model. http://deeplearning.stanford.edu/tut
orial/supervised/MultiLayerNeuralNetworks/. pp 1–6
14. Shaik NB, Pedapati SR, Ammar Taqvi SA, Othman AR, Abd Dzubir FA (2020) A feed-forward
back propagation neural network approach to predict the life condition of crude oil pipeline.
Processes 8(6)
15. Choudhary A, Pandey D, Bhardwaj S (2020) Artificial neural networks based solar radiation
estimation using backpropagation algorithm. Int J Renew Energy Res 10(4):1566–1575
Prediction of Carcinoma Cancer Type
Using Deep Reinforcement Learning
Technique from Gene Expression Data
Abstract In recent decades, the investigation based on the molecular level for the
classification of cancer is becoming trending research topic for several researchers to
identify the type of cancer based on the gene expression data. Analyzing large number
of gene characteristics offered in-depth classification problem for cancer types. These
characteristics help in understanding the gene functions and interaction between the
abnormal and normal conditions of it. Under various conditions, the expression data
of gene to genes behavior is monitored by this characteristic. In this paper, a deep
reinforcement learning (DRL) model is proposed for the effective analysis of gene
expression data to find the type of cancer. The dataset of gene expression is used
for analyzing the model for predicting the cancer types. Furthermore, the simulation
results show that the proposed DRL model can predict the cancer type by obtaining
a 97.8% of accuracy when compared with other existing models.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 541
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_40
542 A. Prathik et al.
and concurred in a critical exact way. The normal four units of molecular forming
the helix DNA are then sequenced by specific courses of action like each segment
on each strand can just connect with specific boundaries in the rest of the strand.
Replication of DNA occurs by breaking the connection among the two strands-the
curved twofold helix and each strand structure a strand that is coordinating.
Level of articulation in quality speaks to the complete of RNA created in each
cell underneath different conditions of science. So during the time of division in cell
methodology, if the phones anguish different illnesses that are threatening or malig-
nant growth that structures transformation or adjustment in qualities, the unstoppable
conduct of quality will be demonstrated to girl cells. In addition, specific articula-
tions in quality will be self-important and subsequently level of articulations can be
shaped by dissecting the RNA. The degree of articulations of thousands of qualities
can be persistently determined underneath certain reenactment condition and circum-
stances because of certain headway of microarray DNA innovation. This philosophy
made it liable to appreciate life on level of particles. For move ring the microarray
test of DNA from their simple model, which includes the inclusion of DNA succes-
sion imprinted in a higher thickness cluster on an infinitesimal glass slide, into an
advanced change, which is the lattice containing the quality articulation that can be
observed and reenacted a few stages which are finished.
The average strategy is to impart mRNA acquired from two cells and oppo-
site transmit them two to frame cRNA, notice them using fluorescent colors. Two
examples are dissipated over entire microarray to hybridize each cDNA (referenced
marked cDNA make connect to their substitute cDNA on microarray to make a
particle of twofold abandoned in technique known as hybridization). This hybridiza-
tion in this manner advance like a pointer of specific quality. The slide is prepared
to accomplish numerical qualities of each color. At that point, the controlled that
can be demonstrated powers of numerical estimations of each color; the hues force
compares to number of mRNA spelled out for each quality. By partner the power
of qualities shading under two different reproduction circumstances, articulation of
quality stages can be checked. For each quality on each chip, articulation of quality
range is: where is red dyes intensity and is green dyes intensity.
The optimal of genes which are discriminant among different types of cancers
or classes is a vital research area. Third, various trade-offs have been exposed like
maintaining rate of accuracy versus maintaining generalization, handling complexity
versus enhancing the performance of classifiers, improving the memory requirement.
Those parameters have affected the significance of the algorithm involving the clas-
sification in cancer. However, in recent days, for significant amount of data of gene
expression for the classification of cancer, the total of samples involved in training
is very little when compared to huge number of genes included in the simulations.
At the point when complete qualities are drastically higher than the all-out exam-
ples, it is probably going to recognize irregular and applicable organic relationships
of conduct of quality with test stages. To defend against results, finding a least
conceivable, however, most point-by-point subset of qualities is the point of quality
choice. This is a huge issue in AI, which is characterized as highlight choice [1].
In extra, a lower subset of qualities is additionally important in creating articulation
Prediction of Carcinoma Cancer Type Using Deep Reinforcement … 543
2 Related Work
Okun [6] mentioned as ensemble model simulated over colon dataset. Filter selection
based on feature methods is utilized to alleviate the overfitting effects. Three various
selection gene models were simulated, namely backward determination Hilbert–
Schmidt independence criterion “BAHSIC” [7], extensive range distribution-based
selection of gene “EVD” [8] and singular range decomposition entropy selection in
gene in [9]. The ensemble includes five classifiers and utilizes k-nearest neighbor
“K-NN” with various values of K either 3 or 5 neighbors which is nearest. The K-NN
classifiers decision was advocated as it does not require preparing which is to a great
extent reasonable for use with dataset of colon as a result of microarray nature infor-
mation. Because of lower sample size, boosted substitution fault calculation “BRE”
is utilized in [10]. The substitution boosted calculation is based on theory, large
544 A. Prathik et al.
3 Proposed Methodology
A deep reinforcement learning model is proposed for analyzing the cancer types
based on the gene expression data. Figure 2 shows the overall architecture of the
proposed model. The deep neural network with reinforcement learning can easily be
identified, and the performance is improved based on the accuracy metrics. There
are three main modules in this research framework, and these are: preprocessing
method, feature extraction, and classification.
In this process, the dataset is manipulated using the preprocessing module. The
major process of this module is filtering, logarithmic transformation, data normaliza-
tion, and thresholding. Before the classification process is made, these preprocessing
modules are used for preparing the dataset into well format. Once the preprocessing
phase is completed, enrichment of gene data is initiated utilizing various dataset as
provided in Table 1. Functional span is estimated using the provided gene expression.
Algorithm 1 provides different phases involved in it. Mathematically functional span
is written as function (F) which includes three parameters as input:
where X describes the profile of input gene and mc is the total of cores available for
parallel processing.
Algorithm 1
Input: Gene Expression (V)
Output: Function Span (F)
Step 1: V = Scale (V) // measure the columns of numeric values
Step 2: [R C] = size (V)
Step 3: calculate the measurement in parallel
Step 4: for ag_i R do
Prediction of Carcinoma Cancer Type Using Deep Reinforcement … 547
Algorithm 2
Input: gene expression’s functional span (S), labels (L).
Output: Classifier
Step 1: do in parallel
Step 2: Adjust framework
Step 3: J_f = as.h2o (S,L) // data frame preparation for H2o
Step 4: A_m = h2o.model building by deep learning (〖training 〗_data)
Step 5: C_s = feature (A_m,S)
Step 6: return A_m
Parameter development is a tool to increase the input for the specific step. This is
carried out using various methodologies and procedures. In this paper, deep rein-
forcement learning model is incorporated for this optimization purpose. Parameter
named as H2O in the parallel classifier is modified automatically using option known
as hyperparameter. It achieves grid search randomly over all the existing parameters
and returns extremely accurate model. In this context, the proposed DRL model is
utilized to enhance a number of neurons. This is achieved using values of random
neurons in populating hidden layers to get the optimal model.
Encoding chromosome: Let us assume that an extreme total of neurons of present
hidden layer be n and the neurons in output be o. The expression of neurons in hidden
layer can be expressed using binary hidden system as:
m 1 , m 2 , m 3 , . . . ., m n (2)
Encoding is used for neurons, means m_i will be 0 or either 1 based on the
scenario of whether the neurons exist or not. Real encoding is utilized for weights
(V _ij), which is usually represented as follows:
V11 V21 . . . Vn1 V12 V22 . . . Vm2 . . . V10 V20 . . . Vn0 (3)
548 A. Prathik et al.
It is an important process for making the performance of the proposed model for better
classification. Features are mainly concentrated here to extract particular variables
so that the classification accuracy can be improved. Here, the PCA algorithm is used
for analyzing and extracting the features for the proposed model and extracted the
significant genes from the dataset.
The proposed DRL module is used for the prediction of cancer type from the gene
expression dataset. This classification module can easily predict the type of cancer
based on genes dataset even if it has multiple class labels. Each class is identified
by the deep neural network and continuous estimation using the Q-learning method
reinforcement learning.
4 Experimental Results
In this section, the simulation results are analyzed using the proposed DRL model.
The results are compared with other existing techniques to analyze the performance
of the proposed classifier model. For the performance analysis, the accuracy, TPR,
and FPR metrics are used for the result analysis and three main datasets used for
this result analysis are breast cancer, glioblastoma dataset, and lung cancer. Table 2
shows the performance analysis and comparison results of the proposed model vs
existing algorithms.
Figure 3 shows the comparison results of different algorithms for various datasets,
namely breast cancer, glioblastoma, and lung cancer. The proposed DRL model
outperforms by obtaining more than 98% of accuracy when compared with other
existing techniques.
The ROC curve depicted in Fig. 4a-b illustrates the sensitivity and specificity of
model depicted on training data and additional-based test data. The performance of
prediction on the test data depicts an enhance in the accuracy for DRL by 86.5%. The
data utilized for training the model performs with AUC = 0.95. More remarkably,
we could note still classifier well AUC = 0.75, while the curve for validation is above
the classifier line.
Table 2 Performance comparison for different datasets
Datasets algorithms Breast cancer Glioblastoma Lung cancer
Accuracy (%) TPR (%) FPR (%) Accuracy (%) TPR (%) FPR (%) Accuracy (%) TPR (%) FPR (%)
Proposed DRL model 98.3 97.8 1.34 99.2 98.34 0.98 97.34 96.9 2.34
SVM 91.23 90.78 8.77 92.34 91.25 7.65 93.42 91.3 7.26
RF 78.9 76.7 23.2 81.23 80.94 20.34 82.34 81.98 20.12
ANN 94.5 93.2 6.57 93.47 92.34 7.12 94.5 93.8 5.34
Prediction of Carcinoma Cancer Type Using Deep Reinforcement …
549
550 A. Prathik et al.
In our model, we have illustrated that the primary gene expression can be an
excellent predictor of response to cancer drugs. By utilizing various classification
and clustering techniques, we analyzed the cancer gene expression with validation
accuracy of 86%. Our performance analysis depicts that the DRL model performs
better than the other existing models as shown in Fig. 3. The DRL model had a large
substantial sample size of patient’s data. This was beneficial, as the model was able to
achieve increased diversity in the data used for training to build a demanding model
that was able to successfully forecast on a newer dataset.
5 Conclusion
In this research, the DRL model is proposed for analyzing the cancer types using
the gene expression data. This classification method obtains a correct class of the
particular cancer with having more than 98% of accuracy when compared with ANN,
RF, and SVM classifiers. The false rate for the proposed model is much less for iden-
tifying the cancer types. The overfitting is reduced by obtaining correct testing and
training data for the model, and using PCA extraction technique, we further analyzed
the feature for improvement of performance. Moreover, this proposed model can be
easily used for the classification of multi-class dataset in different domains.
Prediction of Carcinoma Cancer Type Using Deep Reinforcement … 551
Fig. 4 a ROC curve for DRL model (sensitivity: 0.87 specificity: 0.70 AUC:0.88) b ROC curve
gene expression model: cross-validation (sensitivity:0.75 specificity:1.0 AUC:086)
References
1. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
2. Blum A, Langley P (1997) Selection of relevant features and examples in machine learning.
Artif Intell 97(1–2):245–271
3. Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J,
Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery
and class prediction by gene expression. Science 286:531–537
4. Alizadeh A et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene
expression profiling. Nature 403:503–511
5. Tizhoosh HR, Taylor GW (2006) Reinforced contrast adaptation. Int J Image Graph 6(03):377–
392
6. Okun O (2011) Feature selection and ensemble methods for bioinformatics: algorithmic
classification and implementations. Med Inform Sci Ref
552 A. Prathik et al.
7. Song L, Smola A, Gretton A, Borgwardt KM, Bedo J (June 2007) Supervised feature selection
via dependence estimation. In: Proceedings of the 24th international conference on machine
learning. ACM. pp 823–30
8. Li W, Sun F, Grosse I (2004) Extreme value distribution based gene selection criteria for
discriminant microarray data analysis using logistic regression. J Comput Biol 11(2–3):215–
226
9. Varshavsky R, Gottlieb A, Linial M, Horn D (2006) Novel unsupervised feature filtering of
biological data. Bioinformatics 22(14):e507–e513
10. Dougherty ER, Sima C, Hanczar B, Braga-Neto UM (2010) Performance of error estimators
for classification. Curr Bioinform 5(1):53–67
11. Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray
classification? Bioinformatics 20(3):374–380
12. Volinia S, Calin G, Liu C (2006) A microRNA expression signature of human solid tumors
defines cancer gene targets. Proc Natl Acad Sci USA 103:2257–2261
13. Murakami Y, Yasuda T, Saigo K (2006) Comprehensive analysis of microRNA expression
patterns in hepatocellular carcinoma and nontumorous tissues. Oncogene 25:2537–2545
14. Bishop JA, Benjamin H, Cholakh H, Chajut A, Clark DP, Westra WH (2010) Accurate classifi-
cation of non-small cell lung carcinoma using a novel microRNA-based approach. Clin Cancer
Res 16(2):610–619
15. Parry RM, Jones W, Stokes TH, Phan JH, Moffitt RA, Fang H, Shi L et al (2010) k-Nearest
neighbor models for microarray gene expression analysis and clinical outcome prediction.
Pharmacogenomics J 10(4):292–309
16. Yousef M, Nebozhyn M, Shatkay H, Kanterakis S, Showe LC, Showe MK (2006) Combining
multi-species genomic data for microRNA identification using a Naive Bayes classifier.
Bioinformatics 22(11):1325–1334
17. Zheng Y, Kwoh CK (2006) Cancer classification with microRNA expression patterns found
by an information theory approach. J Comput 1(5):30–39
18. Ibrahim R, Yousri NA, Ismail MA, El-Makky NM (2013) MiRNA and gene expression
based cancer classification using self-learning and co-training approaches. In: 2013 IEEE
international conference on bioinformatics and biomedicine. IEEE, pp 495–498
Multi-variant Classification
of Depression Severity Using Social
Media Networks Based on Time Stamp
Abstract Many people in the modern day are suffering from severe depressive
illness. According to the World Health Organization (WHO), depression will become
more common in the next twenty years. Detecting depression at an early stage is
difficult since many people are unaware that they are suffering from it, and this
undetected situation can lead to suicidal thoughts. Thus, depression needs to be
predicted at early stages. Due to the increase in number of people using social media,
the online social network became a platform for many individuals to share their
feeling and expression in day-to-day life. This paper has attempted to develop a
system for analyzing social media posts (Twitter tweets) of the individual for a
specific time period of four weeks or more depending upon the case. The emotions
in the textual data are examined using LSTM-CNN; if the pattern changes, it identifies
a change in the person’s emotional well-being. The method would identify the degree
of depression and the reason of depression based on the change, whether it is due to
a personal connection, the job, or some other factor.
1 Introduction
Individuals experience a variety of emotions; among these, more than 350 million
people suffer from depression, one of the most prevalent mental diseases [1]. It
happens in various intensities as well. Prolonged phases of depression lead to a
number of serious mental health issues, not only affecting the productivity of a
person, but also sometimes lead to self-harm and suicide. Symptoms of depression
include anxiety, sometimes a feeling of loneliness, in worst cases considering oneself
not worthy enough, along with mood swings, eating disorders, etc. People normally
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 553
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_41
554 M. Yohapriyaa and M. Uma
show different sets of symptoms, and while being in that state, they do not feel
comfortable in talking to others about their problems freely.
Nowadays, many people use social media like Twitter, Facebook, and forums to
share their emotions and feelings in day-to-day life [2]. Open and free communication
platforms such as the social media sites, online blogs, and discussion forums help in
problem solving and information sharing [3]. The traditional method of research in
the area of depression analysis was based on questionnaire methods, which require
subjective response or comments from the individuals. This method does not provide
good accuracy since it differs from person to person and it is difficult to obtain real
emotions like social media data. Thus, social media is widely used in detecting the
disorders like stress, anxiety, and depression. According to WHO September 2012
Report, it is stated that 75% of suicides were happened in low- and middle-income
countries. Based on Lancet Report in 2012, it is clear that in India many adults aged
15–29 years were committing suicide. National Crime Records Bureau Reports state
that 2471 students commit suicide due to failure in examination in the year 2013.
Figure 1 shows various percentages of suicidal cause in the year 2015. According
to the WHO, about 7,88,000 persons were affected by this illness in 2015, with
family problem
illness
causes unknown
marriage related
other causes
unemployment
failure in examinaon
drug abuse
Bankruptcy
love affair
roughly 8934 students dying as a result of depression [4]. In the following five years,
this figure is expected to grow to 39,775. As a result, there is a need to establish
a system for early depression prediction in order to save the lives of many young
people. In this paper, we tried to develop a system to detect depression from Twitter
post of individuals for a specific time period. LSTM-CNN is used to detect emotions
that reflect in the post posted by the user. The system will detect the severity of the
depression based on the result of the neural network.
2 Related Works
Many research works were carried out in the field of depression detection by
analyzing the social media data. Depression can affect the language used by suffering
individuals. University of Texas has conducted an examination in the form of essay
writing for a group of individuals, who are depressed, non-depressed, and formerly
depressed college students. The study confirms the increased usage of the word “I” in
particular, along with more negative emotion words in the depressed student’s group,
thus telling us the use of singular form of words is done more frequently by such
individuals. Similarly, a Russian speech study also found an increased frequency of
pronouns and verbs in past tense among the depressed patients. Another study done
on English forum posts observed an elevated use of words like absolutely, completely,
every, nothing, etc., commonly known as absolutist words, among people suffering
from depression and anxiety. Thus, we believe that Twitter tweets will be very useful
in detection of depression [5].
Trotzek et al. [6] have used natural language processing for depression detection by
comparing user-extracted data with the dataset and classify the severity of depression.
The proposed system considers early risk detection error as a metric for depression
detection. This method does not provide high accuracy due to high false-positive
rate of the system. Minimizing false-positive rate can increase the accuracy of the
system.
Aldarwish et al. [7] use Naïve Bayes classifier algorithm along with the question-
naire method to detect stress on Facebook users based on their location and Facebook
post posted by the individuals. The proposed system uses API to extract the data and
questionnaire for further classification to increase the accuracy of the system. Iram
et al. [8] focus on classifying depression through studying linguistic style of the post
of the individual along with the sentiment analysis with the use of mood tags. This
paper implements linguistic inquiry word count as an analysis tool to determine the
severity of depression based on the content generated by the user. Choudhury et al.
[9] in this paper use Twint to extract the Twitter data of individuals using Twitter
Username. Finding the annotated data is difficult with this approach. If the data is
cleansed well, the accuracy can be increased.
Song et al. [10] in this paper consider the ruminative thinking, writing style along
with depression symptoms for text analysis. Recurrent neural network is used to
556 M. Yohapriyaa and M. Uma
understand user semantics. This paper presents feature attention network which simu-
lates the process of detecting depression using social media text by domain expert.
Oak [11] in this paper focus on structuring and processing of text before analysis
followed by radial basis function network (RBFN). These discriminant predictors
along with random forest classifier will help in identifying the depressed post and
differentiate it from neutral one. Victor et al. [12] in this paper proposed automa-
tion evaluation with multimodal neural networks to detect depression based on the
Facebook post of the user along with the questionnaire methods. The proposed frame-
work also incorporates artificial intelligence mental evaluation (AIME) and Naïve
Bayes classifier algorithm to increase the accuracy of the system. Fedil et al. [13]
analyze user behavior based on different aspects of their writings and other features
like textual spreading, time span, and time gap. The research is unique because they
considered time gap between the posts which is not taken into account in previous
research, but the time variable nature of this parameter makes the training data diffi-
cult and decreases the accuracy of the system. Smys et al. [15] proposed a hybrid
approach of support vector machine and Naïve Bayes algorithm to improve the accu-
racy in early detection of depression. Depression data from different social media
domains should be included to test the accuracy and sensitivity of the proposed
model.
Valanarasu et al. [16] proposed a model that uses dynamic multi-context infor-
mation from various social media data like Twitter, Facebook, and Instagram for
predicting the personality of person. The accuracy of the proposed approach is high
compared to other traditional approach in the process of personality prediction of
person. Senthil Kumar et al. [17] proposed a hybrid technique based on Naïve Bayes
and the decision tree for predicting children behavior based on their emotional reac-
tion. The limitation in the proposed model is that it led to overfitting when there is
more change in the training data.
3 Methodologies
The proposed model will work as follows: The LSTM with convolutional neural
network was built using Keras to determine whether social platform users are depres-
sive based on their Twitter posts. We used binary classification in this project because
retrieving datasets on mental illnesses is difficult. Long short-term memory (LSTM)
is well suited to classify and predict sequential data, which was chosen for this project
to retrieve random tweets. We retrieved a CSV file from the Kaggle dataset Twitter
sentiment. Since there are no public datasets available for depression, the Twint tool
is used to scrape data with the keyword depression to get data from over thousands of
users at once. For the data procession stage, the tweets data goes through a cleansing
stage, where all the irrelevant data is removed, which includes the emojis used in
tweets, hashtags, stop words, and various punctuations. The contractions are then
expanded. The tokenizer is then used to assign indices to words and filter out infre-
quent words, thus increasing the usability of the datasets and decreasing the time
Multi-variant Classification of Depression Severity … 557
complexity of the system. Then, we proceed with making the embedding matrix for
the embedding layer of the model. For model architecture, the tokens and tweets
are entered into the embedding layer in a structured manner to get an embedding
vector, which forms our working unit. Figure 2 represents the architecture model of
the proposed system.
We have designed the graphic user interface in such a way that it accepts basic user
details, such as name, age, gender, date of birth, and their Twitter account user ID.
This user ID is our main source for data extraction to get all the tweets from the
558 M. Yohapriyaa and M. Uma
Fig. 3 Code snippet for entering Twitter ID and start date of analysis
user’s account. This includes all the tweets/posts, number of comments, number of
reactions, etc. For this, we have used the tool called Twint.
Twint is an advanced tool for scraping, coded inwardly in Python, allowing the
user to scrape tweets from Twitter user profiles without having to use Twitter APIs. It
utilizes the search operators of Twitter to permit scraping tweets from specified users,
along with hashtags used and comments with date and time the tweets were posted
by the user, without extracting sensitive user information such as their messages and
other personal interactions. Twint can differentiate tweets from other information
such as e-mails and telephone numbers.
Some of the benefits of using Twint that made us choose this include: First, it
fetches almost all user tweets (the API of Twitter limits it to last 3200 tweets only);
second, it felicitates fast setup initially; third and most important is it can used
anonymously too without Twitter sign-up, all this without any charges per usage.
Figure 3 represents code snippet that shows the part of code where user entry for
Twitter ID and start date for analysis is done.
In this stage, the raw data is cleaned to avoid all unnecessary information like hash-
tags, links, emojis, mentions, stop words, and punctuations. It is essential to clean the
data so that the unnecessary data is removed and only data essential to the working
of the project remains.
Multi-variant Classification of Depression Severity … 559
Here, the individual tweets are padded with extra tweets to get uniform tweet length
in every tweet of 140 characters. This is done to make all data uniform in length by
inserting spaces to make every tweet 140 character long.
3.1.4 Tokenizer
The tokenizer is used to assign indices to words and filters out infrequent words.
Tokenizer is used to convert human readable text to machine readable text. It is used
to separate words in sentence for good understanding of the text by the machine.
During training, the model understands depressive words since the tokenizer has
separates them. Thus, tokenizer is essential for the project.
3.2 Analysis
The main model used for analysis is based on LSTM-CNN. Long short-term memory
(LSTM) is an artificial form of recurrent neural network (RNN) architecture used
in the field of deep learning. It is unique as it not only processes single data points
like images but entire sequences of data like speech or video [14]. Its unit includes
a cell, an output gate, an input gate, and a forget gate. The main purpose of cell is
to remember values over arbitrary time intervals, while the other three gates have
to regulate the information flow into and out of the cell. LSTM is used to classify,
process, and make predictions based on time series data by considering the lags
varying duration between important events in a time series. Convolutional neural
network (CNN) is a class of deep neural networks, applied to analyze visual images. It
is useful in applications such as image and video recognition, recommender systems,
natural language processing, image classification, medical image analysis, and finan-
cial time series. CNNs are like regularized versions of multilayer perceptron; thus,
they have each neuron in one layer which is connected to all neurons present in the
next layer. Figure 4 represents the combined structure of CNN and LSTM algorithm.
The LSTM-CNN combined architecture involves using the CNN layers combined
with LSTM; here, CNN helps in feature extraction on input data, while LSTM
supports sequence prediction. This combination is used to analyze and predict visual
time series, and for generating textual descriptions using sequences of images, or
videos. This model is used for activity recognition problems, which included gener-
ating textual descriptions of activities, demonstration done through a sequence of
images and image description, which generated textual descriptions of single images.
Thus, the importance of this architecture lies in generating textual descriptions of
images. Key assets of CNN are that it is pre-trained on tackling challenging image
560 M. Yohapriyaa and M. Uma
classification tasks for feature extractor. CNN-LSTM has also been used for recog-
nition of speech, where LSTMs work on audio and textual input data, and CNNs for
feature extractions.
The convolutional layer is added in the proposed approach because CNN is great
at learning spatial structure from data and the convolutional layer takes advantage
from that and learns some structure from the embedding vector. The output obtained
from the convolutional layer is then fed into the LSTM layer, whose corresponding
output is fed into the dense layer with sigmoid function for final prediction.
3.3 Classification
• If the frequency of the depressed tweets is below 20%, then the person is not
depressed.
The model has been successful in detecting depression of user using their Twitter IDs
and obtaining their severity measure. The result of our model shows a bar graph of
function of number of depressed tweets to time stamp. Depressed tweets are depicted
through spikes, and the absence of spikes represents normal or non-depressed tweets
from the user.
Figure 5 represents the results of test users; the graph depicts the total number of
depressed and not depressed tweets, and based on this the severity is obtained.
The most important feature of the system is detecting depression without inter-
acting with any psychologists. The system is reasonably accurate in detecting depres-
sion from the scraped tweets without any external input, and the combined nature
of LSTM and CNN in this proposed system is well suited for increasing the perfor-
mance of the system. The system has successfully proved that it can help in situation
where the user can recover themselves from the depression without the involvement
of therapist or psychologist. Figure 6 represents the confusion matrix, and the input
for the confusion matrix is obtained from fifty individuals continuously using Twitter.
Twitter ID and few other factors like age, gender, and name are collected from the
users. Data like comments, tweets, and retweets is extracted from the user’s Twitter
ID, and the depression severity is obtained using the proposed model. The confusion
matrix states that even though the system has high successful detection rate it needs
to be improved on better accuracy and detection rate. This can be done by increasing
the size of the dataset and running more epochs while training the model.
5 Conclusion
computing power; with more computing power, we can anticipate that our model
will show competitive performance against the state-of-the-art model. The proposed
model takes advantage of high-dimensional representations of neural networks and
at the same time allows other high-level features to be readily incorporated; if we
add other useful features to the model, it will be possible to obtain more reason-
able and diverse explanations for different aspects of depression. If we can generate
appropriate feature for other mental disorders (such as dementia, schizophrenia, and
bipolar disorder), it will be possible to simulate the process of diagnosing them in a
similar way.
USER RESULT 1
NOT DEPRESSED: The severity of depression for user 1 based on the classification of depression and
not depressed tweets is stated as not depressed.
USER RESULT 2
MILDLY DEPRESSED: The severity of depression for user 2 based on the classification of depression
and not depressed tweets is stated as Mildly Depressed.
USER RESULT 3
MODERATELY DEPRESSED: The severity of depression for user 3 based on the classification of
depression and not depressed tweets is stated as moderately depressed.
USER RESULT 4
SEVERLY DEPRESSED: The severity of depression for user 4 based on the classification of depression and
Not depressed tweet is classified as severely depressed.
Fig. 5 (continued)
References
1. Depression and other common mental disorders: global health estimates (2017) World Health
Organization
2. Glavan R, Mirica A, Firtescu B (2016) “The use of social media for communication”, official
statistics at European level. Rom Stat Rev 4:37–48
3. Rosa RL, Schwartz GM, Ruggiero WV, Rodríguez DZ (2018) A knowledge-based recom-
mendation system that includes sentiment analysis and deep learning. IEEE Trans Industr Inf
15(4):2124–2135
4. Khan A, Husain MS, Khan A (2018) Analysis of mental state of users using social media to
predict depression! a survey. Int J Adv Res Comput Sci 9(2):100–106
5. Al Asad N, Pranto MAM, Afreen S, Islam MM (2019) Depression detection by analyzing social
media posts of user. In 2019 IEEE international conference on signal processing, information,
communication & systems (SPICSCON). IEEE, pp 13–17
6. Trotzek M, Koitka S, Friedrich CM (2018) Utilizing neural networks and linguistic metadata
for early detection of depression indications in text sequences. IEEE Trans Knowl Data Eng
32(3):588–601
7. Aldarwish MM, Ahmad HF (2017). Predicting depression levels using social media posts.
In 2017 IEEE 13th international symposium on autonomous decentralized system (ISADS).
IEEE, pp 277–280
8. Fatima I, Mukhtar H, Ahmad HF, Rajpoot K (2018) Analysis of user-generated content from
online social communities to characterise and predict depression degree. J Inf Sci 44(5):683–
695
9. Choudhury MD, Gamon M, Counts S, Horvitz E (2016) Predicting depression via social media.
In: Proceeding of AAAI conference on weblogs and social media
10. Song H, You J, Chung JW, Park JC (2018) Feature attention network: interpretable depression
detection from social media. In PACLIC
11. Oak S (2017) Depression detection and analysis. In: 2017 AAAI spring symposium series
12. Victor E, Aghajan ZM, Sewart AR, Christian R (2019) Detecting depression using a frame-
work combining deep multimodal neural networks with a purpose-built automated evaluation.
Psychol Assess 31(8):1019
13. Cacheda F, Fernandez D, Novoa FJ, Carneiro V (2019) Early detection of depression: social
network analysis and random forest techniques. J Med Internet Res 21(6):e12554
14. Tadesse MM, Lin H, Xu B, Yang L (2020) Detection of suicide ideation in social media forums
using deep learning. Algorithms 13(1):7
15. Smys S, Raj JS (2021) Analysis of deep learning techniques for early detection of depression
on social media network-A comparative study. J Trends Comput Sci Smart Technol (TCSST)
3(01):24–39
16. Valanarasu MR (2021) Comparative analysis for personality prediction by digital footprints in
social media. J Inf Technol 3(02):77–91
17. Kumar TS (2021) Construction of Hybrid deep learning model for predicting children behaviour
based on their emotional reaction. J Inf Technol 3(01):29–43
Identification of Workflow Patterns
in the Education System: A Multi-faceted
Approach
1 Introduction
Workflow technology is one that is an on-going process with respect to time, in the
sense that the development process is ever-green and it is a newly emerging field of
Technology. The paradigm is construed in such a manner that the scope of its outreach
is broad, i.e. there are multiple products available on the market. As newer products
are taken into consideration, the older frames of references turn out to be obsolete in
nature and this provides very little contextual information. Although, this is a problem
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 565
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_42
566 G. Shidaganti et al.
that does not have a quick-fix, retrieving domain knowledge is easy with the help
of perspectives in the Education Sector. The control flow perspective deals with the
transcend of information from the source to the output layer, the various parameters
and procedures involved like serialization, parallelism, synchronizations and joins.
The data perspective deals with all the related information that is useful at any instance
of time which can be either be on the business end of things or on the model end of
things. The local variables hold crucial states of data before and after the execution
of the workflow model. The resource perspective provides a significant level of
anchor able support and ties the whole workflow operation together with the help of
fixed syntactic and semantic functionality. The operational perspective deals with the
mapping of higher and lower level elements. This is essential as the business activities
can directly be translated into meaningful applications which is of utmost importance
in education related domains. The main contribution proposed in this paper is to
provide the uphill novel task of the multitude of scenarios ranging from easy to
difficult. A multi-faceted approach with different scenarios of slow, manual, labour-
some tasks and its systematic workflow of execution are validated with substantial
results and inferences. The different critical workflows that are consisdered as a part
of the proposed approach are as follows,
• Sequence
• Parallel Splıt
• Synchronızatıon
• Exclusıve Choıce
• Sımple Merge
• Multı-Choıce
• Synchronızıng Merge
• Multı-Merge
• Dıscrımınator
• N-Out-Of-M Joın
• Arbıtrary Cycles
• Multıple Instances Wıth A priori Desıgn Tıme Knowledge
• Multıple Instances Wıth No Aprıorı Run Tıme Knowledge.
2 Related Work
The flow of data, documents or tasks between the participants and procedural rules
are called as workflow patterns. Models which implement such pattern will have
control programme in-built and will handle scheduling, sequencing of each step
from central location. An advantage of using workflow patterns in analysis is, patterns
will be defined before saving. Hence, they can be called either for modification or
re-execution which allows analysers to abolish typical forms while reusing them
in different scenarios. Existing visual workflow management systems are Galaxy,
Swift, Tarvena, e-Science, Tavaxy, Keplar, Clowdflows, etc.
Task scheduling plays an important role in cloud systems. The problem of task
scheduling is, a task needs to be divided into set of subtasks, and available resources
should be distributed among sets of multiple tasks in such a way that the desired
goal should be met. Performance [2] of cloud systems depends upon task scheduling
algorithms. The referred paper discusses different approaches of task scheduling
based on energy and deadline awareness. Fine-grained and Coarse-grained tasks
together form scientific workflow. Scheduling of tasks to virtual machine introduces
system overhead. If multiple fine grain tasks are executing in a scientific workflow,
then it increases overhead. In order to overcome the scheduling overhead, multiple
small tasks have been combined to one large task, which decreases the scheduling
overhead and improves execution time of the workflow.
The dramatic rise in size, volume and sophistication in cloud services and
resources contributes to their increasing difficulty in monitoring and accessing them.
Developing new strategies for identifying, implementing and handling resources to
ensure the need of quality of services [3] is becoming an area of research commonly
referred to as Resource Orchestration (RO). The increasing complexity of Cloud
services prompted the development of new programming and delivery frameworks.
Major providers are encouraging pattern-based production to create additional value-
added services. This approach attempts to provide complicated systems and infras-
tructure through the integration of simpler ones. E-businesses need to actively alter
business operations, i.e. the processing of reports and activities in a business called
as workflow, in this extremely competitive and evolving world. To help these contin-
ually evolving processes, additional robust workflow management systems [4] are
needed. In this study, an autonomous application model is demonstrated for the
implementation of e-workflow applications.
The workflow management in [5] is made simpler with the help of various tools
technology available today like Kubernetes and terraform. Workflow management
systems like Hyperflow accompanied with the aforementioned technology assures
a complete distributed and non-centralised execution and management workflow. In
[6] the concept of bioinformatic workflow is proposed as well where the scalability
is another major concern as the data is generated rapidly and every recorded data has
a role to play. Though cloud and containers contribute extensively they come with
the inability to work over different cloud providers and difficulty in management
of the tremendous number of containers. The authors of [7] highlights the impor-
tance of workflow management systems in the domains of scientific management.
Over the years, scientists have dealt with the growing technological advancements
and modelling all the requirements has caused several setbacks and troubles. In [8]
568 G. Shidaganti et al.
The workflow patterns [9] offer a spectrum of possibilities from seemingly easy to
futuristically being difficult. To enhance the scope of understanding of these patterns,
it will be divided into a generic description and a related example to understand the
given situation better. For each of the patterns and the specific examples a random
dataset was consisdered using the MongoDB.
Pattern 1: Sequence: A workflow is a sequence of steps in any environment to
achieve a defined goal. These steps are designed to improve performance and ensure
efficiency in a certain order. The end goal of a workflow [10–15] defines the structure,
performance and tracking of the different tasks.
In Fig. 1, placement_eligibilty activity gets triggered after completion of all the
previous activities execution in sequence. SSLC_marks_verification task checks for
Identification of Workflow Patterns in the Education … 569
Fig. 1 Sequence
the particular cutoff in 10th grade, if satisfied the 12th_marks_verification task will
be executed which checks for the cutoff in 12th grade, if the cutoff still holds,
CGPA_verification task checks for CGPA cutoff in current degree and on satis-
fying the required criteria the candidate is given eligibility for the placement by
placement_eligibility task.
Pattern 2: Parallel Splıt: A parallel split pattern refers to a workflow that simul-
taneously executes one or more tasks. The pattern emulates a scenario where-in
parallelization using threads is key. The order in which they are defined is not
specified.
In Fig. 2, the scenario of a student applying for a competitive exam is examined.
When the candidate is applying for competitive exams like JEE or CET, the filled
application form is submitted to the examination portal meanwhile status of the appli-
cation will be notified to the applicant via email or message. Hence, the activation of
the exam_application will trigger the following two activities, application_sent and
status_update, simultaneously using the AND workflow.
Pattern 3: Synchronızatıon: The confluence of more than 1 branch into a primary
branch that when all inputs are turned on, the control thread is passed to the subsequent
branch.
In Fig. 3, Activity SEE_eligibility is activated only after the completion of both
the activities, attendance_eligibility and internal_marks_eligibility. Both these tasks
Fig. 3 Synchronization
570 G. Shidaganti et al.
are processing synchronously and are influencing the final task SEE_eligibility.
According to the example in order to get eligibility for SEE (semester end examina-
tion) student should have 85% of attendance and secured at least 30 out of 50 marks
in internals.
Pattern 4: Exclusıve Choıce: Division of a particular branch into sub-branches
so that, if the approaching branch is triggered, the control is quickly transferred to
exactly one single branch. This is based on the procedure that any one of the outgoing
splits can be chosen.
In Fig. 4, Activity supplementary_eligibility is implementing exclusive choice
and hence only one of the activities will be triggered. In this example, supplemen-
tary_eligibility task is checking the eligibility of the student to take up Supplementary
exam based on the number of subjects not cleared. If the count is lesser than or equal
to two then write_supplementary task will be invoked and student will be allowed to
write supplementary exam else year_backlog will be activated and student will not
be allowed to write supplementary.
Pattern 5: Sımple Merge: It is the unification of more than 1 single branch into
a resultant, subsequent of the same. On activation of the approaching branch, the
control thread is transferred to the subsequent branch.
In Fig. 5, the task reciept_generate will be activated either by completion of
activity online_payment or by offline_payment. According to the example student
can pay the fees through online or offline mode and after completion of payment the
receipt will be generated.
Fig. 6 Multi-choice
572 G. Shidaganti et al.
Fig. 8 Multi-merge
any instances once the first instance is up and running. If the concept of multi-merge
is absent from the loop, generic design pattern ensures that the activity instances are
multiplied and are thus followed in the workflow model.
The previous scenario dealt with multiple instances of the SIS_updation, for each
incoming transitions that is, Paper_publication, NPTEL_exam and Online_quiz.
In Fig. 9, in order to overcome the above stated problem, in here, the SIS_Updation
activity is replicated with respect to each incoming transition which would result in
the single instantiation of SIS_updation for each incoming transition but happen
concurrently.
Pattern 9: Dıscrımınator: The unification of more than one single branch into
one concurrent branch after an earlier canonical departure in the system template,
so that the control is transferred to the succeeding unit when the primary arriving
unit is allowed. Subsequent branch allowing is not correlated with the passing of the
control thread in any manner.
In Fig. 10, the review process of a research paper can be considered to explain
the discriminator workflow. The review process will involve multiple sub-processes
and three such sub-processes, plagiarism check, standard conformation, domain
aptness, are considered here. The subsequent activity accept_paper will be trig-
gered if and only if the sub-processes, plagiraism_check, standard_conformation
and domain_apt, are enabled and return positive responses.
574 G. Shidaganti et al.
Fig. 10 Discriminator
Identification of Workflow Patterns in the Education … 575
So, activating all functional incoming divisions, the link build resets. The join takes
place in a fixed, cohesive and structured manner, i.e. in the finite model where-in the
join is present, there must should be one Parallel Split construction beforehand and
it must unify all the units emerging out of the same.
In Fig. 12, a final year BE student is given the facility of choosing two subjects
out of three subjects as open electives. The selection of the respective subjects by
the student can be implemented using the N-out-of-M join. In Fig. 13, once the
open_elective is triggered, the availability of the multiple subjects, DS, CC and DL,
for each student is provided by the AND workflow. The selection of any two subjects
out of the three options provided will lead to the subsequent activity of SIS_updation.
Largely, the work flow models do not have features that ensures the quick and
easy solution of the N-out-of-M-Join. However, the combined results of pattern 3
and 9 spells wonder and one can truly obtain the necessary results of the same. One
disadvantage being, the model becomes awfully advanced and tougher to understand.
In Fig. 10, the following implementation AND workflow is used at three levels. The
activation of the open_elective activity will trigger the first level AND which is
used to depict the availability of the three subjects. AND at second level is used
to form all the possible combinations such as DS_CC, DS_DL and DL_CC, that a
particular student can opt for. All the possible combinations formed are provided
to the discriminator using the third level AND, which will in return activate the
SIS_updation.
5 Structural Patterns
and engineering colleges to the respective student based on the respective conditions
(acquired rank). The above process will result in the activation of the allocate_eng
and allocate_med processes. The result of the above two processes will result into
the activation of final_list providing a single final list of colleges using the MULTI-
MERGE workflow. Using XOR workflow, the student can accept the allocated college
if he is satisfied and exit the process, activating the terminate_process transition. Else,
if the student is not satisfied with the allocated college then he/she can either withdraw
from the entire CET process and exit or proceed with the next round of counselling.
This can be achieved using another XOR workflow where either withdraw_process
or continue_process is triggered based on the condition whether the user wants to
proceed with process or not. The continue_process will result in iterative cycles. The
amorphous presence of the Arbitrary Cycles patterns is laborious to keep in some of
the BPM offerings, most likely those that abide by structured principles. Situations
invoke the likelihood of revolutionizing process models having Arbitrary Cycles as
structured elements.
In Fig. 15, the structured cycle is similar to the previous process till the final list of
colleges is generated. Let ϕ be the parameter stating that the student is satisfied. If the
student is satisfied, then the parameter Θ is set to TRUE else for further computations
the parameter Θ is set to the value that the student wants to discontinue the process.
Either of the two assigned values will be passed on to MULTI-MERGE. Further
computations are carried as follows based upon the Θ and ϕ values using XOR
workflow. Θ set to TRUE indicates ϕ being set to satisfied resulting in the student
accepting the offer and terminating the process, triggering terminate_process. If the
Identification of Workflow Patterns in the Education … 579
student wants to discontinue and is not satisfied with the allocation(−ϕ) then the
student withdraws from the entire process, activating withdraw_process, else the
student proceeds to continue with further rounds of Counselling (−Θ).
Pattern 12: Implıcıt Termınatıon: If there are not any left-over elements or
residues of work, present either now or relatively in the future, a special system
should be construed to ensure that the system is not in deadlock state. There are
normative ways to determine the successful completion of the system.
In Fig. 16, fee_payment is followed by two subtasks, bank_update and
college_update. In bank_update subtask bank database is updated and fee receipt
is generated by the activation of the process receipt_generate in sequential manner.
Second subtask updates the database of college with the amount paid by student as
fee, activating the college_update process. Both the subtasks are independent of each
other and terminate right after their functioning. As per the example the first subtask
terminates right after generating fee receipt and second task terminates after updating
college database. Either of the process is not terminated upon the termination of the
other process.
Pattern 13: Multiple Instances with a Priori Design Time Knowledge: More than
one domain of a function can be generated in a given system example. The number
of instances needed is known at the time of development. Such cases are separate
and operate at the same time. Before any additional activities can be activated, the
task instances must be coordinated at completion.
In Fig. 17, activity result_in _progress is followed by an activity of updating the
result (subject_update) for each subject. At the time of result estimation (design)
the number of instances which are enabled will be known as the number of subjects
the student has registered for is known beforehand. In this example, the student is
assumed to have registered for six subjects, hence six instances are enabled. Comple-
tion of all the enabled tasks will lead to the activation of result_announcement. In
simple terms before announcing the final results, result of each subject has to be
updated, which takes place concurrently.
Pattern 14: Multiple Instances with a Priori Run Time Knowledge: Numerous
types of a function can be generated in a given system example. The possible occur-
rences needed can be related to many runtime variables, inclusive of data in sentential
state, asset accessibility and within-process interaction, but it is understood prior to
the function values need to be generated. These instances, once begun, are self-
sufficient from the rest and execute at the same time. At completion, the instances
must be synchronized earlier to any consequent functions can be invoked.
In Fig. 18, connectivity to college Wi-Fi is depicted in the flow chart. Multitude
of instances with prior run time information workflow initiate’s the calls depending
on number of factors like the network bandwidth, number of devices on the network,
latency, etc. The fact that the connection can be established or not is not known in prior
since the established connections at a particular instance of time varies dynamically
and can be known only at run time. Before a new connection is established the
constraint is checked, that is number of connections already established, at run time. If
the count is below the limit, new connection is granted and the number of connections
spot registration, if the resource are available it is allocated else additional resource
are bought in and allocated as in function add_additional_systems. Only when the
registration closes the competition is initiated which is the next subsequent task.
Pattern 16: Multiple Instances with Synchronization: Several instances of a
task can be generated in a given system example. Such cases are equally exclusive
and operate separately. There is no need to synchronize them when they are done.
Each instance of the numerous value activities imbibed has to execute immediately
within the contextual information retained. They must be independent of each other
and should not be referentially tied to each other.
In Fig. 20, library fine is used to showcase the multi instances requiring synchro-
nization workflow. In the flow chart r stands for return date of the book that is when
the book is returned, e stands for due date which implies the date after which the
fine will be charged. Initially, days and due_amount are 0. Days are computed by
subtracting r and e that is number of days the fine is charged which is known only
during runtime and these many instances of the function add_delay_charges are
initiated. The final due_amount is computed by synchronizing (adding) the results
of all the instances of add_delay_charges where the due_amount is set to the charge
amount of the library for per day delay by each instance.
effective solutions for the same. To further understanding, every pattern has also been
explained diagrammatically. The content presented in this paper can help facilitate
more work in this field and related domains.
References
1. Prerana KA, Sadashiv N (2017). A study of workflow management systems in the cloud
environment. In: 2017 International conference on energy, communication, data analytics and
soft computing (ICECDS). IEEE, pp 2262–2267
2. Kaur S, Aggarwal M (2018) Extended balanced scheduler with clustering and rep-lication for
data ıntensive scientific workflow applications in cloud computing. J Electron Res Appl 2(3)
3. Amato F, Moscato F (2017) Exploiting cloud and workflow patterns for the analysis of
composite cloud services. Futur Gener Comput Syst 67:255–265
4. Ndeta J, Katriou S, Siakas, K (2015) An approach to E-workflow systems with the use of
patterns. Int J Entrepreneurial Knowl 3.https://doi.org/10.1515/ijek-2015-0007
5. Orzechowski M, Balis B, Pawlik K, Pawlik M, Malawski M (2018). Transparent deployment
of scientific workflows across clouds-kubernetes approach. In: 2018 IEEE/ACM ınternational
conference on utility and cloud computing companion (UCC Companion). IEEE, pp 9–10
6. Moreno P, Pireddu L, Roger P, Goonasekera N, Afgan E, Van Den Beek M, He S,
Larsson A, Schober D, Ruttkies C, Johnson D (2018) Galaxy-Kubernetes integration: scaling
bioinformatics workflows in the cloud. BioRxiv, 488643
584 G. Shidaganti et al.
Abstract COVID-19 is quickly gaining popularity across the globe. By April 14,
2020, 128,000 individuals had been killed by COVID-19, and 1.99 million inci-
dents had been recorded in 210 countries and regions, totaling 219.747 cases. The
rapid spread of the virus throughout the globe has resulted in a severe shortage of
medical test kits in many parts of the world, particularly in Africa. A chest X-ray may
prove to be a more successful screening method in certain situations than thermal
screening of the whole body, due to the fact that the respiratory system is the most
susceptible area in a human’s body to infection. Lung segmentation is the initial
stage in identifying diseases using a chest x-ray picture. We describe a method for
segmenting the lung region from CXR images that is based on the Euler number
thresholding approach, i. When compared to current state-of-the-art methods, the
suggested method demonstrates superior accuracy and performance.
1 Introduction
P. A. Shamna (B)
Electronics and Communication Engineering, KMCT College of Engineering, Kozhikode, Kerala,
India
A. T. Nair
KMCT College of Engineering, Kozhikode, Kerala, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 585
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_43
586 P. A. Shamna and A. T. Nair
2020, 128,000 individuals had died from COVID-19, and 1.99 million instances
had been recorded in 210 countries and territories, for a total of 219.747 cases.
COVID-19 is an infectious illness caused by the corona virus, which was recently
discovered and recognized. It was not found until December this year, when an
epidemic occurred in Wuhan, China [6]. The most often seen symptoms of COVID-
19 are fever, exhaustion, and a dry cough. The vast majority of patients (about 80%)
recover completely without additional therapy. According to the Centers for Disease
Control and Prevention, about one out of every six people infected with COVID-19
becomes very sick and has difficulty breathing. Senior individuals and those who have
chronic medical conditions such as hypertension, heart disease, or diabetes are more
likely than the general population to suffer from serious diseases [7], according to
research. COVID-19 virus is mostly spread via association with respiratory droplets
instead of through the air, according to current knowledge. Pneumonia is the term
used to describe an infection of the lungs caused by a kind of acute respiratory illness.
The lungs are composed of tiny air pockets known as alveoli that fill with oxygen as
a healthy person breathes in and out. When a person develops pneumonia, the alveoli
become blocked with pus and blood, causing discomfort when breathing and raising
the body’s oxygen absorption rate. Symptoms associated with pneumonia include
fever, difficulty breathing, and fatigue. Pneumonia infections can be transmitted in
a variety of ways [8, 9].
2 Literature Review
Hubel and Wiesel [10] the 1959 publication on single-neuron receptive fields is
widely regarded as a seminal work in computer vision [11], since it outlines the
key response characteristics of visual cortex neurons and the mechanisms by which
a cat’s sensory experience changes its cortical architecture. Roberts [11] published
a paper in 1963 describing a technique for obtaining three-dimensional data from
two-dimensional pictures of solid objects. Simply stated, the external world has been
reduced to a collection of geometric shapes that are flat on the surface. According to
[12], the discovery that vision is hierarchy was made in 1982. The vision system’s
primary role is to produce three-dimensional world representations with which the
user may interact. Early on in the development of a perception system, low-level
algorithms for line detection, curve detection, and corner detection were used as
stepping stones to a high-level understanding of visual information [13].
Simultaneously, this paper describes the construction of a self-organizing network
simulator comprising simple and complicated cells capable of pattern recognition
and that is not influenced by changes in location. Numerous convolutional layers
were used in this example, with weight vectors serving as filters in their receptive
fields. Following the completion of correct calculations, the filters were intended
to generate activating events that would be used as inputs for future layers of the
network in order to function properly. Various commercial text recognition and zip
code decoding programs have been launched [14], with the most recent being text
Detection of COVID-19 Using Segmented Chest X-ray 587
recognition plus. After everything was said and done, the MNIST data collecting
system, which utilized handwritten numbers, was viable to create. Around the year
1999, a lot of scholars concentrated their efforts on identifying artifacts based on
their physical characteristics [15].
Species-specific traits that are unchangeable in terms of rotation and position,
and to a lesser degree, orientation, changes in light have been created to assist in
the recognition of objects in a visual recognition system. The first real-world facial
recognition program was implemented a few years later, in 2001 [16]. Despite the
algorithm’s lack of attention on deep learning, it has figured out which characteristics
assist in facial recognition. A standardized picture collection, as well as a set of
common evaluation criteria, was seen as being immediately necessary when the field
of computer vision first started to take form, and the group set out to create these as
soon as they could.
It was established in 2010 that the ImageNet Massive Visual Recognition Compe-
tition (ILSVRC) would be held. During the event, the most inventive submissions are
judged for this award, which is given out on an annual basis. With over one million
images, ImageNet has established itself as the gold standard for categorizing and
characterizing objects across a wide variety of object categories, and it continues to
do so. On average, the ILSVRC error rate in picture description was approximately
26% during 2010 and 2011.
In 2012 [17], researchers at the University of Toronto created a convolutional
neural network that can recognize faces had a 16.4% mistake rate when completing
picture identification tests. CNN’s history has been transformed by this incident.
Microsoft research paper [17] has achieved remarkable achievements in the fields
of object detection and identification, as well as in the tasks of localization, via the
use of its Residual Network, or ResNet. When applied to the ImageNet test combo,
this combined effect of residual nets resulted in a 3.57% inaccuracy when compared
to the baseline. This achievement earned the team first place in the 2015 International
Laser Scan and Classification Competition, which took held in Germany (Table 1).
3 Proposed Method
In machine learning, transfer learning is a subset in which a method that has previ-
ously been developed for one task is replicated as a starting point for a different
job [23]. It is a kind of machine learning method in which a method that has been
developed for one task is utilized as a starting point for the next activity, as opposed
to traditional machine learning. It is not planned to publish the present research
owing to the fact that the dataset is too tiny to provide significant findings. For the
purpose of achieving exceptional outcomes, the method makes use of an existing
neural network that has been trained on a bigger dataset. It is thus being used as the
foundation for a new model that takes use of the accuracy of the previous network in
order to achieve a particular objective. Because of a number of factors, this method
for optimizing the outputs of a neural network trained on a short dataset has acquired
increasing popularity in recent years, including its effectiveness. Image classification
tasks were carried out with the help of ResNet-50 (Fig. 1). Initially, it was trained
using the ImageNet data set, which included about 3.2 M images.
With the use of transfer learning and the data set that was acquired, both previously
learned architectural models were re-trained and fine-tuned. ResNet-50 is designed
in the same manner as ResNet-34, and it is divided into five stages. However, each
convolution block is composed of three convolution layers, for a total of 23.52 million
trainable parameters.
The original image, represented by the equation, is formed by the union of all
subregions.
I = R1 ∪ R2 ∪ R3 .... ∪ R N (2)
For every i = 0,1,2, … ,N, the regions Ri should be connected, and each area Ri
should be homogeneous. Disjointness between adjacent regions Ri and Rj should be
maintained, i.e.,
Ri ∩ R j = ∅ (3)
Several research groups have focused their efforts during the last decade on
the segmentation of the lung fields in chest X-rays. Numerous solutions have
been proposed. There are various generalized classifications of solutions, including
rule-based approaches, pixel classification-based methods, deformable model-based
Detection of COVID-19 Using Segmented Chest X-ray 591
where E(t) is the threshold value that was used to generate the binary picture from a
gray level image, q1 is the number of 2 × 2 matrices in the image that include a single
1 and the rest 0’s, and q2 is the number of 2 × 2 matrices in the image that contain a
single 1 and the remainder 0’s. There are four distinct matrices from which the value
of q1 may be determined. q3 denotes the number of 2 × 2 matrices included inside
the picture, with three 1’s and one 0’s in each row and column. There are four distinct
matrices that might be counted as q3 . The letter qd is used to represent the number
of diagonal 2 × 2 matrices. There are two different sorts of qd matrices that may be
constructed. The Euler number E is computed for each given binary picture. In the
instance of a chest X-ray, it is anticipated that the segmentation technique will be
utilized to divide the picture into two lung regions. As a consequence, the expected
Euler number is 2, owing to the expected number of linked components being two
and the expected number of holes being zero. As a result, the Euler number should
be anticipated to be two. The formula for computing Euler’s number E is as follows.
E = c−h (5)
592 P. A. Shamna and A. T. Nair
It has been proven that a graph with unique threshold values on the X-axis and
related Euler numbers on the Y-axis can exhibit fading exponential behavior for
a given picture [28, 29]. As a consequence, a matching threshold value may be
determined for a given Euler number. As a consequence of this observation, the
second possibility has been ruled out; thus, this a singleton set that contains a single
value, which is the required threshold value. The Lung Segmentation Algorithm
consists of the following steps:
1. Following grayscale acquisition, it is required to convert the grayscale picture to
a binary image. As seen in Eq. 4, a binary image B is generated by transforming
an input image I to a binary image B.
2. The Euler number-based thresholding approach is used to calculate the value
of T. A certain chest X-ray is related with the threshold for Euler number 2,
indicated as T. In order to determine the Euler number, use the formulae E =
C−H, where E indicates the Euler number, C the number of linked components,
and H represents the number of holes. In the CXR that has been given, there
are two linked components without holes. E = 2 as a result.
3. Eliminate the black zone that occurs in the CXR image’s four corners using the
Breadth First Search Algorithm.
4. Apply an erosion and dilution technique to the lung borders, smoothing them
out with the aid of a disk as a structural element.
5. Erosion is a process in which a structuring element S does an operation on a
binary image B that results in data loss (denoted B S). It generates a new binary
image, Be = B S. Erosion is the process by which a layer of pixels is removed
from the inner and outer boundaries of a region of interest [30].
6. A structural element S performs a dilation operation on the image B. The
following is a definition: Dilation has the polar opposite effect on the land-
scape as erosion does. When dilation is utilized, a layer of pixels is added to
both the inner and outside boundaries of areas.
7. Using the boundary acquired in step 6, initialize the snake’s points using a
random point selection technique. On the boundary established in step 6, apply
a random point selection procedure.
8. Each control point’s snake energy is reduced to its simplest form. The greedy
snake believes that by minimizing energy at each control point, and the overall
quantity of energy is reduced.
9. Examine the acquired picture to aid in the diagnosis of a variety of diseases.
Adam Optimizer. As previously stated, a batch size of 128 with a learning rate of 0.1
was used to train the model for a total of 10 epochs.
4 Experimental Results
4.1 Training
4.2 Evaluation
The confusion matrix (Fig. 5) and receiver operating characteristic curves (Fig. 6)
depict the performance of the system (Table 2). Confusion matrix chart generates a
594 P. A. Shamna and A. T. Nair
confusion matrix chart object from a confusion matrix chart with true and anticipated
labels. The actual class is represented by the rows of the confusion matrix, while the
expected class is represented by the columns of the confusion matrix. Observations
that were properly classified are represented by diagonal and off-diagonal cells,
respectively, whereas observations that were erroneously classified are represented
by off-diagonal cells.
The multiclass receiver operating characteristic (ROC) curve revealed that the
ResNet 50 model performed very well, attaining statistically significant features
(Table 2). This result established that the proposed technique was the most precise,
Detection of COVID-19 Using Segmented Chest X-ray 595
with a false positive rate of just 0.60. (misclassification). Thus, the ResNet 50 CNN
can enhance the categorization of COVID-19 images.
5 Conclusion
In this paper, we present a technique for detecting the COVID-19 virus by analyzing
X-Ray pictures that we have generated. Additionally, the method developed differen-
tiates between people suffering from pneumonia and those suffering from COVID-
19, a distinction that is necessary since the symptoms of both diseases are similar
and patients often confuse the two. A COVID-19 test kit is much more costly than
utilizing an X-ray to identify the presence of COVID-19, and it is not nearly as quick
as current thermal imaging methods. This implies that airports, hotels, and retail
malls may all utilize it for basic screening. The authors believe that their study will
inspire other researchers to create other methods for detecting potential COVID-19
infection that do not rely on medical COVID-19 test kits. COVID-19 detection with
segmented CXR has a higher detection rate than in prior examples. The suggested
hybrid lung segmentation approach may be used to estimate heart boundaries. It has
the potential to be used as a screening tool for lung disease. We hope that our study
inspires other researchers to create other methods for identifying viral infection in
patients that do not need the use of COVID-19.
References
5. Carlos WG, Dela Cruz CS, Cao B, Pasnick S, Jamil S (2020) Novel Wuhan (2019-nCoV)
Coronavirus. Am J Respir crit Care Med 201(4):P7–P8
6. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen
HD (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin.
Nature 1–4
7. Lai CC, Shih TP, Ko WC, Tang HJ, Hsueh PR (2020) Severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and
thechallenges. Int J Antimicrob Agents 105924
8. Ruuskanen O, Lahti E, Jennings LC, Murdoch DR (2011) Viral pneumonia. The Lancet
377(9773):1264–1275
9. Bartlett JG, Mundy LM (1995) Community-acquiredpneumonia. N Engl J Med 333(24):1618–
1624
10. Marrie TJ (1994) Community-acquired pneumonia. Clin Infect Dis 18(4):501–513
11. Lee JY, Yang PC, Chang C, Lin IT, Ko WC, Cia CT (2019) Community-acquired adenoviral and
pneumococcal pneumonia complicated by pulmonary aspergillosis in an immunocompetent
adult. J Microbiol Immunol Infect Weimianyugan ran zazhi 52(5):838
12. Su IC, Lee KL, Liu HY, Chuang HC, Chen LY, Lee YJ (2019) Severe community-acquired
pneumonia due to Pseudomonas aeruginosa coinfection in an influenza A (H1N1) pdm09
patient. J Microbiol Immunol Infect 52(2):365–366
13. Hubel DH, Wiesel TN (1959) Receptive fields of singleneurones in the cat’s striate cortex. J
Physiol 148(3):574–591
14. Roberts LG (1963) Machine perception of three-dimensional solids. Doctoral dissertation,
Massachusetts Institute of Technology
15. Marr D (1982) Vision: a computational investigation into the humanrepresentation and
processing of visual information
16. Fukushima K (1988) Neocognitron: a hierarchical neural network capable of visual pattern
recognition. Neural Netw 1(2):119–130
17. LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990)
Handwritten digit recognition with a back-propagation network. In: Advances in neural
information processing systems, 396–404
18. Lowe DG (1999, September) Object recognition from local scaleinvariantfeatures. In iccv
99(2):1150–1157
19. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2): 137–154
20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learningfor image recognition. In:
Proceedings of the IEEE conference oncomputer vision and pattern recognition, pp 770–778
21. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009, June) Imagenet: a large-scale hierar-
chical image database. In 2009 IEEE conference on computer vision and pattern recognition,
pp 248–255. IEEE
22. Behera L, Kumar S, Patnaik A (2006) On adaptive learning rate that guarantees convergence
in feedforward networks. IEEE Trans Neural Networks 17(5):1116–1125
23. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXivpreprint arXiv:1212.5701
24. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–
1359
25. Li G, Müller M, Thabet A, Ghanem B (2019) Can GCNs goas deep as CNNs? arXiv preprint
arXiv:1904.03751
26. Lakhani P, Sundaram B (2017) Deep learning at chest radiography: automated classification of
pulmonary tuberculosis by using convolutional neural networks. Radiol Soc North Am 16–26
27. He L-F, Chao Y-Y, Suzuki K (2013) An algorithm for connected-component labeling, hole
labeling and Euler number computing. J Comput Sci Technol 28(3):468–478
28. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep
convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005
(29 pages). World Scientific Publishing Company. https://doi.org/10.1142/S02195194215
00056
598 P. A. Shamna and A. T. Nair
29. Nair AT, Muthuvel K Blood vessel segmentation and diabetic retinopathy recognition: an intel-
ligent approach. Computer methods in biomechanics and biomedical engineering: imaging &
visualization. Taylor & Francis. https://doi.org/10.1080/21681163.2019.1647459
30. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diag-
nosis of diabetic retinopathy. Int J Image Graph 20(4):2050030 (29pages). World Scientic
Publishing Company. https://doi.org/10.1142/S0219467820500308
31. Nair AT, Muthuvel K, Haritha KS (2020) “Effectual evaluation on diabetic retinopathy”
publication in Lecture Notes. Springer
32. Nair AT, Muthuvel K, Haritha KS (2021) “Blood vessel segmentation for diabetic retinopathy”
publication in the IOP. J Phys Conf Ser (JPCS). Web of Science
33. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenetclassification with deep convolutional
neural networks. In: Advances in neural information processing systems, pp 1097–1105
34. Punia R, Kumar L, Mujahid M, Rohilla R (2020) Computer vision and radiology for COVID-
19 detection. In: 2020 international conference for emerging technology (INCET) Belgaum,
India, 5–7 Jun 2020
A Dynamic Threshold-Based Technique
for Cooperative Blackhole Attack
Detection in VANET
Abstract VANET is a highly dynamic network, where the vehicles frequently move
around various locations. Due to its rapidly changing topology and unreliable security
infrastructure, the routing protocols in VANET are vulnerable to several attacks such
as DDoS attacks, blackhole attacks, and wormhole attacks. This paper focuses on a
cooperative blackhole attack where several malicious nodes collaborate to execute
the attack. The attacker nodes drop all packets they receive. We present a security
technique to detect the cooperative blackhole attackers by analyzing the dropped
packets at each node. Using linear regression to determine the packet drop threshold
helps our proposal to improve its accuracy further. The simulation results show that
our proposed system provides a high detection accuracy of 99.78% and false positives
limited to 0.025%.
1 Introduction
VANET is a vehicle network whose principal goal is to ensure safe driving and
efficient traffic flow. It is a part of intelligent transport systems (ITS). The com-
munication in VANET occurs in two modes, vehicle-to-vehicle (V2V) and vehicle-
to-infrastructure (V2I) communication. VANET uses multi-hop intermediate nodes
to transfer the messages among the vehicles outside the communication range [1].
VANET has several complex features such as high density, dynamic topology, rapid
changes in the environment, interference, mobility, short-lived connections, etc., that
leads to packet loss in the network. The routing protocols in VANET must consider
these factors to improve the network performance. The well-known routing protocols
in VANET include AODV, DSR, DSDV, and OLSR [2]. Evaluating the efficiency of
existing routing protocols shows that the AODV has a better performance in VANET
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 599
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_44
600 P. Remya Krishnan and P. Arun Raj Kumar
scenarios concerning its end-to-end delay, throughput, packet delivery ratio, and
latency [3].
The VANET routing protocols should be robust against attacks. Hence, finding a
secure and suitable routing path in the rapidly changing topology of VANET becomes
a critical issue [4]. The blackhole attack is a severe packet-dropping attack in VANET,
as it discards all data packets that pass through it without forwarding them to their
intended destination. As a result, the critical messages may not reach the destination
on time, and the VANET safety application may fail [5]. Depending on the count of
attacker nodes that execute the attack process, a blackhole attack can be either single
or cooperative [6]. This paper suggests a method for secure routing in VANET using
AODV by eliminating the cooperative blackhole attack in VANET [7].
The following are our contributions in this paper:
1. We propose a security mechanism for detecting cooperative blackhole attacks in
VANET based on dropped packet analysis at each node.
2. Determination of dynamic packet drop threshold in the network using linear
regression.
3. A less complex detection algorithm for blackhole attacker nodes takes O(cn)
time, where ‘c’ is the observation time, and ‘n’ is the number of vehicles in the
RSU range.
The remaining part of this paper is structured as follows: Sect. 2 discusses the existing
works. Section 3 goes into detail about the proposed detection method. Section 4 gives
the simulation setup and analysis. We conclude the paper in Sect. 5.
2 Related Works
Several works have been done to ensure the quality of routing in VANET. Hortelano
et al. [4] proposed a watchdog-based blackhole detection mechanism for VANET. In
this approach, each sender node verifies whether the receiver retransmits the packet or
not. Each vehicle keeps a trust value for its neighbor vehicles. Moreover, if a neighbor
node drops packets higher than a predefined threshold, it is labeled malicious. The
researcher has proved that the solutions proposed for routing attacks in MANET are
also applicable to the VANET scenario. Banarje [6] introduced an approach to identify
cooperative blackhole/grayhole attacks in VANET by monitoring the neighbor traffic.
The source sends a prelude message before transmission, and the destination responds
via an acknowledgment in the postlude message. Each node keeps a list of blacklisted
nodes. The approach is well-explained but lacks simulation results and performance
analysis. In [7], a customized algorithm is introduced to guarantee the security and
performance of the AODV in VANET toward the blackhole and grayhole attacks.
The approach identifies attacker nodes according to their behaviors and deletes them
from the routing process.
Ramaswamy et al. [8] implemented a technique to evict cooperative blackhole
nodes. It presents a trust-based algorithm that uses data routing information (DRI)
A Dynamic Threshold-Based Technique … 601
information for monitoring the trusted nodes. The approach is not applicable in gray-
hole attack detection due to its reliance on trust values. Agrawal et al. [9] presented
an approach for detecting cooperating attacker nodes. The approach is based on a
backbone node called a vital node that is considered trusted. The strong nodes are in
charge of detecting the malicious nodes. The approach fails when the vital node is
compromised.
Wahab et al. [10] also detected the blackhole nodes using watchdogs. The method
works in five phases. The reputation phase is to calculate vehicle reputation values.
The watchdog phase is for monitoring. The voting phase is to collect data, the fourth
phase is for reliability check, and the fifth phase is for information propagation.
The outcome is a list of blackhole attackers. In [11], the performance of blackhole
mitigation techniques in MANET is analyzed for grayhole attacks. A multi-attack
detection system for blackhole and grayhole attacks in MANET is implemented by
Ali et al. [12], it chooses watchdog node based on connected dominated set concept.
In this approach, before adding a node into the watchdog set, its energy and non-
existence in the blacklist are cross-checked.
Ankit et al. [13] developed a modified AODV protocol to eliminate blackhole
attacks. The RREQ and RREP packets are modified for routings. It also uses a cryp-
tographic mechanism for verification. The performance of the approach is demon-
strated using NS2-based simulations. Ankit et al. [14] implemented another tech-
nique to minimize the causes of the blackhole attack by finding alternative paths to
the destination or using the sequence number in packets.
Most of the existing approaches mentioned here detect/prevent the blackhole
attack by bringing modifications to the AODV protocol or using an additional back-
bone node to perform the watchdog mechanism. However, our proposed approach
does not need any additional infrastructure or changes to the existing protocols.
Instead, the blackhole nodes are detected using a dynamic threshold determined for
packet drop at each node.
This section presents the attack model and assumptions. Then, we discuss the pro-
posed methodology for detecting the cooperative blackhole nodes in detail. Figure 1
shows the flow diagram of our proposed detection approach.
This approach considers the attack scenario where multiple blackhole attacker
nodes cooperate themselves to drop the packets. The routing is done based on the Ad-
602 P. Remya Krishnan and P. Arun Raj Kumar
Hoc On-Demand Distance Vector (AODV) protocol. In AODV, the network topology
is not maintained in the routing table all the time. Instead, the best suitable path to
the destination is selected only when the source makes a request. Hence, due to the
dynamic behavior and rapidly changing topology of VANET, AODV is suitable for
its environment [1].
Figure 2 shows an attack scenario in VANET with multiple blackhole nodes. In
this scenario, the vehicles labeled A, B, F, and D are normal nodes, and vehicles
‘E’ and ‘C’ are the blackhole attacker nodes. The attacker nodes place themselves
in positions such that they receive the maximum traffic from the normal nodes. The
RREQ and RREP indicate the flow of route requests and route response packets sent
between the vehicles.
When source A requires a route to destination D, it makes a route request (RREQ)
to its neighbor nodes. Upon receiving this RREQ request, the blackhole attacker
A Dynamic Threshold-Based Technique … 603
nodes C and E along the path to destination D forge a false route response (RREP)
with a high sequence number and less hop count value. This forged RREP is sent
back to source A. Thus, source A selects either of the forged routes A-E-D (2 hops)
or A-C-D (2 hops), discarding the RREP from the original neighbor node B with
route A-B-F-D (3 hops) [15]. When the route is created, source A sends the data
packets to D, and the consequence is that the blackhole nodes E or C may drop all
received packets.
The detection mechanism we propose aims to identify these attacker nodes ‘E’
and ‘C’ based on their packet drop rate. According to our mechanism, the RSU
observes all vehicles in their range and records their packet drop values. The RSU
then computes a threshold for the tolerable packet drop value and compares the
packet drop at each node. The nodes that drop packets above this threshold will be
removed from the network, considering them as the attacker nodes. The following
are the assumptions made in our proposed approach:
• Assumption 1: Every node in range of an RSU broadcasts a beacon message that
contains the vehicle ID.
• Assumption 2: The incoming and outgoing traffic of each node is visible to RSU.
In a cooperative blackhole attack, the attackers will drop every data packet they
receive that affects the network’s performance. Here we present a mechanism to find
the blackhole nodes in VANET based on the mean of dropped packets measured
from the vehicular nodes over an observation period. The RSU is responsible for
performing the detection process for all vehicles in its range. From Fig. 2, suppose
the source node A establishes a path to destination D via the blackhole node E,
i.e., Route (A–E–D), and starts sending data packets. Then, the RSU performs the
following steps:
1. Calculate the dropped packets at each vehicle
Consider a vehicle Vi , where the packets are either received properly or dropped
after the reception. Also, the packets may be sent properly or get dropped while
sending. The RSU calculates the packet drop rate PDi of a vehicle Vi using Eq. 1
[16].
(packetRD + packetSD )
PDi = (1)
(packetRC + packetSC + packetSD )
where,
PDi —Packet Drop (PD) at vehicle Vi
packetRD —Packets received but dropped.
packetSD —Packets sent but dropped.
packetRC —Packets received correctly.
packetSC —Packets sent correctly.
604 P. Remya Krishnan and P. Arun Raj Kumar
Vehicle_Count
d= (2)
Segment_Len
Here, Vehicle_Count is the number of vehicles an RSU can hear within the density
estimation period. Segment_Len is the length of the road occupied by the vehi-
cles. Thus, for an estimated density D and mobility M, the RSU can determine
the Permissible packet drop thresholds (δ1 and δ2 ) based on linear regression as
follows:
δ1 = s1 + b1 D (3)
δ2 = s2 + b2 M (4)
Here, s1 and b1 are the slope and intercept of the decision boundary in the density-
packet drop plane, and s2 and b2 are the slope and intercept of the decision
boundary in the mobility-packet drop plane. The above parameters can be learned
using data from our simulation experiments.
First, we run several simulations with various traffic densities and node mobilities
and record all measured packet drops for verification. Then, these values are
used as training data to identify the optimal decision boundaries. Figures 3 and
4 show the obtained results. After training, we obtain the parameters, Slope s1
and Intercept b1 from node density-packet drop plane in Fig. 3 and Slope s2 and
Intercept b2 from node mobility-packet drop plane in Fig. 4.
A Dynamic Threshold-Based Technique … 605
The Blackhole_list containing the IDs of attacker vehicles is broadcast to all vehicles
in the range of the RSU. Upon receiving this list, the genuine vehicles will discard
these attacker nodes from their routing choices.
The proposed algorithm’s complexity is calculated as O(cn), where c is the obser-
vation time, which is assumed to be constant, and n gives the number of vehicles in
RSU range. As a result, our algorithm’s complexity is linear, as it grows in direct
proportion to the size of n.
6: if (P Di ≥ δ1 or P Di ≥ δ2 ) then
7: Blackhole_list = insert(V ehiclei )
8: end if
9: end for
10: end for
11: Broadcast the Blackhole_list to all vehicles in the network.
12: Eliminate the vehicles in Blackhole_list from routing.
We simulated the VANET environment using SUMO traffic simulator version 0.32.0
with Open Street Map [20] and the entire simulation is ported to version 2.34 of NS-2
[21] for packet tracing and network animation. Table 1 shows the simulation parame-
ters. Individual simulations were run by varying the node density and mobility in the
network in which 10% of nodes were set as malicious blackhole nodes that performs
packet drop. The entire simulation is run for 100 s and performs the detection every
10 s. The observation period of 10 s is determined by conducting several simulation
experiments.
A Dynamic Threshold-Based Technique … 607
• False Positive Rate (FPR): The normal nodes falsely found as blackhole nodes.
Here, true negative (TN) gives the genuine vehicles correctly classified [22].
FP
FPR = ∗ 100 (6)
FP + TN
and the false positive rate decreases. To justify, initially, there are only a few nodes.
Hence, the packet drop at normal nodes will be less. So the approach can easily
separate the attackers from genuine vehicles. However, when there is an increase
in the node count, the approach takes time to fine-tune the packet drop threshold
δ1 and δ2 using linear regression. Once the threshold for increased traffic density is
determined, the approach again improves its performance. Also, the approach shows
a better detection rate and the false positive rate at node mobility of 60km/h. Due to
mobility in VANET, there will be rapid changes in the topology that leads to frequent
route changes and affects the accuracy of determining the packet drop threshold δ1
and δ2 in our approach, affecting the detection and false positive rates.
We compare our approach with an existing approaches in [24–26] that detects
blackhole attacks in VANET. The comparison is made in terms of detection rate by
changing the attacker rate from 10 to 20%. In [24–26], the detection rate varies from
95–98% in the presence of 10% attacker nodes and 88–97% when 20% nodes are
attacker nodes. However, our approach maintains a detection rate of 97–99% in both
A Dynamic Threshold-Based Technique … 609
cases. The results of the comparison are shown in Figs. 7 and 8. From this, we can
observe that our technique attains good detection accuracy while there are a large
number of attackers compared to the existing approaches. We are not using a constant
threshold to classify the nodes into normal or attacker nodes in our approach. Instead,
as the node density increases, the RSU fine-tunes the packet drop threshold using
linear regression that helps in achieving a high detection rate.
5 Conclusion
sion technique. The estimation of the packet drop threshold is critical in detecting
blackhole nodes accurately. The proposed system provides a high detection accu-
racy of 99.78% and false positives limited to 0.025%. Our method has a minimum
computational overhead compared with the existing techniques since we do not use
any complex cryptographic algorithms for the detection process. In the future, we
intend to evaluate the efficiency of our system for the grayhole and wormhole attack
detection in VANET.
Acknowledgements This work is supported by the funding agency Science and Engineering
Research Board (SERB), Government of India, under Core Research Grant (CRG) scheme. The
funding grant number is EMR/2016/007502.
References
15. Tobin J, Torpe C (2017) An approach to mitigate multiple black hole attacks in VANET. In:
16th Europen conference on Cyber Warfare and Security
16. Al-Ani AD, Seitz J (2015) QoS-aware routing in multi-rate Ad hoc networks based on ant
colony optimization. Netw Protoc Algorithms 7:1–25
17. Huang M (2020) Theory and implementation of linear regression. In: 2020 International con-
ference on computer vision, image and deep learning (CVIDL), pp 210–217
18. Dhende S, Musale S, Shirbahadurkar S et al (2017) SAODV: black hole and gray hole attack
detection protocol in MANETs. In: 2017 International conference on wireless communications,
signal processing and networking (WiSPNET), pp 2391–2394
19. Darwish T, Abu Bakar K (2015) Traffic density estimation in vehicular ad hoc networks. Ad
Hoc Netw 24(PA):337–351
20. Lim KG, Lee CH, Chin RKY et al (2017) SUMO enhancement for vehicular ad hoc network
(VANET) simulation. In: 2017 IEEE 2nd international conference on automatic control and
intelligent systems (I2CACIS), pp 86–91
21. Bavarva A (2013) Traffic detection in VANET using NS2 and SUMO. Int J Adv Res Comput
Sci Software Eng 3:1–7
22. Hichem S, Senouci SM (2015) An accurate and efficient collaborative intrusion detection
framework to secure vehicular networks. Comput Electr Eng 43:33–47
23. Tyagi P, Dembla D (2018) A secured routing algorithm against black hole attack for better
intelligent trans portation system in vehicular ad hoc network. Int J Inf Technol 04:11
24. Lachdhaf S, Mazouzi M, Abid M (2017) Detection and prevention of black hole attack in
VANET using secured AODV routing protocol. In: Proceedings of the 9th international con-
ference on networks and communications, pp 25–36, November-2017
25. Gautham PS, Shanmughasundaram R (2017) Detection and isolation of Black Hole in VANET.
In: 2017 International conference on intelligent computing, instrumentation and control tech-
nologies (ICICICT), pp 1534–1539
26. Hassan Z, Mehmood A, Maple C, Khan MA, Aldegheishem A (2020) Intelligent detection
of black hole attacks for secure communication in autonomous and connected vehicles. IEEE
Access 8:199618–199628. https://doi.org/10.1109/ACCESS.2020.3034327
Detecting Fake News Using Machine
Learning
Abstract The evolution of information and communication in this digital era has
increased the number of Internet accessible people. Internet has changed the way
information is consumed, and as its consequence, the fake news market has boomed
up. Fake news is one of the major concerns regarding the spread of Internet connec-
tivity because fake news has the potential to make high political damages to countries.
“Fake news” gained popularity during US electoral campaign. Fake news detection
works with application on natural language processing for clarifying and cleaning
up the news, and then, the model uses term frequency-inverse domain frequency
for further processing. Aim of this paper is computational approach automatically
detects fake news and also gives accuracy of the model.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 613
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_45
614 R. H. Patel et al.
and misdirecting articles distributed generally to bring in cash through site visits.
Facebook and particularly its child company WhatsApp have been at the epicenter
of much of the fake news spread. A function has already been introduced to flag fake
news on the Web when it is seen by a user in Facebook. Fake news and varying types
of false information spread can take on different forms. They have major impacts,
because information plays a major role in shaping the world view as humans make
important decisions based on information. How fake news affects the financial sector
can be explained using various examples. Whether it is a company, an institution, or
even a government, fake news has a big adverse effect of it on the financial sector.
Mental effects of fake news on a person/crowd is considered to be a very sensitive
issue; reportedly, there has been a huge spike of mental harassment cases due to
ruining one’s social image using fake news.
In politics also, fake news is considered to be a major player, not only during
the elections but also during the presidential terms. By the spread of fake news, the
reputation of a political party or even a politician can be hindered and in a bad way
too fake news is not a new thing in India as well. False accusations are not a new
thing nowadays; people can easily be misled by brainwashing them. Such was the
case in the recently ended CAA protest where the protesters did not know what they
were protesting for. So, the above points give brief information about how fake news
can have adverse impacts but to understand them more here are some recent cases
where fake news played a major role.
Regarding the CAA protests, the Supreme Court of India advised the central
government to consider “objectives and the benefits of the Citizenship Amendment
Act, a plea for publicizing aims” to cut out all the misleading news portrayed about
CAA. The plea lawyer stated that he had visited Jamia and Seelampur and observed
that more than 95% of the protesters did not even know what CAA is. They were
made to feel that the law will take back their citizenship. This was carried out using
deep fake videos circulated by accounts created in the neighboring countries. The
spread of those deep fake videos was to such an extent that the Indian Ministry of
External Affairs had to call out the Prime Minister of Malaysia for publicizing them
and also for making “factually inaccurate remarks” on CAA. Fake news was very
prevalent in 2019 elections as well. As Vice writes, political parties have weaponized
the platforms and misinformation was weaponries, respectively. It rose to such an
extent that Facebook went on to delete about one million account a day, spreading
false information [1].
Perhaps, the Kashmir issue is the best issue which exhibits the adverse ill effects of
fake news. Misinformation and disinformation related to Kashmir are widely preva-
lent; there have been many instances of photographs from the Syrian and Iraqi civil
wars being passed off with the intention of fueling violence and backing insurgen-
cies from the Kashmir conflict. After the Indian revocation of Article 370 of Jammu
and Kashmir in August 2019, misinformation relating to defense, public welfare,
lack of supplies, and other administrative issues followed. By getting Twitter to
remove accounts distributing fake informative news, the Ministry of Electronics and
Information Technology supported [1].
Detecting Fake News Using Machine Learning 615
2 Literature Review
Ghosh et al. [2] have used different combinations of support vector machine (SVM),
convolutional neural network (CNN), logistic regression (LR), bidirectional long
short-term memory (Bi-LSTM) algorithms. They used tweets from Twitter to create
dataset. The combined use of CNN layers and LSTM layers had given efficient model
with maximum accuracy.
Pantech Solutions Institute created a fake news detector in which count vectorizer
and TF-IDF vectorizer are used to transform the text and Naive Bayes classifier as
classifier. For testing of model, scikit-learn’s grid search functionality is utilized.
They found the optimize scenario for count vectorizer is with parameters two-word
phrases no single words, no lowercasing and to only use words that minimally three
times appeared in the corpus [3].
Rodríguez et al. [4] created three different neural architectures, two by their own
and one with the BERT language model. First model is based on SVM, second
is based on LSTM, and last on CNN. Article suggests that BERT is a pre-trained
language model which has maximum efficiency in a great number of NLP tasks.
With training over 5 epochs, LSTM-based model got an accuracy of 91%. And with
training over 4 epochs, CNN-based model achieved an accuracy of 93.7%.
Muhammed et al. [5] have implemented combination of support vector machine,
passive-aggressive, and Naive Bayes (NB) classifier algorithm. For text processing,
there are use of two vectorizers which are count vectorizer and TF-IDF vectorizer.
Their approach was to combine use of Web crawling and machine learning processes.
By doing so, maximum accuracy observed was 80%.
3 Proposed Model
After observing and analyzing different algorithms and models, this project has
considered 2 algorithms: Naive Bayes and passive-aggressive, for carrying out clas-
sification on the datasets. The varying results of both classifiers are discussed in the
discussion section.
Table 1 (continued)
References Methodology Features Dataset Accuracy Result
[10] TF-IDF vectors, Bag of words, Dataset 94.31% Model is
dense neural word vector created by efficient when
network representation Craig the stances
Silverman between news
Used for articles and
FNC-I headlines are
challenge “unrelated,”
“agree,” and
“discuss,” but
the accuracy
drop for
“disagree”
stance up to
44%
[11] Unsupervised Gibbs sampling, LIAR dataset, 75.9, 67.9% UFD
fake news update rule buzz feed performs
detection dataset better on
framework LIAR dataset
(UFD) than buzz
feed dataset
[12] SVM Length, convey National 87% SVM gives
less clout, appear Public Radio, the best
more negative in New York prediction
tone Times, and results among
Public logistic
Broadcasting regression,
Corporation random
forest,
decision tree,
k-neighbor
classifier
3.2 Dataset
The dataset used for this model is of US Presidential Elections 2016. Kaggle released
this dataset as challenge to create an accurate model of fake news detection since this
election was the reason that fake news came into spotlight. There are 20,000 articles
in this dataset with the column named as “label.” This label column accepts only 2
values, either 1 or 0. 1 is assigned if the news is true, and 0 if the news is false.
Detecting Fake News Using Machine Learning 619
Before using the dataset into model, raw texts of news require preprocessing. We
have used different methods to clear different types of text noise. In this data cleaning
part, first regex expression has been used. With the use of regex, only a word or a
string can be allowed; beside that, symbol, number, and sign are filtered from text.
After using regex, there is only words in text. Second step for cleaning is use of stop
word removing process. To remove stop word, first we have to covert text style to
sentences to words; for that, use of tokenizer is essential. Before use of tokenizer, the
text was considered as sentences but to remove stop words we need to check every
word so for that tokenizer is used. Tokenizer allows text to break a string into tokens.
Here, the tokens are words. After the process of tokenization, it is easy to remove
stop words.
Stop word is set of commonly used words in a language which do not have more
significance in news text. Stop words can be filtered or processed from the text as
they are most common and do not have more importance. Stop words are like part
of sentence which connects words, for example, prepositions like “in,” “from,” “of,”
“to”; conjunctions like “or,” “but,” “and”; articles like “the,” “a,” “an,” etc. Such
stop words which hold less useful information may take valuable processing time;
therefore, removing stop words in data processing is key factor in natural language
processing. The library use in this model to remove stop words is natural language
toolkit (NLTK).
3.3.2 Lemmatization
NLP techniques are used to transform text input data in the form of vectors to make
it compatible for the classifiers to perform mathematical operations on it. This model
uses TF-IDF vectorizer for the conversion.
Term Frequency (TF): Term frequency is how many times a word appears in a docu-
ment divided by the total of how many words are in the document. Term frequency
changes in every document; it is unique for every document [13]. It is calculated by:
n i, j
t f i, j = (1)
k n i, j
Inverse Document Frequency (IDF): The inverse document frequency offers the
measure of what quantity info the word provides; that is, it says if the word is rare
or common across all documents. It is the log of inverse fraction of the documents
that contain the word [13]. It is found by:
N
wi, j = t f i, j ∗ log (2)
d ft
Naive Bayes is a simple supervised function and a generative model which returns
probabilities. Predictions are made using Bayes theorem; then, the predictions are
made on the basis of the presence of a specific feature that is separate or unrelated
to any other feature’s presence. The predictions made by the model are Naïve “on
basis of conditional independence between the pair of features; hence, the name is
derived Naive Bayes.” In Bayes’ theorem, a class variable y’ which dependent on x’,
here x’ is a vector which is made of x i ’... x n ’ is given by [14]:
In Eq. (4), t is temporally dimension. Here, data are collected from same data
generating distribution, there will be no larger parameter modification and the algo-
rithm will keep learning, but if the source of data is changed with different distri-
butions, the weights will change according to new distribution and slowly forget
previous ones. In this model, the data will be drawn from same distribution. Given
weight vector w, the prediction can be calculated as [15]
y t = sign w T · x (5)
In Eq. (6), the value L differs between 0 and k, and if the value of k is 0 that means
perfect match. This depends on f (x(t), q) [15]. The update rule which is generally
used in passive-aggressive algorithm is:
wt+1 = arg minw 21 w − wt 2 + Cξ 2
(7)
L(w; x t , yt ) ≤ x
In Eq. (7), first let us assume the slack variable is x = 0. In case a x(t) sample is
presented, the classifier will determine the sign with use of current weights. When
the sign is correct, the value of loss function will be 0 and the value of arg min will be
w(t). This clearly shows that when correct classification occurs the classifier will be
in passive state. Now assume that instead of correct classification misclassification
occurred in classifier [15].
622 R. H. Patel et al.
The accuracy is measured on the basis of the confusion matrix. While applying Naive
Bayes algorithm and passive-aggressive classifier, confusion is used for finding out
how much accuracy the model gives.
A table is used to describe the performance of a classification model on a set of
test data for which the true values are known as confusion matrix. Now, the concept
of confusion matrix is pretty easy, but terminology related to it is quite confusing;
hence, the name confusion matrix is a table [16] with four completely different
combinations of actual and predicted values as shown in Table 2:
To understand and analyze performance of this model, methods used are recall,
F1-score, and precision. Precision is the ratio between the true positives and all the
positives, i.e., fraction of relevant instances among the retrieved instances [17].
True Positive
Precision = (8)
True Positive + False Positive
The recall is the measure of model correctly identifying true positives. Recall is
given by fraction of the total amount of positive instances that were actually retrieved
[17].
True Positive
Recall = (9)
True Positive + False Negative
Precision × Recall
F1 − score = 2 × (10)
Precision + Recall
After training the model, in testing phase for some datasets the accuracy of Naive
Bayes is higher and for others the accuracy of passive-aggressive is higher. So, in the
result of this model, the accuracy of passive-aggressive and Naive Bayes classifier
for the dataset of US Presidential Elections 2016 is 96.8% and 83.6%; by observing
both accuracies, this model will suggest the user to carry out further classification
using passive-aggressive classifier only. It is to be noted that Naive Bayes classifier
Detecting Fake News Using Machine Learning 623
works on the Bayes probability theorem, so the accuracy in predicting fake news will
be varying as compared to passive-aggressive which remains more stable.
Passive-aggressive will first check the prediction; if it will match to actual result,
then the weight would be same and it will remain in passive state but if the prediction
will not match to actual, it will go in aggressive state and try to change weights in
the way that the predicted value would come as close as possible to actual value; this
is the reason that passive-aggressive is more efficient most of the times than other
classifiers. And that is why passive-aggressive classifier gives a higher accuracy in
this model. The confusion matrices obtained in this model are:
In Figs. 2 and 3, the values represent the test case numbers which have been
classified into these confusion matrices. Here, 2000 test cases have taken that means
2000 news articles have been classified in these matrices. From Figs. 2 and 3, the
values of true positive and true negative are obtained as 1009 and 663 for Naïve Bayes
and 987 and 949 for passive-aggressive. Here for prediction to be accurate, the sum of
values of true negative and true positive has to be higher than the sum of values of false
negative and false positive. As reported in Sect. 3.7.1, result clearly shows that the
sum of true value of passive-aggressive is much higher than Naive Bayes, and hence,
passive-aggressive has a higher accuracy than Naive Bayes. Now to measure Naive
Bayes and passive-aggressive classifiers’ performance closely precision, recall, and
F1-score are used.
In Fig. 4, comparison of classifiers and comparison of these parameters of accuracy
for this propose simulated model (Naive Bayes and passive-aggressive) and support
vector machine (SVM) classifier. Precision, recall, and F1-score for an ideal classifier
are 1; hence, the model which achieves values nearer to 1 is more likely to be an
ideal classifier; as the above graph shows, passive-aggressive classifier has values
very nearer to 1 than Naive Bayes; it can be stated that passive-aggressive can be
considered as a better classifier of this model.
5 Conclusion
In this paper, we have proposed a model for detecting fake news from dataset.
The dataset used is 2016 Presidential Elections news data. Stop word removal and
lemmatization techniques are used to preprocess the dataset. Passive-aggressive
classifier and Naive Bayes classifier are used to classify whether the news is
fake or real. The result of each classifier differs with different datasets. Accuracy
of Naive Bayes is 83.6%, and accuracy of passive-aggressive is 96.8% for this
dataset. Here, the performance of passive-aggressive is better than Naive Bayes.
The reason being different states of passive-aggressive classifier can classify more
accurately than the probability-based Naive Bayes classifier. Here, the performance
of passive-aggressive classifier is far better than other classifiers.
References
1. https://en.wikipedia.org/wiki/Fake_news_in_India
2. Ghosh A, Veale T (2017) Magnets for sarcasm: making sarcasm detection timely, contextual
and very personal. In: Proceedings of the 2017 conference on empirical methods in natural
language processing, pp 482–491
3. “Fake news detection using machine learning”. Pantechsolutions (2018) www.pantechsolut
ions.net/fakenews-detection-using-machine-learning
4. Rodríguez ÁI, Iglesias LL (2019) Fake news detection using Deep Learning. arXiv preprint
arXiv:1910.03496
5. Murshid TM, Nikhil PP, Ranjith EP, Francis JJ (2019) Fake news detection using machine
learning. Int J Innovative Res Sci Eng Technol 8(06):6784–6786
6. Kar D, Bhardwaj M, Samanta S, Azad AP (2020) No rumours please! A multi-indic-lingual
approach for COVID fake-tweet detection. arXiv preprint arXiv:2010.06906
7. Sharma S, Sharma R (2020) A graph neural network based approach for detecting suspicious
Users on Online Social Media. arXiv preprint arXiv:2010.07647
8. Li Q, Zhou W (2020) Connecting the dots between fact verification and fake news detection.
arXiv preprint arXiv:2010.05202
9. Han Y, Karunasekera S, Leckie C (2020) Graph neural networks with continual learning for
fake news detection from social media. arXiv preprint arXiv:2007.03316
10. Thota A, Tilak P, Ahluwalia S, Lohia N (2018) Fake news detection: a deep learning approach.
SMU Data Sci Rev 1(3):10
11. Yang S, Shu K, Wang S, Gu R, Wu F, Liu H (2019) Unsupervised fake news detection on social
media: a generative approach. In: Proceedings of the AAAI conference on artificial intelligence,
vol 33, no 01, pp 5644–5651.
Detecting Fake News Using Machine Learning 625
12. Singh V, Dasgupta R, Sonagra D, Raman K, Ghosh I (2017) Automated fake news
detection using linguistic analysis and machine learning. In: International conference on
social computing, behavioral-cultural modeling, & prediction and behavior representation in
modeling and simulation (SBP-BRiMS), pp 1–3
13. https://towardsdatascience.com/natural-language-processing-feature-engineering-using-tf-
idf-e8b9d00e7e76
14. https://en.wikipedia.org/wiki/Naive_Bayes_classifier
15. https://www.bonaccorso.eu/2017/10/06/ml-algorithms-addendum-passive-aggressive-algori
thms/
16. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
17. https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/
Predicting NCOVID-19 Probability
Factor with Severity Index
A. Pandit (B)
Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata,
West Bengal, India
e-mail: ankush.pandit.cse20@heritageit.edu.in
S. Bose
Communications and Multimedia Engineering, University of Erlangen-Nuremberg, Erlangen,
Bavaria, Germany
e-mail: soumalya.bose@fau.de
A. Sen
Department of Electronics and Communication Engineering, Heritage Institute of Technology,
Kolkata, West Bengal, India
e-mail: anindya.sen@heritageit.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 627
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_47
628 A. Pandit et al.
general population. Sahayata helps the COVID-19 patients living in rural communi-
ties with smaller patients care facilities with limited equipment by providing a way
for efficient treatment care.
1 Introduction
The motivation of this research work arises from the ongoing COVID-19 global
crisis where hospitals of multiple nations are running short of beds, ventilators and
medical equipment leading to collapse of healthcare system across the globe.
Nations like France, China and USA are on break yet again mostly due to Delta
variant of COVID-19. China’s COVID-19 cases hit a seven-month high on August
10, 2021, with the reporting of 143 new cases [1]. Similar trends can be seen at far
west with B.1.617.2 (Delta) variant cases piling up in the USA. As per the report of
centers for disease control and prevention [2] in USA, the daily trends in COVID-19
cases and death rates are increasing per 100,000 population as shown in Figs. 1 and
2. The report also states that there is an increase in 22% in hospital admission in
USA for new COVID-19 cases in between August 01, 2021, and August 07, 2021,
as compared to its previous week, i.e., from July 25, 2021, to July 31, 2021 which is
shown in Fig. 3. The report went on saying that there is a whopping 21.4% increase
in registered COVID-19 cases in the age group of 0–17 years during the same time
frame as shown in Fig. 4. The report further shows the global trend in COVID-19
cases in terms of incidence rates as shown in Fig. 5. A report from “Times of India”
Fig. 1 Daily trends in number of COVID-19 cases till August 08, 2021, in the USA reported to
centers for disease control and prevention and total and cumulative incidence rate of COVID-19
deaths per 100,000 population
Predicting NCOVID-19 Probability Factor with Severity Index 629
Fig. 2 Daily trends in number of COVID-19 cases till August 08, 2021, in the USA reported to
centers for disease control and prevention and total and cumulative incidence rate of COVID-19
cases per 100,000 population
Fig. 3 New hospital admission in USA from August 01, 2021, to August 07, 2021 as per centers
for disease control and prevention
[3] states USA is recording over 1,00,000 new COVID cases a day, and average
number of cases have doubled from two weeks ago. During the same time, deaths
have doubled to 516 a day. “Bloomberg” reported [4] that Austin, the capital city of
Texas, USA, with 2.4 million inhabitants has only six intensive care unit beds left
due to massive rise in COVID-19 cases. Condition is even worse in Middle East with
“Times of India” [5] reporting that in Iran, one person is dying of COVID-19 every
2 min with total deaths have reached 94,603 cases.
“Sahayata” means “help” in “Sanskrit” and has been developed in the “Bengali
Year “of 1427 viz. in year 2020. As the name suggests, the proposed prediction
algorithm has the ability to help the health workers, taking quick decision in proper
630 A. Pandit et al.
Fig. 4 New hospital admission in USA by age groups from August 01, 2021, to August 07, 2021
as per centers for disease control and prevention
Fig. 5 Global trends in epidemic curve trajectory classification till August 8, 2021, as per centers
for disease control and prevention
allocation of hospital beds, ventilators or ICU for deserving patients and eventually
reducing the pressure on healthcare institutes.
Though multiple high accuracy COVID-19 rapid testing kits are now available
in the global markets, the cost of the kits and testing [6–8] is a matter of concern
for people especially from poor and developing nations. Also, these kits do not
provide the severity of COVID-19 cases. Although many effective methods have
been developed to predict severity of COVID-19 [9, 10], cost and user-friendliness
is questionable.
The research develops a free of cost novel prediction algorithm named Sahayata
which provides probability factor of COVID-19 along with severity index of each
case.
Sahayata is easy to use and can be used by healthcare institutes as well as at homes.
Sahayata takes into account of the manually and scientifically computed weightage of
parameters like COVID-19 symptoms and duration of each symptom, severity factor
of locality, national impact factor, comorbidity, age, sex group, travel history and
SpO2 level of patient to predict the probability of a person having been diagnosed by
Predicting NCOVID-19 Probability Factor with Severity Index 631
COVID-19 with severity in each case. The model parameters are manually adjusted
until we get best output representing the chances that the concerned person is diag-
nosed by COVID-19 after repeated test trials with available data. Sahayata operates
as a logistic regression model where the weighted sum of each contributing param-
eter is passed to a sigmoid function. This model also gives an index of severity for the
COVID patients as output, following the same method, representing the severity of
illness of the patient. The chances of COVID-19 infection and severity of it computed
by the model is plotted on graph against their corresponding graph already obtained
from preexisting database, for determination of accuracy of the algorithm.
The two measurements that Sahayata provides are of utmost importance, espe-
cially during the hour of crisis in a country like India, where population density
is very high. At the scarcity of medical resources faced by doctors or officials-in-
charge, they have to prioritize some patients based on their medical situation, and
the probability factor and severity index will prove to be handy to make a quick
decision. Thus, our algorithm helps not only to save significant amount of time but
also save the money to run the additional tests. The severity index is calculated in
such a way that if the health condition of patient is likely to deteriorate quickly, and
his/her severity index will be higher than others, so he/she will automatically get
higher priority if our algorithm is followed.
Our algorithm together with multiple AI-based app for measuring SpO2 level,
GPS-based positioning, few Q&A regarding physical condition and travel history
can make predictions about someone’s COVID status and measure its severity with
high accuracy, thus creating an easy way for self-assessment especially for people
with no particular medical knowledge.
Sahayata predicted the patients diagnosed with COVID-19 with an accuracy of
88.17%, precision of 100% and recall of 87.3%. Besides, it has computed severity
index in each case. A graph has been plotted between SpO2 level and severity index
for each patient. The graph depicted that the severity increases with low SpO2 value
which is a typical medical case, thus proving the authenticity of the severity index
prediction curve given by Sahayata.
4 Major novel contributions presented by our project
• Self-testing is possible by users besides use by doctors or nurses.
• Easy integrations of new symptoms in the system, if observed during a particular
period.
• Helps in decision making for resource allocation approach.
• Predicts the severity of each case.
• Applicable to be used anywhere in the world, as global and local factors have
been taken into account for each of parameters.
632 A. Pandit et al.
2 Materials
Sahayata has been developed in Python high-level programming language. The algo-
rithm is open-sourced, user-friendly and is available in the GitHub repository. Two
freely available datasets are primarily used: (i) Probability factor dataset and (ii)
severity index dataset as discussed below. The program is developed on Python 3.6
IDLE in Windows 10 platform.
Sahayata has been tested on the dataset collected by “Open COVID-19 Data Working
Group” [11] as on date May 26, 2020, for prediction of probability factor. The
dataset has been filtered further to remove redundancy and incomplete data. The
filtered dataset includes 59 patient IDs from different nationalities, corresponding
symptoms experienced, travel history and travel country. Of these, 55 are COVID-19
positive with two being asymptomatic patients.
Sahayata has been tested on the dataset of 103 patients from India, of different age,
sex and comorbidity status, duration of COVID-19 symptoms and SpO2 level at time
of admission in hospital for predicting severity index.
3 Methods
The prediction of probability factor for nCOVID-19 has been shown in Fig. 6.
As the above diagram depicts, our algorithm has three main modules, the first
module receives necessary user data and extracts the corresponding factors. The
second module calculates the weighted sum of the different factors and normalizes
the weighted value between 0 to 1 using a sigmoid filter. And, the last module
classifies the user based on a comparison of the normalized value to a pre-calculated
threshold value. The following sections describe each module in details.
Predicting NCOVID-19 Probability Factor with Severity Index 633
a person is COVID-19 positive and already showing some symptoms, adding a few
mores symptoms to his condition does not alter the fact that he is a positive case.
So, the final probability should not alter much even if some symptoms are added to
the existing condition, which increases the weighted sum to a significant amount.
Keeping this idea in mind each of the three above sections under 3.1.1 is summed
up and passed through a sigmoid filter to normalize the probability score in scale of
0 to 1.
Classification Module
Finally, the authors have computed the average of probability factor for COVID-
19 positive patients from training dataset. This average is considered as threshold
value. The probability factor for each patient is compared with threshold value. If
found larger or closer to threshold value, Sahayata declares the patient as COVID-19
positive, else negative.
TCF and WF have been calculated from Worldometer [12] as on date June 14,
2020. LF, GF and TSF have been calculated from filtered dataset of [11] as on date
May 26, 2020. The LoF values have been chosen experimentally resulting to the best
prediction result with the weightage being given to the more severe zones. Authors
have defined a total weightage in scale of 10 for TF, TSF and LoF (in Eq. 8) with the
values for each parameter that has been chosen experimentally resulting to the best
prediction result.
The computation of the prediction of the probability factor for a patient whether
affected with nCOVID-19 is shown in Fig. 7.
From the diagram, we get the basic difference between the calculation of severity
and probability factor. We are concerned about the absolute value of the weighted sum
of all the user related information, and so, subsequent comparison or normalization
is not used here. So, the whole algorithm has two sections: User Module to collect
user information and extract corresponding factor values and Calculation Module
to calculate the weighted sum and output the final severity index.
User Module
The computation of weightage of COVID-19 symptoms works in the same way as
explained in the Eq. 4, 5 and 6 under the Sect. 3.1.2.
The duration of each symptom is taken into account and is multiplied with the
TSF (from Eq. 6) calculated in the above section.
Then, Sahayata asks for the sex group and replaces it with the existing fatality
rate of the corresponding sex group defined in [12] as on date July 5, 2020.
After this, the algorithm takes into account the difference of SpO2 level from
the standard value for the patient and stores it as SpO2 level factor. The standard is
calculated as the average of SpO2 level of healthy persons.
Sahayata asks for the age of the patient. It is replaced fatality rate of the age group
pre-defined by World Health Organization [13] as dated from February 24, 2020 to
April 13, 2020.
Next, the algorithm takes the comorbidity of the patient and replaces it with
their corresponding pre-defined fatality rate [14–17]. This pre-defined fatality rate
of comorbidity is defined as comorbidity factor. The sum of comorbidity factor of
existing comorbidities is computed.
Calculation Module
Each of the six above sections under 3.2.1 is summed up and multiplied with the
fatality rate of the country to predict the severity index of COVID-19 positive patient.
4 Results
Graphical illustrations of results given by for Sahayata algorithm have been depicted
in Figs. 8, 9, 10 and 11.
Predicting NCOVID-19 Probability Factor with Severity Index 637
Fig. 8 Prediction results for probability of COVID-19 positive given by Sahayata algorithm. The
ordinate labeled “probability of COVID positive” represents “probability factor”
Fig. 9 Prediction results for probability of COVID-19 positive given by Sahayata algorithm with
threshold line at y = 0.6
Figure 8 shows the probability of COVID-19 positive, viz. probability factor (PF)
has been plotted on the y-axis of the graph with their corresponding patient or user,
present in dataset [11], on the x-axis in absence of threshold line.
Figure 9 shows that Fig. 8 has been plotted with threshold line place at y = 0.6.
Figure 10 depicts that Fig. 8 been plotted with threshold line place at y = 0.8.
In Fig. 11, the severity index (SI) has been plotted on the y-axis of the graph
with their corresponding patient’s SpO2 levels, present in dataset under Sect. 2.2,
on the x-axis. The plot shows a general increase in the severity with a decrease in
638 A. Pandit et al.
Fig. 10 Prediction results for probability of COVID-19 positive given by Sahayata algorithm with
threshold line at y = 0.8
Fig. 11 Prediction results for severity value of nCOVID-19 cases given by Sahayata algorithm.
The ordinate labeled “severity values” represents “severity index”
the SpO2 level, which is common medical scenario. Thus, the plot validates our
calculation. However, the plot is not strictly a straight line as other factors are taken
into consideration during calculation other than SpO2 level.
Let us elaborate the usage of these two measurements with simple case studies
from our dataset:
Case Study 1: A person from an urban area of India with no serious symptoms and
no travel history has a probability 0.131683213 of being COVID positive, which is
quite low in terms of the decided threshold. Hence, the person is deemed COVID
negative.
Predicting NCOVID-19 Probability Factor with Severity Index 639
Case Study 2: A person living in Spain, who is exhibiting flu like symptoms and has
a recent travel history from Italy, has a probability 0.844886766 of being COVID
positive, which is quite higher and thus falls into the COVID positive category.
Case Study 3: A patient with cough and breathlessness for 2–3 days but no
comorbidity history gives 0.8437 COVID positivity chance and a severity value
of 111.2953507.
Case Study 4: Another patient with fever, cough and breathlessness for almost
a week and history of breathing problem gives severity value 209.5281853 with
0.95631 chance of being COVID positive.
So, interesting observation here is, even though both patients described in case
study 3 and 4 are predicted to be COVID positive, patient 4 will be given higher
priority over patient 3 based on the higher severity value of patient 4. This will save
the time of going through the previous detailed test reports and cost to re-conduct
the tests.
5 Discussions
In the Fig. 11 under Sect. 4, the authors have plotted the prediction of SI given
by Sahayata algorithm against the corresponding SpO2 level in each case. The SI
score has been shown in blue dots. It can be seen from the figure that blue dots are
either present at the bottom-right side or the upper-left side which can be physically
interpreted as SI is inversely proportional to patient’s SpO2 level. The graph depicts
that the severity increases with low SpO2 value which is a classical medical case.
This finding proves the accuracy of proposed Sahayata algorithm in predicting the
severity index for nCOVID-19 cases.
The authors have experimentally and manually optimized several parameters in
the algorithms to get best prediction results. Future works can be done as follows:
• Autonomous optimization of parameters completely.
• Integrating more potential COVID-19 symptoms into Sahayata.
• There has been multiple research works going worldwide which aims to use
computed tomography (CT) of chest for early detection of COVID-19 [18–21].
Integrating CT of chest as symptom of COVID-19 in Sahayata.
• Use of image processing tools for detection of bluish lips, COVID toes and eventu-
ally to be integrated into Sahayata to widen the scope of the algorithm in predicting
PF.
• Development of user-friendly and platform independent application can be devel-
oped based on the Sahayata algorithm to make it more accessible to common
public.
6 Conclusion
nCOVID-19 was officially first reported in the China at the end of January 2020
which was followed by other countries, and lockdown in UK was announced on
March 23, 2020 [22]. India and many other countries followed suit. The community
believes that as the first-, second- and third-order impacts of the virus manifest over
different time frames across regions, this pandemic will not necessarily be “over”
until we are through the impact of the “third wave” of the COVID-19 pandemic [22].
Till December 2020, India is going down the first peak of the pandemic, while Africa
and Europe experiences the second peak, and UK has exposed a new strain of the
virus. Now, the whole world is facing spike in COVID-19 cases yet again mainly
drove by the Delta variant. Countries around the globe are busy defending against the
virus again. This clearly demonstrates the worldwide severity of the pandemic and
reflects the need of efficient services for the ever-growing patient population. With
noticeable rise in demand with patient critical care units at hospitals, the severely
ill nCOVID patients are found waiting for hospital beds, which are occupied by
non-critical COVID patients. The demand has peaked so much in small villages or
areas with low number of hospitals per area residents that patients are queued to
wait in turn for the availability of critical survival units. Due to the absence of prior
knowledge of an index of severity for COVID-19 patients, hospitals, with limited
number of ventilators and medical equipment, fail to admit patients on any priority
Predicting NCOVID-19 Probability Factor with Severity Index 641
basis. With multiple tests kit available in market till now, there is none that has come
up with an instantaneous index for severity prediction for COVID. This research has
developed and implemented a free and user-friendly algorithm “SAHAYATA 1427”
which predicts a factor for a patient having the probability of disease nCOVID-19
termed as “probability factor” of COVID-19 for each patient and concurrently also
provides an index for severity by which the patient is affected by nCOVID, termed
as “severity index.” The probability factor and severity indexes are supposed to act
as an initial guide for the caregiver and patients for making progressive decisions.
Sahayata will be a great advantage and boon for the patient and the care giver if the
critical care unit can be prioritized based on a patient’s probability factor and severity
of nCOVID-19.
The COVID-hit situation is predicted to be worse when the second wave will
reach its peak. The shortage of test kit, the limitation in oxygen supply and most
importantly the absence of sufficient human resources shall be unimaginably difficult
to be handled. Amidst such adverse situation, Sahayata can be an extremely useful
tool to counter the hostility. The use of the “probability factor” and “severity index”
will be beneficial form economic, medical and social point of view.
Sahayata is user-friendly, and program is easily accessible. It can be used by the
general population at home and by trained medical personnel at medical, healthcare
institutes over python programming language platform. One potential situation may
arise if any household does not have any kind of measuring tools like oximeter
available. Many AI-based applications can measure the SpO2 very accurately but
fail to provide any kind of probability factor or severity measurement; hence, by
using our model with the available SpO2 value, one nonmedical person can also get
idea of his or her medical condition, just by staying inside the home without going
out.
The proposed algorithm is simple yet efficient and has achieved high accuracy
in predicting PF on nCOVID-19 with SI in each case. Moreover, the Sahayata is
open-source and thereby welcomes scope for a useful future mobile application
development. Most importantly, Sahayata is free and can be used by any nations all
across the globe.
COVID-19 is evolving day by day, likewise doctors and researchers associate new
symptoms to it. Our choice of algorithm in Sahayata supports this idea of scaling.
If a new symptom is to be added in the data set, our model will not break. As we
are using logistic regression, we can achieve the similar accuracy after training the
model once more.
Author’s Contribution Each author has contributed equally toward this project; from collecting
data from various sources, to design an algorithm, perform the testing and finally build a model
based on the found results.
Conflict of Interest The named authors have no conflict of interest, financial or otherwise.
Information About Funding Authors did not receive any financial support in the form of grants.
642 A. Pandit et al.
Resource Availability “SAHAYATA 1427” code and all the datasets, flowcharts and images used
in this research and final results are available at: https://github.com/AnkushPandit/Predicting-Pro
bability-Factor-of-nCOVID-19-with-Severity-Index.
References
1. Agence France-Presse (2021) China cases hit 7-month high, worst outbreak since virus emerged
in Wuhan, NDTV, Aug 2021
2. Centers for disease control and prevention CDC, Aug 2021
3. US see highest caseload since Feb, Times of India, Aug 2021
4. Chua L (2021) Austin sounds ‘Dire’ covid emergency as available ICU beds drop, Bloomberg,
Aug 2021
5. Iran: one person dying of Covid-19 every 2 min, Times of India, Aug 2021
6. Court E (2020) ‘Game Changing’ 15-minute Covid-19 test cleared in Europe, Bloomberg
Quint, October 2020. Cost of RT-PCR test decreased to | 1600 in Karnataka, Times Of India,
Sept 2020
7. Cost of RT-PCR test decreased to | 1600 in Karnataka, Times Of India, Sept 2020
8. India’s Feluda Covid-19 test cheaper, faster alternative to RT-PCR, ET Healthworld, Sept 2020
9. Assandri R, Buscarini E, Canetta C, Scartabellati A, Vigano G, Montanelli A (2020) Laboratory
biomarkers predicting COVID-19 severity in the emergency room. Arch Med Res 51(6):598–
599
10. Blood test can predict severity of Covid-19: Study, ET Healthworld, July 2020
11. https://github.com/beoutbreakprepared/nCoV2019
12. https://www.worldometers.info/coronavirus/
13. Coronavirus disease 2019 (COVID-19) situation report—89, p 3, Apr 2020
14. https://www.worldometers.info/coronavirus/coronavirus-age-sex-demographics/
15. Hussain A, Mahawar K, Xia Z, Yang W, Hasani S (2020) Obesity and mortality of COVID-19.
Meta-analysis. Obes Res Clin Pract 14:295–300
16. Juarez SY, Qian L, King KL, Stevens JS, Hussain SA, Radhakrishnan J, Mohan S (2020)
Outcomes for patients with COVID-19 and acute kidney injury: a systematic review and meta-
analysis. Clin Res 5(8):1149–1160
17. 63% of coronavirus deaths in India in 60+ age group: Health ministry, India Today, Apr 2020
18. Hui JY, Hon TY, Yang MK, Cho DH, Luk W, Chan RY, Chan K, Loke TK, Chan JC (2004)
High-resolution computed tomography is useful for early diagnosis of severe acute respira-
tory syndrome–associated coronavirus pneumonia in patients with normal chest radiographs.
J Comput Assisted Tomogr 28(1)
19. Huang C, Wang Y, LI X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, Cheng Z, Yu T,
Xia J, Wei Y, Wu W, Xie X, Yin W, Li H, Liu M, Xiao Y, Gao H, Guo L, Xie J, Wang G, Jiang
R, Gao Z, Jin Q, Wang J, Cao B (2020) Clinical features of patients infected with 2019 novel
coronavirus in Wuhan, China. Lancet 395:497–506
20. Lia Y, Xia L (2020) Coronavirus disease 2019 (COVID-19): role of chest CT in diagnosis and
management. Am Roentgen Ray Soc 214:1–7
21. Chung M, Bernheim A, Mei X, Zhang N, Huang M, Zeng X, Cui J, Xu W, Yang Y, Fayad
ZA, Jacobi A, Li K, Li S, Shan H (2020) CT imaging features of 2019 novel coronavirus
(2019-nCoV). Radiology 295(1):202–207
22. Fisayo T, Tsukagoshi S (2020) Three waves of the COVID-19 pandemic. Postgrad Med J 0:1
Differentially Evolved RBFNN
for FNAB-Based Detection of Breast
Cancer
S. P. Gadige · K. Manjunathachari
GITAM University, Rudraram, Hyderabad, Telangana, India
e-mail: 221960404501@gitam.in
K. Manjunathachari
e-mail: mkamsali@gitam.edu
M. K. Singh (B)
Manuro Tech Research Pvt. Ltd., Bangalore, India
e-mail: mksingh@manuroresearch.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 643
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_48
644 S. P. Gadige et al.
1 Introduction
The origin of breast cancer started from the breast tissue and often originated from
the milk duct inner linings called ductal carcinomas or from the lobules called lobular
carcinomas [1]. The occurrence of breast cancer prevailed in humans as well as other
mammals. In the year 2020, there were 2.3 million women have been diagnosed with
breast cancer and more than 685,000 deaths have been recorded globally. More than
7.8 million women have been diagnosed with breast cancer in the last 5 years at
the end of 2020 and have made the top-ranked in the prevalent status of all cancer.
The survival status of 5 years after the first diagnosis is having a varying range. For
the high-income countries, it is around 90% while the condition is worst for poor or
underdeveloped countries (in India, it is 66% while in South Africa it is 40%) [1]. The
treatment process is determined by the current status of cancer characteristics (like
size and growth rate) and involved an integrated approach carried the drugs (hormonal
therapy and chemotherapy), radiation and/or immunotherapy, and surgery. From a
statistical point of view, breast cancer shared nearly 22.9% of all types of cancer in
women. The occurrence frequency of breast cancer is common in women than in
men. Prognosis and survival rate depends upon the cancer type and stage, applied
diagnosis and treatment along with the geographical belongings of the patient. At
present, mammograms have been considered as the front best breast cancer screening
tests. But mammography has carried its limitations and carried high sensitivity of
97% and low specificity of 64.5% [2]. The dense breast which is common in women
increased the value of false-positive percentage further [3]. Other complications with
analyzing the mammogram which is highly subjective can induce serious issues with
final observation outcomes. The information is available with abnormal mammo-
grams confirmed with a further diagnosis like ultrasound, MRI, and biopsy. There
is a recommendation for breast biopsy under certain circumstances like when there
is a lump or thickening the breast, mammogram shows some suspicious area in the
breast, or ultrasound /MRI scan has shown some suspicious finding. A breast biopsy
provides a confirmative diagnosis of whether the patient has breast cancer or whether
the abnormality that existed in the question is benign and has a specificity of 99.6%;
sensitivity, 97.4% [4] and depends upon the outcomes of biopsy further processes
can be defined with better confidence. Fine needle aspiration biopsy (FNAB) of the
breast provides the facility of minimally invasive procedure and prevents the need
for open biopsy [5]. The FNAB process is cheaper, comfortable, and results can
appear in a short period. Even though the core biopsy diagnostic process is more
robust and reliable but carried the disadvantages of taking a longer period, patient
discomfort, and being costlier. In the FNAB, there is the use of a smaller needle
hence has a low probability to develop hematoma and other rare complications, such
Differentially Evolved RBFNN for FNAB … 645
as pneumothorax [6, 7]. The triple diagnostic approach combining clinical evalua-
tion, mammography, and FNAB provides a precise diagnosis for breast cancer and
reduces the risk of missed diagnosis even less than 1% [8]. Generally, a pathologist
examined the biopsy tissue sample to find the availability; however, manual anal-
ysis and quantification along with lacking universal rule can make the final decision
erroneous [9]. The accuracy level of manual analysis has been seen in the range
of 62–90%. This variability can affect the patient management process very much.
Early detection of breast cancer can help very much in the right treatment plan and
survival period and in this regard computer-aided diagnosis (CAD) has been applied
to assist. The present knowledge-based artificial intelligence (KAI) approaches can
help in improving the CAD system performance up to great extent in comparison
with the rule-based problem-specific solution.
This work was carried to develop the knowledge-based approach over FNAB data
learned by evolved radial basis function neural network to recognize the category of
breast cancer for malign or benign. In the conventional form of RBF where basis func-
tion parameters are fixed and learning is provided by up-gradation of out layer weight
only. This makes the learning restricted and causes poor performance. To overcome
this issue along with output layer weights, basis function parameters (like center
and spread value of Gaussian function) are also involved in the learning process. The
gradient-based approach of learning has the limitation of being stuck in local minima
and accuracy, and hence, differential evolution-based approach has been applied to
evolve the whole learning process. A hybrid mutation strategy carried the proba-
bilistic approach of selection among derived by the best member or random member
has shown excellent benefit. The work has been divided into several sections; Sect. 2
carried the details about the related work while the proposed work has presented in
Sect. 3. The detailed experimental results and analysis have given in Sect. 4 while
the conclusion and future work have presented in the last.
2 Related Work
Several works have been reported in past toward automated detection of breast cancer.
To distinguish the category of benign and malignant, Osareh and Shadgar [10] have
been utilized the SVM and K-nearest neighbors and PNN together with signal-to-
noise ratio to feature ranking while PCA has been applied to extract the feature.
Single nucleotide polymorphisms from the BRCA1, BRCA2, and TP53 genes have
been utilized in [11] to detect breast cancer by the different approaches of machine
learning. Machine learning-based classifier has been applied over mammogram in
[12] to detect breast cancer over-extracted features from segmented regions on cran-
iocaudal (CC) and/or mediolateral oblique (MLO) mammography image views. The
response over a single cycle of neoadjuvant chemotherapy (NAC) by breast cancer
patients has been discussed in [13]. A different form of artificial meta-plasticity in the
multilayer perceptron has been used in [14] to detect breast cancer. To detect breast
646 S. P. Gadige et al.
cancer, performances of a different approach like SVM, C4.5, Naïve Bayes, and K-
NN have been discussed in [15]. Recurrence is important in the behavior of breast
cancer related to mortality, and in [16], machine learning-based method has been
applied to predict breast cancer recurrence. Microarray technology-based identifica-
tion of genetic factors has given great help in the diagnosis and treatment. Bektaş and
Babur [17] have been applied to machine learning algorithms to detect and classify
breast cancer. Considering the active genes in breast cancer, Kolay and Erdoğmuş
[18] have been proposed the clustering approach to classify breast cancer. Weighted
K-mean support machines and weighted support machines have been considered in
[19] over two large-scale real applications in the TCGA pan-cancer methylation data
of breast and kidney cancer. The rule-based approach of breast cancer classification
and machine learning-based approach for survival prediction has been discussed in
[20]. SVM alone performance over breast cancer classification has been improved by
the ensemble approach in [21]. FFBPN for the classification of breast cancer cases
as malignant or benign has been discussed in [22]. How machine learning algorithms
help in the detection of breast cancer has been discussed [23, 24].
3 Proposed Work
The radial basis function neural network has been shown very effective in universal
approximation and function mapping applications. The mathematical form of output
delivered by RBF neural network can be represented as given in Eq. 1.
N
Oq = f q ( p) = Wqz ϕz ( p, k z )
z=1
N
= Wqz ϕz p − k z 2 ∀q = 1, 2, . . . m (1)
z=1
where is a spread parameter and regulates the “width” of the Gaussian function. The
centers are normally taken into account from the input data set. The output layer
weights, which are adaptive and play an active part in the function mapping, are the
only parameter in the standard form of RBFNN. When there is a complex function
available for mapping, this can be an issue. Along with output layer weights, two
more kernel function variables, centers and the spread of, were investigated in this
work. This simplifies and expedites the mapping process. Equation 3 shows how to
define the error output.
2
1 1 N
J (n) = |E(n)| =
2
Od (n) − w Z (n)ϕ{ p(n), k z (n)} (3)
2 2 z=1
The major difference which makes the differential evolution different and efficient
is its mutation strategy which is defined by the differential vector terms. Apart from
the mutation strategy carried by DE defined in Eq. 4 which is called DE/rand/1,
numbers of other variants existed as shown by Eqs. 7–9. In Eq. 7, there are two
difference vectors and the basis member is selected randomly hence called DE/rand/2.
In Eq. 8, the base member is the best member of the present generation and one
difference vector exist hence called DE/Best/1. In a similar way, strategy in Eq. 9 is
called as DE/rand to best/1.
M Vi = Mr 1 + F Mbest(g) − Mr 2 + F(Mr 3 − Mr 4 ) (9)
The DE/rand/1 strategy is very efficient and has a high level of exploration capa-
bility because all the component members are selected randomly from the population.
But the problem with the strategy is not exploring the region extensively which may
carry the better solution, and hence, there is either slow convergence or a chance of
missing the quality solutions. The strategy of DE/rand/2 generates the extra pressure
of carrying the differences from others and can be a cause of premature convergence.
DE/best/1 strategy provides a center of exploration around the best solution only and
can cause of trap in the local solution. In a similar way, DE/rand to best/1 tries to
increase the level of exploration by a random differential vector along with the best
member-based difference vector which seems to help in exploitation. But having a
differential vector always concerning, the best solution can cause dominancy over
the random exploration, and as a result, suboptimal convergence appeared. To over-
come this issue, a hybrid approach of mutation strategy has been proposed in this
work called probabilistically best solution directive differential evolution (PBDDE)
as shown in Eq. 10.
⎧
⎨ Mr 1 + F Mbest(g) − Mr 2 if rand < Thr
M Vi = else (10)
⎩
Mr 1 + F(Mr 2 − Mr 3 )
The use of the DE has been applied to evolve the optimal parameters of basis
function and output layer weight for the RBF architecture. Considering RBF archi-
tecture has two input nodes, two hidden nodes, and one output node with Gaussian
function as the basis function. As result, there were four centers of Gaussian func-
tion, two values for function spread, and two values of output layer weights, and
Differentially Evolved RBFNN for FNAB … 649
C σ Wo
Error Differen al
Targets
func on Evolu on
hence, there is a total of 8 parameters that have to explore which is also defines
the problem dimension. Hence, a solution representation in DE carried an array of
size 8 numeric in which the first 4 values represented basis function mean, next 2
values represented the spread of Gaussian function while last 2 values represented
the output layer weights as shown in Fig. 1.
The complete functional block diagram of the proposed DEARBF has shown in
Fig. 2. First, a random population contained the number of solution members having a
defined length (equal to the parameters in RBF architecture) has been selected. From
the individual solution, the centers, spread, and output layer weights are extracted and
applied to RBF architecture where for the given inputs the output is generated with
available parameters in RBF. The DE has evolved the corresponding offspring, and
the fittest among parents and offspring were selected as a next-generation member.
Like this whole process repeated for all other available members in the population
to form the next-generation population. Once the next generation has obtained, the
above-defined process is repeated to evolve further.
650 S. P. Gadige et al.
There were two different sections under which detailed experiments have been
presented. In the first part, the efficacy of the proposed differential evolution in the
evolving RBF architecture has been tested over a benchmark classification problem of
XOR which carried the nonlinear characteristics. Different number algorithms have
been considered for comparison purposes. First, the static RBF has been considered to
see the limitation associated with the fixed value of basis function parameters. The
gradient-based approach has been applied further to make the RBE self-adaptive
where along with weights values, basis function parameters also change with itera-
tions. The performance of gradient algorithms heavily depends upon the value of the
learning rate; hence for three different values of learning rate, performances have been
evaluated. A later different form of mutation strategies in the differential evolution has
been applied, and performance comparisons have been given. In the second section,
FNAB-based breast cancer category has been obtained. The completer experimental
work has developed in the MATLAB environment.
The problem of XOR classification is well known and generally being used to test
the developed algorithms. The nonlinear characteristics make the problem difficult
to solve with a lesser number of hidden nodes. In this work, least optimum size
of architecture [1, 2] has been considered which includes the two input nodes, two
hidden nodes, and one output node. Ten different algorithms, namely static RBF
(SRBF), gradient algorithm-based self-adaptive RBF with three different learning
rates Gr1 (learning rate 0.1), Gr2 (learning rate 0.5), and Gr3 (carried learning rate
of 0.9), have been considered to get the detail effect of learning rate over performance.
Differential evolution carried the population size of 100 with mutation factor F equal
to 0.5 and crossover rate CR equal to 0.9 have been considered for all the cases.
In this work notations, DE1 represents DE/rand/1, DE2 represents DE/rand/2, DE3
represents DE/best/1 while DE4 represents DE/rand to best/1. The allowed numbers
of iterations for all the algorithms were 1000, and over 10 independent trials the mean
and standard deviation have been estimated. The performances of convergence have
shown in Fig. 3. It can observe that except, DE1 (DE/rand/1), jDE, and proposed
PBDDE none of the algorithms have been converged for all the 10 trials. It can also
be observed that the performance of PBDDE was better. For the given input {[0, 0]
[0, 1], [1, 0] [1]} and the obtained mean output and standard deviation in that with
all algorithms have shown in Table1. The success rate has also been included. The
proposed form of mutation strategy has shown excellent performances against all
others. The evolution in the integer domain has also been applied by PBDDE, and
the performance of mean convergence over 10 trials has shown in Fig. 4. The best
Differentially Evolved RBFNN for FNAB … 651
-1
Log 10 (MSAE)
SRBF
-2 Gr1
Gr2
Gr3
-3 DE1
JDE
DE2
-4 DE3
DE4
PBDDE
-5
0 100 200 300 400 500 600 700 800 900 1000
Iteration
parameters evolved for self-adaptive RBFNN under real and integer domains have
also shown in Tables 2, 3, and 4.
Generally, a cytologist after observation of lesion defines binary value for 10 different
attributes. The considered attributes are intracytoplasmic lumina, cellular dyshesion,
3D epithelial cell clusters, bipolar naked nuclei, foamy macrophase, nuclei, nuclear
pleomorphism, nuclear size, necrotic epithelial cells, apocrine change. The age of
the patient has also been considered. In the considered data set, there were a total of
692 instances of data out of which randomly 200 instances have been considered for
the training while the remaining has been considered for the test purpose. The size
of RBF neural architecture has taken as 11 input nodes, 6 hidden nodes, and 1 as
an output node. The obtained mean performances by PBDDE over 10 trials for real
and integer domains have shown in Fig. 5. The performance in terms of sensitivity
and specificity also has been estimated for each case and has shown in Tables 4 and
5 correspondingly. It can observe that proposed PBDDE for both real and integer
domain has not shown very satisfactory performance which can be appreciable for the
application like health care. The best-evolved RBF center position of basis function
available at different hidden nodes in the real domain has shown in Table 6 while
corresponding spread and output layer weight values have shown in Table 7. In
similar manner, the evolved best in the integer domain has shown in Table 8 for
centers position and Table 9 contains the spread and output layer weights.
652 S. P. Gadige et al.
Table 1 Mean and {std.deviation} output along with success rate over 10 trials
Algorithms Output Success rate (%)
5 Conclusion
The proposed work has provided a module in the CAD facility in health care to
detect breast cancer. Complexity in formulating the relationship among the various
cell parameters has been proven easy with the help of knowledge-based compu-
tational intelligence. The knowledge learning has been done through RBF neural
network. Instead of a deterministic approach to learning, evolution-based learning
has been proposed to improve performance. The learning capability of the gradient-
based approach has shown limitations in optimal convergence while the differen-
tial evolution-based stochastic search has given the optimal and faster convergence.
The proposed mutation strategy is carried the probabilistic selection in defining the
Differentially Evolved RBFNN for FNAB … 653
0.6
MSAE
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Iteration
Table 2 Best RBF evolved real parameters value by PBDDE over XOR problem
Algorithms Center value Spread O/p layer weight
Input HN1 HN2 HN1 HN2 w1 w2
X1 1.0736 −0.1743 0.2870 0.2721 1.5225 1.5112
X2 0.1709 1.0134
Table 3 Best RBF evolved integer parameters value by PBDDE over XOR problem
Algorithms Center value Spread O/p layer weight
Input HN1 HN2 HN1 HN2 w1 w2
X1 5 −2 −3 2 3 2
X2 −1 2
Table 4 Performances by PBDDE evolved RBF for breast cancer in real value domain
Training data Test data
Error Sensitivity Specificity Error Sensitivity Specificity
Best 0.0500 0.9130 0.9695 0.0427 0.9096 0.9816
Mean 0.0625 0.8580 0.9794 0.0811 0.8428 0.9577
random differential vector, and best member directive differential vector has shown
very good balancing in exploration and exploitation. The RBF has also evolved in the
integer domain which can have numerous advantages in terms of low arithmetic cost
and less susceptibility to the noise data in the training. The proposed approach can
further be integrated with mammogram outcomes to make the final decision outcome
robust and accurate.
654 S. P. Gadige et al.
Fig. 5 Convergence 70
characteristics PBDDE
iPBDDE
PBDDE-based evolved
60
self-adaptive RBF for
FNAB-based cancer
detection in real value and 50
integer value domain
MSAE
40
30
20
10
0 10 20 30 40 50 60 70 80 90 100
Iteration
Table 5 Performances by PBDDE evolved RBF for breast cancer in integer value domain
Training data Test data
Error Sensitivity Specificity Error Sensitivity Specificity
Best 0.0650 0.8841 0.9618 0.0813 0.8855 0.9356
Mean 0.0770 0.8246 0.9748 0.0925 0.8199 0.9521
Table 6 Evolved best center values by PBDDE for RBF for breast cancer in real domain
İnput Basis function mean value over hidden nodes
H1 H2 H3 H4 H5 H6
X1 2.3463 2.2099 10.5743 −19.7346 3.2959 20.5460
X2 −0.6736 2.5770 12.2502 −3.5754 −14.3008 −2.3810
X3 11.6818 −19.6153 −4.1456 5.5049 −11.1878 27.6747
X4 −4.6449 9.6686 −26.4811 −12.4404 −12.4535 16.6607
X5 2.2615 −4.4348 1.1222 −14.2182 0.6125 −12.5923
X6 7.9362 20.0116 5.0546 −27.2334 2.9395 −1.7550
X7 −5.6084 −24.8404 −8.7345 3.3618 −24.8347 9.3254
X8 20.8940 25.1380 −21.4681 12.8040 12.3531 6.6946
X9 12.7413 −15.4768 13.6615 −1.7509 −3.1255 14.7529
X10 −4.9205 −0.2314 −2.0485 5.1427 23.6456 7.8693
X11 12.9655 6.4217 −20.3528 4.3555 −5.0317 −6.5937
Differentially Evolved RBFNN for FNAB … 655
Table 7 Evolved best spread and o/p weight values by PBDDE for RBF for breast cancer in real
domain
Basis fun Hidden node
H1 H2 H3 H4 H5 H6
Spread −9.1167 −8.7138 14.6923 −5.7163 −1.7460 −25.9035
wo 9.4985 22.9837 11.9000 13.0526 4.9120 10.2559
Table 8 Evolved best center values by PBDDE for RBF for breast cancer in integer domain
Input Basis function center value over hidden nodes
H1 H2 H3 H4 H5 H6
X1 6 21 3 11 44 37
X2 54 −113 16 4 −66 23
X3 9 −3 47 20 −36 −12
X4 5 −10 35 −18 72 −18
X5 −57 −7 32 −7 −16 9
X6 39 12 58 28 −50 −5
X7 39 −13 54 30 −41 −25
X8 10 −39 123 6 −17 23
X9 −22 −97 46 48 −57 36
X10 19 12 43 4 −18 7
X11 −29 −20 −34 −49 20 −23
Table 9 Evolved best spread and o/p weight values by PBDDE for RBF for breast cancer in integer
domain
Basis fun Hidden node
H1 H2 H3 H4 H5 H6
Spread 33 −31 61 −44 −30 22
wo 2 21 22 21 −2 39
Acknowledgements This research work has been completed in Manuro Tech Research Pvt. Ltd.,
Bangalore, India, under the program of Computational Intelligence in Health Care (CIHC).
References
23. Saba T (2020) Recent advancement in cancer detection using machine learning: systematic
survey of decades, comparisons and challenges. J Infect Public Health 13(9):1274–1289
24. Mohammed SA, Darrab S, Noaman SA, Saake G (2020) Analysis of breast cancer detection
using different machine learning techniques. In: Tan Y, Shi Y, Tuba M (eds) Data mining and big
data. DMBD 2020. Communications in computer and ınformation science, vol 1234. Springer,
Singapore. https://doi.org/10.1007/978-981-15-7205-0_10
A Real-Time Face Mask Detection-Based
Attendance System Using MobileNetV2
Abstract COVID’19 has pushed the global trade and commerce in phase of reces-
sion. A dramatic loss has been seen in the gross domestic products (GDP) of many
nations worldwide. To set the economies back on track, nations all over the globe
are pulling back the full lockdowns and taking a step forward to help businesses and
economy. In order to secure a triumph over it, wearing a protective mask should be the
new normal. Hence, because of need and compulsion of a mask, the task of detecting
it on the face has become vital for all of us. A very simple image classification model
is trained by us with the help of fine machine learning libraries like Tensor Flow and
Keras, accompanied by MobileNetV2 neural network architecture. Here, live video
can be taken as an input by the webcam, which later on predicts whether face present
in the ROI (Region of Interest) has mask or not. The system first detects the face of
a person and then identifies whether mask is worn or not. Anyone’s face with mask,
which is also in motion can also be detected with the help of this system. Instead of
detecting only single face in a frame, our system is also capable of detecting multiple
faces and masks on them, whose presence will be depicted in a tabular manner.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 659
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_49
660 K. Rathod et al.
1 Introduction
Novel Coronavirus has affected the lives of humans at its worst. Senior citizens and
people having some respiratory complications are on higher risk factor. Earlier on
March 11, 2020, COVID-19 was declared a pandemic by World Health Organization
(WHO), with nearly 3 million cases, followed by 2,07,973 deaths across two hundred
thirteen countries and territories worldwide [1]. According to the scientist, there are
huge family of such deadly viruses which are already present around us. Coronavirus
is just one of several of them which has infected humans. When novel Coronavirus
started making people ill in late 2019, scientists gave it a name of coronavirus, which
is also called SARS-CoV-2 by experts [2]. After getting attached to a human cell and
getting inside it, the virus makes copy of the RNA to get spread everywhere. But if
some mistake is made, the RNAs gets changed and the scientists calls that mutation
[3]. After mutating to different variants, Coronavirus got new names with updated
effects namely B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma), B.1.617.2 (Delta) [4].
All these types of variants were found at different places throughout the globe, which
caused huge destruction to the affected countries. Out of all, the Delta variant did the
worse to India, more than 2 lakh people died as a result of COVID’s second wave in
India, which includes more than 2000 deaths each day [4]. It was like a nightmare
for all.
To prevent various respiratory problems, including that of caused by COVID’19,
to wear a mask has become vital. World Health Organization (WHO) points on
making face mask as priority for mainly health care frontiers, and for all the citizens
of a country. Many countries mandated wearing a protective mask outdoors. Also,
during the second wave [4], WHO even recommended people to wear double mask
as the strain was becoming too deadly. As a result of it, detecting face mask became
an important responsibility to the society. Our model involves detection of where
someone’s face is located and then determining if mask is worn or not. The result
will be showed in a table, where beside the name, the value ‘1’ = with_mask and
the value ‘0’ = without_mask. Identification of face categorically can deal with
distinguishing some specific entity group, i.e., Face. This paper clearly presents a
simple idea to attend the previously mentioned purpose using some of important
libraries such as Keras, OpenCV Scikit Learn and TensorFlow. Convolutional neural
network architecture namely MobileNetV2 was used here which is a very effective
feature extractor for object detection and segmentation [5].
A Real-Time Face Mask Detection-Based Attendance System … 661
2 Methodology
Fig. 1 Flowchart
662 K. Rathod et al.
3 Related Work
First of all, we started with collection of data set. For this, we collected some data
which includes two categories of person: with_mask and without_mask. A total of
3833 images are there, out of which 1915 includes people wearing mask and the rest
1918 comes in the category of without mask (Figs. 2 and 3).
The steps of data preprocessing convert the given data to a format which is very
friendly to user as well as meaningful. The data could be of any form including
images, tables, graphs, videos and more [6]. Firstly, in order to get started with the
training part, we need to get a dataset for the model which provides good range of
accuracy. Those data can be imported in our model with the help of various Python
libraries. Many a times, we can encounter some missing data in the dataset, in order
to deal with it, we need to delete that whole row or can take mean value of all the
data present. As we already know, our dataset is characterized into with_mask and
without_mask parts. In the part of preprocessing, different steps such as re-shaping,
resizing are applied, followed by vectorization of images to NumPy arrays. Resizing
of image is critical part of preprocessing, smaller the image, better tendencies of
running well. Here, our images are resized till 224*224 pixels, then converted to
A Real-Time Face Mask Detection-Based Attendance System … 663
format of an array and afterward we scaled the intensities of pixel in the image of
input to range [−1, 1]. At last, input can be preprocessed with the help of images
using MobileNetV2.
Convolutional neural network is used by the model of face mask detection. Any
visual imagery can be analyzed by this deep neural network model. Data which is
image’s form is taken as input, all the data is then captured and sent to neuron
layers. MobileNetV2 is used here as convolutional neural network architecture.
MobileNetV2 architecture is a network model which uses depth wise separable
convolution as its basic unit. Its depth wise separable convolution has two layers:
depth wise convolution and point convolution [7]. MobileNetV2 architecture contains
the initial fully convolutional layer with 32 filters, following 19 residual bottleneck
layers [8] (Fig. 4.)
Training Part:
To achieve the main goal of face mask detection [9], we started by importing various
necessary libraries.
664 K. Rathod et al.
that we performed fine tuning with the help of mobile net [10] v2architecture. Later,
after the completion of fine tuning, we finally trained the model to detect the face
mask. Lastly, we will plot accuracy and loss curves graph to know the total accuracy
rate and also loss ratio.
Testing and Implementation:
After training, the testing part has been performed:
To begin with, we loaded the images from dataset. Then, we focused on detecting
the faces in the image. After that we had applied our model to classify whether the
detected face is contains mask or not mask. The function detect_and_predict_mask
will detect the faces and after that it will apply our model to each face Region of
Interest, i.e., ROI. Once the face ROI is extracted and the part of preprocessing is
done finally, we are ready to test our project.
While training the model, in order to check the loss and accuracy, we have
performed 20 epoch iterations. It was analyzed from it that right from the second
epoch, an increase in accuracy was noticed, while on the other hand, value of loss
was seen decreasing. Also, once the line of accuracy got stable, there were no further
need to iterate more in order to increase the model’s accuracy (Figs. 5, 6, 7, 8, 9).
In the above Table, ‘1’ refers to presence of mask, while ‘0’ refers to absence of
mask.
The face mask detector model has been trained and tested upon chosen dataset.
According to that set of data, our method has the accuracy up to 98–99%.
Having a look at the following figure, little signs of overfitting can be observed,
with loss of validation lower than the training loss. Hence, we can analyze that our
model could also generalize well to other images which are not the part of our dataset
(Fig. 10).
A Real-Time Face Mask Detection-Based Attendance System … 667
In this paper, with the help of basic machine learning tools and libraries the method
has achieved highest accuracy. This method can be integrated into public health care
centers. The technique can be used in variety of applications. Finally, the work opens
interesting future directions for researchers. In future, may be wearing a mask will
be mandatory because of this COVID-19 crisis. This model will help to identify if
the person is wearing the mask or not and whether it’s properly wearing, etc. Thus,
we have created a model that is capable of detecting whether a person’s face is
covered with face mask or not with the help of some machine learning libraries such
as OpenCV, Keras, TensorFlow, etc. A two-class model has been trained of people
who are not wearing face masks and ones who are wearing it. A classifier having
approximately ~99% accuracy is obtained by fine tuning MobileNetV2 on the face
mask/no face mask dataset. Model was taken by us and was applied to both real-
time streams of video and images by: Detecting one or more face in images/video,
extracting each individual face and applying the face mask classifier. It is also capable
of showing the results in a tabular form. It should be made more flexible such that
wherever this system is implemented whether in malls, or any public places it should
inform the authority or should alert that person who has not covered the face with
the help of alarm or buzzer beep.
References
1. World Health Organization (2020) Naming the coronavirus disease (COVID-19) and the virus
that causes it. Braz J Implantology Health Sci 2(3)
2. Gage A et al (2021) Perspectives of manipulative and high-performance nanosystems to manage
consequences of emerging new severe acute respiratory syndrome coronavirus 2 variants. Front
Nanotechnol 3:45
3. Campbell F et al (2021) Increased transmissibility and global spread of SARS-CoV-2 variants
of concern as at June 2021. Eurosurveillance 26(24):2100509
4. Bhattacharya A (2021) COVID-19: deaths in 2nd wave cross 2 lakh at daily average of over
2,000. The Times of India [online]
5. Modi S, Bohara MH (2021) Facial emotion recognition using convolution neural network. In:
2021 5th International conference on intelligent computing and control systems (ICICCS).
IEEE
6. Wang W et al (2020) A novel image classification approach via dense-MobileNet models.
Mobile Inf Syst 2020
7. Venkateswarlu IB, Kakarla J, Prakash S (2020) Face mask detection using MobileNet and
Global Pooling Block. In: 2020 IEEE 4th conference on information & communication
technology (CICT), pp 1–5. https://doi.org/10.1109/CICT51604.2020.9312083
8. Ejaz MS, Islam MR (2019) Masked face recognition using convolutional neural network.
In: 2019 International conference on sustainable technologies for industry 4.0 (STI), pp 1–6.
https://doi.org/10.1109/STI47673.2019.9068044
A Real-Time Face Mask Detection-Based Attendance System … 669
9. Jiang M, Fan X, Yan H (2020) Retinamask: a face mask detector. arXiv preprint arXiv:2005.
03950
10. Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H
(2017) MobileNets: efficient convolutional neural networks for mobile vision applications
A New Coded Diversity Combining
Scheme for High Microwave Throughput
Abstract Multiple copies of signals received via different diverse branches are
combined using appropriate combining techniques. Future wireless microwave
communication systems will face two major challenges: high throughput and long
distance. Multiple copies of signals received via different diversity branches are
mixed using appropriate combining techniques. Network coding (NC) communica-
tion has recently piqued researchers’ attention. It combats multipath fading using
techniques in which many microwave stations improving capacity. This paper inves-
tigates a New Coded Diversity Combining Scheme (CDCS) for large throughput
in microwave point-to-point link. To reach this goal, we exploit the NC at each
microwave station. The suggested CDCS improves microwave transmission link
quality and reliability, avoids signal deterioration, and increases throughput. We
demonstrate that our CDCS outperforms several widely used method combinations,
improving throughput by 70% and 50%, respectively, compared to typical technique
and cross-polarization interference cancelation (XPIC).
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 671
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_50
672 Y. Lamrani et al.
given the same frequency. When the available frequencies are limited, the frequency
is assigned twice on the same route using both polarizations.
In contrast to traditional methods, which combine input flows according to prede-
fined schemes established a priori, we explore a CDCS in which microwave stations
mix packets using random or opportunistic network coding methodologies. While the
latter may be more efficient, they may cause buffering in the coding nodes, lowering
throughput, and receiver sensitivity.
The primary advantage of our approach is that it employs a new microwave recep-
tion combination technology that delivers high throughput for enhanced transmission
link durability while using current methods.
After introducing the main concepts in our methodology, we go through the
problem formulation and remedy in Sect. 2. The system moNew Coded Diversity
Combining Scheme for High Microwave Throughputdel is presented and discussed
in Sect. 3. The typical diversity combination techniques and the proposed combina-
tion approach are explained in Sect. 4. Section 5 compares these various methods
showing the better technique. Section 6 brings this study to a conclusion.
over long distances, and the adaptive coding and modulation technique cannot effec-
tively handle interference. Furthermore, sea surface reflection and strong multi-
path fading may readily degrade microwave connection transmissions. The usage
of CDCS provides for a much cheaper cost than undersea fibers.
We suggest an effective diversity combination method for long-range point-to-
point microwave wireless connections in this letter, which may be used in XPIC to
improve their performance. This method may be used for long-distance line-of-sight
transmission and improves transmission throughput.
The suggested CDCS with NC increases the transmission system’s capacity, resulting
in high overall connection spectrum efficiency.
The proposed CDCS system architecture is depicted in Fig. 1.
From N terrestrial receiving dishes, the microwave station receives N flows
designated as F1 , F2 , ..., FN . Each bloc represents a source flow corresponding. The
received packets in each generation will be placed in the buffer after demodulation
and demultiplexing; at this point, the proposed CDCS’s NC will be executed.
Co-channel dual-polarization technology (CCDP) is used with XPIC to allow one
RF channel to transmit two service streams simultaneously. The transmitters put out
two electromagnetic waves with polarization orientations orthogonal to the receiver
on the same channel. After XPIC processing removes the interference between the
two electromagnetic waves, the receiver retrieves the two original signal channels. As
a result, using XPIC doubles transmission capacity without changing channels. When
the XPIC line is not in use, the adjacent channel alternative polarization (ACAP)
arrangement is used to broadcast two service signals concurrently across two RF
channels.
Fig. 1 An intermediate microwave relay station is implementing XPIC transmission with double
polarization with the proposed scheme
A New Coded Diversity Combining … 675
The additional buffer may be employed at the coded microwave stations to create
a linear combination of input packets rather than just forwarding them. The suggested
method allows the flexible combination at the intermediate coding station and relies
on packet checksums at the receiver to recover the original packets. Packet checksums
are also produced for the receiver to ensure that the original packets are recovered.
We focus on increasing end-to-end throughput in the worst-case scenario. Each input
flow Fi represents a packet bloc Bi , i = 1, 2, . . . , N .
Several combination techniques are known, but we only present in this study the
three main techniques used then we detail the proposed CDCS.
where m denotes the number of paths to the receiver. The average of ws is therefore:
m
1
E[ws ] = wa (2)
k=1
k
Therefore, this method corresponds to selecting the path with the best SNR.
Maximum-ratio combining. In this technique, it is required that the m links be
aligned in phase and weighted proportionally to the signal level before their summa-
tion. This method, therefore, corresponds to the summation of all the branches with
676 Y. Lamrani et al.
weights calibrated according to the SNR. The distribution in this method follows (3).
m
(wm /wa )k−1
Pr [w ≤ wm ] = 1 − e−wr /wα (3)
k=1
(k − 1)!
E[wm ] = m − wa (4)
Applied CDCS is a random coding technique in terms of packet arrival dates, which
minimizes the time spent by packets in the queues of microwave stations. It uses the
concept of block code, it is based on the definition of network encoding, and it can
reduce the waiting times of packets in queues by forwarding them without waiting
for all packets in the same block.
Consider a directed graph G = (V, E) representing a wireless transmission
liaison, where V = {v1 , v2 , . . . , v N } is a vertex set. E is an output edge.
We consider a microwave station 1 with N input flows and one output flow (Fig. 1).
We assume that for each input flow F1 , F2 , . . . , FN is considered as a bloc of packets,
the deadline for the arrival time of the packet pi , i = 1, 2, ..., n of the bloc B J , j =
1, 2, ..., N is known, and that the Rx and Tx stations are synchronized. Each link
has a capacity of C1,2 (bits/s); i.e., a packet of L bits can be transmitted in at least
L/C1,2 seconds.
NEach nflow isj constituted by n packets. The packets of all the blocs are given by
i=1 j=1 pi , where N is the block number, and n is the number of packets in each
j j
block. We thus have F1 = p11 , p12 , ..., p1n , so F1 = nj=1 p1 , F2 = in p2 , …, and
A New Coded Diversity Combining … 677
N j
FN = i=1 p N . These microwave flows waves arrive via the edges {e1 , e2 , ..., e N }
of station 1, as shown in Fig. 1.
Suppose that the packets in a given bloc B J arrive at the coding station 1 at time
t. The buffer is created to store the data of a received data packet bloc.
In typical receivers, the data is received without applying any coding technique.
In our scheme, the packets undergo network coding operations [18], identifying and
reassembling the original error-free data from sent messages.
Consider the case where packets from flows are combined in station 1 (Fig. 1) to
produce a combined output packet flow:
N
n
j j
out
pcoded = ⊕αi pi (6)
i=1 j=1
We compare the suggested CDCS to the presently utilized methods to assess its
performance improvement. Our proposed CDCS method significantly increases the
throughput of the microwave connection under challenging transmission circum-
stances, as shown in Fig. 2.
The throughput for the different modulation types was significantly improved, as
seen in Fig. 2. When compared to 256 QAM modulation, the maximum 4096 QAM
678 Y. Lamrani et al.
Fig. 2 Performance throughput evaluation and comparison of two modulation formats 256-QAM,
4096-QAM (TDCT), and 4096-QAM (CDCS)
Fig. 3 Performance evaluation and comparison of throughput between the TDCT and the proposed
CDCS under various modulation formats
Fig. 4 Performance evaluation and comparison of throughput between the TDCT, XPIC, and the
proposed CDCS under various modulation formats
680 Y. Lamrani et al.
Thus, since CDCS also outperformed the typical method and XPIC by 70% and 50%,
it eliminated all kinds of interference efficiently and rapidly.
In comparison with the current methods illustrated in Fig. 4, the suggested scheme
improves the throughput by 4.36 times and 2.18 times, compared to the typical tech-
nique and XPIC, respectively. CDCS uses the technique of not waiting for all packets
to arrive at the microwave station before removing interferences, which results in
these benefits. When greater symbol rates were employed, however, the power effi-
ciency of these modulations rapidly deteriorated. Small variations in frequency at
higher symbol frequencies are to blame for the deterioration.
Figure 5 shows the capacity evolution for various modulation types under the
recent technique XPIC and the proposed CDCS.
CDCS outperformed the state-of-the-art techniques by 12.6 times, 8.5 times, and
4.8 times for 256 QAM, 4096 QAM, XPIC, respectively.
6 Conclusion
The CDCS presented in this paper is an excellent option for overcoming the signif-
icant difficulties long-distance connections face, especially when the largest single-
hop link spans up to 160 km. The CDCS improves the performance of the channel
avoiding interference. CDCS efficiently eliminates unnecessary waiting time in the
buffer CDCS. The deployment of EDCT for medical applications will be the focus
of future research.
A New Coded Diversity Combining … 681
References
1 Introduction
In modern days, the internet and social media are loaded with an abundance of infor-
mation. Any form of information is reduced to only what is necessary to understand
the topic. For this, one of the tools implemented is the text summarization software.
Applications of summarization are:
• To get the highlights of a newspaper
C. P. Chandrika (B)
Department of Computer Science and Engineering, M S Ramaiah Institute of Technology,
Bangalore 560054, India
e-mail: chandrika@msrit.edu
J. S. Kallimani
Visvesvaraya Technological University, Belagavi, Karnataka, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 683
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_51
684 C. P. Chandrika and J. S. Kallimani
2 Related Works
separately, and the summaries LSA and Fusion techniques, Obtained promising results and
generated by both techniques are respectively. This approach can be can be tried with different set of
combined using set operations and combined with other techniques in articles
final summary is generated future to get better accuracy
[5] To implement text classification on Different machine learning Results of all the models are Text classification is one of the
AG’s News Dataset using various algorithms are used for training compared, and the accuracy of all important tasks to perform
machine learning algorithms and testing the classification model the algorithms are more than 80%. summarization, the ideas can be
The same model can be trained and used to build models for other
tested with deep learning techniques regional languages also
in future
(continued)
685
Table 1 (continued)
686
text summarization for model with text ranking algorithm by using this model, and average of is a tedious task; there is a scope
the Malayalam text 51% sentences are similar between for the researcher to look into
machine and human-generated these aspects
summary
687
688 C. P. Chandrika and J. S. Kallimani
This section discusses the entire process from collecting the dataset, data prepro-
cessing to applying the text ranking algorithm. The Kannada dataset is collected
from various articles published over the Internet for the summarization. The kind
of data varies from sports and mobile reviews to life skills.
The text rank algorithm is a derivative of page rank algorithm in which the web pages
are ranked by the number of times visited and the number of pages which are linked
to them. This algorithm is the heart of summarization model, and the following are
the necessary steps to be carried out before applying the text ranking algorithm and
same is demonstrated in Fig. 1.
• The first step would be to concatenate all the text contained in the articles
• Then split the text into individual sentences
• In the next step, we will find vector representation (word Embeddings) for each
word and every sentence
• Similarities between sentence vectors are then calculated and stored in a matrix
• The similarity matrix is then converted into a graph, with sentences as vertices
and similarity scores as edges, for sentence rank calculation
• Finally, a certain number of top-ranked sentences is represented as a final
summary.
A brief discussion about the perquisite tasks to be carried out before applying the
text ranking is given below:
Text Preprocessing:
The Kannada text processing is quite complex due to its richness in morpho-
logical structure. The same language is spoken in different ways by different
region of the Karnataka. Processing this language is not so easy compared
to English language. Dataset should be clean before the processing to make
noise-free and efficient during processing. Few words in the dataset are iden-
tified as stop words and these words does not contribute any meaning to the
process, example: {Mattu}[and], {Haagu}[also], {Avaru}[those
people], {Bagge}[about], {Aadare}[but], {Avarannu}[them],
{Thamma}[them], {Ondu} [single], {Endaru} [said], {Mele}
[above], {Helidaru} [said], {Seridante} [including], {Balika}
[afterward] etc. So, these words are removed from the dataset using the Kannada
stop words file and this file has around 200 words. After this, we obtained a clean
dataset.
Word Embeddings:
Model does not work on the raw Kannada words; these words should be represented
in numerical form, and this is done by using word embedding representation which is
available online from Wiki word vectors. It is a huge file which has 300 dimensions for
all the Kannada words. Each dimension represents usage of a word in different ways.
Example for the word {Echharike} [careful]: the following are the dimension
values (only 13 dimension values are shown as samples).
−0.41236, 0.082654, 0.64355, −0.2292, −0.043449, 0.20626, −0.039713, −
0.28893, 0.32872, −0.12913, −0.12276, −0.042511, 0.12026, −0.35143, 0.091768,
0.44619, −0.25532, 0.23058, −0.23937, 0.073449, −0.28712, 0.41471, −0.052678,
0.094267.
The Python code is written to obtain 300-dimension values for all the words
in the dataset. Word embedding vector has 400 MB of the Kannada words in 300
dimensions.
Once numerical value of each word in the sentence is obtained, then each sentence
is represented in numeric form called as vector for the further processes. It is created
by using the simple Eq. 1.
(sum(numerical_values_of_individual_word))
vector_each_sentence = (1)
total_no_of_words_sentence
690 C. P. Chandrika and J. S. Kallimani
S1 · S2
cosine_similarity_kannada = (2)
||S1||×||S2||
S1 S2
S1[0 0.35831465]
S1 to S1 similarity is 0 but between S1 and S2, the value is 0.35831465; like this,
similarity matrix is generated for all the sentences. The maximum value indicates
high similarity between the sentences. Once the sentence vector is ready, next step
is applying the text rank algorithm. This algorithm will work on the graphs, so there
is a need to create a similarity graph as shown in Fig. 2 (sample 11 sentences are
considered). In the proposed work, sentences are considered as nodes and similarity
scores between sentences as edges. The below graph shows that for a given text,
summary is obtained in thirteen sentences. The graph indicates how one sentence is
linked to another. If sentence S1 has some words present in S2, then there will be an
edge between S1 and S2.
Extractive Text Summarization of Kannada Text Documents ... 691
Page rank is one of the popular algorithms used to generate important pages. It
prioritizes a webpage based on how many links have been pointing to it. Similarly in
our work, it produces important top important sentences. Initially, the text rank for
all the sentences is equal to 1/11 where 11 is the total number of nodes. In the next
iteration, for a given node 0, it identifies how many nodes are pointing to it. Consider,
node 2 and 3 are pointing to node 0. Also, assume total number of outgoing links for
2 = 2 and 3 = 3, then rank of node 0 is calculated as shown in Eq. 3.
1 1
Rank of Node 0 = 11
+ 11
(3)
2 3
ROUGE 1: This metric considers the unigram feature. Table 2 shows the results
of the same.
ROUGE 2: It also calculates results same as ROUGE 1, but it is based on bi-gram
of words. The results produced by this metric is shown in Table 3.
ROUGE-L: Longest common subsequence between two sequences of texts,
mainly indicates how long the similarity between two statements. If it is more, then
two statements are more similar. Table 4 shows the Rouge-L metrics of our model.
Comparisons with other techniques:
From the survey we have carried out, performance of summarization techniques used
by other models has been tabulated in Table 5.
In the proposed model, accuracy, precision, and recall using ROUGE metrics have
been calculated, while others have evaluated differently. ROUGE 1 shows better
values compared to ROUGE 2. We obtained 65%, 70%, and 64% of average F1
score, recall, and precision values respectively, which is better compared to other
techniques listed in Table 5. This is due to the lack of data processing tools in
Kannada language. If the model is trained with a good set of processing tools, a huge
and different datasets, the results might get improved. A sample snapshot of the
summary is shown in Fig. 3
Extractive text summarization gathers the important sentences from a given docu-
ment. We have used a text rank graph-based algorithm which is similar to Google
Page rank algorithm. Here, the sentences are similar to the web pages. A GUI appli-
cation is also developed to upload a Kannada document and to take the input from
the user that how many lines he wants to see in the summary. Once the original
and summarized document is compared, it is found that the summarized document
contains important lines without losing the meaning in the original document. The
result is satisfactory. The proposed work considers documents with an average size
of 5000 lines from different documents. Since we are using word embeddings of
300 dimensions for constructing the vector for a single word, instead of that it will be
good to identify the correct POS tagging value of that word. Similarity calculation
is based only on the cosine function, other techniques can be used to test the perfor-
mance of the proposed model. These limitations can be considered as our future
works.
694 C. P. Chandrika and J. S. Kallimani
References
1. Mamidala KK, Sanampudi SK (202) Text summarization for Indian languages: a survey. Int
J Adv Res Eng Technol (IJARET) 12(1):530–538. https://doi.org/10.34218/IJARET.12.1.202
1.049
2. Anusha BS, Ramesh D, Uma D, Lalithnarayan C (2019) Multi-classification and automatic text
summarization of Kannada news articles. Int J Comput Appl 181(38):24–29. ISSN: 0975-8887
3. Shilpa GV, Shashi Kumar DR (2019) Abs-Sum-Kan an abstractive text summarization tech-
nique for an India regional language by induction of tagging rules. Int J Recent Technol Eng
(IJRTE), ISSN: 2277-3878
4. Swamy A, Srinath S (2019) Fused extractive summarization approach for Kannada text
documents. Int J Adv Sci Technol 28(18):565–580. ISSN: 2005-4238
5. Sunagar P, Kanavalli A, Nayak SS, Mahan SR, Prasad S, Prasad s (2020) News topic classi-
fication using machine learning techniques. In: International conference on communication,
computing and electronics systems: proceedings of ICCCES 2020, pp 461–474
6. Dhawale AD, Kulkarni SB, Kumbhakarna VM (2020) Automatic pre-processing of Marathi
text for summarization. Int J Eng Adv Technol (IJEAT) 10(1). ISSN: 2249-8958
7. Mamidalaa KK, Sanampudib SK (2021) A Heuristic approach for Telugu text summarization
with improved sentence ranking. Turkish J Comput Math Educ 12(3). ISSN: 4238-4243
8. Mohan Bharath B, Aravindh Gowtham B, Akhil M (2021) Neural abstractive text summarizer
for Telugu language. Comput Lang. arXiv:2101.07120
Extractive Text Summarization of Kannada Text Documents ... 695
Yashashree Patel, Panth Shah, Mohammed Husain Bohara, and Amit Nayak
Abstract As the phase of digitization reaches our day-to-day lives, things are easily
available and accessible by the computer, which is quite easier and faster method for
transactions. The pandemic also played a huge role in growth in credit card fraud
activities. And, that has led to massive increase in credit card fraud dramatically.
As a result, fraud detection should include surveillance of the person’s/spending
customer’s attitude in order to determine, prevent, and detect unwanted behavior.
For both online and in-person buying, credit cards are the most convenient way
of payment. Fraud detection is agitated not only with capturing fraudulent activi-
ties, but also with discovering them as early as possible, since this kind of fraud
costs millions of dollars of people. Machine learning algorithms have proven to be
extremely useful in detecting fraud of smart cards. Because of the uneven nature of
regular classification algorithms, they are ineffective in detecting credit card fraud.
Isolation forest algorithm is been used in the proposed scheme, and the local outlier
(Tripathi et al. in J Pure Appl Math 118:229–234, [1]) factor is used to recognize
fraudulent transactions and their accuracy.
Y. Patel (B)
Department of Computer Science and Engineering, Devang Patel Institute of Advance
Technology and Research (DEPSTAR), CHARUSAT, Charotar University of Science and
Technology (CHARUSAT), CHARUSAT Campus, Changa 388421, India
P. Shah · A. Nayak
Department of Information Technology, Devang Patel Institute of Advance Technology and
Research (DEPSTAR), CHARUSAT, Charotar University of Science and Technology
(CHARUSAT), CHARUSAT Campus, Changa 388421, India
e-mail: amitnayak.it@charusat.ac.in
M. H. Bohara
Department of Computer Engineering, Devang Patel Institute of Advance Technology and
Research (DEPSTAR), CHARUSAT, Charotar University of Science and Technology
(CHARUSAT), CHARUSAT Campus, Changa 388421, India
e-mail: mohammedbohara.ce@charusat.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 697
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_52
698 Y. Patel et al.
1 Introduction
Deceit involving a smart card that occurs as a consequence of the card owner’s
personal loss of the card, the card being taken by fraudsters, or as a result of tech-
niques such as phishing, skimmer, identity theft, and so on is referred to as credit card
fraud. Financial fraud of this nature has a significant influence on a country’s commer-
cial, organizational, and government sectors. The rate of fraudulent transactions has
increased in today’s world of cyberspace technology, where credit card purchases
have become the most convenient means of transaction, whether online or offline.
As previously stated, there are two sorts of credit card fraud transactions that might
occur. The first type shows a situation in which the cardholder’s information is leaked
to the bank. The second category depicts credit card theft that occurs when a lost card
goes into the hands of humbugs. Credit card fraud was always thought to be a figure
clad in all black snatching your card from your wallet, but that was before the Internet
erupted into society. Most scammers nowadays do not even require your physical
card. Before applying any machine learning techniques, successful preprocessing of
the dataset is needed. This research takes into account the unequal existence of credit
card data, as well as an isolation forest and the outlier factor, which are both local
characteristics. The most notable advantage of employing this approach for recogni-
tion of deceit in smart cards is that it can operate with large amounts of training data
with ease. Electronic shopping has become a vital and necessary part of our modern
life. Credit card companies must now be able to detect bogus credit card purchases
in order to prevent customers from being charged for things they did not buy. As the
amount of transactions grows, the number of dishonest transactions grows as well
[1]. Such difficulties can be solved using machine learning and related methods. The
purpose of this project is to demonstrate how to use machine learning to model a data
collection. The model is then used to determine whether or not a new transaction is
fraudulent. Our objective is to detect all fraudulent transactions while minimizing the
number of false fraud categories. The major focus was on data analysis and prepro-
cessing, as well as the application of a number of anomaly identification methods,
such as the isolation forest approach and the local outlier factor.
2 Literature Review
Fraud of smart cards has been the subject of a significant amount of research. The
methods created can be divided into two categories, as described below.
Destructive Outcomes of Digitalization (Credit Card) … 699
The technique presented in this work uses the most up-to-date machine learning
algorithms to discover outliers, or unusual behaviors, in credit card fraud detection.
When viewed in depth on a larger scale with real-life elements, the entire architecture
diagram can be depicted as follows. To begin, we obtained our dataset from Kaggle,
a data analysis and dataset-sharing Web site [2]. This dataset has thirty-one columns,
with twenty-eight of them labeled as v1–v28 to protect personal information. In the
other columns, time and number are represented. The interval between the first and
subsequent transactions is represented by the time. The total amount of money that
has been traded is referred to as the sum. Class 0 represents a genuine transaction,
whereas class 1 represents a fraudulent transaction. We use various graphs to look for
abnormalities in the dataset and to visualize it. According to our research, the number
of fraudulent transactions is far smaller than the number of actual transactions [3].
According to the data, the least number of transactions was performed at night, while
the most were made during the day. Only a few transactions come close to predicting
variables and the class variable, and the majority is trivial. The dataset has been
formatted and examined at this stage. To make sure that the evaluation is equal, the
class column is excluded. A sequence of algorithms from modules processes the
data. The isolation forest algorithm and the local outlier factor are added to this data
after it has been fitted into a model [4].
These algorithms are portion of the sklearn library. Classification, regression,
clustering, and outlier detection are included in the sklearn packages [5]. A single
data point that departs considerably from the rest of the data points is considered
anomalous. One example is detecting credit card fraud based on the amount spent. If
an object is anomalous in a specific context, it is referred to as a contextual anomaly.
It is only in this case that there is a contextual abnormality. If some related things
can be observed as an anomaly when compared to other objects, this is referred to
as collective anomalies. A set of items, not a single entity, can be anomalous in this
case.
Anomaly detection can be done using various methods, which also includes super-
vised anomaly detection. A system in which training and test datasets are labeled,
allowing a basic classifier to be trained and applied [6]. This case is alike to conven-
tional pattern recognition, with the exclusion of groups, which are usually highly
unbalanced. Not all classification methods are appropriate for this task. For example,
some decision trees scuffle to deal with unbalanced data. Artificial neural network
(ANN) or support vector machine (SVM) can execute better. This configuration,
however, is unnecessary since we need to be aware of all irregularities and correctly
mark data. Anomalies are not always well known ahead of time in certain situations
or as a result of novelties discovered during the research process. Semi-supervised
anomaly detection is a term used to label the detection of anomalies [4, 7]. We gather
knowledge from training results at first when we do not have any knowledge at
all. This setup often employs training and evaluation datasets, with the training data
consisting solely of standard data free of anomalies. According to the theory, a model
700 Y. Patel et al.
of the average class has already been taught, and discrepancies can be detected by
diverging from it. This technique of classification is referred as “one-class” classifica-
tion. There are famous methods: SVMs and autoencoders of one class. In general, any
density estimation method, such as Gaussian mixture approaches or kernel density
estimation, can be used to model the probability density function of the normal
groups.
Unsupervised anomaly detection [8, 9] is a system in which we do not know what
is standard and what is not in the data. It is the most flexible configuration that does
not need labels. Furthermore, there is no distinction between a training dataset and a
test dataset. According to the definition, unsupervised anomaly detection algorithms
rate data based on natural properties of the dataset. To distinguish what is natural
and what is an outlier, distances or densities are frequently used.
The unsupervised outlier detection tool is the local outlier factor. It computes each
sample’s anomaly ranking. It even computes the local density variance of a given
sample in relation to its neighbors. The anomaly score is decided by how distinct the
sample is from the surrounding neighborhood. The local outlier factor can also be
used to identify unsupervised outliers. It generates an anomaly score to reflect data
points in the data that are regarded outliers. This is accomplished by estimating a
data point’s local density differential from nearby data points. The distances between
data points that are close neighbors are used to determine local density (k-nearest
neighbors). As a consequence, the local density of each data point may be determined.
Destructive Outcomes of Digitalization (Credit Card) … 701
We can see which data points have comparable densities and which have lower
densities than their neighbors by comparing data points. Individuals with the lowest
densities are considered outliers [12]. To begin, k-distances are the distances between
points that are measured for each point in order to determine its k-nearest neighbors.
The point’s second-closest neighbor is defined as the point’s second-closest point.
The k-distances between different neighbors in a point cluster are shown in this
diagram (Fig. 1).
The reachability distance is calculated using this distance. It is equal to the sum
of the distance between two points plus the k-distance between them. Consider the
equation below, in which B represents the center point and A represents a position
close it [13, 14].
The native outlier problem is solved by calculating the common of the lrds of
k sets of neighbors of a degree, as well as the lrd of that period (LOR). The LOR
equation is as follows:
k (B)
B ∈ Nk (A) lrd
lrdk (A) B ∈ Nk (A) · lrdk (B)
LOFk (A) := = (3)
|Nk (A)| |Nk (A)| · lrdk (A)
3 Procedure
4 Experimental Setup
5 Result
Here, we calculate the mean, count, max, and other information of the data (Fig. 2).
Histograms are used in the project to help distinguish between fraudulent and
legitimate transactions. The Matplotlib package can be used for this. We can also
change the plot’s size to fit our needs (Fig. 3).
The above output shows that it created the bar chart for every attribute within the
dataset. Histograms cluster the info in bins and are that the quickest way to get plan
regarding the distribution of every attribute in dataset.
The correlation matrix is a heat map that is used to see whether there is a
relationship between various parameter sand variables in our dataset (Fig. 4).
The above graph was generated with pyplot and uses seaborn and SNS heat map.
It gives our basic correlation matrix a visual appearance and makes analysis easier.
At both X–Y axes, with a range of −0.75 to +0.50, all 31 parameters V1V29, class,
and number are present.
For successfully identifying credit card fraud, this study provides local outlier and
isolation forest techniques. This study looked on the unbalanced presence of credit
card data. The results of the trials revealed that the suggested model is effective in
704 Y. Patel et al.
Fig. 2 Mean, count, max, min and other information of each of the predictor columns
Destructive Outcomes of Digitalization (Credit Card) … 705
addressing unbalanced situations in credit card fraud detection. Due to the growing
usage of credit cards for transactions, credit card fraud is on the rise. This study
explores credit card fraud detection using machine learning methods such as local
outlier factor and isolation forest using a publicly available dataset. The Python
programming language was used to construct the proposed framework. The suggested
model has not been verified for high-dimensional datasets. Cleaning techniques
include sampling or feature selection algorithms, and the suggested model may
be enhanced by combining it with additional data to be used in high-dimensional
datasets. This study does not address the unbalanced nature of detection methods or
their influence on results. Another area of potential research is a thorough examina-
tion of the problem in relation to the computational efficiency of smart card fraud
detection techniques. The code outputs the number of false positives it discovered
and compares it to the real figures. This is how the algorithm’s precision and accuracy
are calculated. We only used 10% of the total dataset for speedier testing. Finally, the
entire dataset is utilized, and all reports are generated. These results, as well as the
706 Y. Patel et al.
classification report for each algorithm, are presented in the output,where class 0 indi-
cates that the transaction was determined to be legitimate and class 1 indicates that
the transaction was determined to be fraudulent. To rule out false positives, this result
was compared to the class values. This article reviewed recent research in the subject
and identified the most common types of fraud, as well as methods for detecting
them. The technique, pseudocode, explanation, and experimentation results are all
included in this paper, as well as a full description of how machine learning might
be applied to improve fraud detection outcomes. Since the full dataset is made up of
just two days’ worth of transaction information, it is just a small portion of the data
that could be made accessible if this project were to be used commercially. Since the
software is based on machine learning algorithms, it can only get more efficient over
time as more data is fed into it. Although we did not reach our goal of 100% fraud
detection accuracy, we did develop a system that can get close with enough time and
data. There is room for improvement in this project, as with any other of its kind.
Several algorithms may be integrated as modules in this project, and their outputs
can be merged to increase the accuracy of the final result. To improve these models
even more, new algorithms may be included using blockchain technology [11]. The
performance of these algorithms, on the other hand, must be in the same format as
the others. Once that requirement is fulfilled, the modules are straightforward to add,
as demonstrated in the code. As a result, the project gets a lot of modularity and
flexibility. There is more room for modification in the dataset. As previously shown,
the accuracy of the algorithms improves as the dataset size grows. As a result, more
Destructive Outcomes of Digitalization (Credit Card) … 707
data will undoubtedly improve the model’s accuracy in detecting frauds and reduce
the number of false positives.
Acknowledgements We would like to take this opportunity to express our heartfelt appreciation
and warm regards to our advisor for their outstanding guidance, monitoring, and relentless encour-
agement during the thesis. The blessings, assistance, and encouragement that they provide from
time to time will take us a long way in the life path that we are about to embark on. Every successful
project is built on the continuous motivation, goodwill, and support of those who surround it. We
want to take this moment to thank everyone who has helped the project flourish by donating their
time, full support, and collaboration. Dr. Amit Ganatra, Head of Department; Dr. Amit Nayak, Head
of Department and Project Guide; and Prof. Mohammed Bohra are all appreciative for their help
during the study and development phase. It is because of them that we have been motivated to work
hard and implement new technologies. They created a favorable atmosphere for us, and without
them, we would not have been able to achieve our target.
References
1. Tripathi D, Lone T, Sharma Y, Dwivedi S (2018) Credit card fraud detection using local outlier
factor. Int J Pure Appl Math 118(7):229–234
2. Banerjee R, Bourla G, Chen S, Purohit S, Battipaglia J (2018) Comparative analysis of machine
learning algorithms through credit card fraud detection, pp 1–10
3. Machine Learning Group, Credit card fraud detection. Kaggle, 23-Mar-2018. [Online].
Available: https://www.kaggle.com/mlgulb/creditcardfraud. Accessed 06 May 2019
4. Li Z et al (2021) A hybrid method with dynamic weighted entropy for handling the problem
of class imbalance with overlap in credit card fraud detection. Expert Syst Appl 175:114750
5. Desai J, Bohara MH (2021) “Farmer Connect”—a step towards enabling machine learning
based agriculture 4.0 efficiently. In: 2021 6th international conference on communication and
electronics systems (ICCES). IEEE
6. Isolation forests for anomaly detection improve fraud detection (2019) Blog Total Fraud
Protection, 2019. [Online]. Available
7. Waleed GT, Mawlood AT, jabber Abdulhussien A (2020) Credit card anomaly detection using
improved deep autoencoder algorithm. J College Educ 1
8. Joshi A, Soni S, Jain V, An experimental study using unsupervised machine learning techniques
for credit card fraud detection
9. Rai AK, Dwivedi RK (2020) Fraud detection in credit card data using unsupervised machine
learning based scheme. In: 2020 international conference on electronics and sustainable
communication systems (ICESC). IEEE
10. Beigi S, Amin Naseri MR (2020) Credit card fraud detection using data mining and statistical
methods. J AI Data Mining 8.2:149–160
11. Bohara MH et al, Adversarial artificial intelligence assistance for secure 5G-enabled IoT.
Blockchain for 5G-Enabled IoT: 323
12. Anand H, Gautam R, Chaudhry R (2021) Credit card fraud detection using machine learning.
No. 5616. EasyChair
13. Revathi N (2021) Credit card fraud detection using unsupervised technique in time series data.
Turkish J Comput Math Educ (TURCOMAT) 12(13):3082–3088
14. Hussein AS et al (2021) Credit card fraud detection using fuzzy rough nearest neighbor and
sequential minimal optimization with logistic regression. Int J Interact Mobile Technol 15(5)
708 Y. Patel et al.
15. Lim CP, Seera M, Nandi AK, Randhawa K, Loo CK, Credit card fraud detection using adaboost
and majority voting. IEEE Access 6
16. Shukur HA (2019) Credit card fraud detection using machine learning methodologies.
8(3):257–260
17. “Local outlier factor”, En.wikipedia.org, 2019. [Online]. Available: https://en.wikipedia.org/
wiki/Local_outlier_factor. Accessed 06 May 2019
Impact of Blockchain Technology
in the Healthcare Systems
Abstract The healthcare industry is one of the most important industries in the
world which is in dire need of a restructuring process because of its poor and outdated
techniques of data management. Healthcare system has adopted a centralized envi-
ronment and deals with a lot of intermediaries which makes it prone to issues of
single point of failure, lack of traceability of transactions, and privacy issues such as
data leakage. Blockchain is a relatively new technology which is able to tackle the
obsolete methods and practices existing in the healthcare industry. In this chapter, we
analyzed the applications of blockchains in the healthcare industry which can solve
the issues prevalent in the healthcare industry. The aim of this chapter is to reveal
the potential benefits that comes from using blockchain technology in the healthcare
industry and identify the various challenges that this technology has.
G. Anand
Department of Computational Sciences, CHRIST (Deemed to be University), Bangalore, India
e-mail: garima.anand@christuniversity.in
A. Prajeeth · B. Gautam
Department of Computer Science and Engineering, Delhi Technological University, Delhi
110042, India
e-mail: ashwinkurumkulamprajeeth_2k19co092@dtu.ac.in
B. Gautam
e-mail: binavgautam_2k19co103@dtu.ac.in
Rahul (B)
Department of Software Engineering, Delhi Technological University, Delhi 110042, India
e-mail: rahul@dtu.ac.in
Monika
Department of Computer Science, Shaheed Rajguru College of Applied Sciences for Women,
University of Delhi, Delhi 110096, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 709
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_53
710 G. Anand et al.
1 Introduction
Blockchain is a technology that was first brought to light in 2008 through a white
paper published by Satoshi Nakamoto, which introduced the peer-to-peer electronic
cash system called bitcoin. Blockchain has disrupted every industry in the past decade
and is considered to be the one of the pathbreaking technologies in today’s world.
This is because it offers a platform that is based on trust and transparency in a
decentralized environment [1]. Bitcoin was created as an alternative to fiat currencies
after the financial crisis of 2008 [2]. The cause of the crisis was because most banks
maintained financial records in a centralized manner. There was no one to look
after the processes, and most of the faults of the system were overlooked [3]. This
made the arrival of bitcoin huge as it utilizes the technical aspects of blockchain, a
decentralized currency maintaining an immutable ledger [4].
Blockchain is a very powerful technology as it is an ‘unbreakable’ chain of data
entries that allows its nodes to conduct secure and open transactions. Nodes are
nothing but the computers present in the blockchain network. Transactions that are
done can be viewed by every node present because of its transparent and distributed
structure. Blockchain does not rely on third parties to enable transactions between
entities owing to its decentralized property. It uses miners, who validate transac-
tions in a decentralized way as opposed to third parties. This is done through a
distributed consensus which describes an algorithm used to validate information [5].
The management of the transactions done in healthcare industries is often complex
[6]. With the healthcare system being centralized, it is susceptible to many attacks
from malicious users. Blockchain being decentralized ledger can solve this problem
effectively (Fig. 1).
The code upon which bitcoin was developed was released as open source which
helped researchers to develop their own blockchain-based applications and proto-
types. This made researchers aware of the vast potential blockchain holds and started
implementing this into non-financial industries. The properties of blockchain such
as transparency and being a distributed network make it extremely useful for safe
exchange of information. This awareness increased the popularity of the technology,
and more funds were invested in the research and development for use in applications
like health care, operations, supply chain, and many more [1].
Few of the many problems faced by healthcare researchers and practitioners today
are dealing with fragmented data, lack of communication, and unreliable supply chain
of workflow tools. Practitioners are either afraid to share data as they are unsure if
the patient’s health and identification information safekeeping regulations prevent
such sharing or if there are any financial consequences associated with sharing data
[7]. Interoperability is another pressing issue in the healthcare industry [8]. Hospitals
store data in a centralized structure which makes it vulnerable to problems such as
fragmentation of health data, single point of failure, lack of quantity and quality of
data for medical research [9]. Many records are fragmented due to the lack of the
ability to share the data to other sources which is due to the centralized nature of the
health recording system.
Impact of Blockchain Technology in the Healthcare Systems 711
The structure followed by most medical institutions today is outdated and unreli-
able. With the increase in counterfeit products and the significant lack in communi-
cation between other healthcare institutions, the structure of the industry has been
declining. In this section, we highlight a few significant problems, such as the lack
of interoperability, supply chain integrity, and obsolete practices followed.
712 G. Anand et al.
2.1 Interoperability
the same protocols. This is a long and arduous process, and will take several years
before the theory can be applied to practice [16].
Supply chain integrity is the various procedures and technologies used to monitor and
trace the products within the supply chain. This is implemented to discard counterfeit
products and provide high-quality and safe products to the consumer. Some of the
risks to the integrity of products are:
• Adulteration of products
• Counterfeit materials or products
• Misbranded products
• Expired products which are relabeled and sold to consumers [17].
Following certain protocols and assuring quality by tracking the materials and
products throughout the supply chain ensure that the patients or consumers receive
safe therapies and the problems are contained and minimized (Fig. 2).
The control environment within hospitals and clinics is very complex. The mate-
rials and products are dealt by numerous individuals, and they are hard to identify as
they are usually removed from their original packaging. This makes quality assurance
of the supply chain in health care increasingly difficult [18].
The rise of gray markets poses a significant problem for supply chain integrity.
Prices for many pharmaceutical products vary drastically from country to country.
For example, it costs around $210(~Rs. 15,000) every month for a diabetic patient
in the USA, whereas it only costs around $50(~Rs. 3500) every month in India
[19]. This initiates gray market distribution, and this is very damaging for healthcare
institutions as supply chain integrity is lost.
The centralized structure of hospitals imposes certain restrictions on supply chain
integrity as well, which results in drug shortages. Since most of the raw materials
for drugs are made in a single location, any disturbances in this location can cause
significant shortages of the manufacturing of drugs and increased prices of raw
materials. For example, India imports 70% of its pharmaceutical raw materials from
China. During the dreadful second wave of COVID-19 in India, China suspended all
cargo flights to and from India for two weeks. This resulted in the price of the raw
materials used in the manufacturing of COVID-19 to jump more than 200% in the
matter of a month [20].
Medical data is one of the most important types of information that is abundantly
available in today’s world, and its safe and secure storage is very important to maintain
the integrity and privacy of the general public. But even today, healthcare data is
largely paper-based and as such faces a number of challenges. The management
of multiple records becomes very difficult, and sometimes, a patient is not able to
procure all the prescriptions, thus having an undesirable gap in their medical history.
This leads to relaying incomplete information about the patient to the doctor, which
makes it harder to provide an accurate diagnosis. A possible solution is to digitize
all the data stored in hospitals and make them accessible to patients online through
a portal. All healthcare systems and third-party providers that receive access to such
information must make sure that they have a very secure system to prevent any kind
of breach of this sensitive data.
This not only includes the healthcare system but extended to all other partici-
pants such as insurance companies as well. Insurance companies also work in very
outdated ways, storing all their policies in paper format and keeping it to their agen-
cies. Exchange of information between different insurance providers or between
hospitals/medical providers becomes a hassle for the patient when they might be in
urgent need of the insurance funds and are short of time. To claim one’s own insur-
ance benefits, it becomes a long and arduous process of weeks to get all the required
confirmations to receive the insurance reimbursements. Even then, there might be
technical difficulties or external wait times on either side of the transaction, leaving
the patient at a vulnerable position even though they have already purchased the
insurance. For this reason, sometimes people choose to go through middleman agen-
cies, who handle all their insurances in a simpler way as they are more experienced
in the field and have tie-ups with insurance companies to attract more customers.
This adds extra cost that the public must pay if they choose to avoid the lengthy and
complicated procedures of filing for insurance. Due to the involvement of so many
Impact of Blockchain Technology in the Healthcare Systems 715
parties in the system, there is also the problem of constant fraud that happens and
many falls victim to it every year and lose a big portion of their life savings.
These problems are addressed in Sect. 4, where we talk about how the use of
blockchain along with other technologies can help simplify these problems and digi-
tize them, so as to favor the patient and not have to make them work for facilities that
they have already purchased. Blockchain also helps remove problems of middleman
and concerns of getting confirmations for claims, as the whole process will be auto-
mated so that when certain criteria are met, it will be executed automatically without
any external intervention from any party.
3 Background of Blockchain
block and if it references the previous block in the chain by using the corresponding
hash. The nodes are only added to the blockchain if both the conditions are met,
otherwise they are discarded [5].
Blockchains are divided into three categories based on their permission level. These
are:
• Public permissionless: These are open to the public, and any user can easily
participate and validate the transactions. It is a permissionless blockchain, which
means that users do not require any permission from a central authority to join
the blockchain. The transactions are pseudonymous. There is the highest level of
decentralization in this as it is maintained by the community. Most cryptocurren-
cies use public permissionless blockchain (e.g., bitcoin and Ethereum).
• Consortium: These are permissioned blockchain which operates under the
authority of a certain group of nodes or users. Predefined consortium nodes control
the consensus process. Users cannot participate in the network unless they are a
member of any organization who has access to the blockchain. The transactions
Impact of Blockchain Technology in the Healthcare Systems 717
may or may not be accessible to the public. Some examples of this blockchain
are Quorum and Corda.
• Private: In this, there is usually a single authority or organization to look after
the network. It is a permissioned blockchain where employees of only a single
organization can get access to it. However, the total decentralization property is
lost and it becomes a partially decentralized system because the blockchain is
controlled by a single entity. This will, however, increase the block times and
have a greater transaction throughput. Transactions are validated by this single
entity, and it may or may not be available for the public. Some examples of private
blockchain are Hyperledger Fabric and Ripple.
copy of the ledger. These nodes are known as honest nodes, and an agreement is
formed when the honest nodes reach an agreement of their values.
An electronic health record is one of the most popular ways of storing patient infor-
mation in a digitalized manner. These are real-time records that update instantly as
patients perform any kind of medical activity, and are then securely available for
access to all authorized personnel [25]. A patient’s entire medical history can be
recorded in this way, so that any medical professional can have complete access to
the patients’ medical history without any missing information in between.
Health care is an intensely data-driven industry, and there is always ongoing
research for finding more efficient technologies that can amplify the productivity
of the healthcare system while also cutting down on costs. Blockchain provides the
opportunity to overcome the old and disparate health systems that are still in play
today [6]. With the help of blockchain, health care can be shifted from being provider-
centric, i.e., controlled by hospitals and service providers, to patient-centric, i.e.,
being controlled by a peer-to-peer network comprised of the patients and providing
them with the authority to choose who gets to access their personal information
(Fig. 4).
The field of health care deals with a large number of information exchanges and
microtransactions every second, and it is necessary to ensure that every one of them
is transparent and secure across all authorized organizations [3]. Many studies have
published material on how blockchain can be used for administering digital rules
for information access and maintenance [26]. Blockchain is considered as the most
robust and efficient method for ensuring proper interoperability wherever necessary,
and well-planned large-scale implementation is the most appropriate solution for
addressing this concern [1]. Many top tier healthcare institutions have already put
their research and development team to find out the possibilities blockchain can bring
into their systems. This has led to a large increase in the volume of research papers
being published, hence signifying the growing popularity of blockchain technology
[27].
Impact of Blockchain Technology in the Healthcare Systems 719
Healthcare data is one of the most sensitive forms of data, as a person’s entire life
history can be tracked out from their past medical data. Hence, such information
must be stored with the highest levels of security and complete privacy except those
who have permitted access to it. Until now, healthcare data is largely in the hands
of the hospitals and so is controlled by only one entity. This presents a single point
of failure that any hackers wishing to steal or publicize information can choose to
attack from. If hackers manage to get through the security system, they effectively
gain access to the entire library of patient records stored in that hospital and can then
use such information against an innocent person or to blackmail them for their own
benefit. As blockchain is a distributed peer-to-peer ledger system [32], it is controlled
by a large network of smaller entities and thus becomes very difficult to target and
safe against any malicious threats to the system [33]. In case anyone did manage
720 G. Anand et al.
to breach the system at certain point, to cause any actual damage they would have
to be able to change the records at a rate faster than new records are being verified
and stored in the blockchain, which is a computationally infeasible task due to the
secure hashing algorithm used by blockchain to hash each of the records stored in
the various blocks across the chain.
The use of blockchain technology will shift the authorization of their own records
to the patient, and only, they will be able to manage who is allowed to access
what data, without having to reveal the rest of the information to anyone else. This
greatly increases the privacy of the data by allowing restricted access granted with
permissions decided by the patient.
Internet of Things, or IoT, is also another technology that has gained a lot of traction
in recent years. Any device that facilitates Internet connection and transfer of data
comprises the Internet of things. Currently, we are surrounded by thousands of IoT
devices at all times, from our mobile phones, laptops to our smartwatches and Blue-
tooth devices. With the advancements in technology, devices such as smartwatches
are now capable of performing a variety of tasks very efficiently, such as tracking our
pulse and heart health while continuously transferring data with our smartphones.
Health care is a large system that has a huge amount of activity and movement on
a daily basis. Hospital attendants are usually in a rush to meet all their checkups and
daily duties and make sure all equipment is present at the correct places, and that they
are periodically getting changed wherever necessary. This brings along with it the
problem of mismanagement. It is difficult to keep track of all the moving parts in the
hospital at all times, and hence, there is a very high frequency of items getting lost or
misplaced. A key challenge [34] is to tag medical equipment with a usable ID and in
integrating trust in device identification and tracking. If this could be implemented
effectively, then any misplaced device could simply be traced back using its ID and
be recovered [35]. With the use of radio frequency identification tags, or RFID on
IoT devices used across the entire hospital, it becomes much easier to keep track of
all the moving components at all times and see their entire movement history. This
frees up a lot of time spent searching for items that were not tagged and hence had to
be manually searched up for, saving up more time for attending to patients in need.
Various studies conducted by big name companies such as Deloitte [28] and
IBM [29] have shown that implementation of blockchain technology along with IoT
can greatly benefit the management of medical assets in a hospital and bring more
efficiency into the system. This would save the hospital huge sums of money that
would otherwise be spent in repurchasing of misplaced or missing items.
Impact of Blockchain Technology in the Healthcare Systems 721
Pharmaceuticals is one of the biggest industries in the world responsible for a large
portion of income for many hospitals, clinics, and drug manufacturing companies. It
is a huge industry with a large number of working components and several thousands
of employees working on a daily basis, and hence is very prone to human errors and
consequently large sums of extra overhead expenditures. Blockchain can be used to
automate the entire system by integrating it with IoT, and this can be used to save a
majority of unnecessary expenses, thereby saving more money to implement in drug
research rather than recurring faults within the obsolete supply chain system.
A major problem that pharmaceutical companies face currently is that of drug
counterfeit [36]. This could pose a serious health risk to consumers who come upon
these fake medicines and do not have access to proper healthcare facilities [37].
Various bodies are involved from the starting point of the manufacturing process
until they reach the end point consumer. This creates a lot of opportunities for
moving of fake products among these large shipments of medicinal supplies, and
it becomes difficult to then differ between the two unless a thorough examination of
the medicines is conducted, which is infeasible to do considering the bulk quantity
of supplies. The immutable and traceable nature of blockchain technology comes
into play here, using which it is possible to track a medicine right from its inception
until its sale to the consumer, who can then check a record of the medicines move-
ment right from the start of manufacturing and hence ensure that the medicines are
the real deal [35]. A start-up company has implemented a technology that creates a
chain of custody model [38], tracking the entire manufacturing timeline and using
the immutable nature of blockchain to track medicine and prevent fraud. Blockchain
facilitates the removal of a central authority in these trades, thus removing factors
such as corruption or other negative incentives for fraudulent behavior and tampering.
Smart contracts are programs stored on the blockchain that execute when certain
conditions are met [39]. They are a great way to automate certain daily tasks that
take place in a repeated and similar manner, so that they can be performed instantly
and without any error. This may include activities such as purchasing of medicines,
paying of bills, or regular consultation fees. Due to it being a digital contract, there
is no paperwork or external manual filling of data, hence making processes more
efficient and also cutting costs at the same time. Contracts once executed on the
blockchain cannot be reversed, and the execution is broadcasted to the blockchain
and hence can easily be verified. This removes all possibilities of bribery or any
other negative influences. Another possibility is the incorporation of smart contracts
in the supply chain of pharmaceutical products. The contract can keep track of the
movement of the shipment, execute conditions as they happen in real time, and
722 G. Anand et al.
ultimately execute the contract once the medicines have been successfully delivered
and paid for. This not only removes the possibility of counterfeit drugs, but also
automates the payment for the medicines, hence making it a seamless process for
the patient to receive genuine medicines.
These could also be used for the purchasing and continuity of insurance policies.
People have realized the importance of different kinds of insurance for their safety
and well-being, and have begun purchasing house, car, life, medical, and other various
kinds of insurance. The current insurance industry is very inefficient as it is mainly
done by people and recorded on paper, and this carries with it the same faults that
medical records suffer with. A person could provide their credentials to an insurance
company through the blockchain, and every time they wish to purchase a new type
of insurance, they could simply do so with the click of a button, without having to
go through the long and arduous insurance filing and registration process every time.
The payments of the insurance can also be stored on the smart contracts, making it an
automatic process and hence guaranteeing that the insurance is always kept up to date
and readily available, without any kind of fault or delay in the system withholding the
insurance policy that was promised. If an insuree ever performs any kind of surgical
procedure covered under an insurance policy, the policy would instantly go through
without them having to make several appointments with the insurers.
We know that health care, being one of the largest generators of data, has massive
amounts of information produced every year that could be useful in learning and
performing various analytical studies for academic purposes that could greatly benefit
research scholars and help bring theoretical studies to practical approach. The volume
of data coming out from health care is so large that it is difficult for normal database
software to store and work with this kind of data [40]. Types of data include patient
consultation records, patient health records, identification, insurance, etc. [39]. A
major concern is the safekeeping of privacy, security, and authenticity of such data.
It is also essential to hide confidential information such as name, gender, and age
that could potentially be used to identify a person if this data was ever to be leaked.
These concerns can be addressed by the use of blockchain technology. Blockchain
provides a safe way to ensure the secure transfer and usage of such data by recording
all transactions immutably, and also provides the facilities of sharing only informa-
tion that is required for research purposes without revealing any sensitive patient
identifiable information (Fig. 5).
The field of deep learning is one of the largest growing fields in computer science
and has made tremendous progress in previous years regarding diagnostics of medical
data from radiological information such as X-ray scans and CT scans. Getting report
results is an extensive and expensive task. With the help of blockchain and big data,
it is possible to create a model that is capable of giving radiology reports instantly
as the scans are performed, and upload it to the EHR of the respective patients using
Impact of Blockchain Technology in the Healthcare Systems 723
Fig. 5 How blockchain can help in storage and sharing of big data
a smart contract after clearance of all outstanding payments. In this way, blockchain
allows the seamless integration of various technologies to truly create a modern
smart experience for users. MedRec [41] is a popular prototype for blockchain in
health care that aims to prioritize patient agency, giving them control of a transparent
and accessible view of their electronic health records [42]. It incentivizes miners by
providing them with medical data from hospitals for performing the hashing work
which can then be used for research purposes, thus benefiting everyone taking part
in the blockchain system.
Clinical trials are research studies that are conducted upon a large group of willing
volunteers to test for the efficacy and safety of new treatment methods, such as a new
drug, diet, or medical device. It is used to figure out possible side effects and long-
term effects that the treatment may have, and to weigh out the pros and cons of it to
finally decide whether to introduce it to the general public. Sometimes, new medical
procedures may also be performed on people with chronic diseases or life-threatening
diseases, and try to figure out new revolutionary cures for them [43]. But the task of
724 G. Anand et al.
recruitment is a tedious one with a lot of procedures and formalities that must be duly
completed before the trial can begin. Meeting all concerns before the required time
becomes a difficult task and results in a lot of extra expenditures for the concerned
authorities [44]. Due to these problems, it is estimated that 86% of clinical trials do
not achieve their recruitment goals on time [45] and 19% of registered clinical trials
were either closed or terminated due to failure to reach expected enrollment [46].
A possible solution to this problem can be figured out by the combined use of
blockchain technology along with smart contracts. Blockchain can be used to increase
immutability and transparency of the recruitment procedures, check the auditability
and accountability of medical practitioners, and verify the trials and findings of the
researchers [47]. Patient recruitment and matching, efficient data management, and
daily updates of the trial procedures can then all be programmed into a smart contract
which will be uploaded on the blockchain for an efficient monitoring of the whole
trial. This will also help prevent any trial frauds that commonly happen as anyone
applying to the trial will be able to verify the complete authority and authenticity
of the parties involved through the blockchain [48] and ensure that it is a genuine
clinical trial approved by a government body. Blockchain provides a decentralized
data tracking system for keeping records of all patients, doing background checks
on them and checking up on their health records while also maintaining their privacy
by allowing access to only necessary information [49]. This helps smoothen out the
process as participants willing to undergo the trial can directly apply to the clinical
trials without going through any third parties, making it easier to match personal
profiles with the protocol inclusion and exclusion criteria [50]. Another integrating
feature is that of IoT and blockchain, where IoT devices can be used to continuously
monitor the participants and continuously track and upload sensory data and other
vital information Data traceability of the life cycle of the drugs can be used to verify
their authenticity and guarantee their safety for use, which then makes it easier to
check that all trials being performed are legitimate and are done so at the approval of
correct compliances that follow a regulatory code established by the organizations
holding the clinical trials [51].
medical details are major issues that lead to high extra costs that healthcare systems
and patients might have to bear due to identity conflicts. Radiological imaging or
pathological sampling may have to be carried out repeatedly due to previous data
being stored in the wrong identity, which also leads to delay in treatments.
Blockchain can be used to solve this problem by creating a centralized database
that stores all the information of individuals in a dedicated identification storing
database, so that all necessary information can easily be obtained from one source,
which can easily be verified by any organization trying to check the identity of any
person. This will also remove the possibility of duplicate data, as all information is
stored in one place and can easily be recovered from there if necessary. A set protocol
must be defined for storing all information, so that the information is universally the
same wherever it may be used. Storing the name, address, government-approved
identity number, passport number, etc., can be stored in a proper format following
specific guidelines so that there is no mix up of information and identifying creden-
tials [52]. This also solves the problem of having several different IDs for several
different organizations, as one universally acceptable ID can be used for all necessary
purposes [54]. Due to the cryptographic nature of blockchain, the identities will be
kept secure by providing the user with a key that allows only them to share their
identity and also use it to verify themselves at any place when necessary. In this
way, the decentralized and auditable characteristics of blockchain technology help
enforce a much more robust and secure identity sharing and management for the
system, alleviating many of the problems that the current system faces.
We have learnt a great deal about the importance of blockchain and its impact in the
healthcare systems and how it is one of the most essential technologies that can help
revolutionize the healthcare sector. Though it is still a very novel technology, and
there is much to learn about all its possible impacts, pros and cons, it is for sure one of
the most promising developments in this field and is getting huge investments from
healthcare giants for research and development purposes to truly utilize blockchain
to its fullest potential. In this chapter, we have discussed the various problems faced
by healthcare systems today, why they face such problems and what the possible
solutions for these are, and how blockchain is one of the most efficient solutions to
address these concerns. We discussed the features and technical details of blockchain
and blockchain-based solutions to the aforementioned problems in healthcare system.
Although the idea of a completely digitized healthcare system is intriguing, it
comes along with a number of obstacles that must be overcome to have a practi-
cally implementable solution of blockchain. It is essential to make such a task scal-
able across a large network of providers, such that it can be accessed smoothly and
securely for effective collaboration and accurate medical diagnosis by a large audi-
ence simultaneously without excessive wait times, so that all medical professionals
working on such cases can collaborate together and give more meaningful results
726 G. Anand et al.
References
1. Shukla RG, Agarwal A, Shukla S (2020) Blockchain-powered smart healthcare system. In:
Handbook of research on blockchain technology, Elsevier Academic Press, S.l., pp 245–270
2. Markham JW (2011) A financial history of the United States. M.E. Sharpe, Armonk, NY, New
York
3. Rathore H, Mohamed A, Guizani M (2020) Blockchain applications for healthcare. In: Energy
efficiency of medical devices and healthcare applications, pp 153–166
4. Johnston D, Yilmaz SO, Kandah J, Bentenitis N, Hashemi F, Gross R, Wilkinson S, Mason S
(2014) The general theory of decentralized applications. DApps
5. Hölbl M, Kompara M, Kamišalić A, Nemec Zlatolas L (2018) A systematic review of the use
of blockchain in healthcare. Symmetry 10(10):470
6. Abujamra R, Randall D (2019) Blockchain applications in healthcare and the opportunities and
the advancements due to the new information technology framework. Adv Comput, 141–154
Impact of Blockchain Technology in the Healthcare Systems 727
7. Zhang P, Schmidt DC, White J, Lenz G (2018) Blockchain technology use cases in healthcare.
Adv Comput, 1–41
8. Azaria A, Ekblaw A, Vieira T, Lippman A (2016) MedRec: using blockchain for medical data
access and permission management. In: 2016 2nd international conference on open and big
data (OBD)
9. McGhin T, Choo K-KR, Liu CZ, He D (2019) Blockchain in healthcare applications: Research
challenges and opportunities. J Netw Comput Appl 135:62–75
10. Dubovitskaya A, Xu Z, Ryu S, Schumacher M, Wang F (2017) Secure and trustable electronic
medical records sharing using blockchain. AMIA Annu Symp Proc 2017:650–659
11. Reid PP, Compton WD, Grossman JH, Fanjiang G (2005) Building a better delivery system
12. Integrations H, Healthcare interoperability. [Online]. Available: https://www.healthcareinteg
rations.com/healthcare-interoperability.php. Accessed 23 Apr 2021
13. O’Connor S (2017) What is interoperability, and why is it important? Advanced data systems
corporation, 30-May-2017. [Online]. Available: https://www.adsc.com/blog/what-is-interoper
ability-and-why-is-it-important. Accessed 27 Apr 2021
14. Hasselgren A, Kralevska K, Gligoroski D, Pedersen SA, Faxvaag A (2020) Blockchain in
healthcare and health sciences—a scoping review. Int J Med Inform 134:104040
15. Cichosz SL, Stausholm MN, Kronborg T, Vestergaard P, Hejlesen O (2018) How to use
blockchain for diabetes health care data and access management: an operational concept. J
Diabetes Sci Technol 13(2):248–253
16. Ash JS, Berg M, Coiera E (2003) Some unintended consequences of information technology
in health care: the nature of patient care information system-related errors. J Am Med Inform
Assoc 11(2):104–112
17. Zhang P, White J, Schmidt D, Lenz G (2017) Applying software patterns to address
interoperability in blockchain-based healthcare apps
18. Kennedy G (2015) Supply chain integrity and security
19. Byrnes J, Fixing the healthcare supply chain. HBS working knowledge. [Online]. Available:
https://hbswk.hbs.edu/archive/fixing-the-healthcare-supply-chain. Accessed 01 May 2021
20. Prasad R (2019) The human cost of insulin in America. BBC News, 14-Mar-2019. [Online].
Available: https://www.bbc.com/news/world-us-canada-47491964. Accessed 04 May 2021
21. Chandna H, SG, TNN, KS, Pharma industry warns of Covid drug shortages as raw materials
prices surge 200%. ThePrint, 03-May-2021. [Online]. Available: https://theprint.in/health/
pharma-industry-warns-of-covid-drug-shortages-as-raw-materials-prices-surge-200/650792/.
Accessed 05 May 2021
22. JS (2020) Blockchain: what are nodes and masternodes? Medium, 14-Oct-2020.
[Online]. Available: https://medium.com/coinmonks/blockchain-what-is-a-node-or-master
node-and-what-does-it-do-4d9a4200938f. Accessed 07 Mar 2021
23. Brush K, Rosencrance L, Cobb M (2020) What is asymmetric cryptography and how does
it work? SearchSecurity, 20-Mar-2020. [Online]. Available: https://searchsecurity.techtarget.
com/definition/asymmetric-cryptography. Accessed 07 Mar 2021
24. King S (2013) Primecoin: cryptocurrency with prime number proof-of-work
25. Zheng Z, Xie S, Dai H, Chen X, Wang H (2017) An overview of blockchain technology:
architecture, consensus, and future trends. In: 2017 IEEE international congress on big data
(BigData Congress)
26. What is an electronic health record (EHR)? HealthIT.gov, 10-Sep-2019. [Online]. Available:
https://www.healthit.gov/faq/what-electronic-health-record-ehr. Accessed 02 Mar 2021
27. Gökalp E, Gökalp MO, Çoban S, Eren PE (2018) Analysing opportunities and challenges of
integrated blockchain technologies in healthcare. Inf Syst Res Dev Appl Educ, 174–183
28. Kassab M, DeFranco J, Malas T, Graciano Neto VV, Destefanis G (2019) Blockchain: a panacea
for electronic health records? In: 2019 IEEE/ACM 1st international workshop on software
engineering for healthcare (SEH)
29. Open Source Blockchain Technologies. Hyperledger, 19-May-2021. [Online]. Available:
https://www.hyperledger.org/. Accessed 20 Mar 2021
30. Home. ethereum.org. [Online]. Available: https://www.ethereum.org/. Accessed 20 Mar 2021
728 G. Anand et al.
31. Yue X, Wang H, Jin D, Li M, Jiang W (2016) Healthcare data gateways: found healthcare
intelligence on blockchain with novel privacy risk control. J Med Syst 40(10)
32. Patel V (2018) A framework for secure and decentralized sharing of medical imaging data via
blockchain consensus. Health Informatics J 25(4):1398–1411
33. Shah B, Shah N, Shakhla S, Sawant V (2018) Remodeling the healthcare industry by employing
blockchain technology. In: 2018 international conference on circuits and systems in digital
enterprise technology (ICCSDET)
34. Esmaeilzadeh P, Mirzaei T (2019) The potential of blockchain technology for health infor-
mation exchange: experimental study from patients’ perspectives. J Med Internet Res
21(6):e14184
35. Why Healthcare Industry Should Care About Blockchain? [Online]. Available: https://ww3.
frost.com/files/8615/0227/3370/Why_Healthcare_Industry_Should_Care_About_Blockch
ain_Edited_Version.pdf. Accessed 7 Mar 2021
36. Bell L, Buchanan WJ, Cameron J, Lo O (2018) Applications of blockchain within healthcare.
Blockchain in healthcare today, 1
37. Sylim P, Liu F, Marcelo A, Fontelo P (2018) Blockchain technology for detecting falsified and
substandard drugs in distribution: pharmaceutical supply chain intervention. JMIR Res Protoc
7:e10163
38. Coelho FC (2018) Optimizing disease surveillance with blockchain. bioRxiv 1 18
39. Chronicled I (2018) Chronicled Releases 2017 Progress Report for Blockchain Platform for
Track-and-Trace of Prescription Medicines, 27-Jun-2018. [Online]. Available: https://www.
prnewswire.com/news-releases/chronicled-releases-2017-progress-report-for-blockchain-pla
tform-for-track-and-trace-of-prescription-medicines-300611648.html. Accessed 29 Apr 2021
40. Szabo (1996) Smart contracts: building blocks for digital markets. NJETJoTT 18
41. Dhagarra D, Goswami M, Sarma P, Choudhury A (2019) Big data and blockchain supported
conceptual model for enhanced healthcare coverage: the Indian context. Bus Process Manage
J. https://doi.org/10.1108/BPMJ-06-2018-0164
42. Iaw A, Azaria A, Halamka JD, Lippman A (2016) A case study for blockchain in health-
care:“MedRec” prototype for electronic health records and medical research data. In:
Proceedings of IEEE open and big data conference, vol 13, p 13
43. MedRec. [Online]. Available: https://medrec.media.mit.edu/. Accessed 15 Apr 2021
44. What Are Clinical Trials and Studies? National institute on aging. [Online]. Available: https://
www.nia.nih.gov/health/what-are-clinical-trials-and-studies. Accessed 15 Apr 2021
45. Zhuang Y et al (2020) Applying blockchain technology to enhance clinical trial recruitment.
In: AMIA ... Annual symposium proceedings. AMIA symposium, vol 2019, pp 1276–1285
46. Sullivan J (2004) Subject recruitment and retention: barriers to success. Appl Clin Trials
47. Carlisle B, Kimmelman J, Ramsay T, Mackinnon N (2015) Unsuccessful trial accrual and
human subjects protections: an empirical analysis of recently closed trials. NJCT 12(1):77–83
48. Roma P, Quarre F, Israel A et al (2016) Blockchain: an enabler for life sciences and healthcare
blockchain: an enabler for life sciences healthcare. Deloitte, 1–16
49. Barrett J (2007) Fraud and misconduct in clinical research. Princ Pract Pharm Med 4(2):631–
641
50. Omar IA, Jayaraman R, Salah K et al (2021) Applications of blockchain technology in clinical
trials: review and open challenges. Arab J Sci Eng 46:3001–3015
51. Gross CP, Mallory R, Heiat A, Krumholz HM (2002) Reporting the recruitment process in
clinical trials: who are these patients and how did they get there? Ann Intern Med 137:10–16
52. Petersen S, Hediger T (2017) The Blockchain (R)evolution—how blockchain technology can
revolutionise the life sciences and healthcare industry. Deloitte
53. Just BH, Marc D, Munns M, Sandefer R (2016) Why patient matching is a challenge: research
on master patient index (MPI) data discrepancies in key identifying fields. Perspectives in
health information management, 01-Apr-2016. [Online]. Available: https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC4832129/. Accessed 03 May 2021
54. A framework for cross-organizational patient identity management (2015) The sequoia project
Impact of Blockchain Technology in the Healthcare Systems 729
Sukhada Bhingarkar
Abstract Hepatitis C is a liver disease whose infection is often silent and can lead to
fibrosis or cirrhosis if it becomes chronic and goes undetected. It is generally spread
through blood-to-blood contact. Hence, it is important to accurately classify blood
donors as healthy blood donor or a person having Hepatitis C infection before blood
transfusion happens. Nowadays, machine learning has been used in various domains
including health care for accurate and fast results. This paper proposes a framework
for accurate classification of blood donors using five machine learning algorithms,
namely logistic regression, support vector machine, k-nearest neighbours, decision
tree, and neural networks. The backward elimination technique is implemented for
feature selection to improve the classification accuracy. The experimental results
show that k-nearest neighbours perform better with the testing accuracy of 94.3%
than other classifiers.
1 Introduction
Hepatitis C is a liver disease caused by Hepatitis C virus (HCV) [1]. Unlike Hepatitis
A and Hepatitis B, there is no vaccination for Hepatitis C. Hepatitis C begins as an
acute infection after exposure to Hepatitis C virus. Some infected individuals can
cure virus on their own, but 75% of infected individuals develop chronic HCV.
As per World Health Organization (WHO), more than 71 million people world-
wide have chronic Hepatitis C infection. There are many challenges in dealing with
Hepatitis C. Firstly, the infection with HCV is often silent. There are many infected
individuals who either do not have any symptoms or have unspecific symptoms
such as mild fatigue or discomfort in an abdomen. Hepatitis C does not spread by
coughing, sneezing, hugging, shaking hands, or through food and water. However, it
S. Bhingarkar (B)
MIT World Peace University, Pune, India
e-mail: sukhada.bhingarkar@mitwpu.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 731
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_54
732 S. Bhingarkar
can spread through activities that involve blood-to-blood contact like injection drug
use, sharing personal hygiene items like razor, toothbrush with the infected person,
piercing tattoos with inappropriate sterilization, blood transfusion, organ transplants,
etc. Secondly, over many years, inflammation in liver caused by Hepatitis C virus
usually results in the formation of scar tissue, called as fibrosis. The developing
fibrosis in the liver can eventually reach a specific level which is called as cirrhosis.
Hepatitis C infection with cirrhosis leads to increase risk of liver failure and liver
cancer. It is important to cure the disease before cirrhosis gets developed. Thus, the
purpose of this paper is to propose automated model to analyse laboratory data which
can help in making timely decisions.
Machine learning is one of the hottest trends in today’s market. According to
Gartner [2], by 2022, there will be at least 40% new application development projects
going on in the market that would require machine learning. Data mining and machine
learning are playing vital role in healthcare industry as well to detect several diseases
at an early stage like diabetics, heart attack, autism, arthritis, blood cancer, etc. Clas-
sification is one of the essential branches of machine learning algorithms, wherein the
category of the data item or object is predicted. In last few decades, the researchers
have extensively used machine learning algorithms to design predictive models based
on clinical data. This paper proposes a framework based on machine learning tech-
niques that can help especially pathologists to classify the blood donor correctly
as a healthy blood donor or having Hepatitis C infection based on various blood
attributes.
The rest of the paper is structured as follows: Sect. 2 discusses the related work
in this area. Section 3 describes the proposed framework. Section 4 evaluates the
performance of proposed framework with the help of various evaluation parameters.
Section 5 discusses the results. Finally, the conclusions are drawn in the last section.
2 Related Work
In the related work discussed below, most of the researchers have worked on
predicting the risk of various side effects seen in fibrosis/cirrhosis patients. Some of
the researchers have implemented prediction model for drug design or for treatment.
In [3], esophageal varices has been detected with the help of various machine
learning algorithms like support vector machine (SVM), Naïve Bayes (NB), deci-
sion tree (DT), artificial neural network (ANN), random forest (RF), and Bayesian
network to diagnose liver cirrhosis at an early stage. esophageal varices is one of
the most common side effects of liver cirrhosis. The dataset used had twenty-four
features. Various feature selection techniques were applied in proposed work to
select nine significant features. Out of the six machine learning algorithms, Bayesian
network exhibited highest accuracy of 74.8%.
Harry Chown [4] used SVM, ANN, RF, generalized linear model (GLM), and
linear discriminant analysis (LDA) to predict Hepatitis C NS3 cleavage patterns
of viral proteases which can be helpful for future drug design. Two sequence-based
A Comparison of Machine Learning Techniques for Categorization … 733
feature extraction methods were implemented in proposed work. It was been observed
that the method of feature extraction compensated the chosen machine learning
algorithm.
Georg Hoffmann et al. [5] have proposed to use two functions of decision tree
algorithm, i.e. rpart and ctree to detect liver fibrosis and cirrhosis. The authors have
implemented leave-one-out cross-validation method to improve the accuracy of diag-
nosis, wherein the feature used was enhanced liver fibrosis (ELF). The highest accu-
racy achieved through rpart with ELF was 75.3%, and ctree without ELF was 72.6%.
However, small changes done in the input data can result in largely deviated decision
trees.
George N. Ioannou et al. [6] have implemented recurrent neural network (RNN)
to predict the risk of hepatocellular carcinoma (HCC) in Hepatitis C patients having
cirrhosis. The authors had focused upon using 2 types of features, the features that
remain constant over a period of time and the features that change over a period of
time. It was been observed that RNN outperformed the logistic regression with 0.759
as area under receiver operating characteristic curve (AUROC) among all samples.
Hiroaki Haga et al. [7] have developed a treatment prediction model in which
nine machine learning algorithms have been applied on HCV genome variants. It
was been experimented that SVM algorithm performed well with 95% validation
accuracy.
The four machine learning algorithms, namely logistic regression, RF, gradient
boosted trees, and stacked ensemble, were used in [8] to find undiagnosed patients
with HCV infection. The authors extracted information like risk factors, symptoms,
treatment relevant with HCV from the patient’s medical history. It was demonstrated
in the work that stacked ensemble accomplished maximum precision of 97% than
rest of the algorithms.
In [9], the authors have employed synthetic minority sampling technique
(SMOTE) to deal with the problem of imbalance in the dataset of HCV patients
and to rank the features accordingly. The authors have implemented five classifica-
tion algorithms, namely decision tree, k-nearest neighbours, random forest, logistic
regression, and Naïve Bayes. After removing the imbalance, it was found that random
forest performed better than other algorithms with the classification accuracy of 92%.
K. Santosh Bhargav et al. [10] implemented decision tree, support vector machine,
logistic regression, and Naïve Bayes on HCV dataset to classify whether the person
will live or die depending upon the attributes mentioned in the dataset. İt was
concluded that logistic regression had greater accuracy of 87.17% compared to rest
of the classifiers.
The prediction of cirrhosis development in veterans was done using cross-
sectional and longitudinal model in [11]. The performance of the models was
measured on the basis of concordance index, and it was observed that longitudinal
model resulted in 0.764 concordance index whereas cross-sectional model resulted
into 0.746 concordance index.
In another study [12], the authors had evaluated the performance of machine
learning classifiers on Egyptian patients’ dataset [13] using Python and R tools. The
authors have implemented binary and multi-class classification after applying feature
734 S. Bhingarkar
selection techniques such as principal component analysis (PCA). İt was observed
that feature selection mechanism helped to improve the classification accuracy.
Another study was done by the researchers in [14] to compare machine learning
approaches for prediction of advanced liver fibrosis in chronic HCV patients. Deci-
sion tree, particle swarm optimization, genetic algorithm, and linear regression
models were applied, and it was concluded that machine learning techniques achieved
the accuracy in the range between 66.3 and 84.4%.
3 Proposed Framework
This section involves three parts. The first part discusses the laboratory dataset that
involves blood values of the donors. The second part demonstrates the preprocessing
and the feature selection techniques employed before applying the machine learning
algorithms. Lastly, the third part discusses the machine learning techniques employed
to find out the classifier that can achieve the highest accuracy. Figure 1 represents
the proposed framework that demonstrates various phases of it.
The dataset used for this research is from University of California, Irvine (UCI)
machine learning repository [15] which is HCV dataset and is publicly available.
The dataset consists of records of 615 patients out of which 238 are women and 377
are men. The dataset has 14 features which present the information of each patient.
Table 1 represents the description of the features of the dataset.
Table 2 Description of
Code Category Number of records
category
0 Blood donor 533
0s Suspected blood donor 7
1 Hepatitis C 24
2 Fibrosis 21
3 Cirrhosis 30
Quality of data is an important factor in data mining that leads to accurate prediction.
In order to get precise results, firstly data preprocessing is implemented that includes
dropping a column named as “Unnamed:0”, encoding the columns “Sex” and “Cat-
egory” to numeric values and filling the missing values with the mean value of the
column where the missing value was located. Secondly, it is required to have features
with Gaussian or normal distribution as conventional statistical methods perform
better with such distribution. Hence, as a part of preprocessing, except “Sex” feature
736 S. Bhingarkar
which is a categorical feature, rest of the features have been power transformed to
have normal distribution.
Further, feature selection technique was employed to choose significant features
from the dataset to achieve better classification results. In proposed framework,
feature selection is performed in two stages. In first stage, correlation score of each
feature is measured with each other as well as with target feature. The features
having high correlation score with each other have the same impact on target feature.
Hence, when two features have greater correlation score than the threshold, one of
the features can be dropped. Here, the threshold assumed is 0.9. However, for a given
dataset, there are no such features having correlation score greater than 0.9. Therefore,
none of the features have been omitted in this stage on the basis of correlation score.
Table 3 depicts the correlation score of each feature with the target feature.
In the second stage, the feature selection is done on the basis of backward elimi-
nation method, wherein probability values (p-values) are calculated for each feature.
P-values depict the null hypothesis of significance level. Null hypothesis assumes that
the feature is of no use, whereas significance level is the amount of change a feature
will affect towards the final output. The significance level is set as 0.05. In backward
elimination, we have fit the regressor model with all the features and then have calcu-
lated the p-values. If the p-value of a feature is higher than the significance level,
that feature is removed. These steps have been repeated till only the features having
p-value less than or equal to the significance level remain. For a given dataset, the
two features CHE and ALT have p-values greater than the significance level; hence,
these two features have been removed from the dataset and the remaining features
are selected for further classification. Figure 2 shows the distribution plot for the
selected features.
A Comparison of Machine Learning Techniques for Categorization … 737
4 Performance Evaluation
The dataset was divided as 80% for training and 20% for testing. To evaluate the
performance of all classifiers, the test set was used. The evaluation indices were true
positive (TP), true negative (TN), false positive (FP), and false negative (FN). These
indices were used to calculate performance measures like sensitivity, specificity,
precision, F-measure, and accuracy. Sensitivity, also depicted as true-positive rate
(TPR), implies the ratio of actual positive cases that have been predicted as positive
over the entire population. It is also termed as recall and is calculated as:
TP
Sensitivity (TPR) = (1)
TP + FN
TN
Specificity (TNR) = (2)
TN + FP
A Comparison of Machine Learning Techniques for Categorization … 739
TP
Precision (PPV) = (3)
TP + FP
TP + TN
Accuracy = (4)
TP + TN + FP + FN
Precision and recall are inversely proportional to each other. F-measure combines
the properties of precision and recall into a single measure which is considered as
harmonic mean of precision and recall. Sometimes, it is also termed as F-score which
is represented as:
Precision × Recall
F - Measure = 2 × (5)
Precision + Recall
5 Discussion
It is observed from the above results that k-nearest neighbour algorithm has performed
better with the classification accuracy of 94.3% which is higher than other machine
learning algorithms considered for this study. The results of this study reveal the
significance of applying machine learning models in healthcare domain which saves
the time and is inexpensive. Application of feature selection technique shows that
the two features, namely cholinesterase (CHE) and alanine aminotransferase (ALT),
can be removed from the feature dataset in order to gain good accuracy.
6 Conclusion
This paper has proposed to use five machine learning algorithms on laboratory data
to classify the blood donors. These machine learning algorithms include logistic
regression, support vector machine, k-nearest neighbours, and neural networks. The
740 S. Bhingarkar
Table 4 (continued)
Category (class labels) Sensitivity Specificity Precision F-measure
Hepatitis C 0.00 100.00 0.00 0.00
Fibrosis 0.00 100.00 0.00 0.00
Cirrhosis 0.00 100.00 0.00 0.00
Testing accuracy (%): 61.79
References
3. Abd El-Salam SM, Ezz MM, Hashem S, Elakel W, Salama R, ElMakhzangy H, ElHefnawi M
(2019) Performance of machine learning approaches on prediction of esophageal varices for
Egyptian chronic hepatitis C patients. Inf Med Unlocked 17
4. Chown H (2019) A comparison of machine learning algorithms for the prediction of Hepatitis
C NS3 protease cleavage sites. EuroBiotech J 3(4):167–174
5. Hoffmann GF, Bietenbeck A, Lichtinghagen R, Klawonn F (2018) Using machine learning
techniques to generate laboratory diagnostic pathways—a case study. J Lab Precis Med 3:58–58
6. Ioannou George N et al (2020) Assessment of a deep learning model to predict hepatocellular
carcinoma in patients with hepatitis C Cirrhosis. JAMA Network Open 3
7. Haga H et al (2020) A machine learning-based treatment prediction model using whole genome
variants of hepatitis C virus. PloS One 15(11)
8. Doyle OM, Leavitt N, Rigg JA (2020) Finding undiagnosed patients with hepatitis C infection:
an application of artificial intelligence to patient claims data. Sci Rep 10:10521
9. Oladimeji O, Oladimeji A , Olayanju O (2021) Machine learning models for diagnostic
classification of hepatitis C Tests. Front Health Inf 10(1)
10. Santosh Bhargav K et al (2018) Application of machine learning classification algorithms on
hepatitis dataset. Int J Appl Eng Res 13(16)
11. Konerman, MA et al (2019) Machine learning models to predict disease progression among
veterans with hepatitis C virus. PloS One 14
12. Nandipati SCR, XinYing C, Wah K (2020) Hepatitis C Virus (HCV) prediction by machine
learning techniques. Applı Modell Sımul 4
13. Dua D, Graff C (2019) UCI machine learning repository http://archive.ics.uci.edu/ml
14. Hashem S, Esmat G, Elakel W, Habashy S, Raouf S, Elhefnawi M, Eladawy M, Elhefnawi M
(2017) Comparison of machine learning approaches for prediction of advanced liver fibrosis
in chronic hepatitis C patients. In: IEEE/ACM transactions on computational biology and
bioinformatics
15. Dua D, Graff C (2020) UCI machine learning repository. http://archive.ics.uci.edu/ml
Monitoring the Soil Parameters Using
IoT for Smart Agriculture
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 743
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_55
744 K. Gayathri and S. Thangavelu
1 Introduction
2 Related Work
Growing plants without the help of soil are known as hydroponics. By replicating
their environmental requirements, the technique provides us with superior-quality
crops. It’s also known as vertical farming. Different parameters, such as temperature
and humidity, must be measured for the weather monitoring system and irrigation
controller.
Bhosale et al. [6] described an irrigation scheduler that executes user-defined
functions. The irrigation scheduler also generates commands in order to control
relevant actuators. The soil moisture sensor was designed, developed, and also tested
in order to achieve accurate and reliable measurements at a low cost.
The sensor can also measure humidity using the same PCB circuit. As a result,
Bhosale et al. [6] show the model of a PIC16F877A microcontroller-based irriga-
tion system. Bhaskar et al. [7] proposed a design that will assist farmers who are
experiencing power outages in maintaining a consistent supply of water because
of power outages and insufficient supply of water. The designed model aids in the
reduction of human labor. Due to less interest, cultivation in our country has been
greatly reduced. However, due to very less knowledge about the dryness of land and
not proper usage of the pesticides, this results in very little production. Sowmiya
et al. [8] described how the data that is sensed is processed and how the sensed data
is stored and saved in the cloud, and then relayed to registered farm owners in a
user-friendly format through their pH one or device. In addition, if the pH value in
the soil is less, then the system recommends best pesticides for better cultivation.
This will be extremely beneficial to farmers who are unable to visit their farms and
will improve crop cultivation. In the Internet world, the Internet of Things is the
hottest topic. The concepts aid in the interconnection of physical objects that have
sensing, actuating, and computing capabilities. Thus, Lakhwani et al. [9] discussed
about the Agricultural IoT, Internet of Things, a list of application where Internet
of Things can be used for agriculture, the advantages of IoT in agriculture, and an
analysis about the literature.
On-site engineers require some basic information about the type and structure of
the soil. Chandan et al. [10] investigated traditional soil classification techniques and
developed and tested an image processing-based efficient classifier for soil classifi-
cation. Humus Clay, Clay, Silty Sand, Sandy Clay, Clayey Peat, Clayey Sand, and
Peat were seven soil classes studied for classification. Preprocessed images of the
soils under study were collected. The feature extracted from the preprocessed images
is used to train the classifier-SVM. Developed SVM is then put into the test for clas-
sification efficiency and accuracy for each class. The built model is utilized for the
development of the classification of soil in real-time. The feature extracted from the
preprocessed images is used to train the classifier-SVM. Developed SVM is then
put into the test for classification efficiency and accuracy for each class. The built
model is utilized for the development of the classification of soil in real-time. Bhat-
tacharya et al. [11] used a computer vision approach for characterizing and also for the
soil classification. For soil classification and characterization, a Gravity Analog Soil
Moisture Sensor is utilized here along with an Arduino Uno and image processing
tool. The data sets of this study are from Ethiopia’s Amhara region and Addis Ababa
city. Bhattacharya et al. [11] used six different types of the soil. Each type contains
Monitoring the Soil Parameters Using IoT for Smart Agriculture 747
3 Proposed Work
There are mainly four basic modules in the proposed system as shown in Fig. 1.
The four modules are soil texture classification module, soil monitoring module,
automatic irrigation module, and fire detection module. The soil texture classification
module will help to classify the soil into different types such Humus Clay, Clay, Silty
Sand, Sandy Clay, Clayey Peat, Clayey Sand, and Peat. Based on the type of the Soil
Crops will be suggested. That is, it will suggest and predict about the suitable crops to
cultivate there. The farmers can test the type of soil multiple number of times during
or before the cultivation process and can take necessary actions and precautions to
get good yield. For classifying this, an algorithm will be implemented—SVM. Input
images will be fed to the classifier, and it will classify and detect the soil type. If
any abnormalities are found, then an alert will be sent with the help of buzzer and
necessary actions can be taken. The next module is the soil monitoring module. The
soil monitoring module provides temperature, pH, humidity and soil moisture and
NPK level of the soil. And the automatic irrigation module will be helpful to predict
and analyze the adequate amount of water required for the irrigation. That is, if
moisture level in the soil is less or if the value is less than the specified threshold
level, then ESP32 microcontroller turns on a water pump so that it is possible to give
water to the crops and plants in the farm. Water pump gets turn off automatically
whenever the system finds required moisture content in soil. And the automatic
irrigation will be done. Then, the proposed system can also detect fire using fire
detector module. As we all know, fire is an unexpected and unpredicted event that
results in significant losses for farmers. Hot temperatures and dry conditions can
cause tinder-dry crops and residue in agricultural fields. As a result, field fires can
happen unintentionally. Hence, the fire detection module is used in this project to
avoid this unexpected situation. So, if fire is found in the agricultural land, then the
proposed model will inform end-user with the help of buzzer. That is, whenever
fire is found in the agricultural land, the proposed model identifies fire. Along with
the detection of fire, the proposed system provides an alert to the end-user. Also,
whenever the value comes greater than the threshold value, it will produce an alert
with buzzer alarm.
3.2 Methodology
Soil texture classification has done using the SVM algorithm which is one of the
best and fastest algorithms that outputs accurate results in real-time. The algorithm
classifies the soil within seconds. Block diagram of the proposed system is depicted
in Fig. 2.
Hardware used in the proposed work includes the ESP32 Microcontroller, DHT11,
Soil Moisture Sensor, Color Sensor, Buzzer, Relay, Water pump, TDS Sensor, and
Gas Sensor as shown in Fig. 2. The software requirements of this project are
MATLAB, Arduino IDE, Embedded C, and Google Firebase Cloud. The data is
stored in the cloud. MATLAB software is used for soil classification. The soil texture
classification module helps to classify the soil into seven different types. And also, it
will suggest and predict about the suitable crops to cultivate there. Soil monitoring
module provides temperature, pH, humidity and soil moisture, and NPK level of the
soil using DHT11 sensor, and color sensor. After this, automatic irrigation module
provides the adequate amount of water required for the irrigation using a Soil Mois-
ture sensor, Relay, Water pump, and TDS Sensor. TDS sensor checks the quality of
the water. The soil moisture sensor checks the moisture level in the soil and if the
moisture level is low then the microcontroller switches on a water pump to provide
water to the plant. The water pump gets automatically off when the system finds
enough moisture in the soil. All the sensors are connected to ESP32 Microcontroller.
Then using the gas sensor, the fire detector module detects fire. Whenever fire is
found in the agricultural land, the proposed model identifies fire. Along with the
detection of fire, the proposed system provides an alert to the end-user. Also, when-
ever the value comes greater than the threshold value, it will produce an alert using
the buzzer. The data will be updated in the Google Firebase cloud which is used for
data monitoring.
The SVM (Support Vector Machine) Algorithm proves to be one of the efficient
algorithms for providing accurate results at a faster rate in real-time. It is one of
the famous Supervised Learning algorithms. SVM is mainly utilized to resolve the
classification problems and also to solve the regression problems. SVM’s goal is
finding best line or finding the decision boundary for categorizing n-dimensional
space into classes. Thus, new data points are placed in correct category in future.
A hyperplane is for the best decision boundary. The vectors or the extreme points
that aid to create hyperplane is selected by SVM. Support vectors are extreme cases,
and the algorithm is called a SVM or Support Vector Machine. The SVM’s goal is
to find a hyperplane in N-dimensional space, where this N denotes the number of
features, which categorizes data points clearly. Numerous hyperplanes are there that
are selected in order to separate the two classes of data points.
The main goal of the proposed algorithm is to identify one plane with greatest
margin, or greatest distance between the vectors or data points from both the classes.
The hyperplanes are defined as the decision boundary which aid in classifying data
points. Different classes can be assigned to vectors on each side of hyperplane [12–
20]. The hyperplane’s dimension is also determined by the number of features. If
there are two features present, then the hyperplane is observed as a straight line. And
750 K. Gayathri and S. Thangavelu
3.4 Dataset
The dataset identified and used in this proposed work is soil classification image
dataset. The soil classification image dataset is composed of 700 images that include
different types of soil. This dataset is used for soil classification. This dataset is
taken from Kaggle. Soil type classification image dataset is an image dataset that
Monitoring the Soil Parameters Using IoT for Smart Agriculture 751
contains images as “Humus Clay,” “Clay,” “Silty Sand,” “Sandy Clay,” “Clayey
Peat,” “Clayey Sand,” and “Peat.” This dataset is primarily used to classify soil into
various types. Indian soils are divided into groups based on where the soil is found or
the predominant particle size present in the soil. And the soil is classified as laterite
soil, alluvial soil, black or regur soil, forest soil, red soil, marshy soil or peaty, arid
or desert soil, and so on, depending on its location. Soil is classified as clay, peat,
or sand based on the dominant particle size. Silty Sand, Clayey Sand, Clayey Peat,
Humus Clay, and Sandy Clay, on other hand, are classified as mixtures of two soils.
The dataset is divided into a ratio of 8:2. That means 80% of the dataset is used for
training purposes and 20% of the dataset is used for testing purposes. For this project,
the necessary data and images of soil are also gathered from various sources.
752 K. Gayathri and S. Thangavelu
4 Results
Soil texture classification using an image processing technique for classifying the soil
is done using the SVM algorithm. The implementation of soil texture classification
is done using MATLAB. The hardware and software are serially connected using
the USB cable. By using the SVM algorithm, soil texture classification results are
obtained with higher accuracy of 95.72%. The hardware setup was made and the
results of the soil monitoring module, automatic irrigation module, and fire detection
module were obtained and assessed with accuracy.
The different input images are classified into seven different types of soil such as
“Humus Clay,” “Clay,” “Silty Sand,” “Sandy Clay,” “Clayey Peat,” “Clayey Sand,”
and “Peat” successfully. Also obtained an average accuracy of 95.72%. And here, the
six selected features or elements that are given to this proposed classifier-SVM are
Auto Correlogram, Energy, Mean Amplitude, HSV Hist or HSV Histogram, Wavelet
Moments, and Color Moments. Also, suitable crops to cultivate are predicted. The
results of the soil texture classification module are shown below.
Figure 4 shows the input image, which has been classified as clay. It has a
95.7742% accuracy. Also, suitable crops are predicted. The predicted suitable crops
are Paddy, Fruit trees, and Ornamental trees.
Figure 5 shows the input image, which has been classified as Silty Sand. It has a
95.7742% accuracy rating. Also, suitable crops are predicted. The predicted suitable
crops are Willow, Birch, Dogwood, Cypress, and Fruit crops.
Fig. 5 Soil texture classification module—input image is classified into silty sand
Figure 6 shows the input image, which has been classified as Humus Clay. It has a
95.7742% accuracy. Also, suitable crops are predicted. The predicted suitable crops
are Berry crops, Climbers, Bamboos, Perennials, Shrubs, and Tubers.
Figure 7 shows the values from Sensors for the soil monitoring, automatic irriga-
tion, and fire detection modules. These values are displayed in the Arduino console.
Fig. 6 Soil texture classification module—input image is classified into humus clay
754 K. Gayathri and S. Thangavelu
That is, Gas sensor value, TDS value, Humidity, Soil Moisture Sensor value, and
temperature rate are displayed in the Arduino console.
Figure 8 shows the real-time values obtained from the sensors. These real-time
values are stored in the Google Firebase cloud server. Gas sensor value, TDS value,
humidity, soil moisture sensor value, and temperature rate are stored in the Google
Firebase cloud server.
Table 1 depicts the comparison performance with existing algorithms and
proposed SVM for soil texture classification module.
Precision is defined as the ratio of correctly predicted positive observations to
total predicted positive observations. Recall or sensitivity is the percentage of actual
positive cases that were predicted as positive or true positive. And F1 Score is a
metric for assessing a test’s accuracy. Harmonic mean of precision and recall are
Table 1 Comparative analysis of previous research methods and proposed SVM method
Model Number of soil class for Classification algorithm Accuracy (%)
the experiment
Bhattacharya and 3 Multi SVM with Linear 90.7
Solomatine [11] kernel
Chung et al. [12] 13 Linear regression 48
Vibhute et al. [13] 5 Multi SVM with Linear 71.78
kernel
Chandan and 7 Fine KNN 93.8
Thakur [10]
Proposed SVM 7 Multi SVM with Linear 95.72
Model kernel
used to calculate the F1 score. Table 2 depicts the Precision, Recall, and F1 Score
of the SVM Method.
5 Conclusion
Remote monitoring of moisture content, humidity value and rate of the temperature
of the soil is done at a very low cost. Farmers can access the values from anywhere
in the world at any time. As a result, the proposed work provides a more precise
moisture content value and also the rate of temperature and humidity in the soil.
This is really important on farms. To assess any additional data, the humidity sensor,
soil moisture sensor, and temperature sensor are connected to microcontroller. A
sustainable and reliable monitoring model focused on each farmer’s land has been
developed successfully. The developed model is low cost, low power, and as well as
noninvasive and provisional real-time agriculture monitoring model. It’s also simple
to use and gives precise results. The project has been implemented with both hardware
and software components. The hardware containing several sensors is tested with
good accuracy. And the results obtained are very accurate.
756 K. Gayathri and S. Thangavelu
References
1. Srivastava A, Das DK, Kumar R (2020) Monitoring of soil parameters and controlling of soil
moisture through IoT based smart agriculture IEEE Students Conf Eng Syst (SCES) 13(3):1–6
2. Madhumathi R, Arumuganathan T, Shruthi R (2020) Soil NPK and moisture analysis using
wireless sensor networks. In: 11th international conference on computing, communication and
networking technologies (ICCCNT), vol 9, no. 1, pp 1–6
3. Kapse S, Kale S, Bhongade S, Sangamnerkar S, Gotmare Y (2020) IoT enable soil testing &
NPK nutrient detection. JAC J Compos Theory 13(5):310–318
4. Marcu IM, Suciu G, Balaceanu CM, Banaru A (2020) IoT based system for smart agriculture.
In: 11th international conference on electronics, computers and artificial intelligence (ECAI),
vol. 11, no. 2, pp 1–4
5. Pawar S, Tembe S, Acharekar R, Khan S, Yadav S (2020) Design of an IoT enabled automated
hydroponics system using NodeMCU and Blynk. In: IEEE 5th international conference for
convergence in technology (I2CT), vol 11, no. 1, pp 1–6, March 2020.
6. Bhosale PA, Dixit VV (2020) Water saving-irrigation automatic agricultural controller. Int J
Sci Technol Res 1(11):118–123
7. Bhaskar L, Koli B, Kumar P, Gaur V (2020) Automatic crop irrigation system. In: 4th interna-
tional conference on reliability, infocom technologies and optimization (ICRITO) (trends and
future directions), vol 15, no. 1, pp 1–4
8. Sowmiya E, Sivaranjani S (2020) Smart system monitoring on soil using internet of things
(IoT). Int Res J Eng Technol (IRJET) 4(2):1070–1072
9. Lakhwani K, Gianey H, Agarwal N, Gupta S (2018) Development of IoT for smart agriculture
a review. Emerg Trends Expert Appl Secur 841(1):425–432
10. Chandan, Thakur R (2018) An intelligent model for Indian soil classification using various
machine learning techniques. Int J Comput Eng Res (IJCER) 8(9):33–41
11. Bhattacharya B, Solomatine DP (2020) Machine learning in soil classification. Neural Netw
19(2):186–195
12. Chung S-O, Cho K-H, Kong J-W, Sudduth KA, Jung K-Y (2020) Soil texture classification
algorithm using RGB characteristics of soil images. IFAC Proc 43(26):34–38
13. Vibhute AD, Kale KV, Dhumal RK, Mehrotra SC (2019) Soil type classification and mapping
using hyperspectral remote sensing data. In: International conference on man and machine
interfacing (MAMI), vol 13, no. 1, pp 1–4
14. Byiringiro E, Ndashimye E, Kabandana I (2021) Smart soil monitoring application (Case Study:
Rwanda). In: Future of information and communication conference, FICC 2021: advances in
information and communication, vol 1363, pp 212–224
15. Prakash C, Singh LP, Gupta A, Singh A (2021) Smart farming: application of internet of
things (IoT) systems. In: Congress of the international ergonomics association, IEA 2021:
proceedings of the 21st congress of the international ergonomics association (IEA 2021), vol
221, pp 233–240
16. Koresh, James Deva H (2021) Analysis of soil nutrients based on potential productivity tests
with balanced minerals for maize-chickpea crop. J Electron 3(01):23–35
17. Adam EEB, Sathesh A (2021) Construction of accurate crack identification on concrete
structure using hybrid deep learning approach. J Innov Image Process (JIIP) 3(02):85–99
18. Shankhdhar GK, Sharma R, Darbari M (2021) SAGRO-lite: a light weight agent based semantic
model for the internet of things for smart agriculture in developing countries. Semantic IoT
Theory Appl 941:265–302
19. Bharti B, Pandey S, Kumar S (2021) An advanced agriculture system for smart irrigation and
leaf disease detection. Adv Electr Comput Technol 711:221–233
20. Srunitha K, Padmavathi S (2017) Performance of SVM classifier for image based soil classifica-
tion. In: International conference on signal processing, communication, power and embedded
system (SCOPES), pp 411–415
21. Sabarish BA, Vidhya S (2019) Facility recommendation system using domination set theory
in graph. Int J Innov Technol Explor Eng 8:313–317
Monitoring the Soil Parameters Using IoT for Smart Agriculture 757
1 Introduction
K. Rajaram (B)
Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of
Engineering, Chennai, Tamil Nadu, India
e-mail: rkanch@ssn.edu.in
P. K. Sharma · S. Selvakumar
IIIT Una, Una, Himachal Pradesh, India
e-mail: director@iiitu.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 759
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_56
760 K. Rajaram et al.
the maternal and child health and provisioning of health-related services to the bene-
ficiaries of India’s public health system. This ambitious project currently covers all
the states in India. MCTS required Auxiliary Nurse Midwifes (ANMs) to capture
data in the registers which are weekly or monthly uploaded in the RCH (Repro-
ductive and Child Health) portal. The ANMs used to carry the RCH registers with
them to the PHCs (Primary Health Centers) where the data entry operator collects
the data and update in the RCH portals, which is tremendously burdensome [2]. This
system of data uploading had a scope for data inconsistency and incompleteness as
well as fraudulent data entries, thus defeating the actual purpose of building a robust
system. The data were not being uploaded timely, and the feedback mechanisms
for the health-care workers to take appropriate action were not in place [3]. In a
study conducted in the Indian state of Haryana, there were some observations such
as lack of appropriate training [4], overburdened data entry operators and ANMs,
poor internet connectivity, slow server speed and frequent power failures.
In 2014, Mission Indradhanush was introduced to improve the vaccination rate
among rural population of India. Further, Intensified Mission Indradhanush with
some changes introduced in Mission Indradhanush was launched in 2017 to increase
the vaccination rate in urban slums and in higher urban classes to eradicate the
vaccine-preventable diseases (VPD) completely [2].
UNICEF along with the State Government of Bihar launched a computer tablet-
based MCTS in 2014, to capture real-time data online and to minimize the challenges
faced with the conventional MCTS [3]. The MCTS software contained modules
embedded within the tablet for entering data electronically. These data were preserved
and managed on a dummy server and could be accessed real time. The nonlinking of
the dummy server to the national MCTS portal has not lessened the burden of data
entry operators, who continue to enter data into the national portal as before. In 2017,
various states in India like Haryana, Jammu & Kashmir, Uttar Pradesh, Telangana,
etc., launched ANMOL (ANM Online), a tablet-based software built to eliminate
redundancy, automate data processing and empower healthcare personnel to achieve
improved throughput. ANMOL offered real-time data entry and update by the ANMs
through the options provided in the App for capturing mother and child related data.
ANMOL minimizes the paperwork required to fill the RCH registers by providing
fields in the App itself to register mother and children for health services. However,
this process required ANMs to fill the various fields in the App, while providing
vaccine to the children and to manually verify the details of the children to be vacci-
nated. This APP leads to several human errors and throws huge burden of manual
verification of the details on the ANMs.
Collecting immunization data, maintaining them in Electronic Immunization
Registers (EIR) and sharing the reports helps in better decision making to improve
the vaccination rates among the missed communities [5]. More meaningful infor-
mation can be extracted from the collected data by representing data in a visualized
format in the form of graphs.
To overcome the difficulties in the existing mobile applications used in RI sessions,
a robust system namely Nagarik Rog Prathirakshak Application (NRP-APP) is
proposed with the following functionalities:
NRP-APP: Robust Seamless Data Capturing and Visualization … 761
2 Literature Review
Various studies have shown that the use of growing mobile technologies (mobile
apps), have significantly enhanced immunization services. These studies emphasize
on aiding healthy diet and exercise through regular monitoring of one’s BMI, blood
pressure and caloric intake for the local communities where hospitals and health
centers are not easily accessible [6–8]. In a similar study, it has been shown that
m-Health has played a vital role in eradicating polio in developing countries [9]. The
m-Health work focuses at providing vaccination and routine immunization services in
low- and middle-income countries using mobile technologies for polio eradication.
It is also backed by another research conducted to identify m-Health intervention
studies on vaccination update in 21 countries. Ten peer-reviewed studies and seven
white or gray studies showed improved updating of vaccination after interventions
[10].
A study conducted under WHO to measure the impact of using e-Health tech-
nologies to encourage immunization and increase vaccination rate has shown positive
results which encourages the usage of mobile technologies for immunization [11].
A study conducted in Pakistan [12] to get the qualitative experiences of front-line
health workers and district managers while engaging with real-time digital tech-
nology to improve vaccination coverage in an underserved rural district in Pakistan.
It showed that the use of digital technologies has increased satisfaction, transparency
and enhanced reliability of the system. Time required to complete both manual and
digital entries and outdated phones over time were considered as constraints.
A software application called Jeev [13] to track the vaccination coverage of chil-
dren in rural communities which combines the power of smartphones and the ubiquity
of cellular infrastructure, QR codes and national identification cards. Its main focus
is to reduce childhood deaths by strengthening the immunization surveillance and
monitoring in developing countries as 24 million children born every year do not
receive adequate immunization during their first year. Comprehensive Public Health
Management application (CPHM) launched by a non-governmental organization in
Bengaluru, using which the data could be entered in an offline mode and required
to be synchronized with the cloud later. Although, it was easy to retrieve data from
762 K. Rajaram et al.
the field, many barriers were there such as internet connectivity, lack of technical
support and importantly, the health worker needs to visit the PHC to synchronize the
data [14].
A study conducted at the Aga Khan University Hospital vaccination centre in
Pakistan to evaluate whether an Artificial Intelligence based mobile app can improve
children’s on-time visits at 10 and 14 weeks of age. Study revealed that caregivers
suggested that the mobile app should have information regarding the doses and they
were interested in monitoring their children’s health progress through the app [15].
A vaccination App VAccApp [16] was developed by the Vienna Vaccine Safety
Initiative which enabled parents to keep track of vaccinations for their children, to
check the status of the vaccination and study previous vaccination history. In a study
conducted in rural Sichuan Province, China with 32 village doctors [17], showed
that village doctors found it more convenient to use the EPI App as it saved time by
looking up information of caregivers and contacting them for overdue vaccinations
at time.
3 Proposed Work
NRP-App uses SQLite [18] as local database which is a structured database embedded
in the App itself and stores data in the device’s local storage. As the SQLite database
resides in the device itself, internet connectivity is not required to access the database
and the data remains under the security of the android device. Figure 2 shows the
schema of local database. It consists of 9 normalized relational tables pertaining to
the details of children, mother, facilities, ANMS, vaccines and immunization. The
data in the tables NRP_child_immunization and NRP_vaccine_barcode is updated
during the RI session. The data in the rest of the tables are used in the RI session to
authenticate the children, verify the vaccine due, etc.
The immunization data warehouse NRP-DW is designed as a NoSQL columnar
model database using Cassandra. Apache Cassandra is an open-source NoSQL
distributed database having scalability and high availability without compromising
performance [19]. Linear scalability and proven fault-tolerance on commodity hard-
ware make it a perfect platform for mission-critical data. It is deployed in a HDFS
cluster of three nodes, where one node is the server and the remaining two are
data nodes. Figure 3 shows the schema of NRP-DW. It consists of four highly
denormalized tables holding the details of children, location or facilities, ANMs
and vaccine barcodes. In the table child_master, v1–v21 columns representing 21
vaccines are of user defined data type vaccine with four fields, date_of_vaccination,
weight_at_vaccination, anm_id_at_vaccination and facility_id.
On time data synchronization plays a vital role for real-time tracking and provisioning
of the services. Delayed uploading of data has always remained an issue which further
delays generation of work-plans for the ANMs. NRP-App supports real-time data
synchronization. After the RI session gets over, the ANM can synchronize the updated
records in the device’s local storage with the NRP-DW in the server using the option
provided in the App.
NRP-APP: Robust Seamless Data Capturing and Visualization … 765
Local database in the NRP-App and NRP data warehouse communicates with
each other via a layer of spring boot [20] API running on the server. The spring
boot API is configured with the NRP-DW’s Keyspace running on the cluster. Spring
Boot provides the libraries using which CRUD operations can be performed on the
configured database. API consists of various GET and POST methods, with different
API addresses each having a different purpose. To make any API call from the NRP-
App, Volley library [21] has been used. Volley is an android based library which
is used to make HTTP requests. Benefit of using Volley over simple http request is
that it uses cache to store the response of the http request, so that when the same
request is made again, it fetches the response from the cache without delay. With
API libraries from Volley, the data from NRP-DW is accessed and stored in the local
database in the app and vice-versa to update the NRP-DW with RI session data from
local database.
Language should not be a barrier for the front-line health workers to operate the app.
To eradicate this barrier, NRP-App supports multiple natural languages in the user
interface. Users can choose any language of their preference either in the starting
screen of the App or in the navigating screens by simply going to the menu bar.
Currently, it supports three languages (English, Hindi and Tamil), but any natural
languages can be supported. Inbuilt XML strings have been used in the App to
represent a word or a character in a language. Google Script API is used to translate
a word from English to another language. To extend the support for more languages
in the App, a few parameter values need to be changed in the Google Script API for
dynamic translation of English words. The XML strings to represent the words in
the additional language need to be added to the App for static translation.
NRP-App provides options to generate the day or week wise work-plan of an ANM
and vaccine due for children under an ANM or a facility. Figure 4 shows NRP-
App home screen containing various options. In day-wise work-plan, vaccines are
grouped and planned to be given on a specific day. For instance, DPT, second dose
of OPV and Hepatitis B and Pentavalent 1 are planned for Monday, second dose
of Pentavalent, DPT 3 and Hepatitis third dose are planned for Wednesday and so
on. This way it will be convenient for the ANMs to see their day-wise work-plan.
The weekly work-plan shows the list of all the beneficiaries who need to receive
vaccines in the current week. Similarly, the vaccine due list can also be generated
date wise or vaccine wise. With the date wise due list, an ANM can know the details
of the children who need to take vaccine but did not receive it till the current date.
766 K. Rajaram et al.
The vaccine-wise due list shows how many children did not receive each vaccine
in a particular facility. In summary, the work-plan lists the children who have not
received vaccines on the due date and beneficiaries which need to receive vaccines
in the current week, while the due list includes only the beneficiaries who have not
received vaccines on their due dates till current date. Tables 1 and 2 show a sample
work-plan as well as due list of a particular ANM extracted from NRP-APP.
In order to track long pending unimmunized children, certain color coding is
used in work-plan and due list. The work-plan records highlighted with green color
denote the beneficiaries who need to receive vaccines in the current week, while
green highlighted due list records indicate the beneficiaries who did not receive
vaccination due for them, in the last one week. Yellow colored records indicate that
the beneficiaries who are pending without taking the required vaccination, for more
than a week. The records in red colour show the beneficiaries who are pending for
a particular vaccine for more than 2 months. Figure 5 shows the color coding in
the day-wise work-plan. Lack of knowledge is the most prominent reason including
other reasons behind non or partial immunization of children in India [22]. With these
colored listings, ANM can easily identify the beneficiaries who require immediate
attention and can contact them to get vaccinated on a scheduled routine immunization
session. Thus, the work-plan and due list are useful in improving the immunization
coverage.
NRP-APP: Robust Seamless Data Capturing and Visualization … 767
Table 1 Work-plan report between August 08, 2021 and August 15, 2021
Manoj Sharma
DOB: 06-Jun-2020
Child Id: 777709256620
Mother Id: 222220742091
Last Visit Date: 10-Aug-2020
Baby of Vijiya K
DOB: 05-Jul-2021
Child Id: 888881002105
Mother Id: 111110013086
Last Visit Date: 05-Jul-2021
Baby of Saru R
DOB: 03-Jul-2021
Child Id: 888881002110
Mother Id: 111110013286
Last Visit Date: 04-Jul-2021
Baby of Kajol
DOB: 20-June-2021
Child Id: 888881002124
Mother Id: 111110013692
Last Visit Date: 21-Jun-2021
Baby of Janaki
DOB: 15-Dec-2020
Child Id: 888881002123
Mother Id: 111110013592
Last Visit Date: 14-Jan-2021
in a wireless manner. Android phones support various APIs useful for establishing
connection between Bluetooth-capable devices with which Bluetooth weighing scale
can be paired with the mobile phone and the reading can be obtained on the android
app. Our Bluetooth weighing scale use half Inch seven segment LED display (6-
digits) working on classic Bluetooth technology. As a next step, the ANM scans the
barcode on the vaccine vials to get the vaccine name, dosage information and the batch
details into the App’s local database. The data captured in this way will be accurate
without manual intervention. The RI session doesn’t require any manual entry by
the health worker with our proposed system and the data entry process is seamless
without any manual errors. Figure 6 shows the working of routine immunization in
the NRP-App.
NRP-APP: Robust Seamless Data Capturing and Visualization … 769
NRP-Portal has been designed to get reports on ANM’s work plan, due list based
on filters like district, block, PHC and sub centre. NRP-Portal communicates with
770 K. Rajaram et al.
the NRP DW via a layer of spring boot API. As the data warehouse is used by both
NRP-App and NRP-Portal, the APIs with different API addresses allow the user to
query and update the data into the NRP-DW.
Data visualization using graphs helps the human mind to comprehend the data
and identify trends, patterns and outliers within large data sets. It plays a vital role
in improving immunization coverage by allowing the authorities to take better deci-
sions. A dynamic dashboard with various filters has been designed using Metabase as
API, to visualize the vaccination trend in a facility. Metabase [28] is an open-source
business intelligence tool useful to ask questions about data, and displays answers in
visual formats like tables, graphs, etc., that make sense. It provides graphs showing
the children pending for vaccination for each vaccine or across different facilities like
state, district, PHC, etc. Optionally date filters can be applied. Figures 7 and 8 depict
bar graphs and pie charts showing vaccine-wise number of children pending for
vaccination and pending vaccination status across different states of India. Immu-
nization data pertaining to states such as Tamil Nadu and Himachal Pradesh has
been considered. With color coding of the bar chart, the alarming cases pending for
a long time can be easily identified and appropriate action can be taken for improved
coverage.
4 Experimentation
The HDFS based cluster with a name node and two data nodes has been set up.
The name node is a Intel Xeon server 3.3 GHz, 32 GB RAM, 4 Cores and 2 TB.
The data nodes are Intel i7-4 core workstations with 16 GB RAM and 1 TB HDD.
Our own dataset has been generated for the NRP data warehouse comprising of
NRP-APP: Robust Seamless Data Capturing and Visualization … 771
120
Response time in
100
seconds
80
60
40
20
0
0 2000 4000 6000 8000 10000 12000
Number of Users
immunization data for sample states of India such as Tamil Nadu and Himachal
Pradesh according to the schema shown in Fig. 3. The average population of Tamil
Nadu state is 7.7 crore in 2020 and around 30% of them are children. The number
of children approximately in a state could be 2.3 crores. The Cassandra based data
warehouse is assumed to withstand a load of children in a state. The details of children
along with other immunization details amounts to 3.27 crores of records. Hence, the
data warehouse is loaded with 3.27 crores of records with a storage size of 13 GB.
The data is generated using a tool called DbGen [29] and using MS Excel. DbGen is
a Windows based tool that can be configured based on the schema to generate data.
The proposed NRP-App is capable of running on all android devices (Mobiles and
Tablets) above android version 6.
Two experiments have been conducted and their objectives are given below:
• Load testing of NRP-APP to test its performance under high loads of the local
SQLite database.
• Stress testing to check the performance of the App while synchronizing the local
database and the data warehouse.
NRP-App has been tested using a tool called JMeter [30]. Apache’s JMeter is
an open-source test tool that is used to analyze and measure the performance of
applications.
For load testing of the App, the number or children under an ANM has been varied
between 50 and 200 and the minimum throughput and maximum response time have
been calculated by varying SQLite DB size from 0.2 to 0.8 MB. The testing results
are tabulated in Table 3. It is observed that for a fourfold increase in DB size, response
time increases by 40% and throughput decreases by only 26%.
As per the Rural Health Statistics Bulletin of 2014, there are 8682 sub-centers
which provide health services to the rural population of Tamil Nadu. Hence, while
performing stress testing of the data synchronization between the SQLite database
and NRP-DW, the number of users accessing the data warehouse has been varied
from 1000 to 10,000. The data warehouse size is kept constant at 13 GB. The perfor-
mance parameters such as response time and throughput values, while performing
the stress testing are shown as graphs in Figs. 9 and 10. It is realized that for a tenfold
increase in number of users, response time is 1.5 min only and a two- fold decrease
in throughput.
5 Conclusion
With the proposed robust system for seamless data capturing and visualization for
RI sessions, there is significant scope for improved coverage of immunization, while
improving data quality by reducing manual data entry errors. The multi-fold advan-
tages of the proposed system include: authentication of children and vaccine by
scanning RCH card and vaccine bar code, accurate acquisition of baby weight,
user interface supporting any natural language, real-time data synchronization and
data visualization using a dynamic dashboard. Color-coded wok-plans and due lists
helps front-line workers to easily identify long pending beneficiaries. A data storage
warehouse is designed using modern big-data storage and processing framework
and it provides increased performance. The proposed system has been load tested
with 0.8 MB of local database size and stress tested with 10,000 users for data
synchronization. The performance of the APP is found satisfactory.
Though the existing similar application, ANMOL provides immense set of func-
tionalities and features, NRP-APP is an attempt to provide important additional
features like user interface in any natural language, no manual data entry, color
coded work-plan and vaccine due lists and visualization of RI session data through a
portal. These features help the ANMs to simplify their work while improving vacci-
nation coverage. Biometric based registration and authentication of children which
further improves ease of use by the ANMs, and thus, immunization coverage is the
ongoing work.
Acknowledgements This work was supported by Grand Challenges India (GCI) for Immuniza-
tion Data: Innovating for Action (IDIA) funded by BIRAC and jointly funded by Department of
Biotechnology, Government of India and Bill & Melinda Gates foundation.
774 K. Rajaram et al.
References
Abstract The power of data lies in the insights derived from it. We trace this journey
as Data-Information-Knowledge-Wisdom and then come to the Insight (Rowley in
J Inf Sci 33:163–180, 2007). A data warehouse is meant for storing and processing
enormous amounts of data, gathered and transformed from various data sources
(Yessad and Labiod in 2016 International conference on system reliability and
science, ICSRS 2016—proceedings, pp 95–99, 2017). Data security is a major
concern in the data warehouse domain, along with the privacy and confidentiality
factors. This paper discusses the various measures and actions to be taken care of,
to protect the data in a data warehouse. We are considering the healthcare industry
use cases in this study, and the proposed measures are discussed in the context of
healthcare data. The healthcare industry is governed by various strict guidelines and
regulatory requirements in the aspects of data storage, processing and transferring.
Our proposed methods are concentrated on the areas of privacy and confidentiality of
healthcare data warehouse and consist of de-identification and user privilege-based
access controls.
1 Introduction
Information security is a serious concern for all organizations and industries across
the globe. The awareness and precautions taken with respect to information security
have increased over the past years. Healthcare is one of the industries where secu-
rity and privacy is a key concern. Healthcare industry is monitored and regulated
J. George (B)
Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education
Kumaracoil, Tamilnadu, India
M. K. Jeyakumar
Department of Computer Applications, Noorul Islam Centre for Higher Education Kumaracoil,
Tamilnadu, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 777
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_57
778 J. George and M. K. Jeyakumar
• Security—All the measures and precautions taken to safeguard the data and
network.
International privacy laws are formulated based on the risk versus benefit view-
point and data access shall be allowed only if there exists a substantial benefit for
the patient. HIPAA (Health Insurance Portability and Accountability Act) of the
United Sates came into picture to simplify the health information exchange between
healthcare organizations and to protect the rights of the patients [10].
PIPEDA (Personal Information Protection and Electronic Documents Act) of
Canada [11] is somewhat equivalent to HIPAA in the USA. The GDPR (General
Data Protection Regulation), which is the European Union regulation, focus on the
protection and privacy of data within the region. The major difference between the
GDPR and the HIPAA is the area of concentration. GDPR is for protecting the PII
(Personally Identifiable Information) of EU citizens, while HIPAA concentrates on
the PHI.
3 Methodologies
Let us consider the following scenario for the study. Consider a healthcare orga-
nization with multiple facilities and multiple Hospital Information Systems. A data
warehouse is to be created from these multiple source systems to have a consolidated
view of the clinical and financial information [12]. Figure 1 describes the data flow
from multiple data sources to the data warehouse through a staging area.
The possible data sources are:
• EMR—Various Electronic Medical Records
• LIS—Various Laboratory Information Systems
• RIS—Various Radiology Information System
• CRM—Customer Relationship Management Systems
• RCM—Revenue Cycle Management Systems/Claim Management Systems
• Various types of regulatory benchmark data and patient surveys.
Now, let us focus on the data warehouse formed after the ETL (Extraction Trans-
formation Loading) process [13]. Multiple groups of people will have access to the
centralized data warehouse, which demands proper data governance and security
mechanisms to safeguard the data from unauthorized access and data breaches.
The privacy rules permit the healthcare providers to disclose de-identified data for
secondary usage such as research and data mining. De-identification done on a scien-
tifically acceptable range ensures that re-identification of the person is not possible
using de-identified data [14].
We will be using 3 different approaches for de-identification, namely:
Data jumbling/hiding: All data which can potentially be used to identify an indi-
vidual shall be anonymized. This shall be done at the data source itself before moving
to the staging area for transformation and loading. Carefully designed scripts shall be
deployed for this. This will ensure, only the de-identified data shall leave the source
data location to the staging area. Data warehouse could be in a different geographic
region than the source systems. This will facilitate legitimate data movement with
compliance and privacy.
Data Removal—Removal/nullifying data, which could lead to identification of
individuals/patients.
Data Grouping—Replace absolute data with a data range. For example, if a
patient’s age is 45, this absolute value shall be replaced with a range, in this case,
40–50.
Table 1 shows a snippet of patient data at the source system and at the data
warehouse (Tables 1 and 2).
De-identification is one of the optimal ways of protecting privacy. However, one
should bear in mind that no method is 100% foolproof.
Methodologies to Ensure Security and Privacy of an Enterprise … 781
Table 3 shows the sample access right matrix of various roles assigned in the
healthcare data warehouse. For demonstration, we are considering the roles of Admin
Team, Physicians, Nurses, Pharmacy Team and Finance.
5 Conclusion
Data is the new oil and in today’s world data is the driving force behind every
business. Secured storage of data, irrespective of whether it is online or offline, is
not an optional feature but mandatory in current era. Data is more valuable than any
other assets of any organization. Data warehousing, data lakes and big data concepts
are getting more popular day by day and security threats are also looming large.
Methodologies to Ensure Security and Privacy of an Enterprise … 783
A lot of researches and advancements have already taken place in the data security
arena, but data security is still an open-ended question. In this paper, we proposed two
prominent approaches, namely de-identification and role-based security, to ensure
privacy in the healthcare data warehouse. The power of data is enormous, and in the
same way, every individual has the right to protect his or her private information.
References
1. Rowley J (2007) The wisdom hierarchy: representations of the DIKW hierarchy. J Inf Sci
33(2):163–180. https://doi.org/10.1177/0165551506070706
2. Yessad L, Labiod A (2017) Comparative study of data warehouses modeling approaches:
Inmon, Kimball and Data Vault. In: 2016 international conference on system reliability and
science, ICSRS 2016—proceedings, pp 95–99. https://doi.org/10.1109/ICSRS.2016.7815845
3. Khalifa Alhamami H, Kumar Udupi P (2021) Warehouse safety and security. GSJ 8(7).
Accessed 24 Jul 2021. [Online]. Available: www.globalscientificjournal.com
4. Mathur S, Gupta SL, Pahwa P (2020) Enhancing security in banking environment using business
intelligence. Int J Inform Retrieval Res 10(4):21–34. https://doi.org/10.4018/IJIRR.202010
0102
5. Kong G, Xiao Z (2015) Protecting privacy in a clinical data warehouse. Health Inform J
21(2):93–106. https://doi.org/10.1177/1460458213504204
6. Sorathiya H, Patel A, Jain H, Khajanchi A (2017) Security in data warehousing. Int J Eng Dev
Res 5(2). Accessed 24 Jul 2021. [Online]. Available: www.ijedr.org
7. Konda S, More R (2021) Augmenting data warehouse security techniques-a selective survey.
Int Res J Eng Technol. Accessed 24 Jul 2021. [Online]. Available: www.irjet.net
8. George J, Bhila T (2019) Security, confidentiality and privacy in health of healthcare data. Int
J Trend Sci Res Dev 3(4);373–377. https://doi.org/10.31142/ijtsrd23780
9. Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security
and privacy. J Big Data 5(1). https://doi.org/10.1186/s40537-017-0110-7
10. HIPAA in a Nutshell—RightPatient. https://www.rightpatient.com/blog/hipaa-explained/.
Accessed 24 Jul 2021
11. Personal Information Protection and Electronic Documents Act. https://laws-lois.justice.gc.ca/
ENG/ACTS/P-8.6/page-1.html
12. Kimball JR, Caserta (2004) The data warehouse ETL toolkit
13. Ong TC et al (2017) Dynamic-ETL: a hybrid approach for health data extraction, transformation
and loading. BMC Med Inform Decis Mak 17(1):134. https://doi.org/10.1186/s12911-017-
0532-3
14. Methods for De-identification of PHI|HHS.gov. https://www.hhs.gov/hipaa/for-professionals/
privacy/special-topics/de-identification/index.html. Accessed 27 Jul 2021
15. Wyllie D, Davies J (2015) Role of data warehousing in healthcare epidemiology. J Hosp Infect
89(4):267–270. https://doi.org/10.1016/j.jhin.2015.01.005
Comparative Analysis of Open-Source
Vulnerability Scanners for IoT Devices
1 Introduction
Internet of things devices are becoming more and more common in everyday life.
These “things” often control many daily functions whether it is apparent or not.
Everything from Signage displays in a company building, to security cameras, to
smoke detectors, to even industrial applications. IoT is everywhere. 2018 saw an
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 785
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_58
786 C. deRito and S. Bhatia
estimated 3.96 billion devices, 4.81 billion in 2019, and in 2020 up to 5.81 billion
worldwide [1]. The vast number of IoT devices is hard to comprehend, but they are
in use practically everywhere. Making these devices intrinsically secure should be a
priority of manufacturers, administrators, and even consumers.
IoT stands for Internet of things. “Internet of things (IoT) is a collection of many
interconnected objects, services, humans, and devices that can communicate, share
data, and information to achieve a common goal in different areas and applica-
tions.” [2]. These IoT devices are network connected to share information between
devices or to computer systems. Often IoT devices serve a singular specific purpose.
For example, an IP security camera captures footage and sends it over the network.
In a commercial building, a signage player receives a video feed and outputs it to
a display. IoT devices can be used in a variety of situations such as transportation,
agriculture, health care, and industrial settings [3]. The focus of this paper will be
on smart home devices. Smart home IoT devices are very similar to the other types,
however often catered toward the more consumer market. This means the devices are
designed with ease of use and accessibility as main priorities. Security is sometimes
seen as not important in some of the cheaper and less reputable manufacturers but is
still upkept in the more popular ones.
Vulnerability scanners are automated tools that scan given networks or systems on the
network and produce a set of scan results. These vulnerabilities can be software bugs,
backdoors, missing patches, misconfiguration, and vulnerable ports and services [4].
Vulnerability scanners are an essential part of any organization’s vulnerability man-
agement program. These scanners by no means are a one-stop solution to finding
vulnerabilities, but they greatly cut down on the amount present and time taken to
find and resolve them. Vulnerability scanners are mainly used by organizations and
companies, as they have large inventory of devices and often hold large amounts of
sensitive data. However, these types of programs can also be used in a home setting
to ensure that a home network and systems are secure. In fact, if fixing vulnerabilities
in a home network are desired, then this is most likely the best route to go. It takes
the need for experience and knowledge of security out of the equation. The scanners
present the results directly to you and often also include clear steps on resolving
each vulnerability. This paper will focus on vulnerability scanners utilized in a home
network, but the concepts are very similar and can translate to a company setting.
Comparative Analysis of Open-Source … 787
IoT devices are becoming for and more prevalent in daily life; however, the security
of these devices is often overlooked. Vulnerability scanners are a great solution
to automating and ensuring the security of devices. There however aren’t many
options for vulnerability scanners that are specifically meant to work with IoT. Ideally,
there would be vulnerability scanners built for IoT, targeting the vulnerabilities that
affect those devices and the architecture of them. These scanners would decrease the
frequency of attacks on IoT devices, including the ever-common botnets that plague
IoT. Analyzing and testing some of the common open-source vulnerability scanners
can give some insight into how well these scanners work with IoT devices, if at all.
This will also show what is missing from the scanners in terms of IoT support and
how this goal can be achieved. It can also provide insight into the ideal use cases that
each scanner tested would be ideal for.
This paper analyzes and compares the use of several open-source vulnerability
scanners used with home IoT devices. The paper covers all aspects of using these
programs: the ease of use, support available, effectiveness of the scanners, direction
provided in mitigation, and various operational metrics. In the end, a comprehensive
analysis of each scanner will be provided, discussing the advantages and disadvan-
tages of each, as well as their best use cases with the intent to provide an informative
viewpoint on the selection of vulnerability scanner based on a hands-on analysis and
comparison.
2 Related Work
IoT and specifically IoT security is a very popular topic in academic research cur-
rently. Vulnerability scanner research is also fairly abundant; however, the focus is
on vulnerability scanners for the web, which is not a focus of this paper. Even though
the two topics separately are well-covered, there still seems to be little in the way of
vulnerability scanners meant for IoT devices.
Chalvatzis et al. [4] cover a comparison and analysis of vulnerability scanners for
general use with standard systems. This is a similar concept to the research conducted
in this paper, but without the focus on IoT. It goes through three scanners in total.
Nessus, OpenVAS, and Nmap.
In terms of IoT devices, there are a few studies on IoT vulnerability scanning.
However, there is a difference between scanning for vulnerable IoT devices and
scanning IoT devices for vulnerabilities. The papers [5, 6] go into scanning a network
or the Internet in general for vulnerable devices. This is done through applications
such as Shodan or Masscan. This allows you to search the Internet of things for
devices with specific vulnerabilities. For example, search for webcams with default
username “admin” and password “admin.” Shodan and Masscan are very powerful
tools for what they are built for, just not within the scope of this research.
788 C. deRito and S. Bhatia
IoT security is a very popular research topic, and there are many papers going
into the specific types of threats that plague IoT and ways to mitigate said threats.
These also go into well-known vulnerabilities and mitigation techniques. Hassija et
al. [7] go into the threats that IoT devices face on the various layers. It also talks
about ways to secure these devices such as blockchain, machine learning, and edge
computing.
Anand et al. [8] go into the specific vulnerabilities of IoT devices across all
functions. It is a thorough review of IoT itself, the security problems that these
devices face, the broad vulnerabilities that are commonly seen, and solutions to
these issues. Similarly, Corp [9] goes into the same attack vectors commonly seen
with IoT, as well as case studies exploring these areas. Many IoT device categories
are explored including drones, IP cameras, smart cars, smart thermostats, etc.
Smart home is a separate category of IoT in of itself. The key difference here is
that these devices are intended to bring automation and more specifically ease of use
to the consumer market. For example, Corp [9] dives into IoT smart home and city
environments along with the security risks that these devices pose. IoT can be used
in realistically any environment or industry, so focusing on the smart home area will
have significant differences from other areas such as industrial applications.
3 Methodology
This study will dive into the practicality of different open-source vulnerability scan-
ners and how well if at all they can be used with IoT devices. Five different vul-
nerability scanners and five different IoT devices have been chosen to carry out the
testing. In total, there will be twenty-five different tests performed, each vulnerability
scanner against each of the five IoT devices.
Five different open-source vulnerability scanners are going to be tested. They were
chosen based off a few factors. The first is that they are open source, as closed source
and paid software will usually have barriers or restrictions on how they can be used
at different pay tiers or settings. The second factor is how well-known they are. Most
of the list comprises of well-known scanners or from reputable groups. The last
factor is how well they could potentially be used with IoT devices in specific. The
five scanners chosen are OpenVAS (Greenbone Vulnerability Management), Vuls,
SNOUT, Vulscan, IoTSeeker.
OpenVAS Originally developed as a completely free open-source project, Open-
VAS is now developed by a company called Greenbone as part of their Greenbone
Security Management (GSM) product [10]. This product is a complete all in one vul-
nerability management solution. The source code is still open source; however, they
Comparative Analysis of Open-Source … 789
offer both a free Greenbone Community Feed (GCG) and subscription Greenbone
Security Feed (GSF). The GSF subscription feed offers more features along with
support and various enterprise applications. The free GCG is completely adequate
for home use and the purposes of this analysis.
OpenVAS also was the original version of the Nessus Vulnerability Scanner.
Nessus is a closed source product that offers very similar features as OpenVAS.
OpenVAS forked from the GPS version of Nessus (version 2) after it went proprietary
in 2005. The plugins for OpenVAS are still written in the Nessus Attack Scripting
Language (NASL) [4].
Vuls Vuls is an open-source agentless vulnerability scanner for Linux/
FreeBSD [9]. The vulnerabilities that it searches for are all based on multiple vulner-
ability databases, which are NVD, OVAL, JVN, RHSA/ALAS/ELSA/FreeBSD-SA.
The scanner can be run anywhere, meaning it can be installed and run on the cloud,
physical hardware, virtual machines, and within Docker. Vuls is also capable of scan-
ning various components of a system. It can scan non-OS packages such as libraries,
frameworks, or code compiled yourself. These however must all be registered in the
Official Common Platform Enumeration Dictionary (CPE). This is simply a naming
scheme for IT systems, software, and packages.
In terms of UI and reporting. Vuls has both a terminal user interface (TUI) and
web-based graphical user interface (GUI). Both options are very descriptive in the
results they show, providing all sorts of information on the specific vulnerabilities
found. Email and Slack notifications can also be configured to send notifications to.
Snout Snout stands for SDR-based Network Observation Utility Toolkit [11].
This application utilized software-defined radio (SDR) to be able to interact with the
various non-IP wireless communication protocols, the most popular of which and the
two that SNOUT supports being Zigbee and Bluetooth low energy. SNOUT is adver-
tised as having a few features, being device enumeration, vulnerability assessment,
advanced packet replay, and packet fuzzing. This program is not only a vulnerability
scanner, but also has other features that support functions such as penetration testing.
SDR is a radio frequency communication method that does all of the processing
on the software level. The actual hardware (transmitter, receiver) is used just to
send and receive messages. The messages themselves are created with software.
This function allows SNOUT to be able to communicate on the different wireless
protocols, as they do not use the same communication methods that a Wi-Fi dongle are
physical connection would be capable of. This does mean that this program requires
a supported transceiver to be able to perform this SDR-based communication. The
supported ones is HackRF One and USRP transceivers.
Vulscan This scanner is an addition to the popular and well-known command line
utility Nmap [12]. Nmap supports custom scripts to allow additional functionality.
Vulscan utilizes the Nmap flag -sV which enables version detection. From here,
the script analyzes the port number, port state, service running, and version of that
service to come up with a prediction of the vulnerabilities on that specific host and
ports.
Results from this scanner can be complicated and hard to interpret. Since Nmap
is a command line utility, the results are presented in this fashion. In turn, the results
790 C. deRito and S. Bhatia
are not very intuitive or easy to read. It can take some time to go through the results
and determine for yourself what is of concern and what can be mostly ignored.
From the Vulscan GitHub, “Keep in mind that this kind of derivative vulnerability
scanning heavily relies on the confidence of the version detection of Nmap, the
amount of documented vulnerabilities and the accuracy of pattern matching” [13]
The resulting vulnerabilities presented are very dependent on how accurate Nmap
is of services and versions being detected. This is especially important for IoT use,
as many of the devices meant for the home are using their own proprietary services
or custom embedded operating systems that may not be easily detected by Nmap,
resulting in no results found or incorrect results.
IoTSeeker This isn’t a traditional vulnerability scanner in the sense that it scans
for all vulnerabilities on a system [13]. This scanner scans for a single vulnerability,
but a very relevant and common one. This vulnerability is default credentials. Often,
owners will deploy an IoT device without changing the default logon credentials or
configuring the device at all. This is very common and easy to discover on services
such as Shodan. It will allow any unauthorized person to have full administrative
rights to a device.
IoTSeeker is open source and developed by Rapid7, the creators of Metasploit
and other IT security solutions [13]. It is a less well-known scanner but is developed
by a reputable and well-known company. Although this only scans for a single vul-
nerability, it is a very useful one that the other scanners do not check for. The script
includes a file containing the username and password combinations that it checks
for, which according to the creator is updated often.
The devices used in this analysis were chosen based on a few reasons. The types of
devices used were those that are commonly seen in the smart home environment.
These devices are a smart personal assistant, security camera, Smart IoT Hub, Zigbee
sensor, and a Raspberry Pi.
Smart personal assistants are seen everywhere and constantly used in the smart
home environment. These are devices that you can verbally ask questions and get
responses. They are based off of speech recognition and artificial intelligence to
determine what the user asks and to come up with the appropriate response. As of
2017, an estimated 10% of worldwide consumers own a smart personal assistant [14],
so in 2021 the percentage should be higher. The most popular devices are Amazon
Echo, Google Home, and Apple HomePod. In this test, the Amazon Echo was chosen.
As of 2017 over 50 million Amazon Echo’s have been sold in the USA alone [14].
Security cameras are one of the most popular IoT devices currently used. Not only
are they used in the home, but also across all industries. In 2017, 98 million network
surveillance cameras and 29 million HD CCTV cameras were distributed [15]. Not
only are these devices used frequently, but often by users who are not aware of the
Comparative Analysis of Open-Source … 791
security risk they pose [15]. This is the case with most home IoT devices, but IP
cameras in particular.
In the home, IoT devices are often connected to one centralized device so that
they can be better managed and monitored. These devices are called hubs or smart
home hubs. The hub is the heart of the system and controls all elements connected
to it [16]. Hubs are very common in the home and critical to secure. They are a
gateway to all devices connected to it, so if this device gets compromised, all devices
connected are compromised.
The Zigbee device was chosen to analyze the effectiveness and how necessary it
is for a scanner to support non-IP wireless protocols such as Zigbee and Bluetooth
LE. In the tests performed, Snout is the only scanner to contain this feature. Usually,
insight into these devices is limited to the information sent to the gateways (hubs) that
connect the device [11]. Including one of these devices will give insight as to how
crucial and impactful it can be for a scanner to support the other wireless protocols.
The last device being tested is a Raspberry Pi. A Raspberry Pi is a very popular
computer about the size of a credit card that can be fully set up and configured any way
the user wants. They are often used in the home for DIY IoT projects that can utilize
the small amount of processing power, small size, and control that these bring. They
are perfect for building out custom solutions that may prove to be much cheaper or
provide more control than other off-the-shelf options. The Raspberry Pi has built-in
support for 10/100 Mb/s Ethernet, Wi-Fi, and Bluetooth [16]. Although not as easy
to set up and use as the other smart home devices tested, the Raspberry Pi would be
the kind of device used by individuals who would also deploy a vulnerability scanner
in their home. Table 1 lists the IoT devices used in this study.
Each of the vulnerability scanners was installed into their own virtual machine using
VMWare. The VM’s were configured with 4 GB of Ram, 1 Processor, 45 GB Hard
Drive, and a bridged network adapter. This is plenty of storage and resources to run
the Linux OS’s and scanners. Bridged networking allows the virtual machine to act as
if it was its own device on the network it’s connected to. In this case, the VM appears
792 C. deRito and S. Bhatia
as a device on my home network and the DHCP server in the router assigns the VM
an IP. This allows the device to communicate with all other IP’s on the network.
The IoT devices are all configured and setup as if they were being used normally.
This means downloading the required apps and connecting them as needed. The
Aqara Zigbee sensor was connected to the Aqara Smart Hub so it would be actively
transmitting data over the Zigbee protocol. The Raspberry Pi was setup with Raspbian
OS, which is a custom version of Debian meant for the low computing power of the
Raspberry Pi. SSH, Telnet, and Apache were all installed and enabled as these are
services sometimes used to communicate to IoT devices. They were also enabled to
provide some vulnerabilities to discover.
Once everything was setup and installed, the scans were performed. This was
fairly straightforward at this point; however, the procedure for each scanner varied.
Once the scans finished, there were a few things that were looked for. If the scan
worked, how long the scans took, how many vulnerabilities were found, and how the
usability was.
TRIAL. GSM Source allows the user to download the source code and compile it
directly on their machine. This allows for more control of the installation but is also
much more complicated, not to mention time-consuming. The GSM Trial is a free
version of their professional solution meant to run on laptops or virtual machines.
This solution is incredibly simple to implement, consisting of just an ISO file and
that can be installed like any other OS.
All in all, the program is very simple to install and configure if the GSM Trial
route is taken. The whole download, installation, and configuration process only took
around 10 min to complete and be fully operational.
Vuls Documentation for this scanner from the creators was substantial but seemed
to be lacking in the community engagement. The Vuls website itself has plenty of
information on getting the application started and operational, but other than that
there isn’t much information on other parts of the internet. Very little can be found
on YouTube, however a few guides can be found through Google on various blogs.
Installation was fairly straightforward. It can either be installed manually, requir-
ing the need to install each module separately and installing all dependencies, or
using what they call Vulsctl. Vulsctl is an easy to install docker image that offers the
advantage of non-complex installation and quick startup. The basics of docker need
to be understood to use it at a moderate level, but all of the essentials can be found
through the tutorial that the creators provide.
Snout Out of all the scanners in this list, Snout is in the earliest stages. Being
at version 0.0.1, it is at the very first release, meaning that no bugs have been fixed
and that it may not be as refined as it could be. The documentation is almost non-
existent. The only materials that were found was the GitHub page that gives very
brief installation instructions, a brief showcase video by the creators with some
sample usage and explanations, and the academic paper written for it. Out of all this
material, the only install instructions are from the GitHub page that contains two
simple Linux commands, where in reality the build process was much more involved
when dependency errors and build errors came up.
The installation and build were attempted on various operating systems and ver-
sions to see if the issues would be resolved. These however persisted regardless. The
build and installation process was all in all fairly frustrating. Although the package
was installed and could be run, in the end a usable installation of SNOUT was not
achieved. SNOUT relies upon SDR for its communication with the various wireless
protocols on different frequencies. As it turns out, the creators only built it to sup-
port two pieces of SDR hardware, the Hack RF and Ettus Research USRP devices.
This was not known beforehand and only discovered once a working installation of
SNOUT was achieved and commands run.
Vulscan Built upon the famous Nmap port scanner, Vulscan was very easy to
install and use. Nmap is installed directly through the package manager (“apt” on a
Debian-based OS). This generates a folder directory /usr/share/nmap/scripts, where
all Nmap scripts can be copied to. Once this is done, running the scanner is as easy
as running a Nmap scan specifying the script to use. Documentation for Nmap itself
is very robust. Nmap’s popularity means that there are a ton of resources that exist on
how to use it. Most of this can be applied to Vulscan as the general syntax is the same
794 C. deRito and S. Bhatia
OpenVAS OpenVAS was able to scan almost all of the devices. The Raspberry Pi
had the most robust results with all levels of vulnerabilities being detected. This is the
closest to a traditional device that would be scanned so this makes sense. The Wyze
cam v3 and Aqara Smart Hub also had results, with mostly log level vulnerabilities,
but also a low- level vulnerability. The Amazon Echo was not able to be scanned at
all. This seems to be the most consumer aimed device with rather strict security in
place. The scanner could not pull any information from it and did not even report
that the host was scanned at all.
One strength that OpenVAS seems to present is the amount of information gath-
ered. Not only does it detect vulnerabilities, but the reports also show various other
information. This includes the OS that is running, open ports and the services, and
applications running. This is all very useful information, especially for IoT devices
that are not monitored as closely as other machines would be. Even if vulnerabilities
aren’t directly detected and reported by OpenVAS, the user reviewing the results can
see what is going on with that device and if it lines up with what is desired. The
presence of log level results also can prove to be useful. Even though they aren’t
vulnerabilities per say, these results can provide solid insight into how the devices
are operating.
Overall, OpenVAS provided great results that the other scanners could not achieve.
This scanner would prove to be the most useful in situations where large amounts
of information are desired. The amount of insight that this scanner provides is very
useful in enumeration and device monitoring. Also, no additional steps needed to be
taken on the end of the target devices. It is all preformed remotely without the need
to install agents or setup authentication.
Vuls Vuls provides thorough results for the devices that it could scan. That is
the catch, however. In order to scan a device with Vuls, there needs to be key-based
Comparative Analysis of Open-Source … 795
SSH authentication setup on the target. This is not always possible with home IoT
products, as the manufacturers to not allow that type of control over the device. This is
understandable though and for security reasons. Consumers do not need to have that
level of control over a smart home device, as they are provided with an application
that allows control that way. The only device that was able to be scanned with Vuls
was the Raspberry Pi, as this is completely setup by the user, and they have control
of the root account allowing anything to be done.
The vulnerabilities that Vuls detected were not very informative. The Web site
shows vulnerabilities with much more detail including remediation steps, but the
ones that came up in this project had most fields blank. The only information present
was the CVE number, effected process/packages, and a short description of the vul-
nerability. Having more information present such as remediation steps and criticality
would be much more helpful to the user in determining what to do in response to
each vulnerability. Whether they should be fixed and how or can be ignored and left
alone.
Vuls would be great for instances where all devices are able to have ssh setup, but
in this usually would not be the case. With Home IoT this generally is not possible so
Vuls should be avoided as it would not provide full coverage. If all home IoT devices
are self-made using Raspberry Pi’s or other microcomputers, then Vuls would be
possible and provide a higher level of security than the other scanners. Having SSH
setup would also allow remote management of each device which could prove itself
to be very useful.
Snout Based on a few glaring reasons, Snout does not seem to be in any usable
state for consumers currently. The package has quite a bit of potential in being an
all-around utility for IoT security, but its current version does not quite hit that mark
for the average user. The package all-around needs some more work in many areas,
in particular the installation process. It is on its very first version (v 0.0.1) so this is
completely expected. It however does seem like more of a proof of concept rather
than a supported package.
The installation process and documentation are the biggest problems with the
current iteration. The package needs to be built from source, which is never an easy
thing to do on Linux, and thorough knowledge of the OS is needed to properly
go about this. There are some issues with the build process that was encountered
on multiple OS installations and multiple Linux distributions, so this needs to be
addressed. The documentation also could use a decent amount of work. The GitHub
page does not give clear instructions of how the installation is done, and there is no
wiki or docs supporting the package.
Disregarding the state that the package is in currently, Snout boasts some impres-
sive features that could have many different potential use cases. It can be used for
monitoring and enumeration. If it is desired to see what devices are connected or
using the different wireless protocols (Zigbee, Bluetooth LE) around the area, this
is the best option. In terms of vulnerability scanning, the package does seem to be
lacking here. It does have the potential to scan Zigbee devices for vulnerabilities,
but as the demo video demonstrated, only one vulnerability can be detected with
Zigbee. This is the ZLL vulnerability. Although Zigbee isn’t a major target of mali-
796 C. deRito and S. Bhatia
cious users, there definitely exist security risks that need to be managed. Snout has
the framework and ground set to perform more robust vulnerability scans but needs
more in the way of vulnerability detection.
Since Snout has many more features than vulnerability scanning, I would not
recommend it as the only solution in place for this, but it is a solid toolkit for
managing and monitoring a smart home environment. It can best be used in a situation
where there are multiple wireless protocols being utilized and many devices in a
close vicinity. Snout will allow the user to enumerate all devices and confirm that
everything is in order. It should be used alongside a more robust and established
vulnerability scanner however, because this aspect is not a fleshed out as would be
needed. This would also require a fairly tech-savvy user to get the package installed
and in a usable state. It would not be feasible for an average user.
Vulscan Nmap paired with the Vulscan script make a very effective scanning tool.
Capable of scanning devices completely remotely without any interaction with the
device itself, this solution would be very effective on a smart home network with a
wide variety of devices from different manufacturers. The script interacts with the
device purely at the network level, so nothing is altered or changed on the device
itself. IoT devices are different but also very similar in terms of the protocols they
utilize. The common protocols are HTTP, HTTPS, SSH, Telnet, and a few others.
These protocols and services running can be analyzed very easily with Nmap and
Vulscan. Vulscan also allows a wide variety of customization regarding how the
results are presented and what vulnerability databases are used for each scan.
The issue with Vulscan is that it requires the device to have open ports and run-
ning common services to operate normally. With more specialized devices, this isn’t
always the case. For example, the Wyze Cam had only one open port and it was
running a proprietary service. The Amazon Echo was running a few open ports but
would not allow Nmap to scan it at all. It would never return any fingerprints of the
services running.
This scanner would be also useful when you need just a quick and easy solution to
vulnerability scanner. Very easy to install and preform scans with, Vulscan is the ideal
solution for quick tests. It is not a thorough be all end all solution for vulnerability
scanning but will give a good idea of what is vulnerability on the OS level, and
port/service level. Since it runs on Nmap, this scanner will also be very familiar for
a lot of people. Nmap is very popular so users are likely to have used it in the past if
they are involved in IT or technology.
Lastly, Vulscan would be a good option for consumers who want a non-intrusive
scanner. It requires no installation or interaction of any part with the target devices,
will leave almost no trace on the target machines, and will not interfere with the
device’s normal operations. All in all, Vulscan is a great quick and easy solution that
will present a basic but also satisfactory understanding of the vulnerabilities present.
IoTSeeker This seems to be a great scanner if you are looking for the bare
minimum in terms of IoT security. A common problem with these devices is that
they are not configured or set up correctly. Especially when a large home network
worth of devices are deployed, sometimes these steps can simply be neglected. Often
times not on purpose, but one or two devices might get skipped over. This scanner
Comparative Analysis of Open-Source … 797
is perfect for this kind of situation. It also ideal for an occasional use, as once this
vulnerability is fixed, it will not come back unless the device is completely reset.
IoTSeeker would be perfect to use after many devices are deployed to ensure nothing
was glossed over. It can also be used maybe a few times a year if devices are slowly
accumulated over time.
This script is actually the only one out of the five that searches for this vulnerability.
Since most of the other scanners are not tailored toward IoT devices, they are not
searching for default IoT manufacturer passwords. This scanner should most likely
be used alongside a more robust scanner that looks for multiple vulnerabilities.
IoTSeeker in conjunction with Vuls or OpenVAS can ensure that most of the bases
are covered.
IoTSeeker seems to not support many device types as of current. It includes a
configuration file with the device names and the username and password combina-
tion it checks for. At the time of writing this, there were only 18 different devices
supported. In the grand scheme of things, this is a tiny amount when compared to
the number of IoT manufacturers and products that exist. This file can be added to
manually, but this takes out the convenience and ease of use the script has.
Default usernames and passwords are a very common problem with IoT devices.
Leaving these configured like this makes it very easy for an attacker to gain access.
To an attacker that is aware of this vulnerability, it is practically like leaving the
device completely open with no password. IoTSeeker is a great solution to this, even
if to just ensure that no device was overlooking in its setup. All in all, a very simple
script, but very useful as well. This would definitely be recommended as the time it
takes for installation and execution is negligible. Tables 2, 3, and 4 summarize the
comparison of IoT vulnerabilities discovered, scan times of vulnerability scanners,
and additional information discovered by vulnerability scanners, respectively.
5 Conclusion
In general, most of these vulnerability scanners have a way to go for being as effective
with IoT devices. The general framework and functionality exist, but the specifics
for IoT still have more to be desired. The open-source scanners are all very robust
and work well for the intended target hardware. The functionality just needs to be
extended. With IoT devices, it is also important to have minimal direct interaction
with the target devices themselves. Not all IoT devices can be accessed through the
OS with services such as SSH, some are much more locked down and only allow
interaction through their custom-built services like phone applications. When using
a scanner for this field of hardware it is important to ensure that something such as
an SSH connection or any configuration directly on the device does not need to be
done.
Most of the scanners used in these tests leverage vulnerability databases to base the
discoveries off of. These vulnerability databases are precompiled with vulnerabilities
discovered and corresponding information. These databases need to be updated to
include IoT devices, or a separate database needs to be created focused on IoT. There
are a few initiatives ongoing to create such a thing, but they aren’t fully developed
yet. For example, the Warren B. Nelms Institute at the University of Florida is in
the process of building an IoT-specific security vulnerability database called IoT-
SVC [17]. It is a great start, but more work still needs to be done to make it a reliable
source. The last measure that needs to be taken with IoT scanners is the different
wireless protocols. The Snout scanner implemented this and is a good start, but still
leaves a lot to be desired. It is simply not robust enough or developed enough to
be effective at vulnerability discovery. This functionality of scanning other wireless
protocols needs to be expanded and combined with the vulnerability discovery of
other scanners like OpenVAS or Vuls.
IoT devices are in use across practically all industries and environments, with the
total number of devices worldwide being upwards of 5 billion. Vulnerabilities present
on these devices can make them easy targets for attackers. Current vulnerability
scanners can be used with IoT devices, but the effectiveness is not consistent across
different device types and manufacturers. Expanding the functionality of open-source
scanners or having an IoT-specific scanner will greatly improve the security of these
devices and the convenience of making them as secure as possible.
References
1. Goasduff L (2021) Gartner Says 5.8 Billion enterprise and automotive IoT endpoints will be in
use in 2020. https://www.gartner.com/en/newsroom/press-releases/2019-08-29-gartner-says-
5-8-billion-enterprise-and-automotive-io. Accessed 8 June 2021
2. Mahmoud R, Yousuf T, Aloul F, Zualkernan I (2015) Internet of things (IoT) security: current
status, challenges and prospective measures. 2015 10th International conference for internet
technology and secured transactions (ICITST). IEEE, New York, pp 336–341
800 C. deRito and S. Bhatia
3. Deogirikar J, Vidhate A (2017) Security attacks in IoT: a survey. In: 2017 International con-
ference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC). IEEE, New York,
pp 32–37 (2017)
4. Chalvatzis I, Karras DA, Papademetriou RC (2019) Evaluation of security vulnerability scan-
ners for small and medium enterprises business networks resilience towards risk assessment.
In: 2019 IEEE international conference on artificial intelligence and computer applications
(ICAICA). IEEE, New York, pp 52–58 (2019)
5. Amro A (2020) Iot vulnerability scanning: a state of the art. Comput Security, pp 84–99 (2020)
6. Markowsky L, Markowsky G (2015) Scanning for vulnerable devices in the internet of things.
2015 IEEE 8th International conference on intelligent data acquisition and advanced computing
systems: technology and applications (IDAACS), vol 1. IEEE, New York, pp 463–467
7. Hassija V, Chamola V, Saxena V, Jain D, Goyal P, Sikdar B (2019) A survey on IoT security:
application areas, security threats, and solution architectures. IEEE Access 7:82721–82743
8. Anand P, Singh Y, Selwal A, Alazab M, Tanwar S, Kumar N (2020) IoT vulnerability assess-
ment for sustainable computing: threats, current solutions, and open challenges. IEEE Access
8:168825–168853
9. Corp F (2021) Vuls. https://github.com/future-architect/vuls. Accessed 8 June 2021
10. Rahalkar S (2019) Openvas. Quick start guide to penetration testing. Springer, Berlin, pp 47–71
11. Mikulskis J, Becker JK, Gvozdenovic S, Starobinski D (2019) Snout: an extensible IoT pen-
testing tool. In: Proceedings of the 2019 ACM SIGSAC conference on computer and commu-
nications security, pp 2529–2531
12. Vulscan (2021) https://github.com/scipag/vulscan. Accessed 8 June 2021
13. Rapid7 (2017) IoTSeeker: locate connected IoT devices and check for default passwords.
https://information.rapid7.com/iotseeker.html. Accessed 8 June 2021
14. Bugeja J, Jönsson D, Jacobsson A (2018) An investigation of vulnerabilities in smart connected
cameras. 2018 IEEE international conference on pervasive computing and communications
workshops (PerCom workshops). IEEE, New York, pp 537–542
15. Yang H, Lee W, Lee H (2018) Iot smart home adoption: the importance of proper level automa-
tion. J Sensors 2018 (2018)
16. Singh KJ, Kapoor DS (2017) Create your own internet of things: a survey of iot platforms.
IEEE Consumer Electron Maga 6(2):57–68
17. Jin Y (2018) IoT/CPS security vulnerability database. https://iot.institute.ufl.edu/academics/
iot-cps-security-vulnerability-database/. Accessed 9 June 2021
Emotion and Collaborative-Based Music
Recommendation System
Abstract Music plays a vital role in the life of several people, and they consider it as
a part of their life. Whenever a person is happy, sad or emotional, he prefers to relax
his mind by listening to music. To get songs of their own interest, users keep searching
for them in search engines. As we look into the history of searching, the complexity of
search has gradually decreased, maybe due to advancement in technology and various
methods adopted for searching. In this paper, we are concentrating on suggesting
appropriate songs for the users based on their feelings (or mood) known as the music
recommendation system. The objective of the paper is to find the suitable method for
providing recommendations based on access to the music by similar users and history.
Here, we are considering different methods for implementation like cosine similarity,
collaborative filtering, popularity-based and emotion-based methods and also many
parameters like singer, name of the song, genre and movies which help in finding
proper song. We are also analyzing the performance of the same. The advantage of
the music recommendation system is that it avoids the user from searching manually.
It not only saves time for searching, but also updates the similar new song, if any.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 801
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_59
802 R. Aparna et al.
for the development of such a system which along with the main functionality keeps
count of the user interest constantly.
1.1 Objectives
Music is found to have an indirect effect on the listeners’ mood and makes them
active and energetic. It is also important to note that music can be used to cure
health issues in human beings like psychiatric disorders, substance abuse issues,
sensory impairments, physical disabilities, communication disorders, developmental
disabilities, interpersonal problems, aging, etc. Using music to enhance or maintain
health is known as music therapy.
A recommender system is a subclass of the information filtering system which
usually predicts the “rating” or “preference” a user would allot to the item. We
find the application of recommender systems in different areas, such as product
recommenders in online stores, playlist generators in the video and music services,
recommendation based on content and context for social media platforms, and also
the open-Web recommenders for content [2–4]. These systems can be made to operate
across various platforms like books, news and search queries. We also know the
popular recommender systems for specific topics such as hotels and restaurants.
Recommender systems are also developed for exploring research articles and experts,
collaborators, and financial services. The main objective of the development of such
a recommendation system boosts user experience and attracts more users.
The usage of highly efficient and more accurate recommendation techniques plays an
important role in a system to provide good and useful recommendations to the users of
the system [5–7]. This helps the developers to understand the importance of features
and potentials of different recommendation techniques. There are three different
types of recommendation techniques, namely content-based filtering, collaborative
filtering and hybrid filtering.
Fig. 1 Content-based
recommendation system
Fig. 2 Collaborative
filtering
approach is first identify and then add content-based capabilities for collaborative-
based approach (or vice versa); another method is to unify the approaches into
one model. The performance of the hybrid approach is more accurate than pure
collaborative- and content-based methods. Netflix is one good example for hybrid
recommendation systems. This website considers the watching and searching
patterns of similar users, i.e., collaborative filtering and comparing them and also
offering movies which share the characteristics of films that the user highly rated,
i.e., content-based filtering.
2 Literature Survey
Adiyansjah et al. have worked on music recommender system based on genre using
convolutional recurrent neural networks in [9]. Here, they have recommended the
music by comparing the similar features on audio signals. This approach can be
considered as a content-based recommendation system because it recommends based
on the perceptual resemblance of what users have heard previously. Input is prepro-
cessed and fed into a convolutional neural network. They have used convolutional
recurrent neural networks (CRNNs) for extracting features and similarity distance to
find the similarity between features. Receiver operator characteristics and precision–
recall is used for evaluating. Drawback of this paper is that they have considered less
features for recommending.
In [10], Anand Neil et al. have recommended music based on collaborative filtering
and deep learning. In this paper, they have used collaborative filtering and YOLO
(you only look once) methods for music recommendation. Data is preprocessed using
R and Python. This paper concludes that hybrid recommendation systems yield better
results once the model is trained enough to recognize the labels.
Hu et al. recommend the music based on user behavior in [11]. In [12], authors
have used ANN model and KNN regression algorithm to compare different songs
based on similarity. Ranking scores were calculated based on the combination factor
of songs. Here, loss function is decreasing with the increase in epoch. Drawback of
this paper is high prediction complexity for large datasets.
In [13], Prachi Singh et al. have used random forest and XGB classifier for music
recommendation. Accuracy of the random forest algorithm is 0.75 and accuracy of
the XGB classifier algorithm is 0.72. Drawback of this paper is that accuracy of the
recommendation was dependent on split between test and training data.
In the paper titled “Multimedia Recommender System using Facial Expression
Recognition,” the author Prateek Sharma [14] considers the human face as an input
which is captured from a webcam. The face is the usual source of expression which
is the key information used in the system to identify the emotion of the user. The
emotions of the user are mapped with the genres of song or movies.
In [15], Schedl et al. have addressed the current challenges in the music recom-
mendation system. Various challenges in the music recommendation system were
also addressed. Drawback that is mentioned in this paper is the cold start problem,
automatic playlist continuation.
In [16], authors have used “Forgetting Curve” to assess freshness of a song and
evaluate “favoredness” using user log. They have analyzed the user’s listening pattern
to estimate the level of interest of the user in the next song. Running time increases
linearly with the increase in the size of the song library. Drawback of this paper is
that the dataset which they have used is not fetched from any music server.
806 R. Aparna et al.
By analyzing the work carried out from the above papers and drawbacks identified,
in this paper, we are proposing and developing a system whose performance is
better than the systems currently available. In the recommendation systems so far
developed, most of them concentrate on recommending movie and products (items),
but very few address music recommendations. In this paper, we are developing a
collaborative- and emotion-based music recommendation systems by considering
many parameters like the singer, the name of the song, genre and movies which
helps in coming out with accurate results. As per our knowledge, emotion-based
recommendation system is rarely addressed in the literature. It is an innovative way
to recommend music, which takes real-time emotion of the user as input and thus
benefits the user with dynamic experience. It is also an emerging trend in the music
recommendation. In the existing methods, the emotions considered were less, so the
genres of the song that could be considered decreased. In our paper, we propose a
work which includes more emotions; hence, wide variety of genres were considered.
The existing methods usually require additional hardwares like EEG or sensors for
emotion recognition. But, in our proposed system, we use CNN model that takes
image as input, and the image is captured using the webcam.
In our paper, we have implemented four methods that are cosine similarity,
popularity-based, collaborative filtering and emotion-based music recommendation
systems. Popularity-based method is used to display the top recommended songs from
the playlist, emotion-based method is used to recommend the songs based on real-
time expressions by the user, cosine similarity method considers multiple features
from the dataset and recommends the song and collaborative method recommends
the songs to the user based on both user similarity and item similarity approach.
3 System Design
it is out of interest. When we look into the Euclidean distance method, if the song
is compared with the same song n number of times, then it indicates that the song is
not similar. The same scenario in the cosine similarity shows that both songs are the
same. The reason for this may be the word count will be n times the given song and
when it is plotted on the graph, the angle between them is 0. Both the vectors will
be in the same direction, but have different magnitudes.
Cosine similarity can be used by considering multiple features of the song which
makes it unique from other methods. As defined earlier, this is based on user data,
and then it will find the similar song after using the matrix having word count. As we
know this method is advantageous over the Euclidean method and this gives precise
output.
Figure 5 shows the output for the method in the user interface. As we can see,
“Majhe maher pandhari” is classical music. It has recommended similar genre songs
and majority with the same singer. If the song is not present, then it will display “No
Such Song” on the front-end.
The content-based filtering approach predicts what a user might like based on the
previously listened or streamed content from a particular user. Whereas in collabo-
rative filtering, the system predicts based on what a particular user might like and
also takes into consideration the tastes and likes of similar users. The collaborative-
based filtering alone has three different ways of approaching the problem. The
approaches are model-based approach, neighbor-based approach and hybrid models
which combine the implementation of both neighborhood models as well as the
model-based approach. In this, we have focused on naïve popularity-based approach
of predicting the songs. We then combine this with the item similarity-based person-
alized recommendation system. This is also called memory-based filtering which
mainly consists of two main methods, namely
810 R. Aparna et al.
(i) User Item Filtering: which predicts the songs listened to by similar users like
you.
(ii) Item Item Filtering: which predicts the items which you and other users also
liked.
The K-nearest neighbor algorithm is one of the algorithms used for collaborative
filtering because it is considered as the standard method for user-based collaborative
filtering as well as item-based approach. The K-nearest neighbor algorithm is one
of the non-parametric and supervised methods for usage in regression and classi-
fication. KNN algorithm is supervised and an example for lazy-learner algorithm.
KNN algorithm is based on features similarity. It is basically assumed that similar
things, nothing but songs are located nearer to each other. Selection of k values is
very important in KNN algorithm.
Whenever KNN algorithm is used to recommend similar songs to users, the algo-
rithm will calculate the distance which is present in between input and other songs
in the dataset. Then it will sort the distances in ascending order and return the top
k nearest neighbor songs which can be considered as song recommendations to the
user. We will use the nearest neighbor method from scikit-learn. This method takes
several parameters like, metric, algorithm and n-neighbor, and the steps are as shown
in Fig. 6.
Collaborative filtering is based on the historical preference of the user on a set of
songs. We know the preference of the user by rating. Rating can be calculated both
Emotion and Collaborative-Based Music Recommendation System 811
implicitly and explicitly. Explicit rating is nothing but asking users to rate the song.
Implicit rating is nothing but checking whether the user has listened to the song or
not. Implicit rating can be considered as a listening count. After finding the rating,
we will generate an interaction matrix. Interaction matrix has many entries which
includes a user–song pair as well as values which represent the rating of the song.
Interaction matrix has huge value, and most of the values are missing because most
of the songs are not rated by the user.
We use a dataset that is being uploaded in the cloud. As interaction matrix has
very sparse value, dealing with sparse value is resource and memory waste, so we
will consider only the songs which have a listening count greater than or equal to
16 as well as we will use scipy-sparse matrix. We will use the csr_matrix function
which we get from scipy.sparse library. We will reshape the data based on unique
values from song_id as index and user_id as columns resulting in a dataframe. Then
we will use the pivot function to convert the dataframe into pivot table. Pivot table
is then converted to a sparse matrix. Sparse matrix is used to fit the model. This
fitted model can be used to recommend songs. We use fuzzy_matching function to
match the string of a new song to all the songs that are present in the dataset. It uses
Levenshtein distance to match the strings.
We take the input from the user and recommend the best songs that are similar to
the song which is entered by the user and a sample output is shown in Fig. 7.
812 R. Aparna et al.
From the biological point of view, facial expression is extracted by the relative posi-
tion or movement of muscles that lie under the skin of the human face. According to
some of the controversial theories proposed, these also convey the emotional state of
the individual at a given instance of time. They are considered controversial as one
can fake the expressions easily. Figure 9 represents flow chart of the emotion-based
music recommendation system.
Emotion and Collaborative-Based Music Recommendation System 813
In this method, image of the user is captured using the Webcam. The image
captured from the webcam is converted to grayscale image. We mark the face with a
rectangular frame. Here, we consider the human face as a region of interest (ROI) as
it is considered as the primary source where emotion of the user is visible. From this
ROI, we make predictions of each emotion class and determine the probability of the
emotion. The emotion with maximum prediction is identified, and it is considered
as the expression of the human in the image. We use the CNN model for training
the emotions of the user. It is done using thousands of images. Once the emotion
is recognized, we search for the songs that relate to the emotion identified. This is
done by mapping the emotion of the user to the genre of the song. So, the particular
emotion is mapped to the genre of the song. Table 1 shows the mapping of genre of
song to the emotion of the user. For example, if the emotion of the user in the image
is found to be happy, then romantic, funny and comedy songs are used to represent
the user emotion and songs belonging to those genres are recommended to the user.
The data of the songs is present in the dataset as a comma separated values (.csv
file) file. This file is stored in the bucket which is the container to store data in the
IBM cloud. The user can give the input as to how many songs he/she would like to
be recommended and those many songs are displayed. If the number of songs that is
suitable for recommendation is less than the number of songs expected by the user,
then the users are informed regarding this by displaying the message.
For the purpose of identifying the human face, Haar Cascade is used, which is the
algorithm used for object detection to identify faces in an image or a real-time video.
This algorithm usually uses edge or line detection features. It is provided with lot of
814 R. Aparna et al.
images consisting of faces (these are considered as positive images) and lot of images
not consisting of any face (negative images) to train the model on these images. As
the images used for training increases, accuracy of the method also increases. The
repository has the XML files where the models are stored, and these are read using
the OpenCV methods. These include models for detection of the human face, eye,
upper body and lower body etc.
To predict the emotion of the user, we use the convolutional neural network (CNN)
model for training the emotions. There are three kinds of layers. These layers are:
the convolution layer, pooling layer, and fully connected layer (FC layer). All these
layers are brought together and combined to form the CNN architecture. In addition
to these three layers, there are two more vital parameters. They are the dropout layer
and the activation function.
Convolutional Layer: It is the very first layer used for extracting the numerous
features from the given input images. These convolutional layers perform various
mathematical processes of convolution between the given input images and filters of
the specific size NxN. After sliding the filters on the input images, the scalar product
(dot product) is calculated between the parts of the input images and the filters with
respect to the size of the filter (N × N). The output thus obtained is termed as the
feature map. It gives us information regarding the corners and edges of the image.
Later on, the obtained feature map is given as input to the other layers for the purpose
of learning about the various different features of the image which is the input.
Pooling Layer: In the CNN model, the pooling layer usually follows the convo-
lutional layer. This pooling layer performs the function of reducing the size of
convolved feature maps so that the computational costs are minimized. It is done by
reducing the number of connections present between the layers. It operates indepen-
dently on each feature map. There exists many types of pooling operations, depending
on the methodologies used.
One of the pooling operations is Max Pooling. In this type of pooling operation,
the maximum element is obtained from the feature maps. In average pooling, we
calculate the average of the elements in a predefined size image section. In the sum
pooling function, the summation of the elements that are present in the predefined
section is calculated. The pooling layer acts like a connecting bridge among the fully
connected layer (FC) and the convolutional layer.
FC layer: It comprises the weights and the biases in addition to the neurons of the
CNN. It is also used for establishing the connection between the two non-identical
layers. Usually, the FC layers are kept prior to output layers. Thus forming the few
layers present at last of CNN models’ architecture.
We flatten the given input images from the past layers, and it feeds them into the
FC layer. This flattened vector is subjected to undergo some more FC layers in which
the mathematical operations occur. The classification of the image takes place in this
stage.
Dropout: When every feature is connected to the fully connected layer, it results
in over-fitting of the training dataset. It occurs only when a model works very well on
training data which causes an adverse effect on the performance of the model when
it is used on very new data.
816 R. Aparna et al.
To resolve this issue, we use dropout layers. Here, a few neurons are released from
neural networks (NN) during the training process. Thus, resulting in reduced size of
model. If we give a dropout value of 0.2, the neural network randomly removes 20%
of the nodes.
Activation Function: Activation function is a vital parameter of the CNN model.
These are used for learning and approximating all kinds of continuous and complex
relationships between variables of the network. It decides which all the information
of the model should be fired in the forward direction. Also, whose information is
not supposed to be fired at the end of the network. Thus adding non-linearity for
our CNN network. There are many commonly used activation functions such as the
ReLU, Softmax, tanH and the sigmoid functions. These mentioned functions have a
specific usage. For example, a binary classification CNN model, sigmoid and softmax
functions are used. For a multi-class classification, softmax is usually preferred.
We used Adam as an optimizer. Adam is an adaptive learning rate optimization
algorithm. It is designed for training deep neural networks. The algorithm anchorages
the ability of adaptive learning rate methodologies to find learning rates for each and
every parameter individually.
Epoch is “one pass over the entire dataset.” It can be used to separate training
into well-defined phases. These are helpful for the purpose of logging and periodical
evaluation. It is an arbitrary cutoff. While using validation_data or validation_split
along with fit_generator() method which is the Keras model, evaluation will be
performed at the end of each epoch. In Keras library, an ability to add callbacks
which are specifically designed for running at the end of epoch is present. Some
examples are model checkpointing and changes in the learning rate. At every epoch,
loss and accuracy of testing and validation data is monitored. The training of the
model stops when the loss starts to increase or accuracy starts to decrease. If the
number of epochs is large, overfitting of the training dataset takes place. If there are
too few epochs, underfitting of the model occurs. There is one more method called
early stopping, which allows us to specify an arbitrary huge number as training
epochs and to stop the training when the model performance stops increasing or
when the accuracy starts to decrease on a validation dataset.
In our case, the 14th epoch is considered as the best epoch, as the loss is increasing
from 1.0726 to 1.0975, accuracy is decreasing from 0.6175 to 0.6054 in the 13th and
14th epochs. Once the training stops, the model restores the weight from the best
epoch, and this is called as early stopping. This can be seen in Fig. 10.
Fig. 10 Loss and accuracy value in CNN model at 13th and 14th epoch
Emotion and Collaborative-Based Music Recommendation System 817
Figure 11 is the representation of the CNN model that is used for the classification
of images based on the expression of the human in the image captured by the webcam.
Thousands of images are used for training purposes. The images are classified into
different classes based on the emotion of the human in the image. The images in
each class are again classified as training and validation data. The large amount of
data obtained from this model is stored in hierarchical data format 5 (HDF5) file. It
is an open-source file that stores data in a hierarchical structure within a single file.
The user needs to decide how many songs he/she would like to be recommended
and give this as an input. The recommender system displays the songs which are
suitable for the particular emotion of the user. If the number of songs requested by
the user is more than the number of songs which are representing the songs for the
particular expression of the user, then the message is displayed to convey the same
to the user. Figure 12 illustrates the working of this case.
In cosine similarity, when we observe the output song of interest and the recom-
mended songs, then they usually have the same words. It can be words in the song
name, movie name, genre or singer name. This shows that the recommendation is up
818 R. Aparna et al.
to the mark. The result of the cosine similarity is done considering some basic input
songs. For these songs, we have manually checked if the songs are related or not.
There are ten songs which have been considered for calculating the efficiency. There
are many scenarios like getting most related songs and few related songs. Table 2 has
the list of songs which we have considered. From the content of the table, one can
observe that the song “Bhagyada Laxmi Baramma” has the most similar songs; this
is the line in the dataset “Bhagyada Laxmi Baramma, Pt.Bhimsen Joshi, kannada,
“Classical, Bhajan, Hindustani”, Nodi Swami Navirode Heege.”
There are many songs which have common words like Pt.Bhimsen Joshi, Kannada,
“Classical, Bhajan, Hindustani.” The cosine similarity function creates a matrix in
which it will be having the word count for these words one or more than one. Hence,
those songs will be recommended.
Similarly, the song “Yaakinge” has the less number of similar songs. This is
the details of the song “Yaakinge, All ok, Kannada, Rap.” Taking a glance on the
dataset, we observe that there are few number of rap songs. So, whichever Kannada
Rap song is there, those are recommended, and then, whichever song is in Kannada,
those are printed. The observation tells that the dataset has a predominant role in
the recommendation system. When the data size is increased, the recommendation
system fails, as it takes more time for analysis.
Emotion and Collaborative-Based Music Recommendation System 819
The accuracy of the cosine similarity has been calculated by taking the ratio of
the number of similar songs to the total number of songs. The result of the ratio
is considered to be the percentage. After this, the average of ten songs accuracy is
taken. The accuracy of cosine similarity for the considered songs is 80.67%.
In Fig. 13, we have considered the index of the song in the table on the x-axis and
the accuracy of the particular song on the y-axis. This provides a visualization of the
accuracy of the cosine similarity.
TP refers to true positive which denotes that the song recommended is nearer to
the input song. FP refers to false positive which denotes that the song recommended
is far away from the input song with Euclidean distance. FN refers to false negative
which denotes that the song recommended to the user is erroneous. Precision–recall
curve for collaborative-based approach is depicted in Fig. 14.
The accuracy of CNN model for emotion recognition is calculated using the
fit_generator() of Keras neural network library. In this fit_generator(), we first
initialize the number of epochs that we are going to train our network along with the
batch size. As the datasets in the real world are usually too large to fit into memory,
they tend to be challenging and require data augmentation to avoid overfitting and
thus increase the ability of the model to generalize and be better than before. In data
augmentation, a new dataset is artificially created for the training from previously
existing training dataset. Thus, improving the performance of the deep learning neural
networks along the amount of data available. We use Keras object called ImageData-
Generator to apply the data augmentation for images to randomly translate, resize,
rotate, etc. Every new batch of the data is randomly adjusted according to the param-
eters supplied to ImageDataGenerator. Once the maximum accuracy is obtained, we
consider that epoch as the best epoch and restore the model weights. Our model has
training accuracy of 72.34% and validation accuracy of 60.54%. This is graphically
represented in Fig. 15.
The accuracy of recommending the song is analyzed on the basis of the number
of songs present in the actual dataset which belongs to a particular genre and the
number of songs recommended to the user. This is represented in Table 3. Thus, the
overall accuracy of recommending the songs is the average accuracy of songs of all
emotions, and that is found to be 98%.
Fig. 15 Graphical
representation of accuracy of
the model
822 R. Aparna et al.
Table 3 Accuracy of
Emotion Expected Actual number Accuracy (in
recommending songs of
number of songs of songs percentage) (%)
different genres
Sad 379 346 91.3
Happy 750 750 100
Surprise 1300 1283 98.7
Neutral 95 95 100
Anger 4 4 100
Music is merged with many of our lives in such a way that it is not considered as
an extra activity. We do it as our daily chores and enjoy listening to it. But, in most
of the cases, we will find difficulty in finding the best appropriate songs. We are not
so open to new songs which eventually become famous. The user might miss out
many songs just because the user is not updated about the new releases. User keeps
listening to the same set of songs or need to search for the songs on his/her own. In
order to help curb this issue, nowadays, there is a rise in usage of recommendation
techniques and systems.
In here, we have developed an emotion and collaborative music recommendation
system and implemented the same using KNN algorithm. We have implemented
four approaches, namely emotion-based, collaborative-based, cosine similarity and
popularity-based approaches. We analyzed the performance of each method. The
efficiency of KNN is high, but it is prone to cold start problems. We need some
strong prior data upon which the recommendation can be made. That is the reason
why we have implemented the emotion-based method as well and we got the accurate
results from the algorithm. Along with the above two, we have also implemented
cosine similarity approach and popularity-based approaches in order to support these
two implementations. The recommendation systems implemented have proved to be
better compared to existing techniques.
We have implemented all the four approaches in the front-end. There are also
options for improving the same, by adding more features like multi-language support,
easy access based on search, improving the front-end design for aesthetics, and adding
more data into the dataset. And also displaying the songs by bifurcating them into
each genre, adding media files etc.
References
2. Chen HC, Chen ALP (2005) A music recommendation system based on music and user
grouping. J Intell Inf Syst 24(2):113–132
3. Su J, Yeh H (2010) Music recommendation using content and context information mining.
IEEE Intell Syst 25:16–26
4. Liu N, Lai S, Chen C, Hsieh S (2009) Adaptive music recommendation based on user behavior
in time slot. Int J Comput Sci Network Secur 9:219–227
5. Kim D, Kim K, Park K, Lee J, Lee KM (2007) A music recommendation system with a
dynamic k-means clustering algorithm. In: Sixth international conference on machine learning
and applications (ICMLA 2007), pp 399–403
6. Geetha G, Safa M, Fancy C, Saranya D (2018) A hybrid approach using collaborative filtering
and content based filtering for recommender system. In: National conference on mathematical
techniques and its applications (NCMTA 18), IOP publishing IOP conference series: journal
of physics: conference series 1000
7. Yu K, Schwaighofer A, Tresp V, Xu X, Kriegel HP (2004) Probabilistic memory-based
collaborative filtering. IEEE Trans Knowled Data Eng 16(1):56–69
8. Laveti RN, Ch J, Pal SN, Chandra Babu NS (2016) A hybrid recommender system using
weighted ensemble similarity metrics and digital filters. In: 2016 IEEE 23rd international
conference on high performance computing workshops (HiPCW), pp 32–38
9. Arnold AN, Vairamuthu S (2019) Music recommendation using collaborative filtering and deep
learning. Int J Innov Technol Explor Eng (IJITEE) 8(7):964–968
10. Adiyansjah, Gunawan AAS, Suhartono D (2019) Music recommender system based on genre
using convolutional recurrent neural networks. Proc Comput Sci 157:99–109
11. Hu Y, Ogihara M (2011) NextOne player: a music recommendation system based on user
behavior. In: 12th international society for music information retrieval conference (ISMIR
2011), Miami, Florida, USA, Oct 24–28, 2011
12. Namitha SJ (2019) Music recommendation system. Int J Eng ResTechnol (IJERT) 08(07)
13. Singh P, Singh PK, Ganguli A, Shrivastava A (2020) Analysis of music recommendation system
using machine learning algorithms. Int Res J Eng Technol 07(01)
14. Sharma P (2020) Multimedia recommender system using facial expression recognition. Int J
Eng Res Technol (IJERT) 09(05)
15. Schedl M et al (2018) Current challenges and visions in music recommender systems research.
Int J Multimed Inf Retriev 7(2):95–116
16. Song Y, Dixon S, Pearce M (2012) A survey of music recommendation systems and future
perspectives. In: 9th International symposium on computer music modeling and retrieval, vol
4
Cricket Commentary Classification
1 Introduction
Let us return to a time before the Internet and technology. In those days, viewers
rely on radio to learn the outcome of a match or a ball in an over. Cricket is a fast-
paced sport that demands quick reactions. In this case, responses may include the
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 825
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_60
826 A. S. Balaji et al.
conclusion of a match or the outcome of a coin flip, for example. Besides, cricket is a
game that contains a lot of predictions starting with toss, team selection, outcome of
a ball in an over, innings end score, and final result of an entire match. It is believed
as one of the rapid areas to be inculcated with artificial intelligence. The proposed
study mainly focuses on the outcome of the ball in an over. Here, outcome means
whether a ball in an over resulted in a boundary or wicket or dot or runs.
Have you ever thought about the outcome of a ball by hearing the voice commen-
tary? Majority of cases commentators would specify what the outcome of a ball
was. There are some situations where commentators would not specify anything
regarding the outcome of the ball. Commentators may provide outcomes indirectly
in their voice. In this scenario, it will be very difficult for a normal human to interpret
the outcomes of balls in an over. On the other side, the people who saw the cricket
would be somewhat sluggish in nature. To mainly eradicate this issue, we proposed
this study. In addition to that, we found that the updates were happening to the game
databases (Storage for scores, commentary, and outcomes ball by ball) manually.
Manual updates generally consume some time. We were trying to eliminate that
delay also.
Nowadays, people are so mobile in nature. They are moving here and there so
easily with the help of modern transportation. Here comes the problem with mobility;
when we were in mobility, the signal strength was too less to handle video streaming.
Then, it is going to be difficult for the users to watch the match in video basis. The
alternate option would be score or commentary delivery websites like Cricbuzz or
Dream11 etc. We observed that these platforms generally perform manual updates
to their game databases. Using our study, we were trying to eliminate this manual
updates with help of automation. Machine learning was one of the techniques used
in artificial intelligence to create automation.
To make this automation happen, we need a help from the commentary data
(voice delivered by a commentator during a match about a ball in an over). This
commentary data is act as an input to our project. After getting the commentary data,
we made predictions about the outcome of a ball in an over using machine learning as
well as natural language processing. Once we got the outcome, we will transfer that
outcome to the game database to make required updates automatically by completely
eliminating human involvement.
2 Literature Survey
Simonyan et al. [1, 2]; they provide access to various datasets by using a multi-
task learning approach, for recognizing human sports movements and gestures, they
use multi frame optical flow for the purpose of training a ConvNet.
In [1, 3], Karpathy et al. trained a convolutional neural network along with
recurrent neural network for huge classification of images and its descriptions.
Geng et al. [1, 4] pretrained their convolutional neural network model using
autoencoder, then after to recognize human sports actions by using SVM classifier
model.
Karpathy et al. [1, 5] integrate CNNs for the sake of development of fusion
architectures to classify videos into specific kinds of sports. The architectures in
[1, 5] are a large source of motivation for our project. Donahue et al. [1, 6], they
proposed a network that was trained by the long-term convolutional recurrent network
for image and video description. Enthusiastically, using naval architecture and data
augmentation methods, action recognition was the primary task performed on video
attributes. In addition to human action recognition in videos, the authors Ji et al.
[1, 7] proposed using 3D ConvNet. Tyagi et al. [8] were using machine learning
algorithms to predict the duration of a match in terms of number of balls expected to
be delivered in a match. In this study [8], their prediction was based on the historical
data.
Amin et al. [9] suggested a new method for cricket team selection using data
envelope analysis (DEA). They proposed DEA formulation for evaluation of cricket
players based on the various outputs. This evaluation ranks cricket players based on
the DEA scores.
Kumar et al. [10] with the help of convolutional neural networks and long short-
term memory networks the outcome of the cricket match ball by ball with the help
of videos has been predicted.
Rahman et al. [11] utilized preliminary CNN architecture and the transfer learning
models to perfectly classify the outcome of the ball based on the grips of bowlers.
They believed that if the grip of ball in bowler hand is good, then probability to make
a miss hit by the batman will be high and vice versa.
Singh et al. [12] used linear regression model to predict the score in first innings as
well as second innings in a match based on some attributes like players performance,
current run rate, and venue. They used naïve Bayes classification to predict the
outcome of the match.
Subramaniyaswamy et al. [13] proposed a system called iSCoReS to formulate or
provide relevant data about a player during a match to the commentator. Main theme
of this was to increase the efficiency of commentary delivery to the end users.
Kaluarachchi et al. [14] developed a software tool called CricAI. This tool outputs
the probability of the victory in an ODI cricket match using input factors such as toss
advantage, player’s strength and home advantage.
Semwal et al. [15] utilized deep convolutional neural network (DCNN) to classify
different type of bad shots played in cricket. This approach utilizes videos to classify
the bad shots played by the players during the match.
828 A. S. Balaji et al.
3 Proposed System
The proposed approach consists of several steps. As we said earlier, the input to
proposed classifier model was commentary, which was delivered from the commen-
tator during the game. Let us start the methodology with data collection to train the
model. Entire process was illustrated in Fig. 1.
Machine learning projects include a dataset for training and the development of a
model for prediction or categorization. As in the previous case, our model required
a prebuilt dataset as a primary functional need. We cannot train or create a model
without a dataset for training. A well-collected dataset that consists of all the class
labels in a very balanced manner is needed.
Sentiment analysis is the main thing here. To do a good sentiment analysis, we
need all kinds of data that includes all situations and emotions etc. For this purpose,
we collected the data from four different scoring delivery platforms like dream11,
Espn sports, Cricbuzz, etc. This dataset includes around 20,000 records.
This dataset includes all emotional sentiment for a particular outcome in cricket.
A detailed way of illustrating the collection of data for training was shown in Fig. 2.
We are using Microsoft Excel for storing these records. We collected all records by
manual process and some of the work has been done by a tool that widely uses regular
expressions to collect the data for training of the machine learning model. A record
is formulated in the form of table consisting of 3 columns they are Id, Commentary
Text and Class Label. The dataset follows specific schema like relational databases.
For example, the Column “Id” takes only numbers, the Column “Commentary Text”
takes only text format, and similarly, “Class label” also only takes text format.
The process of getting the data ready for a machine learning model was illustrated
in 3 steps.
1. Data Selection: It is a process used to select the relevant data. This project has
only chosen the commentary data apart from commentary data. Hence, we do
not want any other data like how many runs in an over? Or Information related
to the number of overs in a match.
2. Data preprocessing: It is a most important step in dataset preparation. In this
step, we will try to remove all the unnecessary data. For example, if you consider
our dataset, we removed some of the stopwords like “the,” “a,” “an” etc. In other
words, the data which is not suitable for prediction will be removed in this step.
3. Data Transformation: This step has its own importance. There are some cases
in which the data in one format is not useful for prediction, but the same data
will be useful for prediction if and only if the data is in some other format. For
example, our model requires the data in text format rather than audio format.
One of the important things here we need to consider; we need to make sure that
no leakage of the dependent variable in input commentary data. Due to that reason,
we are using a technique called masking. Masking is a kind of technique that will
hide the leakage of the predicted label in training commentary data. An example for
this technique and a sample record was given in Fig. 3. In Fig. 3. XXXX may be four
or six.
The final dataset consists of all the class labels that contains equal in terms of the
number of records. The complete composition of the dataset was given in Table 1.
The main agenda to take the dataset like this was we shall be trying not to make any
830 A. S. Balaji et al.
Table 1 Composition of
Class label Number of records
records
Dot 5000
Boundary 5000
Wicket 5000
Runs 5000
class label as dominant while training. Besides that, we are trying to eliminate the
concept of up and down sampling.
Once the preprocessing was over, now it is time to extract some useful features from
commentary data. Here, Tf–Idf is a technique that enables us to find the usefulness
of each word in commentary data. The technique to find the usefulness of each word
was shown with an example sentence called “I LOVE NLP.”
d f NLP = 1 (2)
WNLP, j = 0.43 (Usefulness of the word NLP in the text i love NLP (7)
Like as shown in the above example, the word NLP in the sentence “I Love NLP”
as an importance of 0.43. Likewise, we try to extract the best possible combination
of words for the prediction of outcome from the commentary data.
If word, text was high, then we can conclude that the “word” would be very useful
for the prediction of class label and vice versa.
After knowing all the prerequisites that are required to build our cricket commentary
classifier model, now it is time to build our model using some classification algorithm.
Random forest classifier is an ensemble type of method that internally uses a decision
tree algorithm for classification. The working of the algorithm was very simple to
understand based on our training set a finite number of decision trees will be formed
let me say three for an instance. Among three decision trees, two trees gave the
outcome as “Dot,” and one tree gave the outcome as “Boundary,” then our outcome
will be “Dot.” Because the “Dot” class label will be the dominant class label which
was predicted by a maximum number of decision trees. The working of this was
illustrated in Fig. 4.
The data flow diagram for the proposed model is shown in Fig. 5. The data flow
diagram will provide a complete overview of project and gives some of kind of clarity
about how the data is flowing among the modules in the project. Data flow diagram
used as a basis for architecture.
832 A. S. Balaji et al.
The main input to our prediction was commentary text. This commentary text comes
from the audio which was delivered by the commentator at the end of each delivery in
an over in cricket. To capture the voice of the commentator, we are going to arrange
some microphones in front of the commentator. The microphone will capture the
voice of the commentator and send those voice data to the central server location. To
do this job, we require a microphone and some Python code.
Cricket Commentary Classification 833
Using Microphone
Commentator Commentary data (Audio Format)
Commentary data that is getting from the commentator was in audio format. Our
cricket commentary classifier model needs text as input for prediction. Therefore,
there is a need for the conversion of audio format to text format. We have services
available in Google Cloud (Google Text API), Amazon Cloud (Amazon Transcribe),
and Azure Cloud. Our job will be simple if we use those above-said cloud services
for the transcription of the audio. But, we are not using any services in the cloud,
rather we are using libraries in Python Flask to make the job done. We wrote our
code in python to complete this task.
API stands for an application programming interface. On the server, we wrote Python
code that will accept the request from the client computer. Here, the client computer
is nothing but our cricket commentary classifier model. Once the prediction was done
by the cricket commentary classifier model, the predicted class label was sent to a
special API method. This special API method consists of code for the handling of
automatic updates of the game database. After completion of this process, the results
of updates for the outcome of the ball in the game of cricket will be reflected in the
client devices. This automation will eliminate human involvement in the handling of
game databases. The entire process was shown in Fig. 6.
4 Results
Following the acquisition of the cricket commentary classifier model, this research
work has calculated model parameters such as accuracy, recall, and F1 score using
2000 data to determine its efficacy. Since the proposed model was a classification
model, most classification models have a discrepancy between accuracy and F1
834 A. S. Balaji et al.
Score of less than 0.5. Our categorization model produced good results. Accuracy
and F1 Score are quite near, which is a key feature of a classification model. Model
accuracy: 0.84.
• Precision: 0.838
• Recall: 0.84
• F1 Score: 0.836
Figure 7 is the screenshot that shows how the live predictions were happening
on the commentary text data. As shown in Fig. 7, the Web page was an interactive
Web page (dynamic in nature). Once we got the predicted outcome of the ball, game
databases will be updated automatically. Later, SQL server running in the background
will return the entire database (along with modifications) as a result set (table with
defined schema) to our frontend of the project. In the frontend, how the results will
be shown to our users just like famous websites like Cricbuzz and Espn Sports etc.,
is given in Fig. 7.
This project has used some of the basic natural language processing and machine
learning techniques to classify the outcome of the ball. TF–IDF is the main technique
used to find the usefulness of a word in the commentary text. The algorithm used
to make our model is random forest classifier algorithm. To deploy our application
in real time, we used Python flask to create a server to manage Web pages and Web
requests. SQL server used to manage (updates) the game database. Replicating any
task in this world with the technique of automation is quite challenging. We tried our
best possible effort to make this automation possible. With the help of our model, at
least some milliseconds of delay were eliminated. If there is no delay, then users can
have a better game experience. If we serve our users with the best possible experience,
then it will automatically place a huge impact on business. In this project, we are
using some basic algorithms and basic approaches to make some difference in the
field of cricket. We were hoping some improvements are still there and that needs
to be addressed in the coming future. We believe that this is a high-growth area for
automation since customers want quick replies. If feasible, we intend to use a similar
system for other sports such as football, volleyball, hockey, and so on in the future.
References
1. Dixit K, Balakrishnan A (2016) Deep learning using CNNs for ball-by-ball outcome classifica-
tion in sports. cs231n.stanford.edu, Mar 23, 2016. [Online]. Available: http://cs231n.stanford.
edu/reports/2016/pdfs/273_Report.pdf. Accessed 13 Mar 2020
2. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition
in videos. In: Proceedings of NIPS’14 27th international conference on neural processing
system’01, pp 568–576
3. Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descrip-
tions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676. https://doi.org/10.1109/TPAMI.
2016.2598339
4. Geng C, Song J (2016) Human action recognition based on convolutional neural networks
with a convolutional auto-encoder. In: Proceedings of 2015 5th international conference on
computer sciences and automation engineering, vol 42, pp 933–938
5. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video
classification with convolutional neural networks. In: IEEE conference on computer vision and
pattern recognition, pp 1725–1732.https://doi.org/10.1109/CVPR.2014.223
6. Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition
and description. In: IEEE conference on computer vision and pattern recognition (CVPR), pp
2625–2634. https://doi.org/10.1109/CVPR.2015.7298878
836 A. S. Balaji et al.
1 Introduction
The growing demands for Internet-based services necessitated efficient data collec-
tion and exchange. The Internet of things tends to refer to a fast-growing network
of interconnected devices that can gather and exchange information via integrated
sensors. It is now widely used in virtually every industry and plays an important
function in the planned environmental surveillance system. The convergence of IoT
and cloud computing provides a fresh approach to improved data planning from
sensing devices, low power consumption, low-priced gathering, and communication
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 837
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_61
838 N. S. Talegaon et al.
2 Literature Survey
Arko Djajad et al. have implemented their system architecture environmental quality
monitoring techniques by using IoT, and their proposed system. Different sensors are
connected by using Internet via serial interfaces called Modbus. Then collected data
is then sent over network. Collected data from all the motes act as input to IoT circuit
board which is made up of Arduino and Raspberry Pi. The result can be sent over
to cloud to monitor changes in environmental conditions. Users can easily access
results via their laptops and mobiles which are Wi-Fi enabled. Different sensors are
deployed in IoT kit. Sensors used in circuitry are like analog sensors. Sensors are
connected to analog port attached to Raspberry Pi and Arduino [1].
Tamilarasi B. et al. proposed model that provides functional design and also
helps in the implementation of sensor networks, which can be deployed in observing
environmental conditions IoT applications [2].
Nikhil Ugale et al. have presented the system helps for monitoring environmental
conditions in home. The system uses different types of sensors like light, humidity,
temperature to observe different conditions. All the sensors are controlled by PIC
microcontroller. Different devices are connected with the help of sensors. It helps to
identify functionality of different devices connected to it. Once particular device is
turned on sensors senses the correct working of device, if any anomalies identified
automatically message will be sent to concerned user through email. This system has
demonstrated a new IoT technology very efficiently. Its proved that IoT technology
more helpful for advanced home automation [3].
Kondamudi Siva Sai Ram et al. have proposed, and the system is an advanced
solution for monitoring the weather conditions at a particular place and make the
information visible anywhere in the world. Kondamudi Siva Sai Ram et al. have
proposed the architecture that is more advanced for weather monitoring systems,
depending on location and time we can easily assess the environmental conditions.
Performance Comparison of Weather Monitoring System … 839
We can observe the data anywhere in the world. This system is designed to monitor
temperature, humidity, light, etc. After data computation and processing, we can get
the data over Internet from anywhere in the world. In this experiment all functions,
data processing and data collection are supported by microcontroller (LPC2148).
Data can be retrieved from sensors by using microcontroller then sent over Internet
by using Wi-Fi module.
Ms. Padwal S. C. et al. have proposed the system architecture for sensor networks
that can be used for environmental monitoring systems in IoT applications. This
proposed system to build sensor networks by combining IoT application [4].
3 System Architecture
Fig. 2 Raspberry Pi
1. Raspberry Pi:
Fig. 4 Arduino
842 N. S. Talegaon et al.
analog audio connection for driving strong impedance forces (such as amplified
speakers). This tool also includes a Camera Serial Interface (CSI) socket for camera
unit interfacing and a Display Serial Interface (DSI) socket for LED or LCD display
interfacing. Both CSI and DSI are 15 pin connectors.
2. Arduino:
Arduino device is freely accessible software tool that helps devices can easily
detect environmental monitoring systems than standalone systems. It is an easily
accessible virtualized framework built on a basic microcontroller with an integrated
advancement platform for building board applications. Arduino may be utilized to
build responsive devices that accept input from various switching devices or sensing
devices and control several lighting systems, engines, and other real responses.
Arduino applications can operate independently or interface with desktop appli-
cations. The tool can be constructed by manual or bought pre-assembled; the
fully accessible IDE software is free to download. Figure 4 depicts an Arduino
microcontroller.
The Arduino programming language is a fork of Cabling, a real application frame-
work built on the computing multimedia development platform. Arduino circuit
board made up of 14 pins and 6 of pins can be used for pulse width modulation output
pins, and also combines six analogue inputs with USB connector, power supply, with
processing speed of 16 MHz, and rest button. We can easily connect microcontroller
to a desktop PC with AC to DC battery. All circuit integrated components are directly
connected with Arduino board.
Performance Comparison of Weather Monitoring System … 843
To gain access from the dht11 sensor embedded within the IoT trainer kit, we
used a four-pin connector wire and connected it from the kit’s RM2 socket to the
RM19 socket.
4. Light Intensity Sensor (LDR) (Fig. 7):
Light sensor: This sensor is designed to gauge the ferocity of light that strikes it.
The LDR is linked to the Raspberry Pi’s GPIO10. The light-dependent resistor
(LDR) is used to measure the strength of light. It generates analog signal. To
refrain from the usage of ADC, a capacitor network is established. Light intensity
is measured by estimating the capacitor charging period, which is reliant on LDR
resistance. The resistance of LDR differs significantly with the beam of light. As a
result, resistance drops as light intensity increases and increases as light intensity
lowers. The charging time of the capacitor varies according to this principle. Light
intensity is classified as high, medium, or low depending on the charging time of the
capacitor. It is displayed as a percentage on thingspeak.com.
An LDR is an unit with a (varying) resistance that changes with the amount
of beam that strikes it. As a result, they can be used in light detecting devices.
Light-dependent resistor (LDR) is a changeable resistor that is controlled by light.
This exhibits photoconductivity, as its resistance reduces as ambient beam intensity
increases. Light-penetrating sensing devices, as well as light and dark triggered
switching units, can all benefit from the use of an LDR. LDR is equipped with high
resistance semiconductor device. LDR resistance capacity lies in between few mega
Oms to few hundred ohms.
When incoming light on an LDR surpasses a particular wavelength, light is
absorbed by a microelectronics device such as a semiconductor, providing suffi-
cient energy for bound ions to enter through the conducting barrier. The resulting
liberated electrons (and their complete companions) transmit electricity, lessening
resistance. The resistance limit and sensitivity of an LDR can vary greatly between
devices.
To gain access from the light sensor embedded within the IoT trainer kit, we used
a four-pin connector wire and connected it from the kit’s RM3 socket to the RM20
socket.
Performance Comparison of Weather Monitoring System … 845
10. Python:
Python is a free and high-level programming language that is used for general-
purpose programming. Python is a beginner’s language that is interpreted, interac-
tive, object-oriented, and interactive. Python is compatible with the Linux kernel.
Integrated Development and Learning Environment (IDLE) is the textual editing
application used for Python programming.
Performance Comparison of Weather Monitoring System … 847
4 ThingSpeak
information. And the second section is when someone else must look at the informa-
tion. ThingSpeak is situated in the center, allowing you to do both. The paper builds
an evidence of theory IoT system using readily available hardware to keep track of
environments surrounding humidity level, temperature level, gas level, soil moisture
level, light intensity, and so on. This can be further altered with various sensing
devices or automation systems to create something for a specific purpose. After the
above-mentioned procedure is completed, the user has immediate accessibility to all
ecological factors.
After detecting information from various sensing tools located in a specific subject of
focus. When an appropriate linking is established with the server device, the detected
information is instantly transferred to the web server. The results will be displayed on
web server page. The web server page displays all environmental data according to
client request. Also, it stores all the variations in values. The entire information will
be automatically generated through Google spreadsheets. The data can be analyzed
and compiled frequently. All the data will be saved in cloud database, and we can
easily observe the changes that occurred in environments.
• Visualization of humidity field on the thingspeak.com channel in a graphical form
with exact humidity on that particular day and time (Fig. 14).
• Visualization of temperature field on the thingspeak.com channel in a graphical
form with exact temperature on that particular day and time (Fig. 15).
• Visualization of gas field on the thingspeak.com channel in a graphical form with
exact gas level on that particular day and time (Fig. 16).
• Visualization of moisture field on the thingspeak.com channel in a graphical form
with exact moisture level on that particular day and time (Fig. 17).
• Performance comparison of weather monitoring system under different scenarios
(Fig. 18).
In view of comparison in between IoT tools (Raspberry Pi and Arduino), we are
considering four parameters humidity, temperature, gas level, moisture level in our
near environment.
Both Raspberry Pi and Arduino nearly give the same outcome because the sensors
used are the same but with a built in Wi-Fi unit in Raspberry Pi, we get the accurate
results as its acts as a mini-computer and dose multiple tasks at once.
Whereas Arduino can do one task at a time and acts as a microcontroller; we need
additional module or Wi-Fi unit to connect to the Internet because it has very less
ports and hardware components.
Both Raspberry Pi and Arduino tools are affordable, small size, low power
consumption and provide fast data transfer, good performance and remote moni-
toring.
Performance Comparison of Weather Monitoring System … 849
The graph shows proposed method with Raspberry Pi is better than with Arduino
in all manner (Fig. 19).
6 Conclusion
We may conclude from this study that Arduino is useful for repeated jobs like opening
the garage door, turning on and off the lights, reading temperature sensors, controlling
a motor as the user desires, and so on. While the Raspberry Pi is capable of executing
numerous activities such as controlling complex robots, playing videos, connecting
to the Internet, interacting with cameras, and so on. For example, if you want to create
an application that monitors humidity and temperature from a DHT11 sensor and
displays the results on an LCD, you can use Arduino to do so. However, if you want
852 N. S. Talegaon et al.
to track the humidity and temperature from a DHT11 sensor, send an e-mail with
the statistics and examine/interpret the outcomes with an online weather report, and
show the data on an LCD, then the Raspberry Pi is the appropriate choice. In simple
terms, Arduino is intended for novice projects and quick electronics prototyping,
whereas Raspberry Pi is utilized for more complex projects that can be handled by
Pi. On many environmental conditions, we may compare the performance of our
system to that of other IoT tools (Raspberry Pi, Arduino).
This IoT-based device monitors environmental indicators in real time. Tempera-
ture, humidity, light intensity, gas intensity level, and soil moisture level are all moni-
tored by this system. Data may be viewed from anywhere on the planet. Using this
method, the client can continuously monitor various environmental factors without
interacting with any other server. The Raspberry Pi itself serves as a server. Raspbian’
s operating system handles this task admirably. This weather monitoring system,
built with Raspberry Pi and Arduino, is inexpensive in cost, compact in size, low in
power consumption, has quick data transfer, good performance, and can be monitored
remotely.
7 Future Scope
1. A smoke alert system can be connected to the module to inform the recipient in
the scenario of excessive smoke concentrations.
2. Clients can be notified through SMS of the temperature/humidity/smoke
parameters.
References
1. Deshpande GR, Sannakki S, Madi S (2021) Advanced Home Automation by using Raspberry
Pi
2. Djajadi A, Wijanarko M (2016) Ambient environmental quality monitoring using IoT sensor
network. Internetworking Indonesia J (IIJ) 8(1)
3. Tamilarasi B, Saravanakumar P (2016) Smart sensor interface for environmental monitoring
in IoT. Int J Adv Res Electron Commun Eng (IJARECE) 5(2) Feb 2016
4. Ram KSS, Gupta ANPS (2016) IoT based data logger system for weather monitoring using
wireless sensor networks. Int J Eng Trends Technol (IJETT) 32(2) Feb 2016
5. Padwal SC, Kumar M (2016) Application of WSN for environment monitoring in IoT applica-
tions. In: International conference on emerging trends in engineering and management research
(ICETEMR-16) 23rd Mar 2016
6. Richardson M, Wallace S (2012) Getting started with Raspberry Pi. 1st edn
7. Ugale N, Navale M (2016) Implementation of IoT for environmental condition monitoring in
homes. Int J Eng Appl Technol (IJFEAT). Feb 2016
8. Rao BS, Rao KS, Ome N (2016) Internet of Things (IOT) based weather monitoring system.
IJARCCE J 5(9) Sept 2016.
9. Gawali SM, Gajbhiye SM (2014) Design of ARM based embedded web server for agricultural
application. Int J Comput Sci (1)
Performance Comparison of Weather Monitoring System … 853
10. Singh KK. Design of wireless weather monitoring system. Department of Electronics and
Communication Engineering National Institute of Technology
11. DeHennis AD, Wise KD. A wireless microsystem for the sensing of temperature, and relative
humidity. J Micro Elect
A Study on Surface Electromyography
in Sports Applications Using IoT
1 Introduction
In our day-to-day life muscle fatigue has become a very common problem among
most of the people, especially who does heavy activities like sports [1], bodybuilding
etc. Muscles are mainly responsible for our body movements and our postures, and
they also control our heartbeat, breathing and digestion [2]. Muscle fatigue is the
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 855
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_62
856 N. Nithya et al.
of sEMG data is a great challenge [13]. This study mainly deals with the detection
of muscle fatigue in upper limb, lower limb and lumbar region.
Several research on muscle fatigue in upper limbs on athletes have been conducted
with the help of surface EMG technique. The clinical aspects like kinematics and
surface EMG of athletes on their isometric contractions were experimented. This
system is also used in sports activities and used to monitor transition-to-fatigue
condition to detect its progression. The data of five male athletes were collected.
They were allowed to be seated in a biceps curl machine to perform their activities.
The participants used to stop their activity once they reach the total biceps fatigue.
SHORT-TERM FOURIER TRANSFORM—It extracts surface EMG’s median
frequency (MDF) and mean frequency (MF) when frequency spectrum analysis was
applied.
+∞
X(t, f) = x(t)h(τ − t)e− j 2π f τ dτ (1)
−∞
where t denotes time, f denotes frequency, h(t) denotes window functions and
|X (t, f )|22 denoted as the energy of x(t). For the indication of energy at each time
and frequency, a new variable is introduced p(t, f )
MDF and MF
Power Spectral Density
+∞
p(t) = ∫ r(t)e− j f t dt. (3)
−∞
Variables X and Y;
Y = Yj , j = 1, 2, . . . N2 (8)
N
X i − X Yi − Y
i=1
r= (9)
N 2
N 2
i=1 X i − X i=1 Yi − Y
Muscle fatigue reduces the metabolic performance and neuromuscular system which
results in persistent muscle contraction and decreases its steady activity. Postural
control plays an important role in an appropriate biochemical stance. But the main
factor which affects this postural control is fatigue. Recent researches found that
lower extremity muscles plays an important role in maintaining and balancing
postural control [18]. Athletes those who uses their lower limb mostly will be affected
by less balancing in in postural control and muscle fatigue in their lower limb. The
activities of lower extremity muscles were analyzed for the fatigueness using surface
EMG before and after activity.
These results indicate that the muscle activity level of rectus femoris, hamstrings
and gastrocnemius muscles significantly changes before and after fatigue. An impor-
tant relationship was found between the postural, rectus femoris muscle and fabulous
anterior muscles [19]. Investigating the activity of major muscles of lower limbs
860 N. Nithya et al.
during a soccer sport have taken with the help of 10 soccer players. Electromyo-
graphic activities of lower limb muscles have taken. Muscles like rectus femoris,
biceps femoris, tibialis anterior and gastrocnemius were monitored before and after
exercise. Then the EMG data’s were analyzed, and root mean square were computed
over ten gait cycles. The results showed that after the exercise of intensity soccer-
play simulation, the electromyographic activity in most of the lower limb muscles
was lower than before [20]. A real-time fatigue monitoring system to detect muscle
fatigue during cycling exercise has been developed, which provides an online fatigue
monitoring and also discusses the analysis on lower limb. It contains physical bicycle
with more number of peripheral devices, wireless EMG sensor set and a computer
which provides a visual feedback. The bicyclers were allowed to pedal with constant
speed and EMG signals of lower limb muscles, velocity and time were recorded.
Once the fatigue occurs, cycling speed will show larger deviation in velocity, and
reference was used to judge the cycling stability. This method can be applied on
bicycle ergometer to monitor real-time onset and activities of lower limb muscles
fatigue. Kinesiological and kinematical data’s are measured using this system [21].
2 Prevention of Injuries
Observing the muscle condition of an athlete during their activities is very important
to prevent injury. Swimming is a sport where arms and legs plays a vital role in swim-
mers which is used to create successive movements in propelling the body through
the water. So a system was developed which is capable to measure the stress level of
the muscle of the swimmer and also indicates the muscle fatigue level. This device is
A Study on Surface Electromyography in Sports Applications … 861
3 Performance Accessing
In the field of sports, surface EMG can analyze and monitor different situations and
also makes it a special sort of interest. Improvement in the efficiency of a movement
is mainly determined by the economy of effort, its effectiveness and also injury
prevention [28]. The main goal of performance monitoring systems was to prevent
overtraining of athletes to reduce injuries caused by muscle fatigue and to monitor
the training activities as well as to ensure the performance maintenance [29]. In
sports, movement strategy is very critical, and surface EMG is used to evaluate
activation of muscles in sports application which includes performance, recovery
and also evaluating the risk parameters in injuries. There is a system called Athos
wearable garment system which integrates the surface electromyography electrodes
into the construction of compression athletic apparel.
It decreases the complexity and increases the portability of collection of EMG data
as well as gives processed data. A portable device collects the surface EMG signal,
and it clips them into apparel, process it and sends them wirelessly to a device which
is handled by a client that presents to a trainer or coach. It monitors and provides
the measure of surface EMG which is consistent [30]. Performance of muscles is
calculated in terms of its strength or during contraction, its ability to generate force
[31]. As we are evolving in a highly competitive world, we need a monitoring system
which analyze our body functions with high performance level, especially athletes
those who vigorously train their body. So monitoring of fatigue condition is necessary
to measure accurate fatigue stress level to maximize their performance [32]. Table 6
discusses the various application of EMG sensor in sports.
4 Signal Processing
In the past few years, electromyogram signals were becoming a great need in different
fields of application like human machine interaction, rehabilitation devices, clin-
ical uses, biomedical application, sports application and many more [33]. EMG
signals which is acquired from the muscle need advanced technique for detection,
processing, decomposition and classification. But these signals are very complicated
because they are controlled by the nervous system which is dependent on phys-
iological and anatomical properties of muscles. If the EMG transducer is mainly
placed on the skin surface, it basically collects the signals from all the motors at
a given time. This can generate the interaction of various signals [34]. The EMG
signals which were collected from the muscles using electrode consists of noise.
Removing noise from the signal also becomes an important factor. Such noises are
caused due to different factors which originates from the skin electrode interface,
hardware source and also from other external sources. The internal noise generated
form the semiconductor devices also affect the signal. Some of them include motion
artifact, ambient noise, ECG noise, crosstalk and so on [35]. The EMG signals may be
A Study on Surface Electromyography in Sports Applications … 863
high or low. The amplifiers direct current offsets produce low-frequency noise. This
low-frequency noise can be filtered using high-pass filters, whereas nerve conduc-
tion produce high-frequency noise. High-frequency interference comes from radio
broadcasts, computers which can be filtered using low-pass filter [36]. A specific
band of frequencies should be transmitted in the EMG transmission process which
needs to remove low and high frequencies. It is achieved by a filter called band pass
filter. It is much suitable for EMG signals because it allows specific bands to be
transmitted according to the range fixed by a trainer [37]. EMG signal processing
techniques include three procedures; they are filtration, rectification and smoothing.
Advanced signal processing methods are used in the detection of muscle fatigue
system. The suitable surface EMG signal processing methods for muscle fatigue
evaluation and detection have been listed below.
864 N. Nithya et al.
5 Discussion
This paper is mainly focused on the role of surface EMG sensors and its contribution
in monitoring and detecting muscle fatigue in different body parts such as lower limb,
upper limb and lumbar region during sports activities. The study is also concentrated
in analyzing its various signal processing methods. From this analysis, it is found
that there are only prototypes and samples built up with certain conditions. The chal-
lenges faced with surface EMG are: (1) the signal received from the surface EMG
must be accurate. If any noise gets mixed up, interpretation might go wrong; (2) the
wearables are handy and use batteries for their power consumption. The challenge
is the operating hours should be higher, avoiding repeated replacements (3) every
person is concerned about their data privacy. Data security must be ensured while
transferring them. (4) If there is displacement of electrodes on the muscles, then
the spatial relationship cannot be maintained which will affect the amplitude of the
signal (5) the variation between surface EMG and the power loss is higher before and
after the activity. So the EMG models might not give the proper values of the muscle
fatigue after an intense training. There are no sufficient and advanced technologies
for evaluating muscle fatigue. Further, researchers can concentrate on adopting effi-
cient machine learning and artificial intelligence technologies with secured IoT data
transfer to give an instant update about the strain encountered in the muscles.
A Study on Surface Electromyography in Sports Applications … 865
6 Conclusion
The aim of this study is to analyze various methods of surface EMG techniques
used to monitor muscle fatigue condition in different sports activities. This paper
also demonstrated various categories such as prevention of injuries in athletes with
the use of surface EMG, monitoring the performance with muscle activity and its
signal processing techniques. EMG signals can be transmitted over the Internet for
further analysis. Cloud infrastructure provides storage and processing resources over
the Internet to support EMG monitoring system. Researchers should focus on the
feasibility of the wearable devices to be made available in market as a reliable one to
monitor the signals and derive valuable information in real time. Efficient machine
learning algorithms can be introduced to classify the signals based on the activity. In
future, technologically advanced and compact muscle fatigue detection system with
surface EMG can be implemented.
References
1. Nithya N, Nallavan G (2021) Role of wearables in sports based on activity recognition and
biometric parameters: a survey. In: International conference on artificial intelligence and smart
systems (ICAIS), pp 1700–1705
2. Chaudhari S, Saxena A, Rajendran S, Srividya P (2020) Sensors to monitor the musclar
activity—a survey. Int J Sci Res Eng Manage (IJSREM) 4(3):1–11
3. Yousif H, Ammar Z, Norasmadi AR, Salleh A, Mustafa M, Alfaran K, Kamarudin K, Syed
Z Syed Muhammad M, Hasan A, Hussain K (2019) Assessment of muscle fatigue based on
surface EMG signals using machine learning and statistical approaches: a review. In: IOP
conference series materials science and engineering, pp 1–8
4. Adam DEEB, Sathesh P (2021) Survey on medical imaging of electrical impedance tomography
(EIT) by variable current pattern methods. J IoT Soc Mob Anal Cloud 3(2):82–95
5. Liu SH, Lin CB, Chen Y, Chen W, Hsu CY (2019) An EmG patch for real-time monitoring of
muscle-fatigue conditions during exercise. Sensors (Basel) 1–15
6. Taborri J, Keogh J, Kos A, Santuz A, Umek A, Urbanczyk C, Kruk E, Rossi S (2020) Sport
biomechanics applications using inertial, force, and EMG sensors: a literature overview. Appl
Bionics Biomech 1–18
7. Fernandez-Lazaro D, Mielgo-Ayuso J, Adams DP, Gonzalez-Bernal JJ, Fernández Araque A
(2020) Electromyography: a simple and accessible tool to assess physical performance and
health during hypoxia training. Syst Rev Sustain 12(21):1–16
8. Worsey MTO, Jones BS, Cervantes A, Chauvet SP, Thiel DV, Espinosa HG (2020) Assess-
ment of head impacts and muscle activity in soccer using a T3 inertial sensor and a portable
electromyography (EMG) system: a preliminary study. Electronics 9(5):1–15
9. Gonzalez-Izal M, Malanda A, Gorostiaga E, Izquierdo M (2012) Electromyographic models
to access muscle fatigue. J Electromyogr Kinesiol 501–512
10. Boyas S, Guevel A (2011) Neuromuscular fatigue in healthy muscle: underlying factors and
adaptation mechanisms. Annal Phys Rehabil Med 88–108
11. Al-Mulla MR, Sepulveda F, Colley M (2012) Techniques to detect and predict localised muscle
fatigue 157–186
12. Rum L, Sten O, Vendrame E, Belluscio V, Camomilla V, Vannozzi G, Truppa L, Notarantonio
M, Sciarra T, Lazich A, Manniini A, Bergamini E (2021) Wearable sensors in sports for persons
with disability. Sensors (Basel) 1–25
866 N. Nithya et al.
13. Chang KM, Liu SH, Wu XH (2012) A Wirwless sEMG recording system and its application
to muscle fatigue detection. Sensors (Basel) 489–499
14. Al-Mulla MR, Sepulveda F, Colley M (2011) An autonomous wearable system for predicting
and detecting localised muscle fatigue. Sensors (Basel) 1542–1557
15. Ming D, Wang X, Xu R, Qiu S, Zhao Xin X, Qi H, Zhou P, Zhang L, Wan B (2014) SEMG
feature analysis on forearm muscle fatigue during isometric contractions 139–143
16. Cahyadi BN, Khairunizam W, Zunaidi I, Lee Hui L, Shahriman AB, Zuradzman MR, Mustafa
WA, Noriman NZ (2019) Muscle fatigue detection during arm movement using EMG Signal.
In: IOP conference series: materials science and engineering, pp 1–6
17. Angelova S, Ribagin S, Raikova R, Veneva I (2018) Power frequency spectrum analysis of
surface EMG signals of upper limb muscles during elbow flexion—a comparison between
healthy subjects and stroke survivors. J Electromyogr Kinesiol 1–29
18. Filipa A, Bymes R, Paterno MV, Myer GD, Hewett TE (2010) Neuromuscular training improves
performance on the star excursion balance test in young female athletes. J Orthopeadic Sports
Phys Theraphy 551–558
19. Fatahi M, Ghesemi GHA, Mongasthi Joni Y, Zolaktaf V, Fatahi M (2016) The effect of lower
extremity muscle fatigue on dynamic postural control analysed by electromyography. Phys
Treatments. 6(1):37–50
20. Rahnama N, Lees A, Reilly T (2006) Electromyography of selected lower-limb muscles
fatigued by exersice at the intensity of soccer match-play. J Electromyogr Kinesiol 16(3):257–
263
21. Chen SW, Liaw JW, Chan HL, Chang YJ, Ku CH (2014) A real-time fatigue monitoring
and analysis system for lower extremity muscles with cycling movement. Sensors (Basel)
14(7):12410–12424
22. Elfving B, Dedering A, Nemeth G (2003) Lumbar muscle fatigue and recovery in patients
with long-term low-back trouble—electromyography and health-related factors. Clin Biomech
(Bristol, Avon) 18(7):619–630
23. Coorevits P, Danneels L, Cambier D, Ramon H, Vandeerstraeten G (2008) Assessment of the
validity of the biering- sorensen test for measuring back muscle fatigue based on EMG median
frequency characteristics of back and hip muscles. J Electromyogr Kinesiol 18(6):997–1005
24. Roy SH, Bonato P, KnaflitZ M (1998) EMG assessment of back muscles during cyclical lifting.
J Electromyogr Kinesiol 8(4):233–245
25. Helmi M, Ping C, Ishak N, Saad M, Mokthar A (2017) Assesment of muslce fatigue using
electromyographm sensing. In: AIP conference proceedings, pp 1–8
26. Benoit DL, Lamontage M, Cerulli G, Liti A (2003) The clinical significance of electromyo-
graphy normalisation techniques in subjects with anterior cruciate ligament injury during
treadmill walking. Gait Posture 18(2):56–63
27. Yousif HA, Zakaria A, Rahim NA, Salleh AF, Mahmood M, Alfran KA, Kamarudin L, Mamduh
SM, Hsan A, Hussain MK (2019) Assesment of muscle fatigue based on surface EMG signal
using machine learning and statistical approaches: a review. In: IOP conference series: materials
science and engineering, pp 1–8
28. Masso N, Rey F, Remero D, Gual G (2010) Surface electromyography application in the sport.
Apunts Med Esport 45(165):121–130
29. Taylor KL, Chapman D, Cronin J, Newton M, Gill N (2012) Fatigue monitoring in high
performance sport: a survey of current trends. J Aust Strength Conditioning 12–23
30. Lynn SK, Watkins CM, Wong MA, Balfany K, Feeney DF (2018) Validity and reliability of
surface electromyography measurements from a wearable athlete performance system. J Sports
Sci Med 17(2):205–215
31. Kuthe C, Uddanwadiker R, Ramteke A (2018) Surface electromyography based method for
computing muscle strength and fatigue of biceps brachii muscle and its clinical implementation.
Inf Med Unlocked 34–43
32. Austruy P (2016) Neuromuscular fatigue in contact sports: theories and reality of a high
performance environment. J Sports Med Doping Stud 6(4):1–5
A Study on Surface Electromyography in Sports Applications … 867
33. Chowdhury RH, Reaz RH, Ali MA, Bakar AA, Chellapan K, Chang TG (2013) Surface elec-
tromyography signal processing and classification techniques. Sensors (Basel) 13(9):12431–
12466
34. Raez MB, Hussain MS, Mohd-Yasin F (2006) Techniques of EMG signal analysis: detection,
processing, classification and application. Biol Proced Online 11–35
35. Shair EF, Ahmad S, Marhaban MH, Tamrin SM, Abdullah AR (2017) EMG processing based
measures of fatigue assessment during manual lifting. BioMedical Res Int 1–12
36. Senthil Kumar S, Bharath Knnan M, Sankaranarayanan S, Venkatakrishnan A (2013) Human
hand prosthesis on surface EMG signals for lower arm amputees. Int J Emerg Technol Adv
Eng 3(4):199–203
37. De Luca CJ, Gilmore LD, Kuznetsov M, Roy SH (2010) Filtering the surface EMG signal:
movement artifacts and baseline noise contamination. J Biomech 43(8):1573–1579
38. Cifrek M, Medved V, Tonkovic S, Ostojic S (2009) Surface EMG based muscle fatigue
evaluation in biomechanics 24(4):327–340
39. Ahmad Z, Jamaudin MN, Asari MA, Omar A (2017) Detection of localised muscle fatigue
by using wireless surface electromyogram(sEMG) and heart rate in sports. Int Med Devices
Technol Conf 215–218
Detection of IoT Botnet Using Recurrent
Neural Network
Abstract The Internet of Things (IoT) is one of the most used technologies nowa-
days. Hence, the number of DDoS attacks generated using IoT devices has raised.
Normal anomaly detection methods, like signature-based and flow-based methods,
cannot be used for detecting IOT anomalies as the user interface in the IOT is incor-
rect or helpless. This paper proposes a solution for detecting the botnet activity within
IoT devices and networks. Deep learning is currently a prominent technique used to
detect attacks on the Internet. Hence, we developed a botnet detection model based on
a bidirectional gated recurrent unit (BGRU).The developed BGRU detection model
is compared with gated recurrent unit (GRU) for detecting four attack vectors Mirai,
UDP, ACK, and DNS generated by the Mirai malware botnet, and evaluated for loss
and accuracy. The dataset used for the evaluation is the traffic data created using the
Mirai malware attack performed on a target server using C&C and scan server.
1 Introduction
The Internet of Things (IoT) is an ongoing means of communication [1]. In the near
future, it is anticipated that objects of regular lifestyle can be equipped with microcon-
trollers, microprocessors for virtual communication [2], and proper protocol stacks
in order for them to speak to everyone else and to the users and become a vital
element of the Internet.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 869
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_63
870 P. Tulasi Ratnakar et al.
DOS and DDoS attacks have evolved to be a widespread area, posing substan-
tial hazards to network security and online services’ efficiencies. Due to the inter-
connection between machines that are based on the World Wide Web, the target for
denial of service (DoS) attacks is convenient [2]. A denial of service (DoS) attack
attempts to prohibit potential users from accessing a computer or network resource
by disrupting or stopping the service on an Internet host on a permanent basis. A
distributed DoS attack occurs when several hosts are working together to bomb a
victim with excess of attack packets, and at the same time, the attack takes place in
several locations.
1.3 Botnet
A bot is a computer program which carries out complex tasks. Bots are automatic,
ensuring that they can run without assistance from a human user according to their
own set of instructions. Bots are always trying to imitate or replace human activities
[3]. Botnets are computer networks used to steal information, send phishing emails,
perform distributed denial of service attacks, and allow a hacker to access and extract
information from a particular system. Botnet detection is a method used in various
techniques to identify botnets of IoT devices [4]. The botmasters are responsible for
sending commands to all bots in that particular network using command and control
(C&C) tools. Multiple DoS and DDoS attacks have emerged as threats with increased
use and implementation of IoT devices in recent years. These attacks take place at
different IoT network protocol levels. The layers of the protocol are physical, MAC,
6LoWPAN, network, and device layer [5]. In 2016, a DDoS attack against DNS
provider ‘Dyn’ culminated in the largest DDoS attack ever recorded. Linux.Mirai
created a massive botnet (network of infected devices) via which millions of linked
devices, including webcams, routers, and digital video recorders were infected. This
incident is known as the largest DDoS attack that occurred on October 21, 2016,
with an attack speed of approximately 1.1 terabits per second (Tbps).
processing edges and high layers identify human or animal, numbers, letters, or
facets [6]. Deep learning mainly benefits by leveraging unstructured data, achieving
higher quality results, reducing costs, and eliminating need for data classification,
which allows deep learning to be used in neural networks.
In the domain of networking and cyber security, deep learning is crucial since
networks are prone to security risks including IP spoofing, attack replay or SYN
inundation, jamming as well as resource restrictions including out-of-memory, inse-
cure software, etc., [7]. Deep learning’s self-learning utility has improved accuracy
and processing speed, allowing it to be used effectively to detect Mirai botnet attacks
in IoT devices.
This paper proposes a way for detection of botnet activity among IoT devices and
networks. A detection model is created using recurrent Neural networks (RNN), and
the algorithm used for detection is gated recurrent unit (GRU). Detection is performed
at the packet stage, with an emphasis on text recognition within features rather than
flow-based approaches. For text recognition and conversion, a method called word-
embedding is used. The BGRU based detection model is further compared with GRU
based detection model based on the evaluation metrics, accuracy, and loss.
The main contribution of this paper is:
• To develop GRU and BGRU recurrent neural networks (RNN)-based botnet
detection models.
• To compare the performance of GRU and BGRU models with respect to loss and
accuracy.
The rest of the paper is organized as follows: Section 2 deals with the related work.
Section 3 outlines the design of the system to develop GRU and BGRU recurrent
neural networks (RNN)-based detection models. Section 4 explains about the detailed
implementation of GRU and BGRU recurrent neural networks (RNN). Section 5
contains results for comparing the accuracy and loss of GRU and BGRU recurrent
neural networks (RNN). Section 6 concludes the paper and makes recommendations
for future studies.
2 Related Work
Torres et al. [8] have proposed a work, whereby using a sequence of time changing
states to model network traffic, the viable behavior of recurrent neural networks
is analyzed. The recent success of the RNN’s application to data sequence issues
makes it a viable sequence analysis candidate. The performance of the RNN is
evaluated in view of two important issues, optimal sequence length, and network
872 P. Tulasi Ratnakar et al.
traffic imbalances. Both issues have a potentially real impact on implementation. The
evaluation is performed by means of a stratified k-fold check and a separated test of
unprecedented traffic from another botnet takes place. The RNN model resulted in
an accuracy of 99.9% on unseen traffic.
Sriram et al. [9] have proposed a botnet detection system based on deep learning
(DL), which works with network flows. On various datasets, this paper compares and
analyzes the performance of machine learning models versus deep neural network
models for P2P botnet detection. They employ the t-distributed stochastic neighbor-
embedding (t-SNE) visualization technique to comprehend the various characteris-
tics of the datasets used in this study. On the DS-1 V3 dataset, the DNN model they
used achieved 100% accuracy.
A recently implemented DNN approach is used to detect malware in an efficient
way. DNN methods have a key importance in their ability to achieve a high rate of
detection while generating a low false positive rate. Ahmed et al. [10] have proposed a
strategy for identifying botnet assaults that relies on a deep learning ANN. Other
machine learning techniques are compared to the model developed. The performance
of the ANN model is evaluated by number of neurons within the hidden layers. For
six neurons, accuracy is 95%; for eight neurons, accuracy is 96%; for ten neurons,
accuracy is 96.25%.
Yerima et al. [11] have proposed a deep learning approach based on convolutional
neural networks (CNN) to detect Android botnet. A CNN model which is able to
differentiate between botnet applications and normal applications with 342 static app
features is implemented in the proposed botnet detection system. The trained botnet
detection model is evaluated by a series of 6802 real apps with 1929 botnets of the
open botnet dataset ISCX. The results of this model are examined by different filter
sizes. The best results can be achieved with 32 filters, with an accuracy of 98.9%.
Nowadays, IOT devices are widely used to form botnet, and as a result, McDer-
mott et al. [12] have proposed a solution to detect IOT based botnet attack packets
using deep learning algorithms such as long short-term memory (LSTM) and bidirec-
tional long short-term memory (BLSTM). They have used a technique called word
embedding for mapping text data to vectors or real numbers. As LSTM is a recurrent
neural network, it stores past data in order to predict future results. To remember the
past memory, it uses three gates: forget gate, input gate, and output gate, whereas
bidirectional LSTM uses these gates to store both past and future memory. Both
LSTM and BLSTM resulted in the accuracy of 0.97.
By comparing the collected data with actual expected data, it is possible to detect
real glitch in the collected data by comparing the collected data with unexpected data
received from lower-level fog network devices. The glitches impacting performance
might take the form of a single data point, a set of data points or even data from sensors
of the same type or many different components to detect these glitches. Shakya et al.
[13] proposed a deep learning approach that learns through the expected data to
identify the glitches. The proposed deep learning model resulted in an accuracy that
is nearer to 1.0.
A network attack is possible on IoT devices since they are interconnected with
the network to analyze accumulated data via the internet. To detect IoT attacks, it
Detection of IoT Botnet Using Recurrent Neural Network 873
is necessary to develop a security solution that takes into account the characteristics
of various types of IoT devices. Developing a custom designed safety solution for
every sort of IoT device is, however, a challenge. A large number of false alarms
would be generated using traditional rule-based detection techniques. Hence, Kim
et al. [14] proposed a deep learning-based model using LSTM and recurrent neural
network (RNN) for detecting IoT based attacks. N-Balot IoT dataset is used to train
this model. When it came to detecting BashLite Scam botnet data, LSTM achieved
the highest accuracy of 0.99.
A massively connected world, such as the Internet of Things (IoT), generates a
tremendous amount of network traffic. It takes a long time to detect malicious traffic
in such a large volume of traffic. Detection time can be considerably decreased if this
is done at the packet-level. Hwang et al. [15] proposed a unique word embedding
technique to extract the semantic value of the data packet and used LSTM to find
out the time relationship between the fields in the data packet header, and determine
whether the incoming data packet is a normal flow component or a malicious flow
component. This model was trained on four datasets: ISCX-IDS-2012, USTC-TFC-
2016, Mirai-RGU, and Mirai-CCU. The highest accuracy of 0.9999 is achieved on
ISCX-IDS-2012 dataset.
Hackers have been attracted to IoT devices by their proliferation. The detection of
IoT traffic anomalies is necessary to mitigate these attacks and protecting the services
provided by smart devices. Since anomaly detection systems are not scalable, they
fail miserably when dealing with large amounts of data generated by IoT devices.
Hence, in order to achieve scalability, Bhuvaneswari Amma et al. [16] proposed
an anomaly detection framework for IoT using vector convolutional deep learning
(VCDL) approach. Device, fog, and cloud layers are included in the proposed frame-
work. As the network traffic is sent to the fog layer nodes for processing, this anomaly
detection system is scalable. This framework has a precision of 0.9971%.
IoT-connected devices dependability depends on the security model employed to
safeguard user data and prevent devices from participating in malicious activity. Many
DDoS assaults and botnet attacks are identified utilizing technologies that target
devices or network backends. Parra et al. [17] proposed a cloud-based distributed
deep learning framework to detect and defend against botnet and phishing attacks.
The model contains two important security mechanisms that work in tandem: (i) the
distributed convolutional neural network (DCNN) model is embedded in the micro-
security plug-in of IoT devices to detect application-level phishing and DDoS attacks
and (ii) temporal long short memory (LSTM) network model hosted in the cloud is
used to detect botnet attacks and receive CNN attachments. The CNN component in
the model achieved an accuracy of 0.9430, whereas the LSTM component achieved
an accuracy of 0.9784.
In the above-mentioned works, many botnet identification methods have employed
deep learning algorithms. We can refer from [18] that deep learning algorithms
have better performance than basic machine learning algorithms. As our detection is
performed at the packet level, most of the packet information is present in a sequential
pattern in the info feature. Recurrent neural network is more efficient than the arti-
ficial neural network when it comes to sequential series [19]. We developed a gated
874 P. Tulasi Ratnakar et al.
recurrent neural network (GRU)-based botnet detection model that runs faster than
LSTM [20] as it has fewer training parameters, and the word embedding technique
is used for mapping text data to vectors or real numbers.
3 System Architecture
This section provides the blueprint for developing the GRU and BGRU-based recur-
rent neural networks (RNN) for detection of IOT based botnet attack vectors. This
architecture functions as illustrated in Fig. 1.
The network traffic dataset contains the following features (1) No., (2) Time, (3)
Source, (4) Destination, (5) Protocol, (6) Length, (7) Info, and (8) Label. There may
be some features which do not affect the performance of the classification or perhaps
make the results worse; hence, we need to remove those features as a result we
selected Protocol, Length, and Info features from the dataset [10] and Label is the
target feature.
Our computers, scripts, and deep learning models are unable to read and understand
text in any human sense. Text data must therefore be represented numerically. The
Gated recurrent units simplify the process of training a new model by improving
the memory capacity of recurrent neural networks. They also solve the vanishing
gradient problem in recurrent neural networks. Among other uses, they can be applied
to the modeling of speech signals as well as to machine translation and handwriting
recognition. Considering the advantages of GRU, we employed it for detection of
botnets. Once the feature selection and word embedding are complete, we need to
split the data into train data and test data. The GRU and BGRU models are built, and
the models are trained using train data. The trained models are tested using test data,
and the required metrics are evaluated to compare the models.
4 Implementation
The developed model uses a GRU and BGRU recurrent neural network, as well as
word embedding, to convert the string data found in the captured data packets into
data that can be used as GRU and BGRU input.
Dataset [21] used in this work includes both normal network traffic and botnet attack
network traffic. No., Time, Source, Destination, Protocol, Length, Info, and Label
are some of the features in our dataset. Some features, such as No., Time, Source, and
Destination, are omitted as they are not useful for data processing. The Info feature
contains most of the captured information. Algorithm 1 shows the detailed steps of
our implementation.
876 P. Tulasi Ratnakar et al.
As explained in 4.1, No., Time, Source, Destination are not useful; hence, we omitted
them. The remaining features Protocol, Length, and Info are selected for further
processing.
Actually, the data in our dataset’s Info feature follows a sequential pattern. Hence, we
built our solution by converting each letter into a token and storing it in binary format.
A vocabulary dictionary of all tokenized words is produced, and their associated index
is substituted with the index number in the info column. To understand each type of
attack, the order of the indices in a series must be maintained, and hence, an array of
the indices is generated. Since the protocol and length of the packet that was captured
are related with each attack, the protocol and length features are both included in the
array we previously generated. Word embedding is also used to convert and generate
a dictionary of tokenized protocols together with their index. The length features, as
well as the tokenized protocols, are added to the array. The target feature is converted
from string to integer to classify each type of captured packet. We used one hot
Detection of IoT Botnet Using Recurrent Neural Network 877
After feature selection and word embedding, the data is split into train and test data.
The IOT based botnet detection models are built using GRU and BGRU and trained
by the train data. The detection model incorporates the output layer with sigmoid
activation. A total of 50 iterations of the categorical cross entropy loss function and
Adam optimizer are used to build the models. We evaluated the metrics like loss,
accuracy, validation loss, validation accuracy, and compared the results of GRU and
BGRU to find out its efficiency.
5 Results
The six experiments will assess the overall performance of the two GRU and
BGRU models. Python is the programming language used to build these models. We
used Anaconda (IDE for Python), Keras (Python library for building deep learning
models), Scikit learn (Python library for data preprocessing), Pandas, Numpy (Python
libraries for working on data frames and arrays) to build the models.
Six experiments for comparing GRU and BGRU models are conducted on each
model. The first four experiments use a train dataset and a test dataset containing
normal network traffic and an attack vector network traffic. Both models are trained
using train data and then tested using test data. For each attack vector, evaluation
metrics such as accuracy and loss are calculated. The fifth experiment uses the
train dataset containing normal network traffic and multi-attack vector [Mirai, UDP,
DNS, ACK] network traffic. Both models are trained using train data and then tested
using test data. Evaluation metrics like accuracy and loss are calculated for multiple
attack vectors. The sixth experiment uses the train dataset containing normal network
traffic and multi-attack vector [excluding ACK attack] network traffic. Both models
are trained using train data and then tested using test data. Evaluation metrics like
accuracy and loss are calculated for multiple attack vectors. The validation data used
878 P. Tulasi Ratnakar et al.
in these experiments is 10% of the train data, and this data is further validated to
determine whether or not overfitting exists in our model.
Table 1 shows the evaluation metrics for all six experiments, including accuracy,
validity accuracy, test accuracy, loss, and validation loss.
According to the above Table 1, BGRU is more efficient than GRU since the
accuracy of both algorithms is almost equal, but the loss for BGRU is minimal when
compared to GRU in all the experiments performed. While detecting ACK attacks
in conjunction with other attack vectors, the accuracy is reduced; however, the GRU
model used in this paper performs commendably when predicting ACK attacks. Table
1 shows that the accuracy of experiments that include ACK attack vector (EXPT-3,
EXPT-5) is nearly equal to 1.0. Table 1 shows that the accuracy of validation data in
all experiments is nearly equal to 1.0. This indicates that our model does not exhibit
overfitting.
Table 2 displays the number of training and testing tuples used in each of the six
experiments, as well as the Avgtime/Epoch. Since BGRU is bidirectional, it takes
more time to train than GRU, as given in Table 2. Though BGRU takes more time
compared to GRU, we can refer from Table 1 that it has minimal loss compared to
GRU model which makes it effective than GRU.
As mentioned in Sect. 4.4, 50 epochs are executed for both models, and two graphs
are plotted for each experiment to show how the accuracy and loss varied across each
epoch.
The graphs obtained from each experiment are shown in Figs. 2, 3, 4, 5, 6, and 7.
Once the highest accuracy is reached, the variation of the accuracy and loss across
the epochs in GRU and BGRU in experiments 1, 2, 3, 4 (single attack vector network
traffic) is linear, as shown in Figs. 2, 3, 4, and 5. However, in experiments 5, 6 (multi-
attack vector network traffic), the graphs of accuracy and loss across each epoch
show slight deviations in the case of GRU, as shown in Figs. 6 and 7, whereas the
Detection of IoT Botnet Using Recurrent Neural Network 879
Fig. 2 a–d Graphs of experiment-1 (Mirai attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
880 P. Tulasi Ratnakar et al.
Fig. 3 a–d Graphs of experiment-2 (UDP attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
Fig. 4 a–d Graphs of experiment-3 (ACK attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
Detection of IoT Botnet Using Recurrent Neural Network 881
Fig. 5 a–d Graphs of experiment-4 (DNS attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
Fig. 6 a–d Graphs of experiment-5 (Multi-attack with ACK) a GRU accuracy b GRU loss c BGRU
accuracy d BGRU loss
882 P. Tulasi Ratnakar et al.
Fig. 7 a–d Graphs of experiment-6 (Multi-attack without ACK) a GRU accuracy b GRU loss c
GRU accuracy d BGRU loss
BGRU model works the same as in experiments 1, 2, 3, 4. That is why, despite the
additional overheads, BGRU is a better model than GRU.
6 Conclusion
architecture for designing botnet. Hence, we need to develop P2P botnet detection
method to detect P2P botnets within IoT.
References
19. Apaydin H, Feizi H, Sattari MT, Colak MS, Shamshirband S, Chau K-W (2020) Compara-
tive analysis of recurrent neural network architectures for reservoir inflow forecasting. Water
12(5):1500
20. Yang S, Yu X, Zhou Y (2020) LSTM and GRU neural network performance comparison
study: taking yelp review dataset as an example. In: 2020 international workshop on electronic
communication and artificial intelligence (IWECAI). IEEE, pp 98–101
21. Dataset link: https://drive.google.com/drive/folders/148XD5gU7cAIlOGzF98N2uC42Bf74-
LID?usp=sharing
Biomass Energy for Rural India:
A Sustainable Source
Namra Joshi
1 Introduction
India is a developing nation, and the population is getting rise year by year. As per
the census of the year, 2011 the Indian population is 1.21 billion, and it is expected
to rise by 25% by the year 2036. With such a rise in population, the power demand
is increasing tremendously [1]. In the upcoming two decades, worldwide power
consumption will be rise by 60–70%. According to the world outlook magazine, India
will have peak energy demand and to fulfill the same emissions will also increase.
India is looking toward clean sources of energy, i.e., renewable sources of energy. It
is having around 17% of the entire GDP. The energy sources which can renew again
are termed as Renewable Source of Energy. Renewable energy sources like Solar
[2], Wind [3], Geothermal, etc., include any type of energy obtained from natural
resources that are infinite or constantly renewed. The classification of renewable
energy sources is illustrated in Fig. 1. India is about to achieve the aim of 10 GW
bioenergy-based generation by the year 2022. The position of India is fourth in
renewable energy capacity. Government of India promoting waste to energy program
with the help of financial support from the ministry of petroleum and natural gas.
N. Joshi (B)
Department of Electrical Engineering, SVKM’s Institute of Technology, Dhule, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 885
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_64
886 N. Joshi
Biomass Hydel
Enery Energy
Renewable
Energy
Wind Solar
Energy Energy
Agriculture plays a very crucial role in the Indian Economy. India has 60.43%
of agricultural land. Agriculture waste material can be used for power generation
through the biomass-based plant. In India, MNRE [4] is promoting biomass-based
power plants and cogeneration plants. The basic target is to extract as much power
possible from the sugarcane bagasse, agricultural waste. A special scheme was intro-
duced in the year 2018 by the ministry to promote such type of generation. The esti-
mated potential is around 18,000 MW. As per MNRE annual report, 2020–21 more
than 550 cogeneration plants are installed in India upto December 2020 [5]. The major
states having the potential of such type of generation are Maharashtra, Chhattisgarh,
Tamil Nadu, Uttar Pradesh, West Bengal, Punjab, and Andhra Pradesh. In the rural
areas which are located away from the central grid, biomass-based power plants are
a very good option for such sites. Currently, around 200 biomass-based plants with
a capacity of 772 MW are installed in the country till now. The MNRE has launched
New National Biogas and Organic Manure Program (NNBOMP), for promoting
Biogas Plant installation in rural India. As of 3.June.2021 installed capacity of such
type of power plant in India is 10170 MW. A major contributor to achieve the set
target of bioenergy is sugar mills bagasse-based plants [6]. The source of generation
percentage-wise in January 2021 is illustrated in Fig. 2 [7].
Small Bagasse
Hydro Biomass17%
5% 3%
Biomass Energy for Rural India: A Sustainable Source 887
Biomass is a very effective source of energy. The wood garbage, crops, agricultural
waste are categories as biomass as shown in Fig. 3. It can be transformed into energy-
rich fuel either chemically or biochemically. The energy can be extracted through
various methods as illustrated in Fig. 4. Either we can adopt a dry process or a wet
process for extraction of energy from biomass. The dry process is further classified
into pyrolysis and combustion, whereas the wet process is further classified into
Biomass Energy
Anerobic
Digestion
Pyrolysis
Gasification
Combustion
Fermentation
anaerobic digestion, gasification and fermentation [8]. The energy obtained from
biomass can be further utilized to either produce electrical power or heat.
Biomass-based power plants are more popular in rural areas. As illustrated in Fig. 5,
for the operation of biomass-based power plants first of all we have to gather biomass
materials like agricultural waste, garbage, wood, animal dung cakes, etc. After that
suitable sorting is being carried out [11]. Once sorting is done that we treat gathered
biomass materials and make them suitable to go through the gasification process.
After gasification, we check whether the gas obtained is suitable to run turbines or
not. If it is suitable to run turbines then we feed to turbines which in turn runs the
generator shaft and electrical power is obtained. If gas is not capable enough to drive
the turbine it is given to bio fueled engine. Engines run the shaft of the generator,
and thus, we generate the power [12].
Biomass Energy for Rural India: A Sustainable Source 889
Biomass-based power plants are proven to be a good option to fulfill energy needs.
But so many challenges are associated will biomass-based power plants [14]. The
major challenges associated with biomass-based power are as follows:
• Various seasoned agricultural waste is used as a biomass fuel it is quite typical to
have a constant supply of such type of biomass as agriculture depends on climatic
conditions [15].
• The cost/unit may not able to sustain throughout the year in a competitive power
market.
• The space requirement is more in such type of plant.
• It is not suitable for densely populated areas.
• It is affected by temperature variation.
4 Future Scope
The number of biomass power plants is getting increase day by day in rural areas
worldwide. Although, in India [16], we have to look for more emphasis on the usage
of such a useful mode of power generation. GoI is promoting biomass plants through
several schemes and policy framework as discussed in this paper. Several research
projects are going on to improve effective generation through biomass power plant.
The investment from outside the country will also helps to promote the biomass
plant installation. Power production through biomass plants is a nice step toward
sustainable development.
5 Conclusion
India is the second largest country for producing agricultural waste, and it is having a
very nice potential for biomass energy. As of now around 30% capacity of available
potential is used for generation. The government of India is having an excellent
policy framework to implement biomass-based power generation plants in India.
This can be concluded that in rural region of India biomass energy is proven to be a
best available option for fulfilling power requirements. As power is generated locally
so the cost required to construct huge transmission and distribution network will be
saved. At the same time, T & D losses are also minimized. The feed-in tariff will
also motivate to use of biomass-based power generation plants.
Biomass Energy for Rural India: A Sustainable Source 893
References
1. Paul S, Dey T, Saha P, Dey S, Sen R (2021) Review on the development scenario of renew-
able energy in different country. In: 2021 Innovations in energy management and renewable
resources (52042), pp 1–2
2. Khandelwal A, Nema P (2021) A 150 kW grid-connected roof top solar energy system—case
study. In: Baredar PV, Tangellapalli S, Solanki CS (eds) Advances in clean energy technologies.
Springer Proceedings in Energy. Springer, Singapore
3. Joshi N, Sharma J (2020) Analysis and control of wind power plant. In: 2020 4th ınternational
conference on electronics, communication and aerospace technology (ICECA), pp 412–415
4. Annual Report MNRE year 2020–21
5. Tyagi VV, Pathak AK, Singh HM, Kothari R, Selvaraj J, Pandey AK (2016) Renewable energy
scenario in Indian context: vision and achievements. In: 4th IET clean energy and technology
conference (CEAT 2016), pp 1–8
6. Joshi N, Nagar D, Sharma J (2020) Application of IoT in Indian power system. In: 2020 5th
ınternational conference on communication and electronics systems (ICCES), pp 1257–1260
7. www.ireda.in
8. Rahil Akhtar Usmani (2020) Potential for energy and biofuel from biomass in India. Renew
Energy 155:921–930
9. Patel S, Rao KVS (2016) Social acceptance of a biomass plant in India. In: 2016 biennial
ınternational conference on power and energy systems: towards sustainable energy (PESTSE),
pp 1–6
10. Parihar AKS, Sethi V, Banerjee R (2019) Sizing of biomass based distributed hybrid power
generation systems in India. Renew Energy 134:1400–1422
11. Sharma A, Singh HP, Sinha SK, Anwer N, Viral RK (2019) Renewable energy powered elec-
trification in Uttar Pradesh State. In: 2019 3rd ınternational conference on recent developments
in control, automation and power engineering (RDCAPE), pp 443–447
12. Khandelwal A. Nema P (2020) Harmonic analysis of a grid connected rooftop solar energy
system. In: 2020 fourth ınternational conference on I-SMAC (IoT in social, mobile, analytics
and cloud) (I-SMAC). pp 1093–1096
13. Sen GP, Saxena BK, Mishra S (2020) Feasibility analysis of community level biogas based
power plant in a village of Rajasthan. In: 2020 ınternational conference on advances in
computing, communication and materials (ICACCM), pp 385–389
14. Saidmamatov O, Rudenko I, Baier et al (2021) Challenges and solutions for biogas production
from agriculture waste in the aral sea basin. Processes 9:199
15. Ghosh S (2018) Biomass-based distributed energy systems: opportunities and challenges. In:
Gautam A, De S, Dhar A, Gupta J, Pandey A (eds) Sustainable energy and transportation.
energy, environment, and sustainability. Springer, Singapore
16. Seth R, Seth R, Bajpai S (2006) Need of biomass energy in India. Prog Sci Eng Res J PISER
18, 3(02/06):13–17
Constructive Approach for Text
Summarization Using Advanced
Techniques of Deep Learning
Abstract Text summarization is one of the popular fields, and a great demand is also
associated with text summarization due to a large amount of text which is available
with the Internet in the form of various social media sites, blogs, and other Web sites.
Therefore, the demand with the shortening the information is increasing for reducing
the information for various reasons. Nowadays, there are plenty of resources for the
data is available, and also there the number of tools available for reducing the amount
of information is increasing due to such a great requirement. This paper also discusses
the various types of methods and techniques which are effective in shortening the
text or information using the various advanced technology and advanced algorithms
such as deep learning, machine learning, and artificial intelligence. The advanced
algorithms and technology also work with the other technology to make a great
combination of technology which will resolve the various issues regarding the text
summarization or in other words reduction of information. The main aspect while
reducing the amount of information is that the reduced information must retain the
information which is very essential from the user or application point of view and
must maintain the consistency in the information which is available with the different
sauces.
S. J. Sapra
Research Scholar, Department of Computer Science and Engineering, Sant Gadge Baba Amravati
University, Amravati, India
S. A. Thakur (B)
Assistant Professor, Department of Computer Science and Engineering, G H Raisoni College of
Engineering, Nagpur, India
e-mail: shruti.thakur@raisoni.net
A. S. Kapse
Head of Department, Information Technology, Anuradha College of Engineering, Amravati
University, Chikhli, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 895
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_65
896 S. J. Sapra et al.
1 Introduction
Though there are various challenges associated with the reduction in the size or the
amount of information on the different social media or Web sites sources, there are
many effective methods or techniques available which can efficiently reduce the text
without changing the meaning of the information [1]. Text summarization is the
restating of the actual information of the text and makes that text as short as possible
or in other words expresses the large information in very few words or sentences as
is possible [2].
This research also studies and analyzes the various tools for maintaining the
integrity of the data or information which is to be reduced and also finds the various
parameters which must be efficiently handled while dealing with such a large amount
of data. This research mostly studies the different aspects that how the redundant
information should be removed from the main content of information and to be
replaced by short or summarized text or information [3].
The most important parameter while reducing the size of the data or information
available should be shortened; in other words, after the summarization of the text or
information, this will lead to a significant reduction in the amount of memory required
for saving the shortened content of information which is very less as compared to
the original text [4].
Deep learning will give better results as it has plenty of quality data and becomes
available to obtain from it, and this tedious circulates as the data available increases.
But, if the quality data is not available, this may result in the loss of data which may
be severe or may create damage to the whole system due to loss of useful data.
There is another great example where researchers make the deep learning system
of Google fooled by introducing errors and by changing the useful data and added
noise. Such errors are forcefully introduced by the researchers on a trial basis in
the case of image recognition algorithms. And, it is found that the performance is
greatly hampered because of the changes in the quality and quantity of data that were
introduced with the system [5].
Though there are very small data changes, it is found that the results are greatly
changed, and no change in the input data is allowed in such case, and hence, it is
suggested not to alter the input data; hence, it is has become very essential to add
some constraints to deep learning algorithms that will improve the accuracy of such
systems which will lead to great efficiency of the system performance [6] (Fig. 1).
After reduction of the text or information, different analytical methods are also
applied to judge the performance of the applied method and technology. Thus, the
performance measurement is also again the important aspect while understanding
the text summarization [7].
Many times, the text may contain data that is non-essential data, and such data
must be removed in an efficient way to reduce the text or shorten the text. There
is another data called metadata which is the important data, that is, data about the
data that must be preserved for the shortening of the text and must be represented
differently.
Constructive Approach for Text Summarization Using Advanced … 897
Fig. 1 Summarization
scenario to reduce a large
amount of data to sort data
Then, another significant parameter while reducing the text or information is the
application domain where this reduced information is to be used; this plays a very
important role as based on the application domain; the method or technology to be
used changes continuously [8].
2 Literature Survey
Researchers in the early days designed a system faultless depending on the neural
networks of the intelligence based on human analogy. They grouped and mixed most
of the mathematics and algorithms to create the below processes.
Researchers from the various corners of the world are also continuously making
efforts for smart techniques of the text summarization and are also very successful
in most of the cases, but still, there is more requirement of finding the still more
efficient techniques which are again more effective and efficient. This section deals
with the various studies made by the researchers in the field of text summarization
and also studies and analyzes the different advantages and disadvantages concerning
the various crucial parameters [7].
Different methods may have some disadvantages but are useful in particular
scenarios and are very useful in the various applications related to the various needs
of the users and produce some special usefulness for the different domain and might
have different characteristics of the particular method and are essential to be studied
[9].
In automated text summarization, employing machines will perform shortly
summarizing different kinds of documents using different forms of statistical or
heuristics techniques. A summary in this case is shortened form of text which may
exactly grab and refer to the most crucial and related data which may be contained
in the document or document that is being summarized. These numerous tried are
true and automated text summarization methods that are recently in application of
different domains and different fields of information.
898 S. J. Sapra et al.
3 Proposed Methodology
The new approach proposed is applicable in all the different domains of the data
and information for the use of different methods of artificial intelligence, and deep
learning algorithms are also used for better and faster output. These deep learning
algorithms are used for increasing efficiency and effectiveness which may lead to
faster operation of the proposed methods and may give better very short and abstract
data regarding the context of the information and may save a large amount of time
[11]. Different operations can be explored in different ways for the proposed methods
as shown in Fig. 3.
The method proposed is very useful in many domains and applications, and tech-
nologies are very useful and have produced very great results related to the various
parameters and constraints which are related to the close of the summarization. Thus,
the proposed method has different constraints based on the process of summarization,
and such great things also have many benefits for the specific applications associated
with the different parameters such as time of summarization, speed of summarization,
and accuracy or perfectness of the summarization as shown in Fig. 4.
The architecture addresses the different techniques of abstractive summarization
task accurate details in the source document, like dates, locations or, phone numbers,
that were often imitated erroneously in the summary [12].
Constructive Approach for Text Summarization Using Advanced … 899
parameters, and all parameters are important making the text useful for the show-
casing the large information in a very short space, in other words, making the text
very crucial and readable to the user and different applications also [13].
The datasets formulated for the proposed methodology were implemented on
CNN/DailyMail dataset.
4 Scope
The proposed model is very efficient in dealing with many challenges faced during
the shortening of text and forming a new short text. Thus, there is a great use of the
proposed method, and its scope is broad in a large number of business applications
also [14]. This proposed method is very suitable in the real-time applications related
to the various domains which provide numerous advantages to the user and gives
better efficiency as compared to the other studied methods for the shortening or
information which may lead to a better quality of text and can be directly used in
many sorts of application where short data or information is essential or where it
is very essential to represent the data or information in very short words. Thus,
the proposed method provides a great help in reducing the amount of redundant
information and producing meaningful information in very few words [1].
• Precision-target—Precision-target (prec(t)) does the same thing as prec(s) but
w.r.t to real summary. The acquaintance is to calculate how many entities does
model produces in the proposition summary is also measure of the real summary.
Mathematically, it is set as
here, N(h) and N(t) refer to named entity set in the generated/premise and the real
summary, respectively.
• Recall-target—Under recall-target (recall(t)), the knowledge is to calculate how
many entities in the real summary are not present in the model generated
hypothesis summary. Mathematically, it is set as
here, N(h) and N(t) refer to named entity set in the produced/assumption and the real
summary, respectively. To consume an individual measurable number, they merge
together prec(t) and prec(s) and signify as F1-score. Mathematically, it is set as
The mathematical terms are initiated to be very useful and essential for the use of
summarization of different types of text or information.
L t := L tM L + λL tcoverage .
Classification can be done on different datasets which would be useful for gener-
ating the different usage patterns of the necessary information and can be grouped in
the cluster for the better quality of data for the application and are also useful in the
representation of different notations used in a specific domain of application [15].
Training ROUGE ROUGE2 ROUGE Macro-precs Micro-precs Macro-prect Micro-prect Macro-recallt Micro-recallt Macro-F1t Micro-F1t
902
data 1 L
Newsroom Original + 47.7±0.2 35.0±0.3 44.1±0.2 97.2±0.1 97.0±0.1 65.4±0.3 62.9±0.4 70.8±0.3 68.5±0.2 68.0±0.2 65.6±0.3
filtering + 47.7±0.1 35.1±0.1 44.1±0.1 98.1±0.1 98.0±0.0 66.5±0.1 63.8±0.1 70.2±0.2 67.7±0.3 68.3±0.1 65.7±0.1
classification
JAENS 47.7±0.2 35.1±0.1 44.2±0.2 98.1±0.1 98.0±0.0 67.2±0.4 64.2±0.4 70.3±0.2 67.8±0.4 68.7±0.3 65.9±0.4
46.6±0.5 34.3±0.3 43.2±0.3 98.3±0.1 98.3±0.1 69.5±1.6 67.3±1.2 68.9±1.5 66.8±1.6 69.2±0.1 67.0±0.2
CNNDM Original + 43.7±0.1 21.1±0.1 40.6±0.1 99.5±0.1 99.4±0.1 66.0±0.4 66.5±0.4 74.7+0.7 75.4±0.6 70.0±0.2 70.7±0.3
filtering + 43.4±0.2 20.8±0.1 40.3±0.2 99.9±0.0 99.9±0.0 66.2±0.4 66.6±0.3 74.1±0.6 74.9±0.6 69.9±0.2 70.5±0.2
classification
JAENS 43.5±0.2 20.8±0.2 40.4±0.3 99.9±0.0 99.9±0.0 67.0±0.6 67.5±0.5 74.7±0.2 75.5±0.1 70.6±0.3 71.3±0.3
42.4±0.6 20.2±0.2 39.5±0.5 99.9±0.0 99.9±0.0 67.9±0.7 68.4±0.6 75.1±0.7 76.4±0.7 71.3±0.2 72.2±0.3
XSUM Original + 45.6±0.1 22.5±0.1 37.2±0.1 93.9±0.1 93.6±0.2 74.1±0.2 73.3±0.2 80.1±0.1 80.3±0.3 77.0±0.1 76.6±0.2
filtering + 45.4±0.1 22.2±0.1 36.9±0.1 98.2±0.0 98.2±0.1 77.9±0.2 77.3±0.2 79.4±0.2 79.6±0.2 78.6±0.1 78.4±0.2
classification
JAENS 45.3±0.1 22.1±0.0 36.9±0.1 98.3±0.1 98.2±0.1 78.6±0.3 78.0±0.3 79.5±0.3 79.8±0.4 79.1±0.1 78.9±0.1
43.4±0.7 21.0±0.3 35.5±0.4 99.0±0.1 99.0±0.1 77.6±0.9 77.1±0.6 79.5±0.6 80.0±0.5 78.5±0.2 78.5±0.1
S. J. Sapra et al.
Constructive Approach for Text Summarization Using Advanced … 903
ts(i)
LiBIO θ (enc), x , z = −
i i
logpθ(enc) zti xi
t=1
5 Results
Results of the proposed method found to be very accurate summarization of the text
as it produces a short type of the summary; otherwise, it would be self-explanatory
and directly applicable in the application for the representation of the data which
leads to the better performance of the overall system.
6 Conclusion
7 Future Scope
It is expected that the continuous research and improvement in the proposed model
will definitely increase the usefulness of the proposed architecture in the field of
text summarization and will eventually result in a variety of utility and tools. These
strategies will also improve the effectiveness and efficiency of implementing various
methods and technologies of text summarization for fast application in different
domains. And therefore, it leads to enhanced summarization approach that will
improve the proposed method to a great extent.
904 S. J. Sapra et al.
References
Abstract Lane vehicle detection is fundamental to vehicle driving systems and self-
driving. The proposed concept is to employ the pixel difference in the intended lane
line backdrop to isolate the lane and the road surface, and then, the curve fitting model
is used to identify the lane in the image. A histogram on gradient, histogram graph,
and binary spatial features are extracted from the vehicle and non-vehicle images.
For vehicle detection, support vector machine classifier is employed to separate the
vehicle and non-vehicle images using the extracted features. But many methods are
constrained by light conditions and road circumstances, such as weak light, fog, rain,
etc., which may result in invisible lane lines. Feature extraction is the lane images
being picked using various filters. Our work focuses on a lane detection technique
founded on the Sobel filter and curve fitting model for lane line tracking in different
conditions. Preprocessing encompasses the mitigation of noise as well as getting
the image ready for the subsequent procedure. To achieve this, HLS color space
was performed which identifies the lane by adding pixel values. The main aim is to
increase the accuracy and reduce the computation time compared to other existing
methods.
Keywords Sobel filter · Curve fitting model · Lane detection · Vehicle detection ·
Sliding window · Support vector machine
1 Introduction
Most accidents occur due to invisible road lanes. The accidents can be reduced
drastically, by employing improved driving assists. A system that warns the driver
can save a lot of a considerable number of lives. To increase safety and reducing road
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 905
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_66
906 R. Rajakumar et al.
accidents, researchers have been worked for better driving techniques that assure
security.
While driving, accidents occur due to the driver’s unawareness of the lane, specif-
ically at the curved lane which leads to accidents. Therefore, if it is possible to infer
the road and vehicles before the advent of any lane conditions, assist the driver in
advance to reduce speed and avoid road accidents by using the proposed sliding
window algorithm.
In driving assistance to achieve safety on roads, the challenging tasks are road lane
detection or boundary detection which is exposed in white and yellow lines on roads.
Many researchers are working on lane detection, lane tracking, and warning on lane
departure. Yet, many systems have limitations of shadows, changing illumination,
worse conditions of road paintings, and other image interference. This problem can
be overcome by using the proposed algorithm.
This paper developed a curve fitting model enhancing the strength of detecting
the lane and tracking for safe transportation. In our method, lane detection and
tracking will be inspected by the curve fitting model and related component function
to improvise detection of lane and tracking. The support vector machine classifier
was widely used for the detection of a vehicle.
2 Related Works
The literature [1] extracted the AROI to overcome the complexity of computation.
Then, Kalman filter along with progressive probabilistic Hough transform (PPHT)
is used to find boundaries of the lane in the image. Depending on the lane and the
position of the vehicle, their algorithm decides if the vehicle is offset. Different lane
conditions are used for detection and tracking for both city roads and highways. In
the literature [2], lane marks in road images are extracted which is based on the
multi-constraint model and a clustering algorithm is proposed to detect the lane. By
dividing the region of interest into sections, it is easy to track lane lines with curved
shapes. The literature [3] used the B-spline fitting from the RANSAC algorithm
for the front lane and Hough transform for the rear lanes. The algorithm is used
for lane detection, and it eliminates the interference lines, better than the RANSAC
algorithm. The literature [4] improved the accuracy of lane recognition and aimed
to minimize the pixel-wise difference. The predicted lane has both white and yellow
pixels that also do not directly reflect the lane parameters which are essential to detect
the straight line. To detect a lane, we avoid the interference of fixed objects and other
parameters on the outside of the lanes. After the pixels in the road area are selected
as a reorganized data matrix, for the detection of a pre-trained vehicle, a deep neural
network is employed to get the moving vehicle’s information.
The literature [5] proposed a flexible road identification method that connects
both lane lines and obstacle boundaries, applicable for detecting lanes. This algo-
rithm uses an adaptive sliding window for lane extraction using the least-squares
method for lane line fitting. The literature [6] employs a color threshold method
Lane Vehicle Detection and Tracking Algorithm … 907
to identify the lane edges along with perspective transform and Hough transform
technique to detect lane segments in the image. These conditions are a straight lane
and sunny climate. Literature [7] dealing with Vision Based methodology performs
well only in controlled weather conditions and uses Hough transform to identify the
straight road. In this paper, edge-based detection with the open street map is used
to detect lanes which increases computation time. Hough transform [8] identifies
the straight lane, and the curve fitting identifies the curved lane which increases the
computation time. In [9], vehicles are constructed by their geometry and structured
as a combination of a small image to form one histogram, similar to the sliding
window model. In the literature [10], feature pairing elimination (FPE filter) is used
for only feature extraction and SVM, random forest, and K-nearest neighbor clas-
sifiers were compared. In this lane detection, Hough transform [11, 12] is used to
detect them, but these algorithms increase the computational time and the complex
processing. It is essential to focus on the edge image derived from the response of the
CenSurE algorithm. By using the edge lane, we can identify the traffic lane which is
detected from its geometry. For identifying the blobs, an SVM classifier is used [13].
The literature [14] predicts the position of the vehicle by using the Kalman filter
and histogram technique with a mean detection accuracy of 94.05% during the day.
Particle-filter-based tracking was used to learn some road scene variations in [15].
Lane detection is used in image processing and computer vision that has many
applications. Previous literature in lane detection, dealt with curve detection methods.
These algorithms detect the lane edges and are used to determine the vehicle position
in the lane. A dataset captured using a single monocular camera is used for lane
detection. This work contributes to the correct position of the vehicle in the same
lane. This system recognizes most of the white and yellow markings across the lane
effectively during different climatic conditions which include shadows, rain, snow,
or any damage on the road.
The lane detection algorithm includes lane detection and lane tracking method.
By changing the input parameters, regions of interest are identified. Both perspective
and inverse perspective transforms were performed on the lanes to detect the region
of interest. In the next step, detected lanes are analyzed by the Sobel filter and the
future lanes are calculated using the polynomial curve model.
In this section, we will explain lane detection by using the HLS method and Sobel
filter method edge detection and that examine the results by using this given method in
Chap. 4. The following procedures are performed to detect the lane. The preprocessed
image is perspective transformed which converts a 3-dimensional image to a 2-
dimensional image. Then, the Sobel filter was performed for noise reduction and to
identify the pixel representing the edge. The filtered image is converted into HLS
colour space and its components (hue, lightness and saturation) to detect the yellow
lane were identified. To detect the white lane, maximum lightness value of 100% was
908 R. Rajakumar et al.
Perspective transform
selected. The histogram is computed to separate the left and right lane by summing
the pixel value and select the maximum pixel which identifies the lane.
The sliding window method is applied from the bottom of the image by identifying
lane pixels. The next upward sliding window is constructed based on the previous
window. Then, the polynomial fit to find both lanes using the previous lane is used
to estimate the search area for the next frame. Eventually, the fitted lane is etched on
the original image and an inverse perspective transform is performed.
See Fig. 1.
The Sobel filter performs by estimating the image intensity at every pixel of the lane.
Sobel filter estimates the direction of the change in light for any direction. Figure 2
shows how the lane image changes at each pixel and how the pixel representing edges
changes.
The Sobel filter has two 3 × 3 kernels: one kernel to identify changes in the hori-
zontal direction and another kernel to identify changes in the vertical direction. The
two kernels are combined with the original lane to calculate the equation derivatives.
Lane Vehicle Detection and Tracking Algorithm … 909
By applying the threshold selected part of the image, ROI, we have a hue, lightness,
and saturation (HLS) component color image as input. In this step, to find lane
boundaries one edge detection method called the Sobel filter is used and boundaries
detected.
In the Sobel filter, the main objective is to detect the edges that are nearer to the
real lane edges. Sobel edge detection basically uses the gradient vector of an intense
image. Lane boundary features are extracted using a gradient vector and through
which we can detect the lane.
Many edge detection methods have different edge operators that can be used, but
the efficiency levels are different. One of the best and efficient methods is Sobel edge
detection.
The novel feature of the Sobel method is that the error rate of this method is low
because this algorithm uses a double threshold for a yellow and white lane. Therefore,
the detected edge is close to the real-world lane.
In the next step, the captured color image is transformed to HLS color space
to speed up the process and be less sensitive to scene conditions. To detect the
white lane, lightness is set to a value close to 100%. Then, the combination of
saturation and lightness value was defined to detect the yellow lane. In our proposed
method, captured images chose from the directory of the Xi’an city database would
be processed. The camera is so calibrated that the vanishing point of the road should
be placed on the top of the region of interest (ROI).
A histogram contains the numerical value of an image. Information obtained from the
histogram is very large in quality. The histogram indicates the particular frequency
910 R. Rajakumar et al.
Fig. 3 Histogram
computation
of different gray levels in a lane. Lane images contain a series of pixel values, and
a different pixel value includes a specific color intensity value. This is an important
step in the segmentation, and computation time will be decreased. At the lower level
of the lane, the only lane will be present. When we scan up in the image, other
structures will be present. For all the bands, a peak will be drawn. The histogram
contains a sum of pixel values horizontally from that left and the right lane that can
be identified which contains a larger pixel value which is shown in Fig. 3.
identified lane points, changes them depending on the vehicle projection, and then
alternates points based on the values of the left and right edge points on the lane.
A curve fitting model was selected for efficient tracking. Another algorithm
requires less computational time and is less vulnerable to distortion. Equations with
more degrees can provide a correct fit of the lane. In this work, a curve fitting
model that is applicable to trace curved roads, vigorous during noise, shadows, and
weak lane markings is employed. Also, it can give details about lane orientation and
curvature. Lane tracking involves two parameters which will be discussed in the next
section.
To construct the sliding window, the initial point of the windows must be known. To
find the initial point, a histogram for the bottom part of the image is calculated. Based
on the peak value of the histogram, the initial window is selected and the mean of
the nonzero points inside the window is determined. For the first half of the image,
the left lane peak is obtained and the other right half gives the peak of the right lane.
Thus, left and right starting sliding windows are formed, and then, left lane center
and right lane center are calculated. This kind of selection works fine for both lanes
on the left and right sides of the image.
In some cases, for example, where the vehicle is gradually steered more toward
the right, then we might see the right lane present in the left half. In such situations,
improper detection is possible. To avoid such situations, a variable cache is defined
to save the starting point windows of previous lanes. The histogram is not calculated
throughout the detection process but only for the first few frames, and later, it will
be dynamically tracked using the cache. For each initial sliding window, the mean
of the points inside each window is calculated. Two windows to the left and right of
the mean point and three more windows on top of the mean point are selected as the
next sliding windows. This kind of selection of windows helps to detect the sharp
curves and dashed lines. The selection of sliding windows is shown in Fig. 4.
The window width and height are fixed depending upon the input dataset. The
width of the sliding window should be adjusted depending on the distance between
both lanes. The sliding windows on top help track the lane points turning left and
right, respectively. The windows need to have a relatively well-tuned size to make
sure the left- and right-curved lanes are not tracked interchangeably when lanes have
a sharp turn and become horizontally parallel to each other. The detected points inside
the sliding window are saved. The process of finding the mean point and next set of
sliding windows based on valid points inside the respective sliding windows for left
and right lanes is continued until no new lane points are detected. Points detected
in the previous sliding windows are discarded when finding points in the next set
of sliding windows. Then, the searching can stop tracking when no new points are
discovered.
912 R. Rajakumar et al.
Once the left and right points are detected, these points are processed to polynomial
fitting to fit the respective lanes. Average polynomial fit values of the past few frames
are used to avoid any intermittent frames, which may have unreliable lane informa-
tion. The lane starting points are retrieved from the polynomial fitting equation. This
approach helps increase the confidence of the lane’s starting point detection based
on lanes rather than relying on starting sliding windows. The deviation of the vehicle
from the center of the lanes is estimated. Then, the image is inverse perspective trans-
formed, and the lanes are fitted onto the input image. The sliding window output is
shown in Fig. 4.
The curve model is obtained for the lane curve, and the quadratic equation is imple-
mented to analyze and compare the model’s merits and demerits of the different
structures.
The equation of the curve model is given as
Ax 2 + Bx + C = 0 (1)
where A, B, C are the given constants of the quadratic curve, five of which three
constants of the quadratic curve are thus stated.
Lane Vehicle Detection and Tracking Algorithm … 913
The vehicle detection is implemented through the support vector machine classifier.
To extract features, histogram-oriented gradient (HOG), histogram, and binary spatial
were performed on training images and input images. Then, the processed image is
converted into YCbCr color space transformation which increases the brightness.
The training input images are fed into the SVM network. This model performs
normalizing the data to the same scale approximately. GTI vehicle image dataset
comprises 8792 vehicle images and 8968 non-vehicle images that are trained to the
SVM classifier and stored in a pickle file.
For vehicle detection, a sliding window technique is performed at each pixel
level and a trained classifier is used to search for vehicles in images. After training
is completed, the support vector machine classifier is applied to the lane images.
From [13], support vector machine classifier is a simple and efficient technique for
classifying vehicles based on the features. To eliminate the false positives, the heat
map function of a higher threshold value was selected. This algorithm was simulated
using PyCharm software.
Our technique encodes the spatial variation among the referenced pixel and its
neighbor pixels, which depends on the gray abrupt changes of the horizontal, vertical,
and oblique directions. The difference between the center pixel and its surrounding
neighbors is calculated to mirror the amplitude information of the entire image. We
used a support vector machine classifier that uses space information to classify the
lane images, and also necessary features are identified for each pixel in this method.
Then, the features are quantized to train the support vector machine model. After
then, the resulting regions are modeled using the statistical summaries of their textural
and shape properties than the support vector machine model used to calculate the
classification maps. Figure 8 shows the binary spatial graph.
Captured image
Features extraction
A support vector machine classifier is a machine learning approach that enables two
separate classifications [11]. The SVM classifier includes a set of labeled training
data provided by the individual category and used to classify the vehicle. Support
vector machine algorithm employs a hyperplane in N-dimensional space that in turn
classifies the data points. The support vector machine or separator’s large margin
is supervised learning methods formulated to solve classification problems. SVM
technique is a way of classification of two classes that separate positive values and
negative values. An SVM method is based on a hyperplane that separates the different
values, so the margin will be almost maximum. The purpose of the SVM includes the
916 R. Rajakumar et al.
selection of support vectors that contain the discriminate vectors, and the hyperplane
was estimated.
In the given image, overlaps are detected for each of the two vehicles, and two
frames exhibit a false positive detection on the center of the road. We intend to build
a heat map combining overlapping detections and removing false positives. For this
purpose, a heat map of a higher threshold limit is used.
6 Experimental Result
6.1 Lane Vehicle Detection for Video Frames from Xi’an City
Dataset
The present section details the experimental results of our lane vehicle detection
method with two sets of various video frames obtained from the Xi’an city dataset.
Frames in this dataset have shadows from trees and cracks on the surface of the
roads. Figure 9a, b, c, d shows some sample frames marked with lanes and vehicles
for the dataset. When all frames in the dataset are processed, we see that our holistic
detection and tracking algorithm has 95.83% accuracy in detecting the left lane and
vehicle.
The proposed sliding window model was tested using a dataset with different
driving scenes to check the adaptiveness and effectiveness. The results showed that
the proposed sliding window algorithm can easily identify the lanes and vehicles
in various situations and it is possible to avoid wrong identifications. In analyzing
the parameters, different window sizes were found to make improvements on the
performance of lane and vehicle detection.
To assess the computational complexity of the proposed hybrid lane vehicle detection
and tracking algorithm, we first computed the time required to fully process a single
frame of size (1280 × 720). For a (1280 × 720) frame, processing time was found to
be around 3 to 4 s/frame. To operate in real time, time for calculation is an important
parameter (Table 1).
Lane Vehicle Detection and Tracking Algorithm … 917
Fig. 9 a Output frame of Xi’an city database. b Output frame of Xi’an city database. c Output
frame of Xi’an city database. d Output frame of Xi’an city database
918 R. Rajakumar et al.
Table 2 Accuracy
Performance (%) Dataset
calculation comparison
Total frames 975
MLD 1.8
ILD 2.37
Accuracy 95.83
Table 3 Accuracy
Source Accuracy (%)
comparison
Literature [8] 93
Literature [10] 95.35
Literature [13] 94.05
Proposed algorithm 95.83
where MD denotes the detection that had a miss, ID indicates the incorrect detection,
C was the images detected correctly in the dataset, and N denotes the total number
of dataset images.
GTI vehicle image dataset comprises 8792 vehicle images and 8968 non-vehicle
images that are trained in the SVM classifier, and accuracy was calculated as 99%.
In this paper, different scenes are selected from the video as samples to test the accu-
racy. A sequence of 975 frames was tested, 934 lane vehicle frames were correctly
identified, and 95.83% accuracy was obtained by curve fitting model (Tables 2 and
3).
Lane vehicle detection and tracking is an important application to reduce the number
of accidents. This algorithm was tested under different conditions to render the
transport system very strongly and effectively.
Lane Vehicle Detection and Tracking Algorithm … 919
As in the case of lane detection, we described and implemented the HLS color
space and edge detection by using the Sobel filter. Then, we analyzed the curve fitting
algorithm for efficient lane detection.
For vehicle detection and tracking, support vector machine classifier and sliding
window techniques were performed. For our dataset, accuracy was calculated as
95.83%. This algorithm computation time was calculated as 3–4 s/frame.
In the future, we will improve the lane and vehicle detection system by reducing
the computation time in the proposed algorithm. In this approach, the detected lanes
and vehicles can be efficient in real time. This algorithm can be further developed
for self-driving vehicles.
References
1. Marzougui M, Alasiry A, Kortli Y, BailI J (2020) A lane tracking method based on progressive
probabilistic Hough transform. IEEE Access 8:84893–84905, 13 May 2020
2. Xuan H, Liu H, Yuan J, Li Q (2018) Robust lane-mark extraction for autonomous driving under
complex real conditions. IEEE Access, 6:5749–5766, 9 Mar 2018
3. Xiong H, Yu D, Liu J, Huang H, Xu Q, Wang J, Li K (2020) Fast and robust approaches for lane
detection using multi-camera fusion in complex scenes. IET Intell Trans Syst 14(12):1582–
1593, 19 Nov 2020
4. Wang X, Yan D, Chen K, Deng Y, Long C, Zhang K, Yan S (2020) Lane extraction and
quality evaluation: a hough transform based approach. In: 2020 IEEE conference on multimedia
information processing and retrieval (MIPR), 03 Sept 2020
5. Li J, Shi X, Wang J, Yan M (2020) Adaptive road detection method combining lane line and
obstacle boundary. IET Image Process 14(10):2216–2226, 15 Oct 2020
6. Stević S, Dragojević M, Krunić M, Četić N (2020) Vision-based extrapolation of road lane lines
in controlled conditions. In: 2020 zooming innovation in consumer technologies conference
(ZINC), 15 Aug 2020
7. Wang X, Qian Y, Wang C, Yang M (2020) Map-enhanced ego-lane detection in the missing
feature scenarios. IEEE Access 8:107958–107968, 8 June 2020
8. Wang H, Wang Y, Zhao X, Wang G, Huang H, Zhang J (2019) Lane detection of curving
road for structural high-way with straight-curve model on vision. IEEE Trans Veh Technol
68(6):5321–5330, 26 Apr 2019
9. Vatavu A, Danescu R, Nedevschi S (2015) Stereovision-based multiple object tracking in traffic
scenarios using free-form obstacle delimiters and particle filters. IEEE Trans Intell Trans Syst
16(1):498–511
10. Lim KH, Seng KP, Ang LM et al (2019) Lane detection and Kalman-based linear parabolic lane
tracking. In: International conference on intelligent human-machine systems and cybernetics,
pp 351–354
11. Kang DJ, Choi JW, Kweon IS (2018) Finding and tracking road lanes using line-snakes. In:
Proceedings of the conference intelligent vehicles, pp 189–194
12. Wang Y, Teoh EK, Shen D (2014) Lane detection and tracking using B-snake. Image Vis
Comput 22(4):269–280
13. Cortes C, Vapnil V (2020) Support vector networks. Mach Learn 20(3):273–297
14. Zhang X, Huang H (2019) Vehicle classification based on feature selection with anisotropic
magnetoresistive sensor. IEEE Sens J 19(21):9976–9982, 15 July 2019, 1 Nov 2019
15. Gopalan R, Hong T, Shneier M et al (2019) A learning approach toward detection and tracking
of lane markings. IEEE Trans Int Transp Syst 13(3):1088–1098
A Survey on Automated Text
Summarization System for Indian
Languages
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 921
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_67
922 P. K. Vaishali et al.
1 Introduction
The need for automatic summarization increases as the amount of textual information
increases. Unlimited information is available on the Internet, but sorting the required
information is difficult. Automated text summarization is the process of developing
a computerized system that has the ability to generate an extract or abstract from
an original document. It presents that information in the form of a summary. The
need for summarization has increased due to unlimited sources. Summarization is
useful in information retrieval, such as news article summary, email summary, mobile
messages, and information of businesses, offices and for online search, etc. There
are numerous online summarizers accessible, such as Microsoft News2, Google1,
and Columbia Newsblaster3 [1]. For biomedical summarizing, BaseLine, FreqDist,
SumBasic, MEAD, AutoSummarize, and SWESUM are utilized [2]. Online tools
include Text Compacter, Sumplify, Free Summarizer, WikiSummarizer, and Summa-
rize Tool. Open-source summarizing tools include Open Text Summarizer, Clas-
sifier4J, NClassifier, and CNGL Summarizer [3]. As the need for knowledge in
abstract form has grown, so has the necessity for automatic text summarization.
The first summarizing method was introduced in late 1950. The automatic summa-
rizer chooses key sentences from the source text and condenses them into a concise
form for the general subject. It takes less time to comprehend the information of a
huge document [4]. Automatic text summarization is a well-known application in
the field of Natural Language Processing (NLP). The majority of the work in this is
focused on sentence extraction and statistical analysis. However, recent study trends
are focusing on cue phrases and discourse structure. Text summarizing is widely clas-
sified into two types: extractive summarization and abstractive summarization. The
extractive approach takes essential lines or phrases from the original text and puts
them together to provide a summary that retains the original meaning [5]. Reading and
comprehending the source text is required for abstractive summarization. It employs
linguistic components and grammatical rules from the language. The abstractive
system can produce additional sentences, which improves the summary’s quality or
standard.
Manual summarization of the large text document is a difficult task. It also requires
more time for the summary generation. A text summarizer is an essential tool for
understanding the text and then generating the summary. For this reason, an automatic
summarizer tool is very much required to provide a quick view as a concise summary.
It is the need of the current era of information overload. The automatic summarizer
converts a large document text to its shorter form or version by maintaining its overall
content by its meaning.
A Survey on Automated Text Summarization System … 923
Extractive text summarization is used to choose the most important sentences from
the original source. The relevant sentences are extracted by combining statistical and
language-dependent characteristics of sentences. Extractive summaries are chosen
in most instances around the world since they are simple to implement. The problem
with extractive systems is that the summaries are long and may contain information
that isn’t necessary for the summary. The crucial information is dispersed across the
document or in several text sections [6].
2 Literature Survey
To study all about the automatic text summarization system survey of past literature
is done to get the specific knowledge and identification of scopes in the application.
Table 1 gives a brief history of the past literature.
There are different forms of summaries required for the application. Summarizer
systems can be classified as per the type of summary requiremnent for the application.
There are two types of summarizer systems: extractive and abstractive. The table
below summarizes the key concepts of extractive and Abstractive summarization in
brief (Table 2).
In addition to extractive and abstractive, there are various other types of summaries
that exist. Different summarization methods are used based on the type of summary
Table 1 Automatic text summarization systems for Indian languages
924
Author Trend of research and Technique/Methodology Dataset and features Lacuna Outcomes
language
Sivaganesan et al. [7] Social networks Interest based parallel Interactive behavior of Parallelism in large Social influence analysis
influence analysis algorithm, semantics the user is weighted networks is difficult algorithm enables
structure based, network, dynamically task identifing influential
partitioned graph with users, implementing the
page rank machines with CPU
architecture and
community structure
Valanarasu et al. [8] Summarization of social Prediction of personality Job applicants data, If job applicants are Digital footprint used for
media data for of job applicants, Naïve collection of the various non-social media prediction of people
personality prediction Bayes, and SVM dataset from different users, proposed model through communication,
using machine learning probability prediction social media sites cannot be used sentiments, emotions,
and A.I models and expectations to their
data
Sinha et al. [9] Extractive mutidocument Single doccument, Data set with 100 Lack of standard ROUGE-1 and
summarization sentence encoding, document sets, each set Malayalam NLP tools, ROUGE-2 based
(Malyalum) TextRank, MMR, with three news articles, problem in multi evaluation calculated at
sentence scoring TF-idf, Word2Vec and document 0.59, 0.56, 0.57% for
aalgorithm Smooth Inverse summarization Precision Recall F-Score,
Frequency SIF, respectively
TextRank
Malagi et al. [10] Survey on automatic text Extraction and Own data sets. Sentence Lack of It is an effort to bridge
summarization abstraction, LSA, HMM, scoring features multi-document the gap in researches in
SVM, DT models, summarizers due to the development of text
clustering algorithms tools and sources summarizers
(continued)
P. K. Vaishali et al.
Table 1 (continued)
Author Trend of research and Technique/Methodology Dataset and features Lacuna Outcomes
language
Manju et al. [11] Abstractive text Abstractive machine Structure, semantic Predefined structures Semantic approach
summarization for learning approaches, features, word may not result in a improves better analyzed
sanskrit prose graph-based method signicance, compounds coherent or usable summary
and sandhis, verb usage summary
diversity
Verma et al. [12] Extractive and Stop words list, stemmer, 100 news articles, NLP Unavailability of NLP tools are essential
abstractive NER sentiment analyzer, tools as stemmer, PoS resource for Language for summarizing the text
summarization methods wordNet, word vector, tagger, parser, named understanding and accurately
and NLP tools for Indian segmentation rules, entity recognition generation
languages corpus system, etc
Mamidala et al. [13] Automatic text Extraction and Own datasets, sentence Extractive summaries Combination of the
summarization abstraction, Text Rank length, title similarity, are not convenient, preprocessing and
techniques, text mining, Algorithm, TF and IDF, semantic similarities in abstractive sometimes processing techniques
OCR, K-Nearest sentences, ANN, Fuzzy not able to represent could give good model
A Survey on Automated Text Summarization System …
Author Trend of research and Technique/Methodology Dataset and features Lacuna Outcomes
language
Rathod [16] News articles Extractive, Text rank for Own collection of news Language specific File1 Score is 0.84
summarization (Marathi) sentence extraction, article, similarity-based domain dependent File 2 Score is 0.56
Graph-based ranking features Average ROUGE-2 score
model 0.70
Mohamed et al. [17] Document clustering LSA, Multiple document Own database, word Language specific Clustering of Tamil text.
(Tamil) summarization, weight, sentence feature, Gives good results to
similarity, clustering length, position, generate cohesive
centrality, proper nouns summaries
Dalwadi, et al. [18] Text summarization Extractive and Various own designed Current systems not so Study concludes most
using fuzzy logic and abstractive document dataset efficient to produce researchers used
LSA summarization summary rule-base approaches
techniques
Kanitha et al. [19] Comparison of extractive Word and phrase Own datasets, sentence Comparin manual Domain independent
text summarization frequency algorithm, ranking methods summary with generic summary. LSA
models Machine learning, machine sumary not based systems
HMM, Cluster-based appropriatt summarize the large
algorithm datasets within the
limited time
Gaikwad et al. [20] Text summarization Abstractive as well Researcher used own Aabstraction requires Study gives all about text
overview for Indian extractive approaches datasets, news articles, more learning and summarization. with its
languages story docum-ents, reasoning importance
linguistic, statistical
features
(continued)
P. K. Vaishali et al.
Table 1 (continued)
Author Trend of research and Technique/Methodology Dataset and features Lacuna Outcomes
language
Sarda et al. [21] Text summarization Neural network model, Own document Difficulties in Neural Numerical data feature
using neural networks back propagation collections, sentence network training, and rhetorical structure
and Rhetorical structure technique, rhetorical ranking, clustering theory helps to select
theory structure theory highly ranked summary
sentences
Gulati et al. [22] Study of text Machine learning Most of the researchers Separation of Two main techniques
summarization techniques, text mining used their own collection important contents extraction and
techniques algorithms and semantic of text corpus as from text is difficult abstraction studied for
technologies database text summarization
Ragunath et al. [23] Ontology based Concept extraction Own database collection, Genre specific Accuracy is calculated at
document summarization algorithm, ontology Domain specific features 87.03%
model
Deshmukh et al. [24] Query Dependent Multi document, using News document dataset, Issues regarding Study gives all detail on
Multi-Document Feature based and Clustering K-means, i.e., limitations of feature multi-document
A Survey on Automated Text Summarization System …
Author Trend of research and Technique/Methodology Dataset and features Lacuna Outcomes
language
Babar et al. [26] Text summarization Extractive Own database, direct The focus of this paper Precision of fuzzy based
using Fuzzy Logic and summarization Feature word matching and is narrow summary is 86.91%
LSA vector algorithm, Fuzzy, sentence feature feature average recall is 41.64%
Inference model score average f-measure is
64.66%
Gupta [27] Survey on summarizer Weight learning Topic identification, Lack of techniques of Study observed research
for Indian languages algorithm and regression, statistical and language text Summarization on summarization is at
(Punjabi) dependent features initial state for Indian
languages
Deshpande et al. [28] Text summarization Extractive, Own collection sentence Lack of simplification Result compared using
using Clustering Multi-document scoring, document on technique for large precision, recall and
summarization, clustering by cosine and complex sentences F-measure. Clustering
document, sentence similarity reduces redundancy
clustering by K-means
Dhanya et al. [29] Comparison of text Extractive, Tf-Idf, Own collection of Same set of sentences Feature selection is
summarization technique sentence scoring, documents, LSW in English are used for important in summary
for eight different graph-based sentence similarity weight, comparing all the generation
languages weights sentence score, features methods
Dixit et al. [30] Automatic text Feature based extraction 30 documents from news The system is tested 81% resemblance with
summarization using of important sentences based URLs. compared only with 30 news human summ-ary. And
fuzzy logic using fuzzy logic, with Copernic and MS document similarity in sentence
sentence scoring, fuzzy Word 2007 summarizer position has got 79%
inference rule resemb-lance
(continued)
P. K. Vaishali et al.
Table 1 (continued)
Author Trend of research and Technique/Methodology Dataset and features Lacuna Outcomes
language
Prasad et al. [31] Feature based text Extraction, sentence Own collected dataset, Limited dataset or Module with 9 and 5
summarization scoring, fuzzy algorithm, utilizes a combination of documents features has better
feature decision module nine features to achieve accuracy for precision,
feature scores of each recall and f-measure as
sentence compared to MS—Word
Jayashree et al. [32] Text summarization Extractive, key word Database obtained from Requirement of human Machine summary
using sentence ranking based summary Kannada Webdunia summary from expert compared with Human
(Kannada) news articles. GSS summary average at
coefficients and IDF, TF 0.14%, 0.11%, and
for extracting key words 0.12% for sports,
Entertainment, Religious
article respectively
Siva Kumar et al. [33] Query-based summarizer Multi-document, Newswire articles from Need of simplification Summary can be
topic-driven summarizer AQUAINT-2 IR Text techniques for very Evaluated using N-gram
A Survey on Automated Text Summarization System …
and applications. The below table shows the classification of summarization systems
by their categories (Table 3).
Text summarizers can identify and extract key sentences from the source and group
them properly to generate a concise summary. A list of features required to select for
analysis and for better understanding of the theme. Some of the features are given
in below table that used for selection of important content from the text on which
meaning depends (Table 4).
A Survey on Automated Text Summarization System … 931
Table 3 (continued)
Content Type Scope of the summary
Input Single document It involves summarization of single
document
Multi-document Several documents are used to summarize
at a time
Language Mono-lingual Input documents only with specific
language and output is also based on that
language
Multi-lingual It accepts documents as an input with
different languages and generate summary
in different languages
There are different types of methods implemented for the summarizer systems that
are capable of identifying and extracting the important sentences from the source
text and grouping them to generate the final summary. Tables 6 and 7 provide the
important metods of extractive and abstractive summarization, respectively.
3 Dataset
From the early days of summarization, most of the work has been done in English.
There are a number of standard datasets available in English for research like
DUC (Data Understanding Conference), CL-SciSumm (Computational Linguis-
tics scientific document summarizer), TAC Text Analysis Conference, TISPER text
summarization evaluation conference (SUMMAC). These datasets are used to test
the language and performing experimental researches. But in the case of Indian
languages, there are no proper datasets available for the researchers. Most of the
data are collected from newspapers, medical documents another source is by own
collected dataset in respective languages. On the basis of the specifications outlined
in the system, the corpus was designed by the researchers.
A Survey on Automated Text Summarization System … 933
4 Proposed Methodology
Through the literature review, it is observed that there are various types of method-
ologies useful or followed for the development of the text summarization systems.
Figure 1 shows general architectural view for text summarization.
Most of the system generally follows some important steps to achieve the target
summary after selection of textual contents.
Table 7 (continued)
Techniques Description Features Advantages
Multimodal semantic It is the semantic Semantic features, Represent the contents
model model it captures useful to generate an of multimodal
concepts or their abstract summary documents
relationship
Semantic graph based It summarizes a Semantic features Used for single
method document by creating document
a rich semantic graph summarization
(RSG)
Query-based It generate summary Sentence scoring Useful to generate
of text based on the based on the precise summary
query frequency counts
Preprocessing Processing
Feature Extraction
Special characters removal
Sentence Tokenization
Compute similarity and
rank sentences
Word Tokenization
Generate Extract or
Abstract Summary
POS Tagging
(b) Preprocessing—In this, some basic operations are performed for normal-
ization of the text. It is an important for selection of text with a particular
script.
(c) Processing—In this, text is processed for selection of the text and extraction
of main important sentences using the features.
(d) Theme Identification—In this, the most important information in text is
identified. Techniques such as word position, sentence length, cue phrases,
etc.
938 P. K. Vaishali et al.
For our literature studies, we used the last 10 years’ research papers and their trends.
From the study, it is observed that extraction methods are mostly used for summariza-
tion. Extraction is easier than abstraction. The results are good for extractive systems.
Today, extractive systems have good scope in the industry. Abstraction systems are
difficult to implement. The most useful features for generating a summary are word
frequency, word length, sentence scoring, sentence position, keywords and phrases,
semantics, and linguistic or structural features. Abstractive summaries are sometimes
not clear enough to express the meaning and are a challenging task for development.
Based on the reserach study of past literature from 2008 to 2021, it is observed that.
There are various challenges in the development of automated text summarization.
The problems have been the challenges for present technologies to resolve.
• It is a difficult task to summarize the original content by selection of significant
contents from the other text.
• No standard metric available for evaluation of the summary.
A Survey on Automated Text Summarization System … 939
6 Conclusion
Acknowledgements Authors would like to acknowledge and thanks to CSRI DST Major
Project sanctioned No.SR/CSRI/71/2015 (G), Computational and Psycholinguistics Research Lab
Facility supporting to this work and Department of Computer Science and Information Technology,
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India. Also thankful
to the SARATHI organization for providing financial assistant as a Ph. D. research fellow. I would
like to express my sincere thanks to my research guide Dr. C. Namrata Mahender (Asst. Professor)
of the Computer Science and IT Department, Dr. B.A.M.U, Aurangabad. For providing research
facilities, constant technical and moral support.
940 P. K. Vaishali et al.
References
1. Sindhu CS (2014) A survey on automatic text summarization. Int J Comput Sci Inf Technol
5(6)
2. Reeve Lawrence H, Hyoil H, Nagori Saya V, Yang Jonathan C, Schwimmer Tamara A, Brooks
Ari D (2006) Concept frequency distribution in biomedical text summarization. In: ACM 15th
conference on information and knowledge management (CIKM), Arlington, VA, USA
3. Mashape.com/list-of-30-summarizer-apis-libraries-and-software
4. Atif K, Naomie S (2014) A review on abstractive summarization methods. J Theor Appl Inf
Technol 59(1)
5. Manne S, Mohd ZPS, Fatima SS (2012) Extraction based automatic text summarization system
with HMM tagger. In: Proceedings of the international conference on information systems
design and intelligent applications, vol 132, pp 421–428
6. Sarwadnya VV, Sonawane SS (2018) Marathi extractive text summarization using graph
based model. In: Fourth international conference on computing communication control and
automation. (ICCUBEA). 978-1-5386-5257-2-/18/$31.00 IEEE
7. Sivaganesan D (2021) Novel influence maximization algorithm for social network behavior
management. J ISMAC 03(1):60–68. http://irojournals.com/iroismac/. https://doi.org/10.
36548/jismac.2021.1.006
8. Valanarasu R (2021) Comparative analysis for personality prediction by digital footprints in
social media. J Inf Technol Digital World 03(02):77–91. https://www.irojournals.com/itdw/.
https://doi.org/10.36548/jitdw. 2021.2.002
9. Sinha S, Jha GN (2020) Abstractive text summarization for Sanskrit prose: a study of
methods and approaches. In: Proceedings of the WILDRE5–5th workshop on Indian language
data: resources and evaluation, language resources and evaluation conference (LREC 2020),
Marseille, 11–16 May 2020 European Language Resources Association (ELRA), licensed
under CC-BY-NC, pp 60–65
10. Malagi SS, Rachana, Ashoka DV (2020) An overview of automatic text summarization tech-
niques. In: International journal of engineering research and technology (IJERT), Published
by, www.ijert.org NCAIT—2020 Conference proceedings, vol 8(15)
11. Manju K, David Peter S, Idicula SM (2021) A framework for generating extractive
summary from multiple Malayalam documents. Information 12:41. https://doi.org/10.3390/
info12010041 https://www.mdpi.com/journal/information
12. Verma P, Verma A (2020) Accountability of NLP tools in text summarization for Indian
languages. J Sci Res 64(1)
13. Mamidala KK, Sanampudi SK (2021) Text summarization for Indian languages: a survey. Int J
Adv Res Eng Technol (IJARET), 12(1):530–538. Article ID: IJARET_12_01_049, ISSN Print:
0976-6480 and ISSN Online: 0976-6499
14. Sarker A (2020) A light-weight text summarization system for fast access to medical evidence.
https://www.frontiersin.org/journals/digital-health
15. Bhosale S, Joshi D, Bhise V, Deshmukh RA (2018) Marathi e-newspaper text summarization
using automatic keyword extraction. Int J Adv Eng Res Dev 5(03)
16. Rathod YV (2018) Extractive text summarization of Marathi news articles. Int Res J Eng
Technol (IRJET) 05(07), e-ISSN: 2395-0056
17. Mohamed SS, Hariharan S (2018) Experiments on document clustering in Tamil language.
ARPN J Eng Appl Sci 13(10), ISSN 1819-6608
18. Dalwadi B, Patel N, Suthar S (2017) A review paper on text summarization for Indian languages.
IJSRD Int J Sci Res Dev 5(07), ISSN (online): 2321
19. Kanitha DK, Muhammad Noorul Mubarak D (2016) An overview of extractive based automatic
text summarization systems. AIRCC’s Int J Comput Sci Inf Technol 8(5). http://www.i-scholar.
in/index.php/IJCSIT/issue/view/12602
20. Gaikwad DK, Mahender CN (2016) A review paper on text summarization. Int J Adv Res
Comput Commun Eng 5(3)
A Survey on Automated Text Summarization System … 941
21. Sarda AT, Kulkarni AR (2015) Text summarization using neural networks and rhetorical
structure theory. Int J Adv Res Comput Commun Eng 4(6)
22. Gulati AN, Sarkar SD (2015) A pandect of different text summarization techniques. Int J Adv
Res Comput Sci Softw Eng 5(4), Apr 2015, ISSN: 2277 128X
23. Ragunath SR, Sivaranjani N (2015) Ontology based text document summarization system using
concept terms. ARPN J Eng Appl Sci 10(6), ISSN 1819-660
24. Deshmukh YS, Nikam RR, Chintamani RD, Kolhe ST, Jore SS (2014) Query dependent multi-
document summarization using feature based and cluster based method 2(10), ISSN (Online):
2347-2820
25. Patil PD, Kulkarni NJ (2014) Text summarization using fuzzy logic. Int J Innovative Res Adv
Eng (IJIRAE) 1(3), ISSN: 2278-2311 IJIRAE | http://ijirae.com © 2014, IJIRAE
26. Babar SA, Thorat SA (2014) Improving text summarization using fuzzy logic and latent
semantic analysis. Int J Innovative Res Adv Eng (IJIRAE) 1(4) (May 2014) http://ijirae.com,
ISSN: 2349-2163
27. Gupta V (2013) A survey of text summarizer for Indian languages and comparison of their
performance. J Emerg Technol Web, ojs.academypublisher.com
28. Deshpande AR, Lobo LMRJ (2013) Text summarization using clustering technique. Int J Eng
Trends Technol (IJETT) 4(8)
29. Dhanya PM, Jethavedan M (2013) Comparative study of text summarization in Indian
languages. Int J Comput Appl (0975-8887) 75(6)
30. Dixit RS, Apte SS (2012)Improvement of text summarization using fuzzy logic based method.
IOSR J Comput Eng (IOSRJCE) 5(6):05–10 (Sep-Oct 2012). www.iosrjournals.org, ISSN:
2278-0661, ISBN: 2278-8727
31. Prasad RS, Uplavikar Nitish M, Sanket W (2012) Feature based text summarization. Int J Adv
Comput Inf Res Pune. https://www.researchgate.net/publication/328176042
32. Jayashree R, Srikanta KM, Sunny K (2011) Document summarization for Kannada, soft
computing and pattern, 2011-ieeexplore.ieee.org
33. Siva Kumar AP, Premchand P, Govardhan A (2011) Query-based summarizer based on simi-
larity of sentences and word frequency. Int J Data Min Knowl Manage Process (IJDKP)
1(3)
34. Das A (2010) Opinion summarization in Bengali: a theme network model. Soc Comput (Social
Com), -ieeexplore.ieee.org
35. Agrawal SS (2008) Developing of resources and techniques for processing of some Indian
languages
36. Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. Int J Emerg
Technol Web Intell 2:258–268
A Dynamic Packet Scheduling Algorithm
Based on Active Flows for Enhancing
the Performance of Internet Traffic
Abstract The Internet is being a large-scale network, the packet scheduling scheme
must be highly scalable. This work is to develop a new packet scheduling mechanism
to enhance the performance of today’s Internet communication. Per-flow control
technique has a scalability challenge because of the vast number of flows in a large
network. The proposed method G-DQS is based on aggregated flow scheduling and
can be used to manage huge networks. Packets are divided into two categories in
this study: short TCP flows and long TCP flows. A scheduling ratio is determined
based on edge-to-edge bandwidth and the maximum number of flows that can be
accepted in the path. This ratio varies dynamically and minimizes the packet drop
for short flows, also the long flows are not starved. This is required for today’s Internet
communication as the recent Internet traffic shows huge short flows. The simulation
results show that the suggested technique outperforms the other algorithms that use
a constant packet scheduling ratio, such as RuN2C and DRR-SFF.
1 Introduction
The Internet has been transformed into the world’s greatest public network as a
result of the Web. The Web has acted as a platform for delivering innovative appli-
cations in the domains of education, business, entertainment, and medicine in recent
years. Banking and multimedia teleconferencing are just two examples of business
applications.
The practice of storing multimedia data on servers and allowing users to access
it via the Internet has become increasingly common. Other applications include
distance education provided by colleges via video servers and interactive games that
are revolutionizing the entertainment sector. The quality of Internet communication
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 943
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_68
944 Y. Suresh et al.
will have a big impact on the utility value of these apps. The network should be able
to handle large volumes of this type of traffic in a scalable. The network must be able
to meet quality-of-service (QoS) standards to enable this service.
2 Literature Review
Today’s Internet is still based on best-effort service model [1]. It represents a service
that all data packets are treated equally, and the network tries to ensure reliable
delivery. The design of such model is very simple and easily scalable, but it does not
ensure the delivery of traffic flows from end to end. In this model, during the conges-
tion of network, the FIFO drops the packets regardless its priority and transmission
control protocol (TCP) assures the retransmission for the dropped packets.
The work done in [2, 3] shows that Internet traffic characteristics have heavy-
tailed property. Several academics have attempted to increase system performance
by utilizing the considerable unpredictability of Internet traffic. The research includes
high-speed switching [4], dynamic routing [5], and scheduling [6]. The short flow
and long-flow concept have been applied to the data centers [7]. Determining whether
Internet traffic fits heavy-tailed distributions or not is difficult [8].
According to the research, studies applied on Internet traffic, the majority of flows
are short, with less than 5% of long flows carrying more than 50% of total bytes. The
average size of a short flow is only 10–20 packets. In [9], short flows and long flows
are referred as mice and elephants. The long flows have been observed in specifically
P2P data transfers [10] in Web servers [11].
The short flows are primarily attributable to online data transfers initiated by
interactivity [12]. The author [13] proved that the preferential treatment to the short
flows reduces Web latency. Most of the long flows originates from P2P applica-
tions. The MMPTCP transport protocol is introduced [14], which benefits short
flows by randomly scattering packets. The protocol then switches to multi-path TCP
(MPTCP), which is a very efficient mode for long flows. FDCTCP [13] dramatically
improves performance and decreases flow completion times, particularly for small
and medium-sized flows when compared to DCTCP.
As discussed above, from the recent Internet traffic measurement, it is neces-
sary to classify the Internet flows as short and long to achieve service guarantee.
The proposed scheduling algorithm applies, flow-classified service to improve the
performance of Internet.
In today’s Internet routers, a basic scheduling technique known as first in first
out (FIFO) or first come first served (FCFS) is extensively utilized. FIFO only
provides best-effort service. Because flows are not categorized, it is not suited for
delivering guaranteed service. Any special requirements of a flow, such as latency
and throughput, are not taken into account by the scheduling technique. A high-
throughput data transfer in FIFO, for example, can starve a low-throughput real-time
link like voice traffic.
A Dynamic Packet Scheduling Algorithm Based on Active Flows … 945
Existing algorithms such as weighted fair queuing (WFQ), deficit round robin
(DRR), deficit round robin-short flow first (DRR-SFF), and least attained service
(LAS), and execute per-flow scheduling that involves a complicated mechanism for
flow identification as well as flow state maintenance. It is impracticable to keep all
of the flow state in routers with the significant expansion in Internet communication.
The huge advantage of the proposed approach is that no need to maintain flow state
in the routers.
RuN2C scheduling technique [15], in which packets with a low running number
(class-1) are placed in one queue, whereas packets with a high running number (class-
2) are placed in another. The class-2 packets get chance only if the class-1 packets
are completely scheduled. This creates starvation for the class-2 packets.
LAS [16] is used in packet networks to intercommunicate effectively with TCP
to support short flows. This is accomplished by placing the first packet of a newly
arriving flow with the least amount of service at the top of the queue. LAS prioritizes
this packet and reduces the round trip time (RTT) of a slow-starting flow. Short-flow
transfer times are reduced as a result.
The author [17] compared the performance of round robin-based scheduling algo-
rithms. The various attributes of network performance of WRR/SB is compared with
WRR and PWRR. WRR/SB, the WRR/SB outperforms better than other algorithms.
The author [18] proposed that the network time delay might be reduced by using
the neural network approach. This is accomplished by collecting weights that influ-
ence network speed, such as the number of nodes in the path and congestion on each
path. The study demonstrates that an efficient technique that can assist in determining
the shortest path can be used to improve existing methods that use weights.
3 Calculation of Fmax
Maximum active flows that can be allowed on the edge-to-edge network path is
represented by Fmax. In [19], a method described to determine flow completion
time (FCT). For a packet flow of size S f , the FCT is determined using Eq. (1).
Sf
FCT = 1.5 × RTT + log2 × RTT (1)
MSS
where
MSS Maximum segment size
RTT Round trip time
Relating S f and FCT, the throughput T f and Fmax are determined based on
network bandwidth using Eqs. (2) and (3)
S f × (MSS + H )
Tf = (2)
FCT × MSS
946 Y. Suresh et al.
F Ai (dk )
BWPr
Fmax = ⊗ (3)
Tf
Fmax is related with the scheduling ratio in the proposed algorithm which is
detailed in Sect. 4 for effective scheduling of packets.
4 Proposed Algorithm
The proposed algorithm captures short flows using threshold th and considers the
remaining flows as long flows. They are inserted in two queues: SFQ and LFQ. The
total number of flows in SFQ is used to initialize the variable counter DC(r). Using
the number of flows in SFQ and LFQ, the algorithm derives the dynamic scheduling
ratio Q(r). It also determines the maximum flows that will be available on the path
using Fmax, which is connected to Q(r). To schedule the flows from SFQ and LFQ,
the conditions Fmax > Q(r) and Fmax < Q(r) are tested.
Algorithm
Begin
Using a th threshold, divide flows into short and long flows, and place
them in two queues, namely SFQ and LFQ, respectively.
S:
n
• Total flows in SFQ = i=1
n BSFQ (i)
• Total flows in LFQ = i=1 BLFQ (i)
• Variable counter initialization DC = i=1 n
BSFQ (i)
D:
n n
BSFQ (i)+ i=1 BLFQ (i)
• Scheduling ratio Q(r) for any round r Q(r ) = i=1 n
i=1 LFQ (i)
B
• Estimate Fmax using Eq. (3).
If Fmax > Q(r).
• Flows served in SFQ = Q(r)
• Flows served in LFQ = 1.
• Perform DC(r) = DC(r) − Q(r)
• If DC(r) > Q(r) then return to D: else return to S: for the calculation of
Q(r) and
nFmax for the next round.
• When i=1 BSFQ (i) = 0 then flows served in LFQ = Q(r)
If Fmax < Q(r)
A Dynamic Packet Scheduling Algorithm Based on Active Flows … 947
When Fmax is greater than the flows to be scheduled, the algorithm schedules both
short and long flows in SFQ and LFQ. When Fmax is limited, it prioritizes only
short flows. The proposed algorithm works in accordance with the characteristics of
Internet as Internet traffic exhibits short flows in vast manner. This method provides
a more reliable service than the best-effort method utilized in the Internet.
5 Performance Analysis
In Fig. 1, R0 and R1 are the edge routers. S1–S5 are source, and C1–C5 are sink
nodes which transmits and receives the packets. In our simulation, the packet size is
set to 500 bytes, and short flows are considered as packets with the size of 1 to 20,
and long flows are considered as packets with the size of 1–500. Transmission time,
packet drop, and throughput parameters are analyzed here.
The transmission time of short flows is depicted in Fig. 2. In comparison with FIFO,
the proposed algorithm G-DQS and the other algorithm RuN2C greatly lower the
transmission time of short flows. A single queue discipline method FIFO does not
make any distinction between the two flows which increases the transmission time
largely.
In FIFO, long flows can obtain priority over short flows, increasing transmission
time as shown in Fig. 2. The proposed method is giving short flows preference over
long flows and reduces 30.7% of mean transmission time for the short flows in
comparison with FIFO.
Transmission Time (sec)
Flowsize (packets)
Figures 3 and 4 indicate packet drop for various flow size. It demonstrates that short
flows of less than 25 packets do not incur packet loss when using the proposed tech-
nique, although FIFO flows of the same size do. Packet loss for short flows is lower in
Number of packets dropped
Fig. 4 Packet drop versus flow size (Zoom version of image of Fig. 3)
950 Y. Suresh et al.
Throughput (packets/sec)
the proposed approach than in RuN2C due to dynamic approach in scheduling ratio.
The proposed algorithm schedules packets from both SFQ and LFQ, whereas the
Ru2NC technique schedules long flows only if short flows are completely supplied.
As a result, in addition to small flows, long flows are serviced in G-DQS.
5.3 Throughput
Figure 5 depicts flow throughput as a number of packets received per second. During
the simulation time, it has been observed that the FIFO’s throughput drops abruptly.
Long flows are penalized and starved as a result of this in FIFO. It also shows that
the FIFO and RuN2C throughputs are not constant across the simulated duration.
The proposed G-DQS has a nearly constant throughput and guarantees it.
6 Conclusion
The proposed algorithm reduces transmission time of all flows in comparison with
other protocols. Transmission time of short flows has been analyzed and found the
no packet loss till th. The proposed algorithm reduces packet loss as it schedules
the flows based on Fmax the path. Since it reduces the packet loss, the number of
retransmission decreases and results in reduction of mean transmission time. The
throughput analysis has also been made and results show that G-DQS maintains
almost constant throughput performs better than other protocols.
A Dynamic Packet Scheduling Algorithm Based on Active Flows … 951
References
Abstract Automated short answer grading (ASAG) of free text responses is a field of
study, wherein student’s answer is evaluated considering baseline concepts required
by question. It mainly concentrates on evaluating the content written by student, more
than its grammatical form. In educational domain, assessment is indeed a tedious
and time-consuming task. If in anyway the time utilized for this task is, then the
instructor can focus more on teaching and learning activity and can help students in
their overall growth. Many researchers are working in this field to provide a solution
that can assign more accurate score to student response which are similar to the score
assigned by human tutor. The goal of this paper is to provide insight in the field of
ASAG domain by presenting concise review of existing ASAG research work. We
have included the research work carried out using machine learning and deep learning
approaches. We have also proposed our methodology to address this problem.
1 Introduction
Research in the field of natural language processing (NLP), machine learning and
deep learning has opened doors for providing solution to complex problems. One
such complex problem is automated short answer grading (ASAG). It is the field
wherein students’ short answer which comprises of few sentences or one paragraph
is evaluated and scores are assigned which are close enough to grades assigned by
human evaluator.
In education domain along with teaching & learning, evaluation is one of the
important task. Evaluation helps to assess student understanding about the course
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 953
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_69
954 S. Patil and K. P. Adhiya
being taught. Evaluation mainly comprises of multiple choice, true or false, fill in
the blanks, short answer, and essay type questions [1]. We are interested in assessing
short answer which comprises of 2–3 sentences or few paragraphs & closed ended,
as it helps to analyze overall understanding related to the course. In evaluating, short
closed-ended answers, students are usually expected to concentrate on including
specific concepts related to question, which ultimately help student to get good score.
It will also help to reduce the amount of time devoted in checking student answers
and will provide them with immediate detailed feedback which finally assist them
in their overall growth.Even the system will develop an unbiased score.
Example 1: What is Stack?
Model Answer: Stack is a linear data structure which has PUSH and POP opera-
tions for inserting and deleting an element in an array, respectively.
Student Response: Stack is a data structure where elements are arranged sequentially
and which has majorly 2 operations PUSH for inserting element and POP for deleting
an element from an array.
In the example 1 shown above, the words underlined are important concepts which
are essential to be included in student response to get evaluated. But many a times, it
may happen that student can represent the same concepts with the help of synonyms,
paraphrases, or polysemous words. So the system should be developed such that it
can recognize all such surface representations and assess student responses correctly.
Our main motivation to take this task for research are;
• To evaluate contents rather concentrating on grammar and style.
• Unbiased Evaluation.
• Save instructors time of evaluation and utilize the same for overall progress of
students.
• To provide immediate detail feedback to students, which will help them in their
future progress.
The study conducted so far has clearly showed that problem of ASAG can be
solved by three state-of-the-art methodologies such as rule-based approach, machine
learning, and deep learning [1, 2]. We have majorly studied, analyzed and reported
machine learning and deep learning approaches in this paper.
The remainder section of this article comprises of first we have reviewed the
existing ASAG systems. Then, we illustrated the general approach of ASAG and our
proposed methodology. At last, we present discussion and conclusion.
Automated Evaluation of Short … 955
2 Related Study
Many researchers have gone through the problem of automated short answer grad-
ing and provided the solution for the same by applying various feature extraction
techniques, machine learning, and deep learning approaches. In this study, we have
studied automated grading via machine learning and deep learning.
Author [3] proposed a method in which the feature vector for student answer was
generated by utilizing the word Part of speech (POS) tag acquired from Penn tree
bank, along with POS tag of preceding and next word. Also term frequency-inverse
document frequency (TF-IDF) value and entrophy are included in feature vector.
Finally, using SVM classifier, the students answer were labeled as +1 and −1. Author
has shown the average precision rate of the proposed model to be 68%.
Kumar et al. [4] proposed AutoSAS system which incorporates various feature
vector generation techniques such as Lexical diversity, word2vec, prompt, and con-
tent overlap.The authors have later trained all students answer using the features
described above and later used random forest for regression analysis. They have
employed quadratic weighted kappa for calculating the level of agreement between
their proposed model and human annotated scores which comes to be 0.79. The
authors have tested their proposed model on universal available dataset ASAP-SAS.
Galhardi et al. [5] has presented an approach for well-known ASAG dataset Beetle
and SciEnts Bank which consist of electricity & electronics and physics, life, earth,
and space science questions, respectively. They have utilized the best of distinct fea-
tures such as text statistics (SA and question length ratio, count of words, avg. word
length, and avg. words per sentence), lexical similarity (token based, edit based,
sequence based and compression based), semantic similarity (LC, Lin, Resnik, Wu
& Palmer, Jiang and Conrath and shortest path), and bag of n-grams. Once the fea-
tures were generated, they experimented with random forest and extreme gradient
boosting classifiers. System was evaluated using macro-averaged F1-score, weighted
average F1-score, and accuracy. They reported the overall accuracy between 0.049
and 0.083.
In [6], automated assessment system proposed for engineering assignments which
majorly comprises of textual and mathematical data. They have used tf-idf method
for extracting feature from textual data by performing initial preprocessing such as
stop word removal, case folding, and stemming. Later they utilized support vector
machine (SVM) technique for assigning score to student answers. They have shown
the accuracy of 84% for textual data.
956 S. Patil and K. P. Adhiya
Following we have presented various existing ASAG systems via deep learning:
Zhang et al. [7] has developed word embedding through domain general (through
Wikipedia) and domain-specific (through student responses) information by
CBOW.Later, student responses are evaluated using LSTM classifier.
Ichida et al. [8] deployed a measure to compute semantic similarity between
the sentences using Siamese neural network which uses two symmetric recurrent
network, i.e., GRU due to its capability to handle the issue of vanishing/ exploding
gradient problem. They have also studied LSTM and proved how their approach of
GRU is superior to LSTM as it has very fewer parameters to train. The system can be
improved by utilizing sentence embedding instead of word embedding. They showed
Pearson correlation to be 0.844 which is far better than baseline approaches studied
by author.
Kumar et al. [9] proposed model which comprises of cascading of three neural
building blocks: Siamese bidirectional LSTM unit which is applied for both student
and model answer, later a pooling layer based on earth mover distance applied over
LSTM network and finally a regression layer which comprises of support vector ordi-
nal regression for predicting scores.The evaluation of LSTM-Earth Mover Distance
with support vector ordinal regression has shown 0.83 RMSE score which is better
than scores generated by softmax.
Kwong et al. [10] Author has ALSS, which checks content, grammar, and deliv-
ery of speech using three Bi-LSTM. The end-to-end attention of context is learned
through MemN2N network. System can be enhanced by utilization of GAN.
Whatever systems we studied so far have used word embedding technique, but it
has the limitation of context. So, [11, 12] both approaches have utilized sentence
embedding techniques skip thought and sentence BERT, respectively. Author in [11]
deployed a model, wherein vectors for both student and model answer are generated
using skip thoughts sentence embedding techniques. Later, component-wise product
and absolute difference of both vectors is computed. To predict the final score logistic
linear classifier is utilized. Wherein, [12] proposed a method that provided automated
scoring of short answers by utilizing SBERT language model. The model performs
search through all reference answers provided to the model during training and
determine the more semantically closer answer and provide the rating. Limitation:
False negative scores are also generated which need to be checked manually which
is very tedious job.
Hassan et al. [13] employed paragraph embedding using two approaches: (1) Sum
of pretrained word vector model such as word2vec, glove, Elmo, and Fasttext (2)
utilizing pretrained deep learning model for computing paragraph embedding such
as skip-thought, doc2vec, Infersent for both student and reference answer. Once the
vectors are generated, author used cosine similarity metrics for computing the sim-
ilarity between both vectors. Yang et al. [14] utilized deep encoder model which
has two encoding and decoding layers. Wherein in encoding layer, student answers
Automated Evaluation of Short … 957
are represented in lower dimensions, and later, their labels information is encoded
using softmax regression. While in decoding layer, the output of encoding are recon-
structed.
Gong and Yao [15], Riordan et al. [16] utilized bidirectional LSTM with attention
mechanism. In [15], initially word embeddings are fed to CNN to extract the relevant
features which are later given as an input to LSTM layer. The hidden layers of LSTM
are aggregated in either mean over time or attention layer which gives single vector.
That single vector is passed through a fully connected layer to compute scalar value
or a label, while in [16] student response and reference answers are segmented into
sentences which in turn are tokenized. Each feature is fed into bidirectional RNN
network to generate sentence vectors. On top of it, attention mechanism is applied on
each sentence vector and final answer vectors are generated. At last,the answer vector
is passed through logistic regression function to predict the scores. Tan et al.[17]
have introduced extremely new approach for ASAG by utilizing graph convulational
network (GAN). The author has deployed a three-step process: (1) Graph building:
Undirected heterogeneous graph for sentence level nodes and word bi-gram level
nodes are constructed with edges between them. (2) Graph representation: Two-layer
GCN model encode the graph structure. (3) Grade prediction.
Table 1 gives a summary of ASAG systems studied by us.
3 Methodology
Architecture employed by most of the ASAG systems studied so far is shown (see
Fig. 1). It majorly comprises of four modules: Preprocessing, feature engineering,
model building, and evaluation.
3.1.1 Preprocessing
Though it is not a compulsory phase, but still some sought of preprocessing such
as stop word removal, case folding, stemming/lemmatization are employed in many
works to extract content rich text for generating vectors.
the study carried out so far. Some of them are TF-IDF, n-gram, word embedding [8],
and sentence embedding [11, 12].
3.1.4 Evaluation
Our major intention to carry out this work is to recognize the level of semantic
similarity between student answer and model answer. As per the study conducted,
there are many ways through which semantic equivalence between terms can be
recognized such as TF-IDF, LSA, and embedding. Many research work studied
related to ASAG utilized word embedding-based feature for recognizing the semantic
similarity between terms, but the corpus on which word embedding are trained is
usually model answer as well as student collected responses, which many a times are
limited in context. So, domain-specific and domain general corpora can be utilized
to train word embedding. Even utilization of sentence embedding technique can
overcome the problem of understanding context and intention in the entire text.
The proposed methodology concentrates on utilizing the word2vec skip gram
model for feature extraction and 2-Siamese Bi-LSTM with attention mechanism for
predicting scores for students answers. The work will be limited to data structure
course of undergraduate program of engineering. Instead of using the pretrained
word embedding, we are going to generate domain-specific word vectors by utilizing
the knowledge available for data structures. Later, once the domain-specific word
embedding is generated, word vectors for concepts utilized by reference answer and
students answer will be extracted from those embedding, and we will fed the same
to Siamese networks to predict the similarity between SA and RA.
For the purpose of this research, we have created our own dataset by conducting
two assignments on class of undergraduate students wherein ten general questions
related to data structures are asked in assignment-1 to more than 200 students wherein
students were expected to attempt the questions in 3–4 sentences and in assignment-2
which has four application-oriented open-ended questions on real-world situation,
students were asked to answer the same using more than 150 words. Total of about
1820 answers are collected so far, even we have graded the acquired answer through
human evaluator for checking the reliability of scores predicted by model in near
future.
Table 2 shows sample questions asked in assignment-1 and assignment-2 and
sample student answers collected.
The main objective behind carrying out this work was to study, analyze, and present
the development going on in the field of automated short answer grading which is
related to machine learning and deep learning approaches.
We discovered that many researchers utilized only reference answers (RA) pro-
vided by instructor for rating student answer (SA). So there is a chance that some
of the concepts which are left out or presented in different way in RA and SA may
lead to incorrect label/score assignment. Also generation of word vectors contribute
Automated Evaluation of Short … 961
References
1. Galhardi LB, Brancher JD (2018) Machine learning approach for automatic short answer grad-
ing: a systematic review. In: Simari GR, Fermé E, Gutiérrez Segura F, Rodríguez Melquiades
JA (eds) IBERAMIA 2018. LNCS (LNAI), vol 11238. Springer, Cham, pp 380–391. https://
doi.org/10.10007/1234567890
2. Burrows S, Gurevych I, Stein B (2014) The eras and trends of automatic short answer grading.
Int J Artif Intell Educ 25(1):60–117. https://doi.org/10.1007/s40593-014-0026-8
3. Hou WJ, Tsao JH, Li SY, Chen L (2010) Automatic assessment of students’ free-text answers
with support vector machines. In: García-Pedrajas N, Herrera F, Fyfe C, Benítez JM, Ali M
(eds) Trends in applied intelligent systems. IEA/AIE 2010. Lecture Notes in Computer Science,
vol 6096. Springer, Berlin, Heidelberg
4. Kumar Y, Aggarwal S, Mahata D, Shah R, Kumaraguru P, Zimmermann R (2019) Get IT scored
using AutoSAS—an automated system for scoring short answers, AAAI
5. Galhardi LB, de Mattos Senefonte HC, de Souza RC, Brancher JD (2018) Exploring distinct
features for automatic short answer grading. In: Proceedings of the 15th national meeting on
artificial and computational intelligence. SBC, São Paulo, pp 1–12
6. Quah JT, Lim L, Budi H, Lua K (2009) Towards automated assessment of engineering assign-
ments. In: Proceedings of international joint conference on neural networks, pp 2588–2595.
https://doi.org/10.1109/IJCNN.2009.5178782
7. Zhang L, Huang Y, Yang X, Yu S, Zhuang F (2019) An automatic short-answer grading model
for semi-open-ended questions. Interact Learn Environ, pp 1–14
8. Ichida AY, Meneguzzi F, Ruiz DD (2018) Measuring semantic similarity between sentences
using a siamese neural network. In: International joint conference on neural networks (IJCNN),
pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489433
9. Kumar S, Chakrabarti S, Roy S (2017) Earth mover’s distance pooling over Siamese LSTMs
for automatic short answer grading. In: Proceedings of the twenty-sixth international joint
conference on artificial intelligence, pp 2046–2052. https://doi.org/10.24963/ijcai.2017/284
10. Kwong A, Muzamal JH, Khan UG (2019) Automated language scoring system by employing
neural network approaches. In: 15th International conference on emerging technologies (ICET),
pp 1–6. https://doi.org/10.1109/ICET48972.2019.8994673
11. Gomaa WH, Fahmy AA (2019) Ans2vec: a scoring system for short answers. In: Hassanien
A, Azar A, Gaber T, Bhatnagar R, Tolba MF (eds) The international conference on advanced
machine learning technologies and applications (AMLTA2019). AMLTA 2019. Advances in
intelligent systems and computing, vol 921. Springer, Cham
Automated Evaluation of Short … 963
12. Ndukwe IG, Amadi CE, Nkomo LM, Daniel BK (2020) Automatic grading system using
sentence-BERT network. In: Bittencourt I, Cukurova M, Muldner K, Luckin R, Millán E (eds)
Artificial intelligence in education. AIED 2020. Lecture Notes in Computer Science, vol 12164.
Springer, Cham
13. Hassan S, Fahmy AA, El-Ramly M (2018) Automatic short answer scoring based on paragraph
embeddings. Int J Adv Comput Sci Appl (IJACSA) 9(10):397-402. https://doi.org/10.14569/
IJACSA.2018.091048
14. Yang X, Huang Y, Zhuang F, Zhang L, Yu S (2018) Automatic Chinese short answer grading
with deep autoencoder. In: Penstein Rosé C et al (eds) AIED 2018, vol 10948. LNCS (LNAI).
Springer, Cham, pp 399–404
15. Riordan B, Horbach A, Cahill A, Zesch T, Lee C (2017) Investigating neural architectures
for short answer scoring. In: Proceedings of the 12th workshop on innovative use of NLP for
building educational applications, pp 159–168. https://doi.org/10.18653/v1/W17-5017
16. Gong T, Yao X (2019) An attention-based deep model for automatic short answer score. Int J
Comput Sci Softw Eng 8(6):127–132
17. Tan H, Wang C, Duan Q, Lu Y, Zhang H, Li R (2020) Automatic short answer grading by encod-
ing student responses via a graph convolutional network. In: Interactive learning environments,
pp 1–15
18. Mohler M, Bunescu R, Mihalcea R (2011) Learning to grade short answer questions using
semantic similarity measures and dependency graph alignments. In Lin D (ed) Proceedings
of the 49th annual meeting of the association for computational linguistics: human language
technologies volume 1 of HLT ’11. Association for Computational Linguistics, Portland, pp
752–762
Interactive Agricultural Chatbot Based
on Deep Learning
1 Introduction
Agriculture plays a significant role in employing people in many parts of the world.
Agriculture is the main source of income for the majority of the population. India
is a country where 70% of people reside in rural areas, and they primarily depend
on agriculture where 82% of farmers being marginal and small. The GDP growth of
many countries is still based on agriculture [1].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 965
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_70
966 S. Suman and J. Kumar
The advancement in the field of farming is at a faster pace nowadays. But much
information is still not accessible to the farmers as it requires many steps, and it also
fails in fetching the responses to the queries.
A chatbot is conversational assistant which provides easy communication to the
users as they are conversing with the human being. The users’ requests will be
processed and interpreted, and the appropriate responses will be sent [2]. The chatbot
will extract relevant entities by identifying and interpreting the intent of a user’s
request, which is a vital task for the chatbot.
Farmers are facing a low-yield issue due to the lack of information. Many of
the agricultural-related advanced techniques are discussed in [3–5]. In the proposed
work, querying techniques that help the farmers to get agriculture information is
designed and implemented. The NLP [6] technique used to take the natural language
of humans as input. It will also help the system in interpreting the user query even if
there is an incomplete sentence or grammatical mistake.
The objectives of the project are as follows,
(i) Creating a user interface that allows people to engage successfully in order to
attain the required results in fewer steps.
(ii) Processing of the extracted data into a suitable format using machine learning
algorithms.
(iii) Respond quickly to the user query and suggest the response for it.
(iv) A system that will respond to the users in real time.
The paper is organized into the different sections as follows.
The literature review is covered in Sect. 2. Section 3 describes the system architec-
ture. In Sect. 4, the methodology is explained. The results and analysis are presented
in Sect. 5. Section 6 is the conclusion.
2 Literature Survey
This section discusses a literature review to highlight the work that has been done so
far in the field of chatbots.
Kannagi et al. [1] gives insight into the farmbot application which helps the
farmers in solving their queries related to their agricultural farmland. Farmbot uses
natural language processing (NLP) technique to identify keywords and to respond
with accurate results. NLP technique is used to interpret the natural language of
human as an input. Based on training dataset, neural network will be constructed,
and gradient descent algorithm is used in error optimization. The test dataset will go
through certain preprocessing steps, classification, and finally the construction of a
neural network. The system output will be shown in text format in the user interface,
and the text will be translated to speech using the Web Speech API. The ‘ARIMA’
prediction method used to forecast future cost of the agricultural products.
The study by Karri et al. [7] discusses the chatbot that was successful in answering
the queries. It follows two steps.
Interactive Agricultural Chatbot Based on Deep Learning 967
switch. The send message is declared throughout the socket to the joinroom feature
efficiently as soon as the initialization is completed. One hundred interactions are
mined for the trying out domain to see how the chatbot will correctly apprehend
user inquiries, and then, it is run with questions that are identifiable in the chatroom
through an agent if the chatbot and examined to see whether the chatbot will reply
incredibly or deceptively.
Arora et al. [14] portrayed that chatbot has been proposed that would help the
farmers in providing various solutions to their queries as well as help them in the
process of decision-making. Bot not only provides an answer but also answers to the
questions that have been frequently asked and emphasizes weather forecasting and
crop disease detection. Sequence-to-sequence model is used for building a conver-
sational system. It is a multilayer perceptron RNN. Also known as encoder-decoder.
It is the generative class of models. It means model will automatically grasp the data
and response will be in terms of word by word. Next step will be the creation of
model of classification. They can be constructed from beginning or transfer learning.
The created model trained for 50 epochs and batch size 20. Prediction of weather is
included as one of the features in Agribot. OpenWeatherMap is an administrator who
gives the information regarding climate, which includes current climatic condition
information. The chatbot will be able to guide farmers as in the part of detection of
disease in crops, weather prediction.
The creation of chatbots using natural language approaches which is an initiative
to annotate and observe the interaction between humans and chatbots are described
in [15]. The proposed system performs an analysis of the parameters of machine
learning that will help the farmers in increasing their yield. The analysis is done on
the rainfall, season, weather, and type of soil of particular area which is based on
the historic data. The chatbot is trained using NLP. The system helps the farmers of
remote places to understand the crop to be grown based on the atmospheric condi-
tion. K-nearest neighbors algorithm is used which stores the available cases and
also classifies based the measure of similarity. The data has been collected from
different sources of government websites and repositories. The database is trained
using machine learning using TensorFlow architecture and KNN algorithm. The NLP
is used in training and validation of the data. Once the system has gone through all the
processes of data collection, cleaning, preprocessing, training, and testing, it sends it
to the server for use. The system helps the farmers of remote places where the reach
of connectivity is less and to better understand the crop to be grown based on the
atmospheric condition and also suggest the answers to their queries.
Based on the analysis made on the literature survey, we require a chat platform
that uses the Internet facility to make the discussion process more accessible and
automated. In addition, the system should include capabilities such as real-time
outputs and a user-friendly interface for farmers. A system like this could help farmers
bridge the knowledge gap and develop a more productive market.
Interactive Agricultural Chatbot Based on Deep Learning 969
3 System Architecture
This section gives an overview of the system architecture that was employed in the
project.
The proposed model has been divided into three stages, processing of the query,
training, development of a chatbot, and retrieval of responses. The chatbot application
system architecture is shown in Fig. 1. The user enters their query as text through
the user interface. The interface receives user questions, which are then sent to
the Chatbot application. Then, the textual query in the application goes through a
preprocessing stage. During the preprocessing phase, the query sentence tokenized
into words, stopwords removed, and words are stemmed to the root words. Then,
query would be classified using a neural network classifier, with the appropriate
results being given to the user as text.
4 Methodology
The proposed methodology focuses on responding to the farmer queries, from which
they can get the benefits; it comprises of three steps:
(A) Processing of the query
(B) Training and development of a chatbot
(C) Retrieval of responses
970 S. Suman and J. Kumar
The chatbot was created based on the research and the various methodologies. After
making reasonable predictions, a Google Colaboratory setup was created to make
the chatbot interactive.
The developed chatbot can help farmers in the following areas like soil detection,
recommendation of pesticides, and details about Kisan call center as shown in Fig. 2.
Testing can be done to assess the chatbot’s quality. The procedures involved in
conducting chatbot tests are
(i) Gathering an overview of questions that can be asked and
(ii) Determining if the responses are correct or incorrect.
Table 1 gives the sample of the responses that have been retrieved from the chatbot
application in the first query; there is a spelling mistake, but still, the chatbot is
successful in rendering the answer. In the second query, the chatbot has not been
trained for the particular query, but it learned from the training of a similar type of
dataset. In the third query, the user asks questions that are not related, so the chatbot
is responding as out-of-bound questions.
The false-negative scenario is described in Table 2, where the chatbot inaccurately
predicts the response to the user query. This can be overcome by training the model
with a large number of datasets.
6 Conclusion
The need and necessity for chatbots in numerous industries are justified by the expan-
sion and popularity of chatbots. The performance of chatbots is shown to be relatively
high when compared to traditional approaches. The typical amount of time spent
interacting with a chatbot is fairly brief, and it helps farmers to get quick responses
to their questions. A chatbot has been proved to suit the needs of users by responding
quickly and offering services and information. By leveraging natural language to
answer questions about agriculture, our chatbot has benefited neglected communi-
ties. The chatbot will provide agricultural facts to the farmer. To get an answer, a
farmer can send a direct message. Our approach would allow a farmer to ask any
number of questions at any moment, which would aid in the speedier and more
widespread adoption of current farming technology. Because most farmers interact
in their native languages, future advancements are possible. As a result, there is a
need for a solution that can connect the model and their languages, as well as rainfall
prediction, production, and other aspects estimation.
7 Future Scope
Farmers can ask their questions verbally and receive answers from the bot using the
speech recognition capability. Because most farmers interact in their native languages
as a result, there is a need for a solution that can connect the model and their languages.
The weather prediction module can be added to accesses the location and suggest
the crops based on that. To support farmers, integration with different channels such
as phone calls, SMS, and various social media platforms can be used.
References
Abstract Crowd counting is one of the main concerns of crowd analysis. Estimating
density map and crowd count in crowd videos and images has a large application
area such as traffic monitoring, surveillance, crowd anomalies, congestion, public
safety, urbanization, planning and development, etc. There are many difficulties in
crowd counting, such as occlusion, inter and intra scene deviations in perception and
size. Nonetheless, in recent years, crowd count analysis has improved from previous
approaches typically restricted to minor changes in crowd density and move up to
recent state-of-the-art systems, which can successfully perform in a broad variety
of circumstances. The recent success of crowd counting methods can be credited
mostly to the deep learning and different datasets published. In this paper, a CNN-
based technique named You Only Look Once (YOLO), and its various versions have
been studied, and its latest version, YOLOv5, is analyzed in the crowd counting
application. This technique is studied on three benchmark datasets with different
crowd densities. It is being observed that YOLOv5 gives favorable results in crowd
counting applications with density ranges from low to medium but not in a very
dense crowd.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 975
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_71
976 Ruchika et al.
etc. [1]. Crowd counting has remained a persistent challenge in the computer vision
and machine learning fields due to perspective distortions, severe occlusions, diverse
densities, and other issues. Existing research has primarily focused on crowds of
consistent density, i.e., sparse or dense. In the real world, however, an image may
include uneven densities due to camera perspective and changing distribution of
people in the crowd. As a result, accurately counting individuals in crowds need to
focus on all density levels.
Crowd counting can be done in number of ways. The fundamental approach is to
count manually, but it is infeasible to do in moderate and highly congested scenes.
Another approach is to enumerate the strength of humans in a video frame, further
extrapolating it to the whole frame to estimate the total strength. Since no such
algorithms give a precise count, but computer vision techniques can produce notably
accurate estimates. Broadly used five methods of crowd counting are given below
[2]:
• Detection-based methods: Detectors are like moving windows used to identify
and count the people in an image. This method can further be classified as:
Monolithic detection: a direct approach to count people in an image or video.
Pedestrian detection involves this technique by training the classifier based on full
human body appearance [3–5].
Part-based detection: classifiers are trained on partially occluded humans body
parts like head, shoulder, face, etc., to count the people [6–8].
Shape based detection: realistic shape prototypes are used for detection
purposes. These prototypes are employed to identify the people in images.
Multi-sensor detection: this technique incorporates multi-view data generated
by multiple surveillance cameras applied in an area. However, multi-camera setup
suffers due to varied resolutions, viewpoints, variations in illuminations and back-
grounds. Different solutions to such issues are spatio-temporal occurrences, scene
structures, object size [9, 10].
• Clustering-based methods: This method uses relative uniformity in visual
features and individual motion fields. Lucid feature trajectories are clustered to
show the independently moving entities. Kanade-Lucas-Tomasi (KLT) tracker
[11], Bayesian clustering to track and group local features into clusters [12],
generate head detection-based person hypothesis [13] are some the techniques
usedin this method.
• Regression-based methods: In these methods, patches are cropped from the
image, and corresponding to each patch, low-level features are extracted. Density
is estimated based on the collective and holistic description of crowd patterns
[14].
• Density estimation-based methods: A density map is formed for objects in the
image. Extracted features and their object density maps are linearly mapped.
Random forest regression is also used for learning non-linear mapping.
• CNN-based methods: CNNs are used to build an end-to-end regression model
to analyse an image. Crowd counting is done on the whole image rather than
Analytical Study of YOLO and Its Various Versions … 977
only on a particular part of it. CNNs give remarkable results when working with
regression or classification tasks for crowd counting.
For sparse crowd authors in [11, 15] used sliding window detectors and hand
crafted features are used by authors in [16, 17] for regression-based techniques. These
techniques are not effective in dense crowd counts due to occlusions. Researchers
used CNN-based approaches to predict the density and gives better results as given
in [18–22].
YOLOv5 [23] has been introduced for object detection of different types of objects
in video and images. İn this paper, it has been analysed exclusively for crowd counting
in different video sequences ranging from low to high density. Weights are obtained
by training the model on the COCO dataset [24]. Based on these pre-trained weights,
model is tested on three different datasets. Results show that the model works well
for low to medium dense crowds, but the performance degrades in densely crowded
scenarios.
2 Literature Survey
YOLO—You Only Look Once is a fast and easy-to-use model designed for object
detection. YOLO was firstly introduced by Joseph Redmon et al. [25] in the year
2016. Future versions YOLOv2, v3, v4, v5 were published with improvements in
previous releases. Table 1 describes the basic working and year of publication of all
the YOLO versions. Table 2 shows the features and limitations of all the versions of
YOLO.
2.1 YOLO
Before YOLO, classifiers were used for object detection, but in YOLO, the full image
is directly used in NN to speculate class probabilities and bounding boxes. In real-
time, YOLO processing speed is 45 fps. Its modified version, Fast YOLO processes
at 155 fps. The detailed architecture of YOLO can be found in [25].
Steps involved in object detection are as given below:
1. The input image is divided into A × A grids.
2. A grid cell containing the center of the object is used for detection.
3. Predict bounding box B and confidence score C for each grid cell.
4. The confidence of model for the bounding box and the object in is predicted as:
Table 2 (continued)
YOLO version Features Limitations
YOLOv5 • YOLOv5 is blazingly fast and • YOLOv5 has limited
accurate performance on highly dense
• It can detect objects with images and videos
inconsistent aspect ratios
• YOLOv5 architecture is small,
so it can be deployed to
embedded devices easily
• Pycharm weights of YOLOv5
can be translated to Open Neural
Network Exchange (ONNX)
weights to Core Machine
Learning (CoreML) to iOS
5. For each bounding box, five prediction and confidence parameters are: center
coordinates of box w.r.t. grid cell boundary (cx , cy ), height and width of bounding
box relative to the image size (h, w), predicted confidence C, the IoU between
ground truth and predicted box.
6. Single conditional class probability, Pr(classx |object), is predicted for each grid
cell.
7. For testing, class-specific confidence score for all bounding boxes is computed
as:
2.2 YOLOv2/YOLO9000
Px = σ (bx + cx ) (2)
Py = σ b y + c y (3)
Pw = ow ebw (4)
Ph = oh ebh (5)
2.3 YOLOv3
YOLOv3 replaced YOLO9000 because of higher accuracy. Due to the complex archi-
tecture of DarkNet-53, YOLOv3 is slower, but accuracy is better than YOLO9000.
For each bounding box, the network predicts four coordinates labeled as nx , ny , nw ,
nh , from the top left corner of the image the cell offset is (ox , oy ), and initial height
and width of bounding box is (ih , iw ).
Method to predict next location is shown in Fig. 1. It is derived as:
Px = σ (n x ) + ox (7)
Py = σ n y + o y (8)
Pw = i w en w (9)
Ph = i h en h (10)
The overlapping threshold is fixed at 0.5. The prediction score of objectness for
any bounding box is 1, i.e., the predicted bounding box covers the maximum area
of the ground truth object. YOLOv3 predicts across three different scales, i.e., the
detection layer detects three different-sized feature maps. For each scale, three anchor
boxes are assigned, so nine boxes perform better than YOLO and YOLO9000 both.
2.4 YOLOv4
The YOLOv4 is a fast and accurate real-time network. The structure of YOLOv4
is built using CSPDarknet53 [30] as backbone; Path Aggregation Network (PAN)
[2] and Spatial Pyramid Pooling (SPP) [33], as the neck; and YOLOv3 [27] as
the head. Authors combine several universal features as the backbone and detector
in two categories Bag of Freebies (BoF), Bag of Specials (BoS). BoF is the tech-
nique that increases the training cost, and it also improves the detector’s accuracy
within a similar estimated time. Whereas the BoS comprises several plugins and
post-processing units, which improves detection accuracy with a little increment in
inference cost.
A number of crowd counting methods have been proposed in literature. Some
of them are suitable for low-density videos, while others are suitable for moderate
density videos. Further, there is very little work that is applicable for all density
crowd videos. Therefore, there is a need to design ubiquitous methods for all density
crowd videos.
3 Proposed Model
Detailed explanation of proposed model is given in this module. The main aim of this
study is counting the number of people in images and videos using the object detec-
tion technique YOLOv5. Glen Jocher from Ultralytics LLC [23] designed YOLOv5
and published it on GitHub in May2020. YOLOv5 is an improvement on YOLOv3
and YOLOv4 and is implemented in PyTorch. Till date, developer is updating and
Analytical Study of YOLO and Its Various Versions … 983
4. The model weights are optimized using strip optimizer for input images and
videos.
5. Images are rescaled and reshaped along bounding boxes using normalization
gain.
6. The boxes are then labeled according to pre-trained weights classes.
7. Weights are processed on the model and generates bounding boxes on input
image or video frames to produce the person count as output.
4 Experimental Results
The given model is tested on three benchmark datasets, AVENUE dataset [34], PETS
dataset [31], Shanghaitech dataset [35], out of which, the first two datasets represent
low to medium crowd density. In contrast, the last one is for high-density crowds.
AVENUE Dataset: The dataset consists of 35 videos. 30,652 frames corre-
sponding to these videos are used for testing.
PETS Dataset: Regular workshops are held by the Performance Evaluation of
Tracking and Surveillance (PETS) program, generating a benchmark dataset. This
dataset tackles group activities in public spaces, such as crowd count, tracking indi-
viduals in the crowd and detecting different flows and specialized crowd events.
Multiple cameras are used to record different incidents, and several actors are
involved. Dataset consists of four different subsections with different density levels.
Shanghaitech Dataset: It is a large crowd counting dataset. İt contains 1198
annotated crowd images. Dataset is divided into two sections: Part-A and Part-B,
containing 482 and 716 images, respectively. In all, 330,165 persons are marked in
the dataset. Part-A images are collected from the Internet, while Part-B images are
from Shanghai’s bustling streets.
Figures 3, 4, and 5 show the detected humans in bounding boxes for AVENUE,
PETS2009, and Shanghaitech datasets. Average Precision (AP) and Mean Absolute
Error are the evaluation parameters. Table 5 shows the average precision for three
datasets. Table 6 gives the resultant MAE for all datasets.
It can be seen that AP for AVENUE and PETS2009 datasets is 99.5%, 98.9%,
respectively, whereas it is 40.2% for the high-density dataset Shanghaitech dataset.
Further, in terms of MAE, the value is higher for high-density dataset. Therefore,
it has been concluded that YOLOv5 works more efficiently for crowd detection for
low to medium videos, and its performance degrades for high-density videos.
Fig. 3 Detection of humans in various frames of different video sequences of AVENUE dataset
(i)–(ii) represent frame 1, and frame 51 of video sequence 2, (iii)–(iv) represent frame 20 and frame
101 of video sequence 7, (v)–(vi) represent frame 301 and 451 of video sequence 13
Fig. 4 Detection of humans in various frames of different video sequences of PETS2009 dataset.
(i)–(ii) represent frame 1, 1001 of video sequence 2, (iii)–(iv) represent frame 101, 651 of video
sequence 8, and (v)–(vi) represent frame 1, 251 of video sequence 13
References
1. Ford M (2017) Trump’s press secretary falsely claims: ‘Largest audience ever to witness an
inauguration, period.’ The Atlantic 21(1):21
2. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation.
In: Proceedings of the IEEE conference on computer vision and pattern recognition 2018, pp
8759–8768
3. Cheng Z, Zhang F (2020) Flower end-to-end detection based on YOLOv4 using a mobile
device. Wirel Commun Mob Comput 17:2020
4. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE
computer society conference on computer vision and pattern recognition (CVPR’05), vol 1,
IEEE, pp 878–885, 20 June 2005
5. Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on riemannian
manifolds. IEEE Trans Pattern Anal Mach Intell 30(10):1713–1727
6. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE
computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, 20
Jun 2005. IEEE, pp 886–893
7. Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using
perspective transformation. IEEE Trans Syst Man Cybern-Part A Syst Hum 31(6):645–654
8. Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by
bayesian combination of edgelet based part detectors. Int J Comput Vision 75(2):247–266
9. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with
discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–
1645
10. Wang M, Li W, Wang X (2012) Transferring a generic pedestrian detector towards specific
scenes. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3274–
3281, 16 Jun 2012
11. Wang M, Wang X (2011) Automatic adaptation of a generic pedestrian detector to a specific
traffic scene. In: CVPR 2011. IEEE, 20 June 2011, pp 3401–3408
12. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to
stereo vision 1981
13. Brostow GJ, Cipolla R (2006) Unsupervised Bayesian detection of independent motion in
crowds. In: 2006 IEEE computer society conference on computer vision and pattern recognition
(CVPR’06), vol 1 17. Jun 2006. IEEE, pp 594–601
14. Tu P, Sebastian T, Doretto G, Krahnstoever N, Rittscher J, Yu T (2008) Unified crowd segmen-
tation. In: European conference on computer vision. Springer, Berlin, pp 691–704, 12 Oct
2008
15. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image
by bayesian combination of edgelet part detectors. In: Tenth IEEE international conference on
computer vision (ICCV’05), vol 1. IEEE, pp 90–97, 17 Oct 2005
16. Chan AB, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In: 2009
IEEE 12th international conference on computer vision 2009 Sep 29. IEEE, pp 545–551
17. Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features.
In: 2009 digital image computing: techniques and applications. IEEE, pp 81–88, 1 Dec 2009
18. Ruchika, Purwar RK (2019) Crowd density estimation using hough circle transform for video
surveillance. In: 2019 6th international conference on signal processing and integrated networks
(SPIN). IEEE, 2019 Mar 7, pp 442–447
19. Kampffmeyer M, Dong N, Liang X, Zhang Y, Xing EP (2018) ConnNet: A long-range
relation-aware pixel-connectivity network for salient segmentation. IEEE Trans Image Process
28(5):2518–2529
20. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn
with attention mechanism. IEEE Trans Image Process 28(5):2439–2450
21. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: Residual
learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
Analytical Study of YOLO and Its Various Versions … 989
22. Chao H, He Y, Zhang J, Feng J (2019) Gaitset: regarding gait as a set for cross-view gait
recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp
8126–8133, 17 Jul 2019
23. Jocher G, Changyu L, Hogan A, Changyu LY, Rai P, Sullivan T (2020) Ultralytics/yolov5. Init
Release. https://doi.org/10.5281/zenodo.3908560
24. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll’ar P, Zitnick CL (2014)
Microsoft COCO: common objects in context. In: ECCV, 2014. ISBN 978-3-319-10601-4.
https://doi.org/10.1007/978-3-319-10602-148
25. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
2016, pp 779–788
26. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
conference on computer vision and pattern recognition 2017, pp 7263–7271
27. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.
02767. 8 Apr 2018
28. Bochkovskiy A, Wang CY, Liao HY (2020) Yolov4: optimal speed and accuracy of object
detection. arXiv preprint arXiv:2004.10934. 23 Apr 2020
29. Davies AC, Yin JH, Velastin SA (1995) Crowd monitoring using image processing. Electron
Commun Eng J 7(1):37–47
30. Wang CY, Liao HY, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) CSPNet: a new backbone
that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition workshops 2020, pp 390–391
31. Lu C, Shi J, Jia J (2020) Abnormal event detection at 150 fps in MATLAB. In: Proceedings of
the IEEE international conference on computer vision 2013, pp 2720–2727. Available at: http://
www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html [Accessed 15 Nov 2020]
32. Zhang Y, Zhou D, Chen S, Gao S, Ma Y. Single-image crowd counting via multi-column
convolutional neural network. In Proceedings of the IEEE conference on computer vision and
pattern recognition 2016, pp 589–597
33. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks
for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
34. Oh MH, Olsen P, Ramamurthy KN (2020) Crowd counting with decomposed uncertainty. In:
Proceedings of the AAAI conference on artificial intelligence, vol 34(07), pp 11799–11806, 3
Apr 2020
35. Ferryman J, Shahrokni A (2009) Pets2009: dataset and challenge. In: 2009 twelfth IEEE inter-
national workshop on performance evaluation of tracking and surveillance. IEEE, 7 Dec 2009,
pp 1–6. Available at: http://www.cvg.reading.ac.uk/PETS2009/a.html [Accessed 15 July 2021]
IoT Enabled Elderly Monitoring System
and the Role of Privacy Preservation
Frameworks in e-health Applications
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 991
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_72
992 V. J. Aski et al.
1 Introduction
IoT has gained a massive ground in day-to-day’s life of the researchers and prac-
titioners due to its capability of offering an advanced connectivity and uniquely
identifying nature of every physical instance on the planet earth. Thanks to IPV4 and
IPV6 addressing spaces which facilitates the seamless addressing schemes that can be
remotely being called. These technological evolutions offer the new ways for smart
objects to communicate with things (M2T), machines (M2M), and humans (M2H)
[1]. With the rapid increment in the requirements of remote monitoring, the IoT
enabled healthcare devices can be commonly seen in many domestic places nowa-
days. These devices help in monitoring body vitals majorly temperature, heartrate,
blood pressure, and glucose level., thus by enabling patients with self-helped diag-
nosis and avoids overcrowding at hospitals. The captured vitals are being constantly
uploaded to iDoctor like cloud centric data centers and eventually one can seek
profession medical advices on regular basis [2]. In addition, such self-helped medical
services are becoming more prominent and able to produce more accurate results
because of recent advancements in wireless communication techniques (WCTs) and
information technologies [3, 4].
The privacy protection (data or device) is one of the key problems that IoT is facing
since its inception. There is an immense need for a universal solution to overcome
such issues in order to widely accept the technological solutions in critical appli-
cation domains such as healthcare and allied sectors. Researchers are witnessing
a huge spike in medical information leakage patterns compromising the security
goals from recent past since the malwares are becoming more vulnerant and resistive
[5, 6]. For instance, the US’s second-largest health insurance corporation database
was once targeted by hackers and led to the leakage of approximately 80 million
customers’ personal information including health data of those individuals. There-
fore, the privacy protection strategies play a vital role in designing any e-health
application. In this article, authors provide a holistic overview of recent trends and
techniques that are employed in development of healthcare privacy preservation
frameworks and associated security concerns.
The following scenario help us understand as to how the privacy breaching has
becoming day-to-day reality in all our lives. Akshata looks at her smart wrist-band
while doing the regular workouts and observes that the heartrate was little higher
than the target rate (Target heart rate during the exercise should be normally between
the range of “220 minus your age” [7]). After the workout, she asks Alexa (smart
speaker) to book her an appointment with nearest cardiologist for general heart health
checkup. Next day after completing her office works, she visit to a cardiologist and felt
relaxed when doctors reassured nothing was wrong and the heartrate went high due
to the intensified workouts. The next time Akshata uses her browser, she felt irritated
because of those annoying adds related to heart medications, heart complications,
and many tutorials about identifying heart attack were popping up on the browser
constantly. Things got more worsen when she received a telephone call from a health
insurance agency to recommend her a plan. This could be only one of several such
IoT Enabled Elderly Monitoring System and the Role of Privacy … 993
incidents where the modern technology brings us the high risks of privacy and it
became unavoidable in daily life.
2 Related Work
There exist several survey articles in the domain of HIoT security and privacy
concerns [8–13]. Most of these survey articles overview on security issues and privacy
aspects along with the proposed solutions in healthcare IoT prospects as shown in
the Table 1. In the proposed article, we have focused on providing a holistic overview
on privacy protection of healthcare data and devices. In addition, we also provide a
conceptual architecture for designing an application for monitoring health status of
elderly (patient with degenerative disorder) and differently abled person.
Table 1 Comparative analysis of existing survey articles in the domain of HIoT and implementation issues
Author and year of Aim of the article Security concerns Architecture Research questions Open issues Challenges Drawbacks
publication discussed
Luis et al. [8] An exhaustive Access controlling ✕ ✓ ✕ ✕ Limited frameworks
literature review on rules and policies
medical IoT
Abbas et al. [9] Survey on health Attribute-based ✕ ✕ ✓ ✓ Centralized
clouds encryption architecture
Nuaimi et al. [10] Survey on healthcare Not mentioned ✕ ✕ ✓ ✕ Limited scope
cloud implementation variation
Idoga et al. [11] Comprehensive on AES ✕ ✕ ✓ ✕ Application scope
security issues of limitation
e-health
Pankomera et al. A review on security Not mentioned ✓ ✕ ✓ ✕ Lack of public health
[12] and privacy issues in concerns
healthcare
Olaronke et al. [14] A survey on bigdata Biometric security ✕ ✕ ✕ ✕ Incomplete
challenges in functions information
healthcare
Proposed study A holistic overview Access control and ✓ ✓ ✓ ✓ NA
on privacy authentication
preservation schemes
V. J. Aski et al.
IoT Enabled Elderly Monitoring System and the Role of Privacy … 995
IoT provides healthcare consumers with a high degree of control on how to carry
out day-to-day tasks that are ranging from capturing the data from patient body to
disseminate the captured information to the remote servers for further analysis. It
also provides a way to saturate the patient environments with the smart things. Smart
things denote a broad spectrum made of low power computing platforms such as
microcontroller, microprocessors, sensor area network, and wireless communication
entities which help data to be settled at a cloud platform. Figure 1 describe a generic
healthcare environment which comprises sensor network, data processing platforms,
wireless infrastructure, cloud platform, and the base station where multiple health
workers are being benefitted. HIoT can be implemented in both static and dynamic
environments. In static environment, patient’s movement is static to a place like
ICU and physician’s examination hall (In-hospital monitoring use case of Fig. 1).
Data
Packets
Local Network
Processing Gateway
Unit (LPU)
of Sensor
Nodes
Prescription
upload Patient
IP Network Monitoring
Out Patient
Monitoring
Medical
Data Server
Packets
Cellular
Services Enhanced Drug
Management
Indoor Patients
Wi-Fi
gateway
Outdoor Wi-Fi
Access Points
Ambulance
Monitoring Improved Hospital
Report
Air Resource Utilization
Analysis
Road Ambulance Ambulance
HIoT Use-cases
In-hospital
Patient Monitoring 2G/3G/4G, LPWAN (Sig fox, LoRA), WiMAX, ZigBee, Wi-Fi, Wireless Connectivity
BLE,6LoWPAN, NFC, RFID
WSN Connectivity
Backbone Wireless Protocols for HIoT Systems
However, in dynamic environments, patient can wear the device and perform his daily
activities such as jogging and walking (out patient monitoring use case of Fig. 1). In
this article, we provide privacy concerns related to healthcare application. In addition,
the risk exposure for these devices are much higher at the place of development than
that of the deployment and a safeguard technique shall be used to prevent these
devices from the security threats at deployment place.
Given the high vulnerable nature of IoT devices, it is essential for us to know
the risks and challenges such devices pose to the privacy of patient data. Moreover,
one has to obtain satisfactory answer to the following question before going to opt
an IoT enabled healthcare device from a hospital. Is it possible to get a device
that fully supports privacy preserving and safe environments like the traditional
internet provides? To get a precise response to this questionn one must understand the
logical differences between trust, privacy, and confidentiality. Privacy in healthcare
IoT terms is defined as the information of any individual’s health data must be
protected from third party accesses. In the same way, privacy also means that the
information should not be exposed to others without an explicit consent of a patient.
It is a fundamental right of a patient to decide whom to share his/her data. For
instance, in our previous example, only Akshata who decides whether to share the
data to insurance company or not. In the same way, trust is consequential product of
transparency and consistency. Finally, the confidentiality is a factor that decides the
data and manages the right person is accessing right data, and it prevents data being
accessed from unauthorized entities. If I say, my data is confidential that means the
data is accessible only by me and without my permissions no one is authorized to
access.
In this section, we deliberate the various privacy concern issues and security aspects.
Mainly we categorize the different security and privacy concerns in three heads
such as process based, scheme based, and network and traffic based. The detailed
classification is explained in the taxonomy diagram as shown in Fig. 2.
The modern lifestyle has generated a need to incorporate the huge number of smart
devices around us. These smart devices are made of multiple sensors and actuators
comprising data acquisition system which captures numerous physical parameters
such as temperature and heartrate. These devices creating massive amount of data
IoT Enabled Elderly Monitoring System and the Role of Privacy … 997
which can only be handled with specific set of algorithms specified in Bigdata tech-
nologies. This massive amount of data is using open channel such as Internet. Though
it is needless to mention the vulnerable nature of Internet, its more challenging job
is to handle the issues of process-based techniques such as cloud computing, fog
computing, and edge computing.
In the centralized systems, the user data is stored at one central database, and it is
queried as and when required by end user from the same database. During the failure
of central server, the whole system is frozen and it is difficult to recover the lost data.
Therefore, it is a biggest drawback of such systems. In distributed systems, such
issues can be easily handled. Table 2 shows the state-of-the-art security protocols in
distributed computing field.
IoT exposes several internet paradigms to security vulnerabilities due to its highly
openness nature. In healthcare, the data needs to be securely captured, transferred,
stored, and processed. The sensitive or critical data such as body vitals of several
patients can be protected from unauthorized accesses with the use of password based
mechanisms, cryptographical algorithms, or biometric authentication schemes.
IoT Enabled Elderly Monitoring System and the Role of Privacy … 999
Figure 4 depicts the proposed HIoT layered architectural framework for monitoring
elderly and differently abled people. Here, we have derived different application
scenarios and components in accordance with their functionalities and requirements
into three layers. Object layer or perception layer is a layer where all the physical
objects such as sensors and actuators are functioning for a common goal of capturing
Application Layer
Signaling Web 2.0
Gateway
Network/Gateways Layer
LoRA
Sybil attack
Gateway Eavesdropping
MIM Attack
Internet Network Replay attack
DDOS attack
Spoofing
Routing attack
LPWAN
WiMAX
Room Temperature
and Light Intensity Sensor
GPS Sensor
Pulse Sensor
Gateway Enabled
MCU Joystick
ECG Sensor Controller
IR Sensor
Wheelchair
Motor Controller
Data Acquisition and
Sonar Sensor
Processing Unit Moment Monitoring
Pressure Sensors Sensor
Object Heart Activity
(b) Monitoring Sensor
(a) (c)
(a) Real-Time Monitoring of Foot Pressure and Heart Activity of Diabetic Patient.
(b) A Real-Time Assistive Smart Wheel-Chair for Parkinson Patient.
(c) A Real-Time Monitoring of an ICU Patient.
Fig. 4 Proposed HIoT layered architectural framework for monitoring elderly and differently abled
people
1002 V. J. Aski et al.
the data from patient body in dynamic environments. The network or gateway layer
is responsible for transporting data from DAQs to the storage infrastructures such
as clouds and fog nodes. Further, the application layer is responsible for performing
data analytical tasks such as creating graphs, flow charts to improve the business
processes. Sometimes application layer also called by the name of business layer.
Attack vectors of the layered architecture is also shown in Fig. 4.
Several researchers worked on model-based attack-oriented schemes to prevent
unauthorized accesses. For instance, authors in [37] presented a model-based attack
oriented algorithm to safeguard the healthcare data which work on the basic principle
of Markov model. Further authors in [38] designed a model to prevent the information
breaches in healthcare applications.
We have considered various chronic health issues for both elderly as well as differ-
ently abled community as use cases and they are briefly discussed in the below
subsection.
Here, various sensors such as force sensitive resistive (FSR) pressure sensor is being
installed in the foot sole of a patient. The diabetic patient has a tendency to develop
a wound easily as his/her foot skin is highly sensitive to rough surfaces. The wound
may lead to gangrene and therefore may cause permanent disability followed by
amputations of infected body parts. Therefore, it is important for a patient to know
his/her foot pressure variations. These variations are continuously monitored by a
medical professional through the FSRs. Such that, if there are any abnormal variations
can be easily tracked out and further medical attention can be gained. Here, multiple
other sensors such as pulse rate sensor, ECG sensor, and IR sensors are interfaced to
a microcontroller and data gets transferred to medical health server (MHS) through
Wimax like wireless technologies. At cloud-level data gets segregated as per the
nature of applications. For instance, data from diabetic patient and ICU patient gets
stored in mobile sensor database and monitoring database, respectively. Generally,
at cloud-level, LoRa gateways are used for further data distribution.
IoT Enabled Elderly Monitoring System and the Role of Privacy … 1003
Here, the patient is equipped with motion sensor (to detect the motion), GPS sensor (to
know the location), and motor controller. The wheelchair is smart enough to capture
patient data and transfer it to the nearest cloud database through microcontroller. In
Parkinson’s disease, patient cannot move his/her arms as per their wish, so the smart
joystick will take care of patient’s movements. The data captured from this patient
is stored in the separate database at cloud called mobile sensor database for further
evaluations.
In this section, we have discussed the research challenges that are common in
designing IoT frameworks for monitoring vitals of elderly and disabled people. It
was observed that the major challenge is customization of healthcare devices that fit
comfortable for disabled people as every disabled individual has specific needs and
circumstances are different. Context-aware environments are created in smart work-
flows which takes the intelligent decision based on the context information received
from the bio-sensing devices. Another important challenge is the self-management of
IoT device. It is always preferable to design a human intervention-free device which
automatically updates its environment as its difficult for elderly or disabled people to
work on regular updates. Standardization is another key problem that every health-
care IoT designer needs to take care. Incorporation of globally acceptable standards
into IoT device is more essential to avoid interoperability related problems.
Finally, the future goal is to enhance and envision the evolution of technologies
associated in IoT and allied fields that helps in creating the devices for disabled
and elderly people. The advances in brain–computer interface (BCI) have made it
possible to create the control environments for various artificial limbs such as arms
and legs. There are continuous transformations occurring in BCI technologies around
the globe for further enhancing the research challenges. It is expected that the disabled
community will be greatly benefited by such advancements in BCI.
6 Conclusion
The paper offers a detailed overview of numerous privacy preservation concerns and
security issues that are seen in day-to-day functions of an HIoT applications. We
have deliberated key aspects of security aspects though the taxonomical diagram.
A heterogynous verity of recent state-of-the art authentication and access control
schemes and their implications in a detailed way. In addition, we have presented the
insights of different policy-based, process-based, and authentication-based security
and privacy preserving schemes that are used in HIoT application domain. IoT based
1004 V. J. Aski et al.
healthcare architectural framework has been discussed. Multiple use cases such as
1 real-time monitoring of foot pressure and heart activity of a diabetic patient and
real-time assistive smart wheelchair for Parkinson’s disease are deliberated with
the diagram. Here, various sensors such as force sensitive resistive (FSR) pressure
sensors, ECG sensor, GPS sensors, and pulse rate sensor are explained with their
usage implications.
References
1. Bahga, Madisetti VK (2015) Healthcare data integration and informatics in the cloud. Comput
(Long Beach Calif) 48(2):50–57, Feb 2015
2. Zhang Y, Chen M, Huang D, Wu D, Li Y (2017) iDoctor: personalized and professionalized
medical recommendations based on hybrid matrix factorization. Futur Gener Comput Syst
66:30–35
3. Yu K, Tan L, Shang X, Huang J (2020) Gautam srivastava, and pushpita chatterjee. In: Efficient
and privacy-preserving medical research support platform against COVID-19: a blockchain-
based approach. IEEE Consumer Electronics Magazine
4. Yu, K-P, Tan L, Aloqaily M, Yang H, Jararweh Y (2021) Blockchain-enhanced data sharing
with traceable and direct revocation in IIoT. IEEE Trans İndustrial İnformatics (2021)
5. Sriram S, Vinayakumar R, Sowmya V, Alazab M, Soman KP (2020) Multi-scale learning based
malware variant detection using spatial pyramid pooling network. In: IEEE INFOCOM 2020-
IEEE conference on computer communications workshops (INFOCOM WKSHPS), IEEE, pp
740–745
6. Vasan D, Alazab M, Venkatraman S, Akram J, Qin Z (2020) MTHAEL: cross-architecture IoT
malware detection based on neural network advanced ensemble learning. IEEE Trans Comput
69(11):1654–1667
7. Target heart rates chart | American heart association. [Online]. Available: https://www.heart.
org/en/healthy-living/fitness/fitness-basics/target-heart-rates. [Accessed: 28 May 2021]
8. Fernández-Alemán JL, Señor IC, Lozoya PÁO, Toval A (2013) Security and privacy in
electronic health records: a systematic literature review. J Biomed Inf 46(3):541–562
9. Abbas A, Khan SU (2014) A review on the state-of-the-art privacy-preserving approaches in
the e-health clouds. IEEE J Biomed Health Inform 18(4):1431–1441
10. Al Nuaimi N, AlShamsi A, Mohamed N, Al-Jaroodi J (2015) e-Health cloud implementation
issues and efforts. In: 2015 ınternational conference on ındustrial engineering and operations
management (IEOM). IEEE, pp 1–10
11. Idoga PE, Agoyi M, Coker-Farrell EY, Ekeoma OL (2016) Review of security issues in e-
Healthcare and solutions. In: 2016 HONET-ICT, pp 118–121. IEEE
12. Pankomera R, van Greunen D (2016) Privacy and security issues for a patient-centric approach
in public healthcare in a resource constrained setting. In: 2016 IST-Africa week conference.
IEEE, pp 1–10
13. Olaronke I, Oluwaseun O (2016) Big data in healthcare: prospects, challenges and resolutions.
In: 2016 future technologies conference (FTC). IEEE, pp 1152–1157
14. Lee I, Lee K (2015) The Internet of Things (IoT): applications, investments, and challenges
for enterprises. Bus Horiz 58(4):431–440
15. Zhang D, Zhang D, Xiong H, Hsu C-H, Vasilakos AV (2014) BASA: building mobile Ad-Hoc
social networks on top of android. IEEE Network 28(1):4–9
16. Sharma G, Bala S, Verma AK (2012) Security frameworks for wireless sensor networks-review.
Procedia Technol 6:978–987
IoT Enabled Elderly Monitoring System and the Role of Privacy … 1005
17. Wang K, Chen C-M, Tie Z, Shojafar M, Kumar S, Kumari S (2021) Forward privacy
preservation in IoT enabled healthcare systems. IEEE Trans Ind Inf
18. Hassan MU, Rehmani MH, Chen J (2019) Privacy preservation in blockchain based IoT
systems: ıntegration issues, prospects, challenges, and future research directions. Future Gener
Comput Syst 97:512–529
19. Bhalaji N, Abilashkumar PC, Aboorva S (2019) A blockchain based approach for privacy
preservation in healthcare iot. In: International conference on ıntelligent computing and
communication technologies. Springer, Singapore, pp 465–473
20. Du J, Jiang C, Gelenbe E, Lei X, Li J, Ren Y (2018) Distributed data privacy preservation in
IoT applications. IEEE Wirel Commun 25(6):68–76
21. Ahmed SM, Abbas H, Saleem K, Yang X, Derhab A, Orgun MA, Iqbal W, Rashid I, Yaseen A
(2017) Privacy preservation in e-healthcare environments: state of the art and future directions.
IEEE Access 6:464–478
22. Xu X, Fu S, Qi L, Zhang X, Liu Q, He Q, Li S (2018) An IoT-oriented data placement method
with privacy preservation in cloud environment. J Net Comput Appl 124:148–157
23. Bhattacharya P, Tanwar S, Shah R, Ladha A (2020) Mobile edge computing-enabled blockchain
framework—a survey. In: Proceedings of ICRIC 2019. Springer, Cham, pp 797–809
24. Al Hamid HA, Rahman SMM, Hossain MS, Almogren A, Alamri A (2017) A security model
for preserving the privacy of medical big data in a healthcare cloud using a fog computing
facility with pairing-based cryptography. IEEE Access 5 (2017):22313–22328
25. Zhou J, Cao Z, Dong X, Lin X (2015) TR-MABE: White-box traceable and revocable multi-
authority attribute-based encryption and its applications to multi-level privacy-preserving e-
healthcare cloud computing systems. In: 2015 IEEE conference on computer communications
(INFOCOM). IEEE, pp 2398–2406
26. Kaneriya S, Chudasama M, Tanwar S, Tyagi S, Kumar N, Rodrigues JJPC (2019) Markov
decision-based recommender system for sleep apnea patients. In: ICC 2019–2019 IEEE
international conference on communications (ICC). IEEE, pp 1–6
27. Mutlag AA, Abd Ghani MK, Arunkumar NA, Mohammed MA, Mohd O (2019) Enabling
technologies for fog computing in healthcare IoT systems. Future Gener Comput Syst 90:62–78
28. Zhou J, Cao Z, Dong X, Lin X (2015) PPDM: a privacy-preserving protocol for cloud-assisted
e-healthcare systems. IEEE J Sel Top Sign Process 9(7):1332–1344
29. Ziglari H, Negini A (2017) Evaluating cloud deployment models based on security in EHR
system. In: 2017 ınternational conference on engineering and technology (ICET). IEEE, pp
1–6
30. Sanz-Requena R, Mañas-García A, Cabrera-Ayala JL, García-Martí G (2015) A cloud-based
radiological portal for the patients: ıt contributing to position the patient as the central axis of
the 21st century healthcare cycles. In: 2015 IEEE/ACM 1st ınternational workshop on technical
and legal aspects of data privacy and Security. IEEE, pp 54–57
31. Huang C, Yan K, Wei S, Hoon Lee D (2017) A privacy-preserving data sharing solution for
mobile healthcare. In: 2017 ınternational conference on progress in ınformatics and computing
(PIC). IEEE, pp 260–265
32. Banerjee M, Lee J, Choo K-KR (2018) A blockchain future for internet of things security: a
position paper. Digital Commun Net 4(3):149–160
33. Gordon WJ, Catalini C (2018) Blockchain technology for healthcare: facilitating the transition
to patient-driven interoperability. Comput Struct Biotechnol J 16:224–230
34. Kshetri N (2017) Blockchain’s roles in strengthening cybersecurity and protecting privacy.
Telecommun Policy 41(10):1027–1038
35. Kumar NM, Mallick PK (2018) Blockchain technology for security issues and challenges in
IoT. Procedia Comput Sci 132:1815–1823
36. Li X, Ibrahim MH, Kumari S, Sangaiah AK, Gupta V, Choo K-KR (2017) Anonymous mutual
authentication and key agreement scheme for wearable sensors in wireless body area networks.
Comput Netw 129:429–443
1006 V. J. Aski et al.
37. Strielkina A, Kharchenko V, Uzun D (2018) Availability models for healthcare IoT systems:
classification and research considering attacks on vulnerabilities. In: 2018 IEEE 9th ınterna-
tional conference on dependable systems, services and technologies (DESSERT), IEEE, pp
58–62
38. McLeod A, Dolezel D (2018) Cyber-analytics: Modeling factors associated with healthcare
data breaches. Decis Support Syst 108:57–68
Hybrid Beamforming for Massive MIMO
Antennas Under 6 GHz Mid-Band
1 Introduction
The ongoing evolution in wireless technologies has become a necessary evil of our
everyday life. The present system uses RF signals, electromagnetic waves (EM) to
forward their data from the source point to its destination point. 5G redefines the
network with new global wireless standards for the fastest communications. The use
of macrocells makes the foundation of 5G technology by serving thousands of mobile
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1007
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_73
1008 K. Bhagat and A. Suri
2 Literature Review
From the literature review, we observed that this technology in addition with the
massive MIMO systems have become a hot topic in this research area. Article [10],
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band 1009
survey was conducted for the macro-cells millimeter-wave system which discussed
the performance of the MIMO systems showing that it is better to model for multi-
dimensional accuracy using scattering models at outdoor scenarios. Our paper is
organized in a systematic way following with the section firstly proposed model. The
next section includes the mathematical model representation of our model. Thirdly,
it includes measurements units following with the results and discussion. The fifth
section includes the conclusion and scope in future time.
3 Proposed Model
Starting with basic beamforming, the analog beamformers produce a single beam
to each antenna array which makes it a little complex for multiple beams. Digital
beamformers have an analog baseband channel in a single antenna for processing of
digital transceivers for every station to reduce its costs, more power utilization, and
system complexities. To overcome such problems, the use of hybrid beamforming
is the best choice [28]. The combining of both beamforming in RF and baseband
domains smartly forms the patterns transmitted from a large antenna array. In a hybrid
beamforming system, the transmission is like others beamforming. If we need to send
more than one data stream in a particular sequence over a propagation channel, we
need to express it with the help of precoding weights at the transmitter and combining
weights to the receiver over an impulse matrix. At last, every single data stream from
the users can be recovered from the receiver independently. For signal propagation,
the ray-tracing method is applied to the model using an SBR ray-tracing method for
the estimation of the tracing of rays assumed to be launched and to be traced. The
diagram shown below in Fig. 1 shows the transmission and reception of the signal
in the MU-MIMO OFDM system.
In the transmitter section, one or more user’s data are sent over the transmitter
antenna array through the channel encoded with the help of convolution codes. The
output of channel encoded bits is then mapped into quadrature amplitude modulation
(QAM) with different subcarriers (2, 16, 64, 256) complex symbols which results
in generating mapped symbols of each bit for every single user [29]. The output of
QAM data of users is then distributed into multiple data streams for transmission.
After all this process is completed, the next phase starts where the output is passed
into digital baseband precoding for assigning precoding weights for data streams.
In our proposed model, these weights are measured using hybrid beamforming with
Orthogonal Matching Pursuit (OMP) algorithm and Joint Spatial Division Multi-
plexing (JSDM) algorithm for single users and multi-users. The Joint Spatial Divi-
sion multiplexing is used as its performance is better for maximum array response
vector. It also allows many base stations for transmission.
Channel sounding and estimation is performed at both transmitter and receiver
section in reduction of the radio frequency propagation chains [30]. The base stations
sound the channel by using the reference signal for transmission so that it is easily
detected by the mobile station receiver point to estimate that channel. The mobile
stations then transmit the same information back to the base station so that they
can easily calculate the precoding required for the upcoming data transmission [31].
After assigning the precoding weights, the MU-MIMO system is used to combine
these weights at the receiver resulting in complex weights. The signal received is
in the digital form which is further modulated using orthogonal frequency-division
multiplexing modulation with pilot contaminated mapping followed by the radio
frequency analog beamforming for every single transmitter antenna. This modulated
signal is then fed into a scattering MU-MIMO, and then, demodulation is performed
for decoding of the originality of the signal when reached its destination point [32].
Table 1 shown below includes the parameters of our model which are generally
assumed for experimenting by considering different numbers of users, data streams
allotted to such users, and the OFDM system.
The channel matrix of the MIMO system is shown below considering H as the
channel impulse response.
⎡ ⎤
h 11 h 21 · · · h 31
⎢ h 12 h 22 · · · h 32 ⎥
⎢ ⎥
Channel matrix, H = ⎢ . .. .. ⎥
⎣ .. . . ⎦
h 1M h 2M · · · h 3M
We have assumed the downlink transmission from the first base station which acts
at the transmitter to the mobile user. In each transmitter section, baseband digital
1012 K. Bhagat and A. Suri
precoder FBB is processed with NS data streams to obtain its outputs. It is then
converted into RF chains through an analog precoder TRF to NBS antenna elements
for the propagation of the channel. Analog beamformers WRF are combined with
RF chains from the user’s antennas to create the output at the receiver.
Mathematically, it can be written as:
where FRF = Analog precoder, FBB = Digital decoder, N S = Signal streams, and
N T = Transmitter antennas.
Combining weight matrix is written as:
k
k, yk = Hk Wk Sk + Hk x xa + n k (4)
n=k
Whereas, k is the number of users, xk = signal allotted to the user, Hk is the channel
from the transceiver point to the k user; n k is the noise.
k
k, yk = Hk wk sk + Hk x xa + n k (6)
n=k
4 Measurement Units
Our results for the measurement of the channel are impaired in an indoor scenario with
the MATLAB software using the communication toolbox and phased antenna array
toolbox. The communication toolbox plays a major role in it designing of the model
and helps to provide algorithms that enable to analyze and gain outputs easily. The
antenna-phased array toolbox provides the correct positioning of the transmitter and
receiver antennas when locating in large numbers. The performance of the software
tool is analyzed once so that it shows accurate results when applied to characterize
the indoor radio channels. MIMO hybrid beamforming is designed in such a way
that it is highly possible to achieve accurate results. The implementation part of
the work is to plan the suitable environment for conducting measurements using
the channel sounder. The environment that is considered in our work model is an
indoor environment operating at a 6 GHz frequency range working for the application
of broadband and the low band for enhanced capacity of the network. Equipment
used for the measurement of the channel includes channel sounder, antennas such as
1014 K. Bhagat and A. Suri
Fig. 2 Channel sounder with transmitter unit and receiver unit [30]
isotropic antenna as it radiates power equally among all directions, mobile users at
250 m, 500 m, and 1 km range from the base station, uninterruptible power supply,
and laptop for measurement purposes. The number of rays is set to be at 500. The
modulation scheme used in it is OFDM whose number of data symbols is 10 and
8. There are four and eight users who are assigned with multiple data streams in
the order 3, 2, 1, 2, 2, 2, 1, 3. The next step is to calibrate the channel sounder
equipment for the spatially multiplexing system as shown in Fig. 2 consisting of two
main units: transmitter and receiver. The function of the channel sounder is to apply
the maximum power to the signal in the desired direction [27]. Preamble signal is
sent all over to the transmitter for processing the channel at the Rx section. The
preamble signal is generated for all sound channels and is then sent to the selected
MIMO system. The receiver section then performs pre-amplification, demodulation
(OFDM) for all established links. The experiment is practiced by varying parameters
such as the transmission distances, propagation channel model, data symbols, and
noise figures. We have simulated our results based on the error vector magnitude
values, beam patterns, and rays patterns. Following, different cases are assumed
for the measurement of their results such as users, data symbols, range of the base
stations, noise figures, propagation channel model, and modulation schemes.
MU-mm MIMO communication link between the BS and UEs is validated using
scattering-based MIMO spatial channel model with “single bounce ray tracing”
approximation. The users are placed randomly at different locations. Different cases
are studied, and on its basis, experiments have been performed and analyzed.
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band 1015
Case 1 In this first case, we change the distance of the antennas between the users
and then increase our users to check the impact on the bits received to users, EVM
RMS values, and antenna patterns. Also, here, the data symbol is set to be at 10 with
a noise figure of 10 dB. The number of rays is said to be 500 and remains fixed in
it. The propagation channel model selected in it is MIMO as it is better and more
efficient than the scattering channel model in Fig. 3. Here, in Fig. 3, it is clearly
shown that increased distances does not shows any effect on its values as well as the
received bits. They remain constant if the distance is increased from 250 to 1000 m
and beyond it. It is observed that for users who have a high number of data streams,
RMS EVM value remains low, and for users with single data streams, RMS EVM
value is high. This value is increased as base stations are decreased in multiple data
streams users. In other words, no impact is seen on the bits, EVM values by increasing
the distances (Fig. 3).
Fig. 5 Comparison of EVM values using the 16-QAM, 64-QAM scheme by considering eight
users
In Case 2, we assumed our users to be increased from four to eight and all other
parameters remained constant as we assumed in Case 1. We analyzed that there
is very little effect seen on the error vector magnitude values which relys on the
performance. Figure 5 as shown, the EVM values go partially high by increasing
users, the bit error rate remains low, and the output bits received are the same for
the first four users. This value is increased as base stations are decreased in multiple
data streams users. The number of bits is increased slightly for multiple data streams
as shown in Fig. 5. The transfer bit rate remains the same for multiple data streams
as those for 64-QAM. The root means square value is minimized when base station
antennas are increased. More data feeds lead to limiting the root mean square value
for every single data feed.
Case 3 In the third case, we have changed the data symbols from 10 to 8 symbols
and compared them with the outputs of Fig. 4. It shows that by varying the data
symbols there is an effect seen on our output bits and error vector magnitude values.
The output bits of this case are compared with the output bits of Case 2. The EVM
values are low if we consider data symbols value to be low and goes on the increase
by increasing the symbol rate. For increasing the output bits, the data symbols are
set to be at the high value shown in Fig. 6. By comparing both the 64-QAM and 256-
QAM for eight users, we analyzed that the RMS EVM value of both the modulations
is somewhat the same. The EVM value is very high for two data streams in 64-QAM.
Case 4 As seen in Figs. 7, 8, 9, and 10, noise figures from 4 to 6 and then from
6 to 8 are varied accordingly, and there is a prominent difference seen in the RMS
values, yet the bit remains constant in it. There is again a direct relation seen between
the noise figure and the EVM values. It goes on decreasing by decreasing the noise
figure levels.
Case 5 Fig. 11 shows by varying the propagation channel model from scattering
to MIMO channel model we saw that RMS EVM value is high using scatter channel,
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band 1017
Fig. 6 Comparison of values of 64-QAM and 256-QAM schemes by changing its data symbols at
eight users
Fig. 7 Comparison of EVM values at different noise levels using the 2-QAM scheme for four users
whereas while using MIMO channel, it can be lower down. There is no effect on the
bit error rate. It remains the same using both the propagation channels.
Figures 12 and 13 show the 3D radiation antenna pattern using MIMO and scat-
tering channel model at 256-QAM modulation schemes. More lobes are formed
with the MIMO model as it works on multiple antennas. If we compare it with the
scattering channel model as seen in Fig. 10, the lobes are formed in fewer numbers
compared with the scattering design. The lobe that is at the right side of the diagram
signifies the data streams of the users. The pointer shows that hybrid beamforming
is achieved, and data streams for every user are divided. It is quite clear from the
diagram that the signal radiation beam pattern is growing sharper as the antennas at
base stations are increased which results in increasing the throughput of the signal
to be efficient.
1018 K. Bhagat and A. Suri
Fig. 8 Comparison of EVM values at different noise levels using the 16-QAM scheme for four
users
Fig. 9 Comparison of EVM values at different noise levels using the 64-QAM scheme for four
users
The constellation diagrams are also shown in Figs. 14, 15, 16, and 17 by consid-
ering eight users. It reveals the point tracing blocks of every data stream for higher-
order modulation schemes in the working model. The ray-tracing blocks explain to
us that the streams retrieved are high for those users with fewer data streams. Position
of the blocks represents that those blocks where points are adjusted so closely have
a high rate of retrieved streams for users with multiple data streams and the points
which are positioned with more space have less rate of retrieved streams for users
with single data streams. More recovered data streams for users with multiple data
streams result in less SNR ratio, and less retrieved data streams for single-stream
users result in high SNR.
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band 1019
Fig. 10 Comparison of EVM values at different noise levels using the 256-QAM scheme for four
users
Fig. 11 Comparison using scattering and MIMO channel model of 256-QAM scheme
We have analyzed the hybrid beamforming with the ray-tracing method in which each
user can use multi-data streams. The spectral efficiency is improved significantly with
the high number of data streams. It is observed that for the users who are having a high
number of data streams, RMS EVM values are low, and for those users with single
data streams, the RMS EVM value is comparatively high. The possibility of errors is
reduced; also, we can compare the bit errors with actual bits transmitted with the bits
that are received at decoder per user. The number of antennas required is decreased
only if the users are transmitting its information or signal data by using multiple data
streams. If the users are transmitting their data by using a single data stream, then
the requirement of the antenna is also increased in this case which results in more
system complexity. For more throughputs and less bit error rate, multi-data streams
1020 K. Bhagat and A. Suri
are more advantageous to the users for every higher-order modulation scheme as
seen from the diagrams. Studies at different parameters can be analyzed in the future
through this experiment. The environment can be changed from indoor to outdoor,
and then, analysis can be done by comparing different environmental conditions.
Additionally, we will do a comparison with the ray-tracing results with different
channel models, and analysis should be done precisely and accurately. Complexity
is reduced to the minimum level of RF chains in the uplink conversions. The MU-
MIMO hybrid beamforming is to be designed with the aim of deduction of RMS
EVM values users with single data streams.
References
1. Larsson EG, Edfors O, Tufvesson F, Marzetta TL, Alcatel-Lucent (2014) Massive Mimo for
next generation wireless systems, USA
2. Gupta A, Jha RK (2015) A survey of 5G network: architecture and emerging technologies
3. Ahmed I, Khammari H, Shahid A, Musa A, Kim KS, De Poorter E, Moerman I (2018) A survey
on hybrid beamforming techniques in 5G: architecture and system model perspectives
4. Lizarraga EM, Maggio GN, Dowhuszko AA (2019) Hybrid beamforming algorithm using
reinforcement learning for millimeter-wave wireless systems
5. Zou Y, Rave W, Fettweis G (2015) Anlaogbeamsteering for flexible hybrid beamforming design
in mm-wave communications
6. Palacios J, Gonzalez-Prelcic N, Mosquera C, Shimizu T, Wang C-H (2021) Hybrid beam-
forming design for massive MIMO LEO satellite communication
7. Choi J, Lee G, Evans BL (2019) Two-Stage analog combining in hybrid beamforming systems
with low-resolution ADCs
8. Lee JH, Kim MJ, Ko YC (2017) Based hybrid beamforming design in MIMO interference
channel
9. Hefnawi M (2019) Hybrid beamforming for millimeter-wave heterogeneous networks
10. Ratnam VV, Molisch AF, Bursalioglu OY, Papadopoulos HC (2018) Hybrid beamforming with
selection for multi-user massive MIMO systems
11. Chiang H-L, Rave W, Kadur T, Fettweis G (2018) Hybrid beamforming based on implicit
channel state information for millimeter-wave links
12. Yoo J, Sung W, Kim I-K (2021) 2D-OPC Subarray Structure for Efficient Hybrid Beamforming
over Sparse mmWave Channels
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band 1023
13. Zhang D, Wang Y, Xiang W (2017) Leakage-based hybrid beamforming design for downlink
multiuser mmWave MIMO systems
14. Chahrour H, Rajan S, Dansereau R, Balaj B (2018) Hybrid beamforming for interference
mitigation in MIMO radar, IEEE
15. Aldubaikhy K, Wu W, Shen X (2018) HBF-PDVG: Hybrid Beamforming and User Selection
for UL MU-MIMO mmWave Systems
16. Satyanarayana K, Ivanescu T, El-Hajjar M, Kuo P-H, Mourad A, Hanzo L (2018) Hybrid
beamforming design for dual-polarised millimeter wave MIMO systems
17. Mishra D, Johansson H (2020) Optimal channel estimation for hybrid energy beamforming
under phase shifter impairments
18. Vlachos E, Thompson J, Kaushik A, Masouros C (2020) Radio-frequency chain selection
for energy and spectral efficiency maximization in hybrid beamforming under hardware
imperfections
19. Sohrabi F, Student Member, IEEE, Yu W (2017) Hybrid analog and digital beamforming for
mmWave OFDM large-scale antenna arrays
20. Hybrid-beamforming design for 5G wireless communications by ELE Times Bureau published
on December 12, 2016
21. Dama YAS, Abd-Alhameed RA, Salazar-Quiñonez F, Zhou D, Jones SMR, Gao S (2011)
MIMO indoor propagation prediction using 3D shoot-and-bounce ray (SBR) tracing technique
for 2.4 GHz and 5 GHz
22. Alkhateeb A (2019) DeepMIMO: a generic deep learning dataset for millimeter-wave and
massive MIMO applications
23. Dilli R (2021) Performance analysis of multi-user massive MIMO hybrid beamforming systems
at millimeter-wave frequency bands
24. Jiang X, Kaltenberger F (2017) Channel reciprocity calibration in TDD hybrid beamforming
massive MIMO systems
25. A Alkhateeb G Leus R Heath 2015 Limited feedback hybrid precoding for multi-user
millimeter-wave systems IEEE Trans Wireless Commun 14 11 6481 6494
26. A Liu V Lau 2014 Phase only RF precoding for massive MIMO systems with limited RF chains
IEEE Trans Signal Process 62 17 4505 4515
27. E Bjornson J Hoydis M Kountouris M Debbah 2014 Massive MIMO systems with non-ideal
hardware: energy efficiency, estimation, ¨ and capacity limits IEEE Trans Inf Theory 60 11
7112 7139
28. Eisenbeis J, Pfaff J, Karg C, Kowalewski J, Li Y, Pauli M, Zwick T (2020) Beam pattern
optimization method for subarray-based hybrid beamforming systems
29. Alkhateeb A, El Ayach O, Leus G, Heath RW (2014) Channel estimation and hybrid precoding
for millimeter wave cellular systems. IEEE J Selected Topics Signal Process 8(5):831–846
30. open example (‘phasedcomm./MassiveMIMOHybridBeamformingExample’)
31. Y Zhu Q Zhang T Yang 2018 Low-complexity hybrid precoding with dynamic beam assignment
in mmwave OFDM systems IEEE Trans Vehicular Technol 67 4 3685 3689
32. Foged LJ, Scialacqua L, Saccardi F, Gross N, Scannavini A (2017) Over the air calibration of
massive MIMO TDD arrays for 5G applications. In: 2017 IEEE international symposium on
antennas and propagation & USNC/URSI national radio science meeting, pp. 1423–1424, San
Diego, CA, USA, 2017
33. Gonzalez J (2021) Hybrid beamforming strategies for secure multicell multiuser mmWave
MIMO communication
34. Eisenbeis J, Tingulstad M, Kern N et al (2020)MIMO communication measurements in small
cell scenarios at 28 GHz. IEEE Trans Antennas Propag. Smith TF, Waterman MS (1981)
Identification of common molecular subsequences. J Mol Biol 147:195–197
Multi-Class Detection of Skin Disease:
Detection Using HOG and CNN Hybrid
Feature Extraction
K. Babna (B)
Electronics and Communication Engineering, KMCT College of Engineering, Kozhikode, Kerala,
India
A. T. Nair
KMCT College of Engineering, Kozhikode, Kerala, India
K. S. Haritha
College of Engineering, Kannur, Kerala, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1025
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_74
1026 K. Babna et al.
1 Introduction
The skin, which acts as the body’s outer layer, is the biggest organ in the human
body. The skin is made up of up to seven layers of ectodermal tissues that serve
as a protective covering over the underlying muscles, bones, ligaments and internal
organs. The skin protects the body from harmful substances and viruses, aids in
temperature control and gives feelings of cold, heat and touch. A skin lesion is defined
as a patch of skin that is abnormal in contrast to the surrounding skin. Infections
inside or on the skin are the basic and main cause of skin lesions. Skin lesions
may be categorised as primary (present at birth or developed over time) or secondary
(resulting from poor treatment of the original skin lesion), both of which can progress
to skin cancer. Since a consequence, manual skin cancer diagnosis is not optimal, as
the skin lesion is assessed with the naked eye, resulting in mistreatment and ultimately
death. Accurate detection of skin cancer at an early stage may significantly increase
survival chances. As a consequence, automated detection becomes more reliable,
increasing accuracy and efficiency.
In the proposed method, three types of skin lesions are included in the dataset. The
training sets are passed through four major steps in the methods are pre-processing,
segmentation, feature extraction and classification. Here, we propose a novel method
of feature extraction stage includes HOG, and GLCM features along with ResNet-18
transfer learning for a better output in the classification process.
2 Literature Review
Dermoscopy methods are being developed in order to produce a clear skin lesion
site, which improves the visual impact by removing reflections. Automatic skin lesion
identification, on the other hand, is difficult owing to artefacts, poor contrast, skin
colour, hairs [1] and the visual similarities between melanoma and non-melanoma
[2]. All of this may be reduced to a minimum by using pre-processing processes.
The exact location of the skin lesion is determined by segmenting the pre-processed
skin lesion picture. The wavelet algorithm, basic global thresholding, region-based
segmentation, the watershed algorithm, the snakes approach, the Otsu method, active
contours and geodesic active contours are some of the segmentation methods avail-
able. Geodesic active contours [3] are used to segment the data. There are a variety
of methods for extracting characteristics from a segmented skin lesion image [4, 5],
including the cash rule, ABCD rule and ABCDE rule, as well as the GLCM rule, the
HOG rule, the LBP rule and the HLIFS rule. The ABCD rule is a scoring method
that collects asymmetry, colour, border and diameter information [6]; the authors
describe how to take the total dermoscopic score and identify melanoma and non-
melanoma using wavelet analysis. Using hog, it is possible to extract the form and
edge of a skin lesion [7]. In this study, the recovered feature is passed straight to an
SVM classifier, which yields an accuracy of 97.32%. The classifier is the last step in
Multi-Class Detection of Skin Disease … 1027
the process of identifying skin lesions and is responsible for categorising them. This
method consists of two parts: teaching and testing. Unknown patterns are fed into
the system, and the information acquired during the training process is utilised to
categorise the unknown patterns. There are many different kinds of classifiers, such
as SVM, KNN, Naive Bayes, and neural networks, amongst others. Author Khan [8]
applied features to the SVM, KNN, and Naive Bayes classifiers and achieved accu-
racy rates of 96%, 84%, and 76%, respectively, for the three classifiers. In their article
[9], Victor, Akila, and M. Ghalib describe pre-processing as the first and most signifi-
cant step of image processing, which helps in the elimination of noise. Pre-processing
is the first and most essential stage of image processing, according to the authors.
The output of the median filter is supplied as an input to the histogram equalisation
phase of the pre-processing stage, and the input of the histogram equalised picture
is provided as an input to the segm stage after that. The use of segmentation aids in
the identification of the desired area. Area, mean, variance and standard deviation
calculations for feature extraction are now carried out using the extracted output from
the segmentation phase, and the output is fed into classifiers such as support vector
machine (SVM), k-nearest neighbour (KNN), decision tree (dt) and boosted tree (bt).
The categorisations are compared one to another. Kasmi and colleagues showed that
glcm extracts textural characteristics [10], and that the extracted feature may then
be passed straight to a neural network, resulting in a success rate of 95.83%. Skin
lesion segmentation is the essential step for most classification approaches. Codella
et al. proposed a hybrid approach, integrating convolutional neural network (CNN),
sparse coding and support vector machines (SVMs) to detect melanoma [11]. Yu et al.
applied a very deep residual network to distinguish melanoma from non-melanoma
lesions [12]. Schaefer used an automatic border detection approach [13] to segment
the lesion area and then assembled the extracted features, i.e. shape, texture and
colour, for melanoma recognition. Moataz et al. practised upon a genetic algorithm
with an artificial neural network technique for early detection of the skin cancers
and obtained a sensitivity of 91.67% and a specificity of 91.43%. [14]. Kamasak
et al. classified dermoscopic images by extracting the Fourier identifiers of the lesion
edges after dividing the dermoscopic images. They obtained an accuracy of 83.33%
in diagnosing of the melanoma [15] (Table 1).
3 Proposed Methodology
Table 1 (continued)
Sl No Author (citation) Methodology Features Challenges
12 Yu et al. Very deep residual Automated melanoma Accuracy low and
networks recognition in only two
dermoscopy images classifications
13 Schaefer et al. An ensemble Ensemble Accuracy is 93.83%
classification classification
approach
14 Moataz et al. Artificial Image classification Sensitivity 91.67%
intelligence using ANN and AI and the specificity
techniques 91.43%
15 Kamasak et al. ANN, SVM, KNN Classification with Comparison of
and decision tree different machine different classifiers
learning methods
3.1.1 Dataset
It is intended that the initial phase of this project will include the gathering of data
from the International Skin Imaging Collaboration’s databases of images of skin
lesions (ISIC). There are three types of cancer represented in this experiment: actinic
1030 K. Babna et al.
keratosis, basal cell carcinoma and melanoma. Photographs of skin lesions were
taken using data from the ISIC 2017 dataset. Images in JPEG format are utilised. It
was decided to divide the skin lesion pictures into three groups. There was 69 actinic
keratosis, 80 basal cell carcinoma and 60 melanoma images for training and testing.
Figure 2a, b, c shows actinic keratosis, basal cell carcinoma and melanoma,
respectively.
3.2 Pre-Processing
3.3 Segmentation
The third step involves the segmentation of the images that have been pre-processed.
The technique of segmentation is used to pinpoint the exact site of a skin lesion.
Geodesic active contours were used in this study to segment the dataset for segmen-
tation (GAC). In general, GAC identifies the most significant changes in the overall
Multi-Class Detection of Skin Disease … 1031
skin lesion, which are usually seen near the lesion’s borders. The Otsu thresholding
technique is used to binarize pre-processed skin images, and the binarized image is
then applied using the GAC technique.
The extraction of characteristics from the segmented skin lesion is the subject of the
fourth step. In order to acquire accurate information about the skin lesion, the feature
extraction method was utilised to gather information on the lesion’s border [16],
colour, diameter, symmetry and textural nature. The identification of skin cancer is a
straightforward process. Three distinct feature extraction methods were employed:
GLCM, HOG and CNN. GLCM was the most often used methodology.
HOG is used to extract information about the shape and edges of objects. It is neces-
sary to utilise the orientation histogram in order to assess the intensity of a lesion’s
edges. When it comes to this goal, there are two basic components to consider: the
cell and the block.
3.5 Classification
There are a plethora of models available for distinguishing between malignant and
non-cancerous skin lesions. The SVM, KNN, Naive Bayes and neural networks
Multi-Class Detection of Skin Disease … 1033
algorithms are the most frequently used machine learning methods for lesion classi-
fication. Specifically, a multi-SVM classifier is used in this study, with the obtained
features being instantly sent on to the classifier.
A framework for training and testing that is based on SVMs. The support vector
machine method, which makes use of these element vectors, builds and trains our
proposed structural model (colour and texture). In the database, each cancer image’s
colour and texture attributes are recorded, and these qualities will be used in the
following phase of categorisation.
This suggested structure based on SVM will categorise cancer pictures in the
light of the component vectors colour and texture. Multiple distance metrics are used
to measure feature similarity between one picture and other photographs in order to
successfully categorise one image with other photos. This was done by comparing the
characteristics of the query image with the features of the database images, which
was accomplished using SVM classifiers in this instance. Based on these values,
the SVM classifier will decide which class the input picture belongs to. The SVM
classifier will compute the feature values of the input image and the database images;
the SVM classifier will determine which class the input image belongs to.
4 Experimental Results
Proposed method is applied to skin lesion images collected from skin lesion images.
When applied to ISIC skin lesion pictures, the suggested approach yields excellent
results. There are 69 pictures of actinic keratosis in the datasets, 80 images of basal
cell carcinoma and 60 images of melanoma in the databases. Classes are taught to
classifiers by utilising a number of different training and testing sets. As specified in
the method, three different feature extractors are used in the analysis. In addition to
GLCM [19] and HOG, CNN is used for feature extraction in this application. Many
stages of the process, including as training, pre-processing, segmentation, feature
extraction and classification, may be automated using the algorithms that have been
proposed. Multi-SVM classifier is used for the classification.
We created five push buttons for the easy finding of different stages of the process.
For each step, the relevant findings are shown. Figure 5 shows the selected image
and the processed stage of the image. The processed images then entering to the
noise removal stage. Hair removed image of selected image is shown in Fig. 6.
Then, the image undergoes segmentation, and the segmented images are shown in
Fig. 7. We need a confusion matrix in order to get a thorough grasp of our suggested
models, which is necessary due to the problem of class imbalance. This allows us
to identify areas in which our models may be inaccurate, and the confusion matrix
is used to assess the performance of the architecture. A comparison of the accuracy
and precision of feature extraction using the proposed approach is shown in Table
2 [20, 21]. The accuracy of CNN, HOG and GLCM may be increased to 95.2% by
combining them. The statistical result shown in Table 3 shows also the comparison
and the better sensitivity and specificity of the classifier.
1034 K. Babna et al.
Specificity (SP) and sensitivity (S) of classifier models are used to evaluate their
performance (SE). They are defined as follows:
TN
Specificity =
TN + FT
TP
Sensitivity =
TP + FN
where
TP correctly classified positive class (True positive).
TN correctly classified negative class (True negative).
FP incorrectly classified positive class (False positive).
FN incorrectly classified negative class (False negative).
By using hybrid feature extraction of HOG, GLCM along with the convolution
neural network features, the proposed method became more accurate. Classifier got
high sensitivity and specificity compared with other methods. Here, use multi-SVM
classifier for classifier so we can add more skin disease classes and works like a skin
specialist who can identify any of the skin disease in the future. Further investigations
on deeper convolution network for classification may increase the accuracy.
6 Conclusion
Skin lesions were classified using hybrid feature extraction in this proposed study,
which is described in detail below. The suggested technique is utilised to Kaggle
images of skin lesions taken with a digital camera. Images of three distinct kinds
of skin diseases, including melanoma, are included inside the files. In addition to
GLCM and HOG, CNN is used for feature extraction in this application. The GAC
method was used to segment the skin lesion, which was suggested as a solution.
It has been possible to achieve segmentation with a JA of 0.9 and a DI of 0.82 in
1036 K. Babna et al.
this study. It is possible to extract CNN features by utilising the ResNet-18 transfer
learning technique, whilst texture features may be retrieved by using the GLCM
and HOG methods. In this instance, we use a multi-SVM classifier to allow for the
inclusion of additional skin disease classes in the future, as well as to serve as a skin
expert capable of detecting any skin condition in the future. The suggested technique
was tested on a variety of datasets, including pictures of lesions on the skin. The
multi-SVM classifier categorises the pictures into three different categories of skin
diseases with 95.2% accuracy and 924.8% precision, according to the manufacturer.
As a result, we may be able to add more skin ailment classifications in the future and
act as a skin expert who is capable of detecting any skin condition. In the light of the
information gathered, we can infer that accuracy is enhanced after the implementation
of augmentation performance. Also, possible is the use of this technique on a neural
network platform to enhance accuracy.
References
1. Jaisakthi SM, Mirunalini P, Aravindan C (2018) Automated skin lesion segmentation of dermo-
scopic images using GrabCut and k-means algorithms. IET Comput Vis 12(8):1088–1095
2. Chung DH, Sapiro G (2000) Segmenting skin lesions with partial-differential- equations-based
image processing algorithms. IEEE Trans Med Imaging 19(7):763–767
3. Hemalatha RJ, Thamizhvani TR, Dhivya AJ, Joseph JE, Babu B, Chandrasekaran R (2018)
Active contour based segmentation techniques for medical image analysis. Med Biolog Image
Anal 4:17
4. Salih SH, Al-Raheym S (2018) Comparison of skin lesion image between segmentation
algorithms. J Theor Appl Inf Technol 96(18)
5. Li Y, Shen L (2018) Skin lesion analysis towards melanoma detection using deep learning
network. Sensors 18(2):556
6. Kasmi R, Mokrani K (2016) Classification of malignant melanoma and benign skin lesions:
implementation of automatic ABCD rule. IET Image Proc 10(6):448–455
7. Bakheet S (2017) An SVM framework for malignant melanoma detection based on optimized
hog features. Computation 5(1):4
8. Khan MQ, Hussain A, Rehman SU, Khan U, Maqsood M, Mehmood K, Khan MA (2019)
Classification of melanoma and nevus in digital images for diagnosis of skin cancer. IEEE
Access 7:90132–90144
9. Victor A, Ghalib M (2017) Automatic detection and classification of skin cancer. Int J Intell
Eng Syst 10(3):444–451
10. Goel R, Singh S (2015) Skin cancer detection using glcm matrix analysis and back propagation
neural network classifier. Int J Comput Appl 112(9)
11. Kawahara, J.; Hamarneh, G. Fully convolutional networks to detect clinical dermoscopic
features. arXiv 2017, arXiv:1703.04559.
12. Jerant AF, Johnson JT, Sheridan CD, Caffrey TJ (2000) Early detection and treatment of skin
cancer. Am Fam Phys 62:381–382
13. Binder M, Schwarz M, Winkler, A, Steiner A, Kaider A, Wolff K, Pehamberger H (1995)
Epiluminescence microscopy. A useful tool for the diagnosis of pigmented skin lesions for
formally trained dermatologists. Arch Dermatol 131:286–291
14. Celebi ME, Wen Q, Iyatomi H, Shimizu K, Zhou H, Schaefer G (2015) A state-of-the-art survey
on lesion border detection in dermoscopy images. In: Dermoscopy image analysis. CRC Press,
Boca Raton, FL, USA
Multi-Class Detection of Skin Disease … 1037
15. Erkol B, Moss RH, Stanley RJ, Stoecker WV, Hvatum E (2005) Automatic lesion boundary
detection in dermoscopy images using gradient vector flow snakes. Skin Res Technol 11:17–26
16. Celebi ME, Aslandogan YA, Stoecker WV, Iyatomi H, Oka H, Chen X (2007) Unsupervised
border detection in dermoscopy images. Skin Res Technol 13
17. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep
convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005
(29 pages) World Scientific Publishing Company. https://doi.org/10.1142/S02195194215
00056.
18. Nair AT, Muthuvel K (2020) Blood vessel segmentation and diabetic retinopathy recognition:
an intelligent approach. Comput Methods Biomech Biomed Eng Imaging Vis. Taylor & Francis.
https://doi.org/10.1080/21681163.2019.1647459
19. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diag-
nosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030 (29pages). World Scientic
Publishing Company. https://doi.org/10.1142/S0219467820500308
20. Nair AT, Muthuvel K (2021) Effectual evaluation on diabetic retinopathy Lecture notes in
networks and systems, vol 191. Springer, Singapore. https://doi.org/10.1007/978-981-16-0739-
4_53
21. Nair AT, Muthuvel K (2021) Blood vessel segmentation for diabetic retinopathy. J Phys Conf
Ser 1921012001
DeepFake Creation and Detection Using
LSTM, ResNext
Dhruti Patel, Juhie Motiani, Anjali Patel, and Mohammed Husain Bohara
Abstract Technology was created as a means to make our lives easier. There is
nothing more fast-paced than the advancements in the field of technology. Decades
ago, virtual assistants were only a far-fetched imagination; now, these fantasies have
become a reality. Machines have started to recognize speech and predict stock prices.
Witnessing self-driving cars in the near future will be an anticipated wonderment.
The underlying technology behind all these products is machine learning. Machine
learning is ingrained in our lives in ways we cannot fathom. It may have many good
sides but it is misused for personal and base motives. For example, various forged
videos, images, and other content termed as DeepFakes are getting viral in a matter of
seconds. Such videos and images can now be created with the usage of deep learning
technology, which is a subset of machine learning. This article discusses the mecha-
nism behind the creation and detection of DeepFakes. DeepFakes is a term generated
from deep learning and fake. As the name suggests, it is the creation of fabricated
and fake content, distributed in the form of videos and images. Deep learning is one
of the burgeoning fields which has helped us to solve many intricate problems. It
has been applied to fields like computer vision, natural processing language, and
human-level control. However, in recent years, deep learning-based software has
accelerated the creation of DeepFake videos and images without leaving any traces
of falsification which can engender threats to privacy, democracy, and national secu-
rity. The motivation behind this research article was to spread awareness among the
digitally influenced youth of the twenty-first century about the amount of fabricated
content that is circulated on the internet. This research article presents one algorithm
used to create DeepFake videos and, more significantly, the detection of DeepFake
videos by recapitulating the results of proposed methods. In addition, we also have
discussed the positive aspects of DeepFake creation and detection, where they can
be used and prove to be beneficial without causing any harm.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1039
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_75
1040 D. Patel et al.
1 Introduction
Fake images and fake videos formed by DeepFake methods have become a great
public concern. The term “DeepFake” means to swap the face of one person by
the face of another. The first DeepFake video was generated by a Reddit user in
2017 to morph the celebrity photos faces in pornography by using machine learning
algorithms. Furthermore, some other harmful uses of DeepFake are fake news and
financial fraud. Due to these factors, research traditionally devoted to general media
forensics is being revitalized and is now dedicating growing efforts to detecting facial
manipulation in images and videos [1].
The enhancing intricacy of cell phones as well as the development of social
networks have resulted in an enormous increase in brand-new digital object contents
in recent times. This extensive use of electronic images has resulted in a rise in
strategies for changing image web content [2]. Up until recently, such techniques
stayed out of range for the majority of customers because they were lengthy as
well as tedious, and they necessitated a high level of computer vision domain name
proficiency. Those constraints have continuously discolored away, thanks to recent
growths in maker learning and accessibility to vast quantities of training data. Conse-
quently, the time required to produce and control electronic web content has lowered
significantly, allowing unskilled individuals to modify the content at their leisure.
Deep generative versions, particularly, have lately been commonly used to create
fabricated photos that appear all-natural. These models are based upon deep neural
networks, which can estimate the real-data distribution of a provided training dataset.
Consequently, variants might be added to the found-out circulation by testing from it.
Two of the most frequently made use of and also effective techniques are Variational
Autoencoders (VAE) and also Generative Adversarial Networks (GAN). Particu-
larly, GAN techniques have lately been pushing the limits of cutting-edge outcomes,
boosting the resolution and top quality of pictures generated. Therefore, deep gener-
ative designs are ushering in a new period of AI-based fake image generation, paving
the way for the quick dissemination of top-quality tampered photo web content [2].
Face manipulation is broken down into four categories:
(i) Entire face synthesis
(ii) Identity swap (Deep Fakes)
(iii) Attribute manipulation, and
(iv) Expression swap [1].
Illustrations of these face manipulation categories are given below in Fig. 1.
One of the mechanisms which can manipulate or change digital content is “Deep-
Fake.” DeepFake is a word that is derived from “Deep Learning” and “Fake.” It
is a mechanism through which one can morph or change an image over a video
DeepFake Creation and Detection Using LSTM, ResNext 1041
Fig. 1 Examples of facial manipulation groups of real and fake images [1, 3, 4].
and thereby creating a new fabricated video that may appear to be real. The under-
lying mechanism for the whole of DeepFake development are the autoencoders and
the Generative Adversarial Networks (GAN), which are deep learning models. Their
usage is concentrated in the computer vision field. These models are used to analyze a
person’s facial expressions and movements and synthesize facial images of someone
with similar expressions and movements. So, through the DeepFake mechanism, we
can create a video of a person saying or doing things that the other person is doing
just by using an image of the target person and a video of the source person.
2 Methods
In the following section, we describe our approach toward DeepFake creation and
DeepFake detection algorithms.
1042 D. Patel et al.
The popularity of DeepFakes can be attributed to the creative users who target
celebrities and politicians to generate fake and humorous content. DeepFakes have
burgeoned over the past 3–4 years due to the quality of tempered videos and also
the easy-to-use capability of its application to a broad range of users from profes-
sional to amateur. These applications evolved on deep learning techniques. One
such application is called Faceswap, captioned as “the leading free and open-source
multi-platform Deepfakes software.” Deep autoencoders constitute the blueprint of
this application. The idea behind using autoencoders is dimensionality reduction
and image compression because deep learning is well known for extracting the
higher-level features from the raw input.
A brief introduction about the techniques used is given below:
1. CNN: Convolutional Neural Network (CNN or ConvNet) is a category of deep
neural networks which are primarily used to do image recognition, image
classification, object detection, etc.
Image classification is the challenge of taking an input image and outputting a
category or a possibility of instructions that best describes the image. In CNN, we
take an image as an input, assign significance to its numerous aspects/functions
in the image and have the ability to distinguish one from another. The prepro-
cessing required in CNN is a lot lesser compared to different classification
algorithms [5, 6].
2. RNN: RNN is short for Recurrent Neural Network. RNN is used to remember
the past and selections made by the RNN are influenced by the past. Only one
additional input vector is provided to the RNN to produce single or multiple
output vectors. These outputs are not only governed by the weights that are
applied on the input but also by a “hidden” state vector. This hidden state vector
represents the context supporting previous input(s)/output(s) [7].
3. GAN: GANs stand for Generative Adversarial Networks. As the name implies,
GANs are largely used for generative purposes. They generate new and fake
outputs based on a particular input. GANs comprise of two sub models, which
are the generator model and the discriminator model. The difference between
the two is that the generator model, as the name suggests, is trained to generate
or create new examples, whereas the discriminator model is more of a binary
classification model that tries to identify the generated output as real or fake.
Interestingly, the discriminator model is trained till it believes half the times that
the generator model has produced a plausible output.
DeepFake creation uses two autoencoders, one trained on the face of the target
and the other on the source. Once the autoencoders are trained, their outputs are
switched, and then something interesting happens. A DeepFake is created!
The autoencoder separates the inert properties from the face picture and the
decoder is utilized to reproduce the face pictures. Two encoder-decoder sets are
needed to trade faces between source pictures and target pictures where each pair is
DeepFake Creation and Detection Using LSTM, ResNext 1043
utilized to prepare a picture set, and the encoder’s boundaries are divided among two
organization sets. This strategy assists normal encoders with finding and learning
likeness between two arrangements of face pictures since faces by and large have
comparative credits like eyes, nose, and so forth. We can say that the encoder provides
data in a lower dimension, thus performing dimensionality reduction. The job of the
decoder is to reconstruct the face again from the compressed and extracted latent
features. Figure 2 shows the DeepFake creation measure.
One may notice that the diagram shown in Figure 2 uses the same encoder but
two different decoders. Since latent features are common to all faces, the job of an
encoder remains uniform for all inputs. However, in order to generate a morphed
picture, one needs to use the decoder of the driving image on the source image.
Creating DeepFake and spreading it over the social media platforms and swapping
the faces of celebrities and politicians to bodies in porn images or videos can be
threatening to one’s privacy. Sometimes DeepFake threatens the world’s security
with videos of world leaders with fake speeches and falsification purposes and even
used to generate fake satellite images. Therefore, it can be menacing to privacy,
democracy, and national security. This raises concern to detect DeepFake videos
from genuine ones.
A brief introduction about the techniques used for detection of deepfakes is given
below:
1044 D. Patel et al.
3 Experiments
In this segment, we present the devices and exploratory arrangement we used to plan
and foster the model to implement the model. We will introduce the outcomes gained
from the execution of the DeepFake detection model and give an understanding of
the exploratory outcomes [11].
3.1 Dataset
The detection model has been trained with three pairs of datasets. The variety of
datasets allow the model to train from a diverse dataset and create a more generic
model.
The description of the datasets has been listed below:
1. FaceForensics ++: This dataset largely consists of manipulated datasets. It
has 1000 original videos tampered with four automated face manipulation
techniques, which are, DeepFakes, Face2Face, FaceSwap, and NeuralTextures
[3].
This dataset itself has been derived from 977 distinct YouTube videos. These
videos primarily contain frontal face occlusions which allow the automated
tampering methods to regenerate realistic forgeries. This data can be used for
both image and video classification.
2. Celeb-DF: This dataset is a large-scale dataset for DeepFake Forensics. It stands
apart from other datasets for having DeepFake synthesized videos having similar
visual quality at par with those circulated online [4].
It contains 590 original videos collected from YouTube. This dataset has
been created carefully keeping in mind to maintain the diversity of the dataset,
thus, it contains subjects of different ages, ethnic groups, and genders. It also
contains 5639 corresponding DeepFake videos.
3. DeepFake Detection Challenge: [12] This dataset is provided by Kaggle. This
data contains files in the ‘.mp4’ format, which is split and compressed into sets
of 10 GB apiece. The files have been labeled REAL or FAKE and accordingly
the model is trained.
The prepared dataset used to train the model include 50% of the real videos and
50% the manipulated DeepFake videos. The dataset is split into a 70–30 ratio, i.e.,
70% for training and 30% for testing.
First, the dataset is split in a 70–30 ratio. In the preprocessing phase, the videos in the
dataset are split into frames. After that face is detected, the detected face is cropped
1046 D. Patel et al.
from the frame. Frames with no detected faces are ignored in preprocessing. The
model includes ResNext CNN, followed by one LSTM layer and the preprocessed
data of cropped faces videos are split into train and test dataset. ResNext is used to
accurately extract and detect the frame-level features. LSTM is used for sequence
processing so that temporal analysis can be done on the frames. And, then the video
is passed to the trained model for prediction whether the video is fake or real [10].
3.3 Evaluation
4 Results
This section depicts the working of our model. The DeepFake creation has been
depicted in Fig. 4. An image and a driver video are passed to the model to create a
resultant DeepFake. The generated DeepFake is of low resolution because when the
input is of high resolution, an extremely accurate GAN is required to generate fake
videos which are hard to detect. The poor resolution of the DeepFake makes it easily
identifiable by the naked eye, however, advancements in DeepFake technology are
making it increasingly difficult to identify DeepFakes even with the help of detection
algorithms.
DeepFake detection results have been depicted in Figs. 5 and 6. With the help of
LSTM and ResNext, we were able to build a model that detects fabricated videos
based on the inherent inconsistencies between the frames. These results were derived
by passing test data to a pre-trained model. The model was trained on a dataset of
videos containing low-resolution videos divided into 20 frames per video.
DeepFake Creation and Detection Using LSTM, ResNext 1047
Fig. 5 Output of DeepFake detection showing that the provide video is REAL along with the
confidence of prediction
1048 D. Patel et al.
Fig. 6 Output of DeepFake detection showing that the provided video is FAKE along with the
confidence of prediction
5 Challenges
Although the performance and quality of the creation of DeepFake videos and espe-
cially in the detection of DeepFake videos have greatly increased [13], the challenges
affecting the ongoing detection methods are discussed below.
The following are some challenges in DeepFake detection:
(1) Quality of DeepFake Datasets: To develop DeepFake detection methods, we
require the availability of copious datasets. However, the available datasets
have certain impairments such as there lies a significant difference in the visual
quality to the actual fake videos circulated on the internet. These imperfections
in the dataset can either be some color discrepancy, some parts of the original
face are still visible, low-quality synthesized faces, or certain inconsistencies
in the face orientations [13].
DeepFake Creation and Detection Using LSTM, ResNext 1049
6 Current Systems
In this part, we give insights regarding the current system that can be utilized to
generate DeepFake videos.
Currently, applications such as FakeApp, Zao, and DeepFaceLab are used to
create DeepFake videos and images. The first DeepFake application to appear on the
internet was called DeepFaceLab. This application is very useful for understanding
the step-by-step process of a DeepFake creation. DeepFaceLab allows users to swap
faces, replace entire faces, age of people, and change lip movements. So, one could
easily morph an image or video and create a phony one. Zao is a Chinese application
that allows users to create DeepFake videos, but it is observed that Zao cannot create
natural images of Indian faces because it is mainly trained with Chinese facial data.
It can be clearly known whether it is real or fake using the Zao app for Indian faces.
Faceswap is another DeepFake application that is free and open source. It is supported
by Tensorflow, Keras, and Python. The active online forum of Faceswap allows
interested individuals to get useful insights on the process of creation of DeepFakes.
The forum accepts questions and also provides tutorials on how to create DeepFake
[14].
1050 D. Patel et al.
7 Discussion
In this paper, we have delineated and assessed the mechanism of face manipulation
(DeepFake). We have explained the methods for creating the fake identity swap
video and also how to detect such videos. We were able to create a low-resolution
DeepFake video as the accessible frequency spectrum is much smaller. Although,
we can create a DeepFake video and also detect one. DeepFake is a technology that
has many negative aspects and if not applied wisely may cause a threat to society
and turn out to be dangerous. Since most online users believe stuff on the internet
without verifying them, such DeepFakes can create rumors.
Looking at the positive aspects, the concept of DeepFakes can be applied to create
DeepFake videos/images which can be used in a creative way, like one who is not able
to speak or communicate properly can swap their face with the video of a good orator
and hence can create their video. It can also be used in film industries for updating
the episodes without reshooting them. Face manipulated videos can be created for
entertainment purposes unless not creating any threat to society or someone’s privacy.
The DeepFake detection method can be applied in the courtrooms to check
whether the evidence provided in digital form is real or fake. It could be very bene-
ficial for such scenarios. Every coin has two sides, and thus, technology has its pros
and cons so if used wisely can be a boon for society.
Acknowledgements Every work that one accomplishes relies on constant motivation, benevolence,
and moral support of people around us. Therefore, we want to avail this opportunity to show our
appreciation to a number of people who extended their precious time, support, and assistance in the
completion of this research article. This research article has given us a wide opportunity to think
and expand our knowledge about new and emerging technologies. Through this research article,
we were able to explore more about the current research and related experiments. Therefore, we
would like to show our gratitude to our mentors for their guidelines throughout the process and for
encouraging us to look forward to learning and implementing new emerging technologies.
DeepFake Creation and Detection Using LSTM, ResNext 1051
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1053
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_76
1054 K. S. Kalaivani et al.
1 Introduction
Food and oxygen become a necessary thing for the living organisms in the world. The
countries similar to India do agriculture as their important work, proper automation
of the farming process will help to maximize crop yield while also ensuring long-
term productivity and sustainability [1, 2]. The yielding of crop in agriculture is
challenging due to weed invasion on farm lands. In general, weeds are unwanted
plants in farm land. It has no valuable benefits like nutrition, food, and medication.
The growth of weed is faster when compared to crops and hence deplete the growth
of crops. Weeds take the nutrient and the space which is required for crops to grow.
To obtain better productivity, it is necessary to remove the weeds from farming
land at early stage of growth. The manual removal of weeds is not so easy and
efficient process. For precision agriculture decision-making system is employed to
save resources, control weeds, and to minimize the cost. Robots are involved for
removing the weeds from field. It is necessary to accurately detect a weed from
field through vision machines [3–6]. In this work, the dataset is taken from Kaggle
platform which consist of 12 species of plants. The dataset contains totally 5545
images. The basic CNN is widely used to classify the plant species. To improve
the classification accuracy, VGG-19 and ResNet-101 architecture is used. VGG-19
architecture has nineteen layer deep network, and Resnet-101 has 101 layers. The
proposed architecture helps to enhance the vision mission to classify plant species
accurately when compared to existing work.
Ashqar [7] has implemented CNN architecture for classifying plant seedlings.
The implemented algorithms are used extensively in this task to recognize images.
On a held-out test range, the implemented model finds eighty percent correctly,
demonstrating the feasibility of this method.
NKemelu [8] compared the performances of two traditional algorithms and a
CNN. From the obtained result, it is found that when comparing CNN with traditional
algorithms, the basic CNN architecture obtain more accuracy.
Elnemr [9] developed CNN architecture to differentiate plant seedling images
between crop and weed at early growth stage. Due to the combination of normaliza-
tion layer, pooling layer, and the filter used, performance has been increased in this
system. With help of elaborated CNN, this work achieved higher precision. In this
work, complexity is reduced, and also, CNN helps the system to achieve accurate
classification. The segmentation phase is involved in order to classify plant species.
This work can be combined with IoT for controlling the growth of weeds by spraying
herbicides. This system achieved accuracy of 90%.
Alimboyong [10] proposed deep learning approaches like CNN and RNN archi-
tectures. The dataset used for classification contains 4234 images belonging to 12
plants species taken from Aarhus University Signal Processing group. This system
achieves low consumption of memory and high processing capability. Performance
metrics like sensitivity, specificity, and accuracy are considered for evaluation. This
system involves three phases. First, the data are augmented and then compared with
the existing one. Second one is a combination of RNN and CNN using various other
Classification of Plant Seedling Using Deep Learning Techniques 1055
plant seedling dataset. Finally, a mobile application for plant seedling images is
created using a developed model. This work produced an accuracy of 90%.
Dyrmann [11] worked on deep learning techniques for classification of crop and
weed species from different dataset. The dataset contains 10,413 images where it has
22 different crop and weed species. These images are taken from six different dataset
and combined. The proposed convolution neural network recognizes plant species in
color images. The model achieves approximately eighty six percent in classification
between species.
Rahman [12] developed deep learning techniques to identify plant seedlings at
early stage. The quality, quantity, f1-score, and accuracy were measured for the
proposed architecture. By using this calculation, comparison is made with previous
implemented architecture. From this work, ResNet-50 performs well when compared
to previous model. It produces accuracy of 88% than previous work.
Haijian [13] proposed CNN variants like VGG-19 is used for classification of pest
in vegetables. Fully connected layers have been optimized by VGG-19. The analysis
shows that VGG-19 performs better than existing work. The accuracy obtained is
97%.
Sun [14] has designed twenty-six layer deep learning model with eight residual
building blocks. The prediction is done at natural environment. The implemented
model predicts 91% accurately.
The dataset is taken from Kaggle platform. The total number of images present in
this dataset is 5550. Training dataset contains 4750 images, and test dataset contains
790 images. This dataset contains 12 species of plants (Table 1).
1056 K. S. Kalaivani et al.
CNN is widely used to classify plant seedlings in order to distinguish among crop
and weed species at an early stage of development. The CNN has three layers such as
input layer, hidden layers, and output layer. Before passing images to input layer, the
images are equally resized. There are five stages of learning layers in hidden layer.
Each convolutional layer at each stage uses the filters with kernel sizes of 3 × 3 and
a number of filters such as 32, 64, 128, 256, and 1024, respectively (Fig. 1).
2.2 VGG-19
VGG-19 is a 19-layer network with 16 convolution layers along with pooling layers,
two fully connected layers, and a dense layer (Fig. 2).
It uses 16 convolution layers with different filters. The number of filters is 64,
128, 256, 512 used in different convolution layers. Each layer 1 and layer 2 has
two convolution layers with 64 filters and 128 filters, respectively. Layer 3 has four
convolution layers with 256 filters. Each layer 4 and 5 has four convolution layers
with 512 filters. Layer 5 has three convolution layers with filter 512. The input of
VGG-19 is a fixed size of 224 × 224 × 3. The input size of the filters is given as 3 ×
3. In fully connected layers, the first two fully connected layers have 4096 channels
each activated by ReLU activation function, and the third fully connected layer has
1000 channels which acts as an output layer with softmax activation.
2.3 ResNet-101
2.4 Optimizer
Optimizer is a method or an algorithm that can be used to reduce the loss by changing
the attributes of neural networks such as weights and learning rates. It is used to solve
the optimization problems by reducing the function. The following are the different
types of optimizers used.
AdaGrad. It is an optimization method that adjusts the learning rate to the parameters.
It updates low learning rates for parameters related with usually occurring features
and high learning rates for parameters related with unusually occurring features. It
takes default learning rate value of 0.01 and ignores the need to tune the learning
rate manually.
1058 K. S. Kalaivani et al.
Stochastic Gradient Descent (SGD). In each iteration, the model parameter gets
updated. It means the loss function is tested, and model is updated after each training
sample. The advantage of this technique is that it requires low memory.
Root-Mean-Square Propagation (RMS Prop). It balances the step size by
normalize the gradient itself. It uses adaptive learning rate, which means the learning
rate changes overtime.
Adam. It is a replacement optimization method for SGD to train deep learning
models. It combines the properties of AdaGrad and RMSprop to provide optimiza-
tion when handle with large and noisy problems. It is efficient because the default
parameters perform well in most of the problems.
To improve the accuracy, CNN variants like VGG-19 and ResNet-101 are used in this
work. The accuracy obtained for VGG-19 and ResNet-101 is 87% and 94%, respec-
tively. From the results obtained, it is found that ResNet-101 model outperforms
VGG-19 and basic CNN for classifying plant species (Fig. 4).
The above graph shows the accuracy comparison of VGG-19 and ResNet-101
model with different Epochs (Fig. 5).
The above graph shows the accuracy with different batch sizes for ResNet-101.
When batch size is increased, the accuracy also gets increased (Fig. 6).
The above graph shows the accuracy by varying learning rates like 0.1, 0.01,
0.001, and 0.0001. The higher accuracy obtained for 0.0001 learning rate.
4 Conclusion
The main aim of this project is classify the plant species in order to remove the
weeds in the farmland. Removing weeds help the plants to get enough nutrients and
water which in turn makes the plant grow healthier. This increases the productivity
and gives good yield to the farmers. In this paper, we proposed VGG-19 model and
1060 K. S. Kalaivani et al.
References
1. Chaki J, Parekh R, Bhattacharya S (2018) Plant leaf classification using multiple descriptors:
a hierarchical approach. J King Saud Univ—Comput Inf Sci 1–15
2. Prakash RM (2017) Detection of leaf diseases and classification using digital ımage processing
3. Kamilaris A, Prenafeta-boldú FX (2018) Deep learning in agriculture: a survey 147(July
2017):70–90, 2018
4. Mohanty SP, Hughes D, Salathé M (2016) Using deep learning for ımage-based plant disease
detection
5. Grinblat GL, Uzal LC, Larese MG, Granitto PM (2016) Deep learning for plant identification
using vein morphological patterns. Comput Electron Agric 127:418–424
6. Lecun Y, Bengio Y, Hinton G (2015) Deep learning
7. Ashqar BAM, Bassem S, Abu Nasser, AbuNaser SS (2019) Plant seedlings classification using
deep learning
8. Nkemelu DK, Omeiza D, Lubalo N (2018) Deep convolutional neural network for plant seedling
classification. arXiv preprint arXiv:1811.08404
9. Elnemr, HA (2019) Convolutional neural network architecture for plant seedling classification.
Int J Adv Comput Sci Appl 10
10. Alimboyong CR, Hernandez AA (2019) An improved deep neural for classification of
plant seedling images. 2019 IEEE 15th international colloquium on signal processing & its
applications (CSPA). IEEE
11. Dyrmann M, Karstoft H, Midtiby HS (2016) Plant species classification using deep convolu-
tional neural network. Biosyst Eng 151(2016):72–80
12. Rahman NR, Hasan MAM, Shin J (2020) Performance comparison of different convolutional
neural network architectures for plant seedling classification. 2020 2nd International conference
on advanced information and communication technology (ICAICT), Dhaka, Bangladesh, 2020,
pp 146150. https://doi.org/10.1109/ICAICT51780.2020.93333468
13. Xia D et al (2018) Insect detection and classification based on an improved convolutional neural
network. Sensors 18(12):4169
14. Sun Y et al (2017) Deep learning for plant identification in natural environment. Comput Intell
Neurosci 2017
A Robust Authentication
and Authorization System Powered
by Deep Learning and Incorporating
Hand Signals
Abstract Hand gesture recognition signals have several uses. Communication for
visually challenged people, such as the elderly or the handicapped, health care, auto-
mobile user interfaces, security, and surveillance are just a few of the possible appli-
cations. A deep learning-based edge computing system is designed and implemented
in this article, and it is capable of authenticating users without the need of a physical
token or physical contact. To authenticate, the sign language digits are represented by
hand gestures. Deep learning is used to categorize digit hand motions in a language
based on signs. The suggested deep learning model’s characteristics and bottleneck
module are based on deep residual networks. The collection of sign language digits
accessible online shows that the model achieves a classification accuracy of 97.20%,
which is excellent. Model B+ of the Raspberry Pi 3 is used as an edge computing
device, and the model is loaded on it. Edge computing is implemented in two phases.
First, the gadget collects and stores initial camera data in a buffer. The model then
calculates the digit using the first photograph in the buffer as input and an inference
rate of 280 ms.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1061
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_77
1062 S. Palarimath et al.
1 Introduction
available now that perform well on image classification datasets using convolu-
tional neural networks. On the ILSVRC-2012 dataset [6], the sharpness-aware mini-
mization technique on the CNN model obtains 86.73% top-1 accuracy, whereas the
EnAET model gets 1.99% top-1 accuracy CIFAR-10 dataset. All of the preceding
models are large and need a lot of memory and model inference time. This has proven
problematic, especially when inference must occur on a computer device at the
network edge or in the cloud (central processing system). Complex models requiring
more significant computing resources have been used to attain cutting-edge perfor-
mance [7]. We present a memory-efficient CNN model for use in edge computing
systems. The suggested memory-efficient CNN model is paralleled to the existing
memory-efficient technology that is cutting-edge CNN model, MobileNetV2.
2 Literature Review
In the last decade, many papers with regard to processing hand gestures were
published and have become an interesting topic for researchers, where some of these
studies have considered a range of different applications. However, the hand gesture
interaction systems depend on recognition rate, which is affected by some factors,
including the type of camera used and its resolution, the technique utilized for hand
segmentation, and the recognition algorithm used. This section summarizes some
key papers with respect to hand gestures.
In [8], the author has discussed the recognizing hand gesture for the slandered
Indonesian sign language, using Myo armband as hand gesture sensors. Authors in [9]
have reviewed the various hand gesture techniques and merits and demerits. Authors
in [1] used the Kinet V2 depth sensor for identifying hand gestures and suggested
three different scenarios to get effective outcomes. The authors in [10] used the
inertial measurement unit (IMU) sensors for human–machine interface (HMI) appli-
cations using hand gesture recognition (HGR) algorithm. Authors in [11] discussed
hands-free presentations using hand gesture recognition; in this paper, authors have
discussed the design and wearable armband to perform this hands-free presentation.
Finally, the authors in [12] addressed the development and deployment of an end-to-
end deep learning-based edge computing system from conception to completion for
gesture recognition authentication from start to finish.
The technique to rectify the rotational orientation of the MYO bracelet device’s
sensor that has been described by the authors in [13] was addressed in detail. In order
to identify the highest possible energy channel for a given samples from the gesture’s
timing synchronization set WaveOut, the method is used. Researchers say it can
improve the recognition and classification of hand gestures. Authors in [14] used hand
segmentation masks combined with RGB frames to identify real-time hand gesture
recognition. Furthermore, the authors in [15] discussed the handy feature of hand
gesture recognition: utilizing hand gestures in times of emergencies. Recognition
of hand gestures was achieved via the use of vector machine-based classification as
well as deep learning-based classification techniques.
1064 S. Palarimath et al.
3 Proposed Model
3.1 Dataset
We used the Sign Language Digits Dataset [16] to train the proposed CNN
and MobileNetV2 systems. The dataset has 2500 samples. There are ten courses
numbered 0–9. Figure 1 illustrates the four classes. Each sample is [150 × 150]
pixels. Table 1 lists the dataset’s statistics, including the total number of samples
in each grouping. The dataset is separated into three parts: training, validation, and
testing. Because there are ten courses, the test data is split evenly among them. The
test set included 650 samples, or 25.20% of the whole dataset, with 63 instances
Fig. 1 Hand signals and the decoded numbers. (Source dataset [16])
Table 1 Number of samples from dataset [16] for training, testing, and validation
Class No. of samples No. of training No. of validation No. of testing
0 250 149 38 63
1 250 149 38 63
2 250 149 38 63
3 250 149 38 63
4 250 149 38 63
5 250 149 38 63
6 250 149 38 63
7 250 149 38 63
8 250 149 38 63
9 250 149 38 63
Total 2500 1490 380 630
A Robust Authentication and Authorization System Powered … 1065
from each class chosen at random. It is divided into two training and validation sets,
with samples comprising 59.60 and 15.20% of the total dataset, respectively.
Each sample in the collection has an image size of 150 × 150 pixels. These
images were upscaled to 256 × 256 pixels using bicubic interpolation [17]. Due to
the real-time testing restrictions stated in Sect. 3.4, we upscale the images to 256 ×
256 pixels. The scaled photograph samples are then utilized as input pictures for the
deep learning system. Section 3.2 details the proposed authentication mechanism.
This section addresses deep learning-based CNNs for hand gesture identification
using MobileNetV2.
The MobileNetV2 architecture is the state-of-the-art CNN that outperforms other
models in applications like object recognition [18]. The network has efficient depth-
separable convolutional layers. The network’s premise was to encode intermediate
input and output across these bottlenecks efficiently.
We train the MobileNetV2 model gradually using two transfer learning methods:
feature extraction and fine-tuning. We first train the model using feature extraction
and then refine it using fine-tuning. These are briefly mentioned below:
1066 S. Palarimath et al.
When the MobileNetV2 model is trained on the ImageNet dataset, which is utilized in
this method, the model is ready to use. The ‘classification layer’ of the MobileNetV2
model is not included since the ImageNet dataset has more classes than the ‘Sign
Language Digit Dataset’ we utilize. The basic model uses 53 pre-trained layers of
the MobileNetV2 model. The learnt features are a four-dimensional tensor of size
[None, 8, 8, 1280]. A global average pooling 2D function [19] flattens the basic model
output into a 2-dimensional matrix of size [None, 1280]. Then, we add a dense layer,
which is the dataset’s categorization layer. Using the feature extraction approach,
we do not train the base model (MobileNetV2, except the final layer), but rather
utilize it in order to extract characteristics from the input sample and feed them into
the dense layer (the additional classification layer according to the ‘Sign Language
Digit Dataset’). This technique uses the ‘RMSprop optimizer’ (the training of neural
networks using gradient-based optimization) [20].
3.3.2 Fine-Tuning
The MobileNetV2 basic model (minus its classification layer) is pre-trained using
ImageNet dataset, and an additional dense layer fine-tunes the model (classification
layer according to our dataset). On the ‘Sign Language Digit Dataset’, we train
53 layers, including the last dense layer. Compared to the previous approach, this
method has more trainable parameters. ‘RMSprop optimizer’ was used to train the
MobileNetV2 model [20].
The Raspberry Pi 3 Model B + microprocessor is used for the job. The Raspberry Pi
3 Model B + is the newest single-board edge computing device. It has more faster
CPU and better connection than the Raspberry Pi 3 Model B. We used a Raspberry Pi
Camera V2 module to collect pictures in real time. The trained model predicts hand
movements on the Raspberry Pi 3. Using the proposed and MobileNetV2 models in
their original form will create prediction delay. Therefore, these models’ TensorFlow
Lite (TFL) versions are created and deployed to address the latency problem.
During real-time testing, the camera’s pictures represent the system’s input. Before
sending these pictures to the deep learning model, they are shrunk (downscaled) to
256 × 256 pixels. Images with a resolution less than 256 × 256 were warped for
real-time prediction. Figure 3 depicts the complete authentication process. Figure 3
shows the counter variable ‘i’, which keeps track of the number of iterations in the
system, and loop ‘n’ times, where n is the authentication code length. It follows two
fundamental stages within the loop. The system initially reads the live camera feed
from the Pi Camera and saves it in the frame buffer. The input picture (the first image
A Robust Authentication and Authorization System Powered … 1067
Fig. 3 Diagram of the process flow for the production of authentication codes (note: where n = 5,
the length of the code is denoted by the letter ‘n’)
frame in the buffer) is then scaled to 256 × 256 pixels, and the anticipated digit class
is displayed on the screen. After 2 s, the human changes the digit sign and the frame
buffer is cleared. After that, the cycle is performed a second time. This sign digit
changeover pause time may be customized to meet specific application requirements.
Following the conclusion of the loop, the authentication code is shown. It is printed
for the purpose of verification.
1068 S. Palarimath et al.
4 Results
Figure 4 depicts a few instances of projected labels produced by the proposed model
based on the test data and a representation of real-time predictions of the proposed
mechanism, because the model is accurate in predicting the authentication code under
both uniform and non-uniform illumination circumstances.
4.1 Discussion
The findings show that authentication using hand gestures and deep learning on a
Raspberry Pi is possible. The created technology could generate an authentication
PIN without touching the keyboard. Because ATMs have enclosures on both sides,
the digit signals (hand motions) are hidden from view and the entire authentication is
low cost. On the other hand, as demonstrated in Fig. 5, code creation is independent
of illumination conditions and achieves excellent accuracy in both. The Raspberry
Pi 3 Model B+, used in this study as an edge computing device, may offer different
outputs to input the security code.
Fig. 4 Predictions about the planned CNN model in identifying the samples taken from the dataset
[16]
Fig. 5 A total of five numbers are predicted in real time under a variety of natural lighting condition
situations (illumination situations that are both uniform and non-uniform), with the accompanying
sign images being taken, and in each prediction, an edge computing device is used to process data.
This picture depicts the final code that was produced as a result of the suggested CNN algorithm
being implemented
A Robust Authentication and Authorization System Powered … 1069
4.2 Limitations
The number of samples available for each class (0–9) in the hand gesture sign
recognition dataset is restricted in the dataset being used. However, the proposed
deep learning model’s performance is auspicious. Furthermore, the dataset may be
further improved in future by including new classifications. On the other hand, the
model’s performance may be somewhat reduced due to the motion artefacts that
occur during picture capture. In addition, the performance may be adversely affected
by the camera’s field of vision restricted and the practical placement of the hands in
front of the camera.
5 Conclusions
This study designed a comprehensive system that uses sign language digit hand
gestures to authenticate users in public and commercial locations, including ATMs,
information desks, railways, and shopping malls. A convolutional neural network
(CNN) was utilized in this study to generate an authentication code using camera
input, making it genuinely contactless. The whole deep learning model inference
was made on a Raspberry Pi 3 Model B+ CPU with a connected camera, making it
ideal for large-scale deployment. The suggested CNN obtained 97.20% accuracy on
the test dataset. The proposed system operates in real time with a model inference
rate of 280 ms per picture frame and may replace traditional touchpad and keypad
authentication techniques. Furthermore, it is possible to expand the dataset in future
to include classifications such as ‘accept’, ‘close’, ‘home’, ‘ok’, and ‘go back’ to
minimize further the requirement for surface interaction in these secure systems. As
previously mentioned, deep learning techniques may be used to a variety of appli-
cations, including water quality calculation, medical illness prediction, and instruc-
tional computing [20–25]. Future studies in these fields have been planned by the
authors.
References
1. Oudah M, Al- A, Chahl J (2021) Elderly care based on hand gestures using kinect sensor.
Computers 10:1–25
2. Zhou R, Zhong D, Han J (2013) Fingerprint identification using SIFT-based minutia descriptors
and improved all descriptor-pair matching. Sensors (Basel). 13(3):3142–3156. https://doi.org/
10.3390/s130303142
3. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al
(2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377. https://
doi.org/10.1016/j.patcog.2017.10.013
1070 S. Palarimath et al.
4. Zulfiqar M, Syed F, Khan M, Khurshid K (2019) Deep face recognition for biometric authen-
tication. In: International conference on electrical, communication, and computer engineering
(ICECCE), Swat, Pakistan, pp 24–25. https://doi.org/10.1109/ICECCE47252.2019.8940725
5. Aizat K, Mohamed O, Orken M, Ainur A, Zhumazhanov B (2020) Identification and
authentication of user voice using DNN features and i-vector. Cogent Eng 7:1751557
6. Foret P, Kleiner A, Mobahi H, Neyshabur B (2020) Sharpness-aware minimization for
efficiently improving generalization. arXiv arXiv:cs.LG/2010.01412
7. Deng BL, Li G, Han S, Shi L, Xie Y (2020) Model compression and hardware acceleration for
neural networks: a comprehensive survey. Proc IEEE 108:485–532
8. Anwar A, Basuki A, Sigit R (2020) Hand gesture recognition for Indonesian sign language
interpreter system with myo armband using support vector machine. Klik—Kumpul J Ilmu
Komput 7:164
9. Oudah M, Al-Naji A, Chahl J (2020) Hand gesture recognition based on computer vision: a
review of techniques. J Imaging 6
10. Kim M, Cho J, Lee S, Jung Y (2019) Imu sensor-based hand gesture recognition for human-
machine interfaces. Sensors (Switzerland) 19:1–13
11. Goh JEE, Goh MLI, Estrada JS, Lindog NC, Tabulog JCM, Talavera NEC (2017) Presentation-
aid armband with IMU, EMG sensor and bluetooth for free-hand writing and hand gesture
recognition. Int J Comput Sci Res 1:65–77
12. Dayal A, Paluru N, Cenkeramaddi LR, Soumya J, Yalavarthy PK (2021) Design and implemen-
tation of deep learning based contactless authentication system using hand gestures. Electron
10:1–15
13. López LIB et al (2020) An energy-based method for orientation correction of EMG bracelet
sensors in hand gesture recognition systems. Sensors (Switzerland) 20:1–34
14. Benitez-Garcia G et al (2021) Improving real-time hand gesture recognition with semantic
segmentation. Sensors (Switzerland) 21:1–16
15. Adithya V, Rajesh R (2020) Hand gestures for emergency situations: a video dataset based on
words from Indian sign language. Data Brief 31:106016
16. Mavi A (2020) A new dataset and proposed convolutional neural network architecture for
classification of American sign language digits. arXiv:2011.08927 [cs.CV] https://github.com/
ardamavi/Sign-Language-Digits-Dataset
17. Dengwen Z (2010) An edge-directed bicubic interpolation algorithm. In: 3rd International
Congress on image and signal processing, pp 1186–1189. https://doi.org/10.1109/CISP.2010.
5647190
18. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: inverted residuals
and linear bottlenecks. In: IEEE/CVF conference on computer vision and pattern recognition,
Salt Lake City, UT, USA, 18–22 June 2018, pp 4510–4520
19. Lin M. Chen Q, Yan S (2013) Network in network. arxiv arXiv:cs:CV/1312.4400
20. Blessing NRW, Benedict S (2017) Computing principles in formulating water quality
informatics and indexing methods: an ample review. J Comput Theor Nanosci 14(4):1671–1681
21. Sangeetha SB, Blessing NRW, Yuvaraj N, Sneha JA (2020) Improved training pattern in back
propagation neural networks using holt-winters’ seasonal method and gradient boosting model.
Appl Mach Learn. ISBN 978-981-15-3356-3, Springer, pp 189–198
22. Blessing NRW, Benedict S (2014) Extensive survey on software tools and systems destined
for water quality. Int J Appl Eng Res 9(22):12991–13008
23. Blessing NRW, Benedict S (2016) Aquascopev 1: a water quality analysis software for
computing water data using aquascope quality indexing (AQI) scheme. Asian J Inf Technol
15(16):2897–2907
A Robust Authentication and Authorization System Powered … 1071
24. Haidar SW, Blessing NRW, Singh SP, Johri P, Subitha GS (2018) EEapp: an effectual appli-
cation for mobile based student centered learning system. In: The 4th international conference
on computing, communication & automation (ICCCA 2018), December 14–15, India. IEEE,
pp. 1–4
25. Pyingkodi M, Blessing NRW, Shanthi S, Mahalakshmi R, Gowthami M (2020) Performance
evaluation of machine learning algorithm for lung cancer. In: International conference on artifi-
cial intelligence & smart computing (ICAISC-2020), Bannari Amman Institute of Technology,
Erode, India, Springer, October 15–17, 92
Author Index
I
Indira, D. N. V. S. L. S., 519
Indrani, B., 139 M
Indumathi, P., 905 Mahender, C. Namrata, 921
Isaac Samson, S., 265 Mandara, S., 309
Islam, Saiful, 345 Mane, Sunil, 103
Iswarya, M., 373 Manjunathachari, K., 643
Manohar, N., 309
Meghana, B., 529
J Miloud Ar-Reyouchi, El, 671
Jacob, T. Prem, 905 Mohammed, S. Jassem, 33
Jain, Amruta, 103 Mohanraj, V., 943
Jain, Ayur, 385 Mollah, Ayatullah Faruk, 129
Jain, Ranjan Bala, 193 Monika, 709
Jayalakshmi, V., 17 Motiani, Juhie, 1039
Jayashree, H. N., 801
Jeyakumar, M. K., 777
Jeyanthi, D. V., 139
N
Joshi, Brijendra Kumar, 399
Joshi, Namra, 885 Nair, Arun T., 173, 239, 359, 585, 1025
Nallavan, G., 855
Namboothiri, Kesavan, 173, 359
K Namritha, M., 425
Kalaivani, K. S., 425, 1053 Nandhini, R., 1053
Kaliraj, S., 207 Narayan, T. Ashish, 1
Kallimani, Jagadish S., 683 Naveen, B., 837
Kalpana, B. Khandale, 921 Naveen Kumar, S., 265
Kamakshi, P., 281 Nayak, Amit, 697
KanimozhiSelvi, C. S., 425, 1053 Nikhil, Chalasani, 331
Kanisha, B., 207 Nithya, N., 855
Kapse, Avinash S., 895 Nivedhashri, S., 1053
Karthik, N., 541 Niveetha, S. K., 425
Author Index 1075