recent-studies-on-computational-intelligence-doctoral-symposium-on-computational-intelligence-dosci-2020-1st-ed-9789811584688-9789811584695

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 130

Studies in Computational Intelligence 921

Ashish Khanna
Awadhesh Kumar Singh
Abhishek Swaroop Editors

Recent Studies
on Computational
Intelligence
Doctoral Symposium on Computational
Intelligence (DoSCI 2020)
Studies in Computational Intelligence

Volume 921

Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
The books of this series are submitted to indexing to Web of Science,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092


Ashish Khanna Awadhesh Kumar Singh
• •

Abhishek Swaroop
Editors

Recent Studies
on Computational
Intelligence
Doctoral Symposium on Computational
Intelligence (DoSCI 2020)

123
Editors
Ashish Khanna Awadhesh Kumar Singh
Department of Computer Science Department of Computer Engineering
and Engineering NIT Kurukshetra
Maharaja Agrasen Institute of Technology Kurukshetra, India
New Delhi, India

Abhishek Swaroop
Department of Computer Science
Engineering
Bhagwan Parushram Institute of Technology
New Delhi, India

ISSN 1860-949X ISSN 1860-9503 (electronic)


Studies in Computational Intelligence
ISBN 978-981-15-8468-8 ISBN 978-981-15-8469-5 (eBook)
https://doi.org/10.1007/978-981-15-8469-5

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

We hereby are delighted to announce that Shaheed Sukhdev College of Business


Studies, New Delhi, in association with National Institute of Technology Patna and
the University of Valladolid, Spain, has hosted the eagerly awaited and
much-coveted Recent Studies on Computational Intelligence: Doctoral Symposium
on Computational Intelligence (DoSCI 2020). The first version of the symposium
was able to attract a diverse range of engineering practitioners, academicians,
scholars and industry delegates, with the reception of abstracts including more than
143 authors from different parts of the world. The committee of professionals
dedicated towards the symposium is striving to achieve high-quality technical
chapters with tracks on computational intelligence. The track chosen in the sym-
posium is very famous among present-day research community. Therefore, a lot of
research is happening in the above-mentioned research field and their related
sub-fields. The symposium has targeted out-of-box ideas, methodologies, applica-
tions, expositions, surveys and presentations helping to upgrade the current status of
research. More than 40 full-length papers have been received, among which the
contributions are focused on theoretical, computer simulation-based research and
laboratory-scale experiments. Among these manuscripts, nine papers have been
included in the Springer Books after a thorough two-stage review and editing
process. All the manuscripts submitted were peer-reviewed by at least two inde-
pendent reviewers, who were provided with a detailed review proforma. The
comments from the reviewers were communicated to the authors, who incorporated
the suggestions in their revised manuscripts. The recommendations from two
reviewers were taken into consideration while selecting a manuscript for inclusion
in the proceedings. The exhaustiveness of the review process is evident, given the
large number of articles received addressing a wide range of research areas. The
stringent review process ensured that each published manuscript met the rigorous
academic and scientific standards. It is an exalting experience to finally see these
elite contributions materialize into Recent Studies on Computational Intelligence:
Doctoral Symposium on Computational Intelligence (DoSCI 2020) by Springer.

v
vi Preface

DoSCI 2020 invited six keynote speakers, who are eminent researchers in the
field of computer science and engineering, from different parts of the world. In
addition to the plenary sessions on each day of the conference, 15 concurrent
technical sessions are held every day to assure the oral presentation of around nine
accepted papers. Keynote speakers and session chair(s) for the session are leading
researchers from the thematic area of the session.
DoSCI 2020 of such magnitude and release proceedings by Springer has been
the remarkable outcome of the untiring efforts of the entire organizing team. The
success of an event undoubtedly involves the painstaking efforts of several con-
tributors at different stages, dictated by their devotion and sincerity. Fortunately,
since the beginning of its journey, DoSCI 2020 has received support and contri-
butions from every corner. We thank them all who have wished the best for DoSCI
2020 and contributed by any means towards its success. The edited proceedings
volume by Springer would not have been possible without the perseverance of all
the steering, advisory and technical program committee members.
All the contributing authors owe thanks from the organizers of DoSCI 2020 for
their interest and exceptional articles. We would also like to thank the authors of the
papers for adhering to the time schedule and for incorporating the review com-
ments. We wish to extend my heartfelt acknowledgment to the authors, peer
reviewers, committee members and production staff whose diligent work put shape
to the DoSCI 2020 proceedings. We especially want to thank our dedicated team of
peer reviewers who volunteered for the arduous and tedious step of quality
checking and critique on the submitted manuscripts. The management, faculties,
administrative and support staff of the college have always been extending their
services whenever needed, for which we remain thankful to them.
Lastly, we would like to thank Springer for accepting our proposal for pub-
lishing the DoSCI 2020 proceedings. Help received from Mr. Aninda Bose, the
acquisition senior editor, in the process has been very useful.

New Delhi, India Ashish Khanna


Deepak Gupta
Organizers, ICICC 2020
Contents

Onto-Semantic Indian Tourism Information Retrieval System . . . . . . . . 1


Shilpa S. Laddha and Pradip M. Jawandhiya
An Efficient Link Prediction Model Using Supervised Machine
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Praveen Kumar Bhanodia, Aditya Khamparia, and Babita Pandey
Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big
Data Computing by Deadline-Aware Optimize Resource Allocation . . . . 29
Amitkumar Manekar and G. Pradeepini
A Comprehensive Survey on Passive Video Forgery Detection
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Vinay Kumar, Abhishek Singh, Vineet Kansal, and Manish Gaur
DDOS Detection Using Machine Learning Technique . . . . . . . . . . . . . . 59
Sagar Pande, Aditya Khamparia, Deepak Gupta, and Dang N. H. Thanh
Enhancements in Performance of Reduced Order Modelling
of Large-Scale Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Ankur Gupta and Amit Kumar Manocha
Solution to Unit Commitment Problem: Modified hGADE
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Amritpal Singh and Aditya Khamparia
In Silico Modeling and Screening Studies of PfRAMA Protein:
Implications in Malaria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Supriya Srivastava and Puniti Mathur
IBRP: An Infrastructure-Based Routing Protocol Using Static
Clusters in Urban VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Pavan Kumar Pandey, Vineet Kansal, and Abhishek Swaroop

vii
Editors and Contributors

About the Editors

Ashish Khanna has 16 years of expertise in Teaching, Entrepreneurship and


Research & Development. He received his Ph.D. degree from NIT, Kurukshetra.
He has completed his M.Tech. and B.Tech. GGSIPU, Delhi. He has completed his
postdoc from Internet of Things Lab at Inatel, Brazil, and University of Valladolid,
Spain. He has published and accepted around 45 SCI indexed papers in IEEE
Transaction, Springer, Elsevier, Wiley and several journals with cumulative impact
factor of above 100 to his credit. He has more than 100 research articles in top SCI/
Scopus journals, conferences and edited books. He is co-author and co-editor of
around 20 edited and textbooks. His research interest includes MANET, FANET,
VANET, IoT, machine learning and many more. He is Convener and Organizer of
ICICC conference series. He is currently working in the CSE department of
Maharaja Agrasen Institute of Technology, Delhi, India.

Awadhesh Kumar Singh received his Bachelor of Technology (B.Tech.) degree


in Computer Science from Madan Mohan Malaviya University of Technology,
Gorakhpur, India, in 1988, and his M.Tech. and Ph.D. degrees in Computer Science
from Jadavpur University, Kolkata, India, in 1998 and 2004, respectively. He
joined the Department of Computer Engineering at the National Institute of
Technology, Kurukshetra, India, in 1991, where he is presently a Professor and
Head of the Department of Computer Applications. Earlier, he also served as Head
of Computer Engineering Department during 2007–2009 and 2014–2016. He has
published 150 papers in various journals and conference proceedings. He has
supervised 10 Ph.D. scholars. He has visited countries like Thailand, Italy, Japan,
UK and USA to present his research work. His research interests include cognitive
radio networks, distributed algorithms, fault tolerance and security.

Prof. (Dr.) Abhishek Swaroop completed his B.Tech. (CSE) from GBP
University of Agriculture & Technology, M.Tech. from Punjabi University Patiala

ix
x Editors and Contributors

and Ph.D. from NIT Kurukshetra. He has 28 years of teaching and industrial
experience. He has served in reputed educational institutions such as Jaypee
Institute of Information Technology, Noida, Sharda University Greater Noida and
Galgotias University Greater Noida. He is actively engaged in research. One of his
Ph.D. scholar has completed his Ph.D. from NIT Kurukshetra, and he is currently
supervising 4 Ph.D. students. He has guided 10 M.Tech. dissertations also. He has
authored 3 books and 5 book chapters. His 7 papers are indexed in DBLP and 6
papers are SCI. He had been part of the organizing committee of three IEEE
conferences (ICCCA-2015, ICCCA-2016, ICCCA-2017), one Springer conference
(ICICC-2018) as Technical Program Chair. He is member of various professional
societies like CSI and ACM and editorial board of various reputed journals.

Contributors

Praveen Kumar Bhanodia School of Computer Science and Engineering, Lovely


Professional Univerity, Phagwara, India
Manish Gaur Department of Computer Science and Engineering, Centre for
Advanced Studies, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India
Ankur Gupta Department of Electronics and Communication Engineering,
Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India
Deepak Gupta Maharaja Agrasen Institute of Technology, New Delhi, India
Pradip M. Jawandhiya PL Institute of Technology and Management, Buldana,
India
Vineet Kansal Department of Computer Science and Engineering, Institute of
Engineering and Technology Lucknow, Dr. A.P.J Abdul Kalam Technical
University, Lucknow, India
Aditya Khamparia School of Computer Science Engineering, Lovely
Professional University, Phagwara, Punjab, India
Vinay Kumar Department of Computer Science and Engineering, Centre for
Advanced Studies, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India
Shilpa S. Laddha Government College of Engineering, Aurangabad, India
Amitkumar Manekar CSE Department, KLEF, Green Fields, Vaddeswaram,
Andhra Pradesh, India
Amit Kumar Manocha Department of Electrical Engineering, Maharaja Ranjit
Singh Punjab Technical University, Bathinda, Punjab, India
Editors and Contributors xi

Puniti Mathur Centre for Computational Biology and Bioinformatics, Amity


Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh,
India
Sagar Pande School of Computer Science Engineering, Lovely Professional
University, Phagwara, Punjab, India
Babita Pandey Department of Computer Science and IT, Babasaheb Bhimrao
Ambedkar University, Amethi, India
Pavan Kumar Pandey Dr. A.P.J. Abdul Kalam Technical University, Lucknow,
India
G. Pradeepini CSE Department, KLEF, Green Fields, Vaddeswaram, Andhra
Pradesh, India
Abhishek Singh Department of Computer Science and Engineering, Institute of
Engineering and Technology Lucknow, Dr. A.P.J Abdul Kalam Technical
University, Lucknow, India
Amritpal Singh Department of Computer Science and Engineering, Lovely
Professional University, Phagwara, Punjab, India
Supriya Srivastava Centre for Computational Biology and Bioinformatics, Amity
Institute of Biotechnology, Amity University Uttar Pradesh, Noida, Uttar Pradesh,
India
Abhishek Swaroop Bhagwan Parashuram Institute of Technology, New Delhi,
India
Dang N. H. Thanh Department of Information Technology, School of Business
Information Technology, University of Economics Ho Chi Minh City, Ho Chi
Minh City, Vietnam
Onto-Semantic Indian Tourism
Information Retrieval System

Shilpa S. Laddha and Pradip M. Jawandhiya

Abstract Tourism is the world’s most ideal development segment. There has been
profound development in the measure of the tourism insights on the Web. It is a
disgraceful circumstance that paying little heed to the overburden of data, we for
the most part neglect to find important data. This is because of nonattendance of
semantics distinguishing proof of the client query in getting the necessary outcomes.
Spurred by means of these restrictions, a framework is proposed called “Design and
Implementation of Semantically Enhanced Information Retrieval using Ontology.”
The objective of the paper is to exhibit semantic Indian tourism search framework
to upgrade India’s positioning as worldwide explorer with the goal that India could
utilize its favored characteristic assets and along these lines enhance amount of
visitor appearances and income from the travel industry. The proposed strategy uses
ontology constructed for the tourism of India for the precise retrieval. The framework
is assessed with keyword-based Web search engines to see adequacy of semantic over
commonly used methodologies and figures the performance as far as exactness and
execution time as assessment parameters. The outcome obtained clarifies that there
is fabulous improvement in information retrieval utilizing this methodology.

Keywords Information retrieval · Semantic search engine · Ontology · Tourism

1 Introduction

In this world of technology, life without Web is not imaginable. Web, which associates
billions of people all around the world, is the speediest, least complex and most simple
mechanism of correspondence. Internet is the greatest stockroom of data, and it is

S. S. Laddha (B)
Government College of Engineering, Aurangabad 431001, India
e-mail: kabrageca@gmail.com
P. M. Jawandhiya
PL Institute of Technology and Management, Buldana 443001, India
e-mail: pmjawandhiya@gmail.com

© The Editor(s) (if applicable) and The Author(s), under exclusive license 1
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_1
2 S. S. Laddha and P. M. Jawandhiya

effectively available in easy-to-understand way through gadgets like PCs, portable


tablets, cell phones and lot more. With the lightning progression of World Wide
Web, Web search devices have transformed into the essential gadget of data retrieval
for people to look through the Web data. Typically a client enters few keywords,
searching tool processes the client inquiry utilizing keywords and gives relevant
URLs as the final result.
In this era, tourism is the world’s most ideal development segment [1, 2]. Accord-
ingly, endeavors are required to upgrade India’s positioning as worldwide explorer
with the goal that India could utilize its favored characteristic assets and along these
lines enhance amount of visitor appearances and income from the travel industry
[3, 4]. The objective of the chapter is to structure and execute exceptionally valu-
able information retrieval framework for travel industry of India. This onto-semantic
retrieval framework is structured, executed and inspected on travel industry of India to
study the viability of onto-semantic retrieval above usually utilized Web search tools
like Google, Bing and Yahoo. This takes considerable difficulties in making sense of
the heterogeneous data on World Wide Web and available systems for the efficient
retrieval and integration of data. The keyword-based strategy [3] is commonly used
having various obstructions in data processing. These obstructions can be corrected
by the onto-semantic strategy utilized in this framework. Usually, a traditional search
engine finds out the results syntactically right, but the result set they provide is in large
volume. Presently, these systems are keyword-based data retrieval systems working
on phrase-term matching instead of the semantics of the words or tokens [5]. Need
is to upgrade the phrase-term-based Web search tools by methods for pondering the
real semantics in it. Semantic data retrieval is an area of study and research which will
focus on significance of expressions utilized as a phase of patron question. Ontology
here performs a crucial role to characterize the idea close by with the relationship
of terminologies in region [4]. Considering that the ontology is domain explicit,
ontology is organized on the particular area. As indicated by this contention, queries
in “The travel industry” area deciphered contrastingly in some other domain, for
example, “Education.”
In this chapter, the Web search engine adequacy is raised by onto-semantic simi-
larity measure and algorithmic techniques [6]. Novel theory of “Query Prototype” [7]
is blended with a model using ontology to control the accountability of the retrieval
process. These chapters’ Sects. 3, 4 and 5 are illustrating the challenges, goals and
hypothesis, respectively. Sections 6 and 7 are the heart of the chapter offering the
device architecture and overall performance evaluation and analysis. Section 8 is the
concluding part which outlines scope of future work.
Onto-Semantic Indian Tourism Information Retrieval System 3

2 Literature Review

Since the beginning of written language, people have been creating methods for
rapidly indexing and retrieving information. Information retrieval has a variety of
paradigms. Information retrieval is defined as an act of storing, seeking and fetching
information that matches a user’s demand [8].
Until 1950s, data recovery was commonly a library science. In 1945, Vannevar
Bush introduced his idea without limits where machines would be used to give basic
access to the libraries of the world [9]. In 1950s, the first electronic recovery frame-
works were structured by using punch cards. An absence of PC power confined the
helpfulness of these frameworks. During 70s, PCs started to have enough preparing
capacity to deal with data recovery with close to moment results. With the develop-
ment of the Web, data recovery turned out to be progressively significant and looked
into. Presently, great many people use some kind of current data retrieval framework
consistently like Google or some interestingly made structure for libraries.
The volume of data open on the Web makes it difficult to find pertinent data.
The prerequisite for a suitable method to sort out data turns out to be fundamentally
indispensable. The keyword search is not fitting to locate the pertinent data for a
particular idea. In a regular keyword-based Web search engine, the inquiry terms are
coordinated with the terms in a transformed list comprising of all the archive terms
of a book corpus [10]. Just coordinated records are fetched and showed to the end
client. The examination in [6] has talked about the critical reasons why an absolutely
message-based pursuit neglects to discover a portion of the applicable reports because
of the vagueness of regular language and absence of semantic relations.
Literary data recovery depends on keywords extricated from reports and used
as the building blocks of both document and query representations. Nonetheless,
keywords may have various equivalent words. For example, the expression “train”
regarding the travel industry alludes to “vehicle for transportation” though the equiv-
alent “train” term in education industry signifies “to teach.” The current keyword-
based Web search tools coordinate the term in the inquiry with the terms in the reports
and return every one of the records with this term independent of the semantics.
Therefore, endeavors are required to devise semantic Information retrieval methods
to render significant archives based on importance as opposed to keywords. The major
idea at the premise of semantic data retrieval is that the importance of content relies
upon reasonable connections to objects on the conceptual relationships as opposed
to phonetic relations found in content.
In the zone of the travel industry, Tomai et al. [11] introduced philosophy which
helped basic leadership in trip arranging. They introduced utilization of two separate
ontologies, one for the travel industry data and the other meant for client profiles
[12]. Jakkilinki et al. in the year 2005 have presented an ontology-driven smart
travel industry data framework utilizing the travel industry domain philosophy [13].
Lam et al. in the year 2006 have introduced an ontology-driven operator system for
semantic Web administration “OntiaiJADE” and ontology of upper-level utilizing
auxiliary data from different Web sites that are related and relevant to Hong Kong.
4 S. S. Laddha and P. M. Jawandhiya

Furthermore, they built up an enhanced Intelligent Ontology Agent-driven Tourist


Advisory System called “iJADEFreeWalker” in the year 2007 and presented an
intelligent mobile framework using ontology for vacationer direction in the year
2008 [11]. Heum Park1, Aesun Yoon2 with Hyuk-Chul Kwon have developed an
errand system and undertaking philosophy dependent on explorers assignments, and
a canny vacationer data administration framework utilizing them [14]. Wang et al.
in the year 2008 have devised a keen philosophy, Bayesian system, using semantic
methodology for the travel industry, and advanced ontology-driven suggestion system
for the travel industry that makes use of joining heterogeneous tourism data avail-
able on the Web and prescribes vacation spots to clients dependent on data from
in excess of 40 Chinese Web sites and tourism attractions in Beijing and Shanghai
[15]. Kanellopoulos in the year 2009 has presented a philosophy-based system for
coordinating of voyagers requirements for Group Travel Package Tours (GPT) with
a Web entrance administration for explorers residing in Europe. The basic data
resources in this system are travel organizations, travel establishment related news
and the group visit requirements [16]. Tune et al. in the year 2008 have exhibited an
intelligent agent system using ontology meant for a tour and group package tourist
service [17]. Chiu et al. in the year 2009 displayed a multi-operator data framework
“MAIS” and a Collaborative intelligent Travel Operator System (CTOS) utilizing
semantic Web innovations for successful association of data assets and administra-
tion forms [18]. Barta et al. in the year 2009 acquainted an elective methodology
covering the semantic domain of the travel industry by coordinating modularized
ontologies and created center Domain Ontology for Travel and Tourism (cDOTT)
[19]. They proposed another recommender framework that improves the adequacy of
customary substance-based recommender frameworks by thinking about philosophy.
Explicit proposals for future research bearings incorporate considering logical data,
for example, the climate gauge, the period of the year, the time and so forth. This
exploration attempts to execute these future research bearings and render the data
by actualizing semantic data recovery interface utilizing ontology for Indian tourism
domain.

3 Challenges/Research Gap

Today, existence without Web and that too without Web search tool is not possible.
Looking through the net has become the part of our regular day-to-day existence. This
incorporates the entire thing from peering out an appropriate book to contemporary
improvements in exceptional advancements. Web indexes have quite changed the
manner in which people get right of section to and find data, empowering data about
practically any theme to be effectively and in a matter of seconds available. All
the data recovery strategies are chipping away at keyword coordinating. On the off
chance that keyword matches are on accessible data, at that point just that page will
be returned, in some other case dismissed. These methodologies give similarly more
noteworthy wide assortment of results. We need to explore the pages to get required
Onto-Semantic Indian Tourism Information Retrieval System 5

data. These systems are unable in providing exact answer to given question. As these
techniques do not consider the semantic driven by the inquiry terms, they are doing
whatever it takes not to comprehend what client needs to ask bringing about low
exactness and relevancy rate. The basic issue [20] incorporates:
• Fetching and displaying irrelevant outcomes.
• Large volume of results making hard for the client to locate the relevant
information.
• The user is not aware about the rationale being used to bring the outcomes for the
question making it hard to the client to investigate the outcomes properly.
• Query execution takes time, and precision is low.
The above issues are common for keyword-based search engines. The present Web
is the collection of wide variety of information, and the Web indexes are expected
to give the information as indicated by the client’s inquiry. Additionally, many times
client does not know about the exact term expected to look. Along these lines if
exact query term is not matching, at that point the outcome may not be extremely
precise. Web indexes must not confine themselves to keyword based as it were. The
semantics of the words should likewise be contemplated. The logic ought to be fuzzy.
The framework is required to create data recovery interface that renders exact and
effective query items in relatively less time.

4 Objective

Considering vast unstructured data on the Web, the traditional search engines are
incapable to render relevant, precise and efficient information from the Web. The
primary goal of this research is to enhance the precision and efficiency of informa-
tion retrieval semantically using ontology to satisfy the user query and attain user
satisfaction in search result. This semantic information retrieval is evaluated against
generally used conventional search engines, viz. Google, Yahoo and Bing, and the
improvement is demonstrated in terms of efficiency and precision of search results
of the resulting application. The results show that an information retrieval system
using domain ontology can achieve better result than the keyword-based information
retrieval systems.

5 Hypothesis

This research work is an attempt to address few of the problems mentioned in research
issues. The proposed system is aimed at innovations in the design of enhanced
semantic information retrieval system on the Web for generalized purpose used for
specific domain and implement Web-based interface to accept query from user and
6 S. S. Laddha and P. M. Jawandhiya

provide the result by using ontology-based enhanced semantic retrieval system. This
interface ensures multiple users to remotely access the same application through Web
browser.

6 System Architecture/Methodology

One of the significant difficulties in information retrieval is exact and relevant


data recovery, utilizing domain information and semantics. Semantic data recovery
comprehends the end users’ inquiries in increasingly logical manner, utilizing
ontology corpus which plays significant job to interpret the connection between
terms in the user inquiry. Consequently in this exploration, ontology with novel idea
of “query prototype and query similarity” is created to comprehend user inquiry for
giving exact and relevant data in Indian tourism space.
The calculation derives the semantic relatedness between the terms in the
ontology-based corpus, utilizing query prototype and similarity measure. It is worth
to take note of that, however, the proposed model is tried on Indian tourism domain;
the developed methods are adaptable to be utilized on other explicit areas.
In this technical period, it is difficult situation that regardless of the overburden
of data, we normally fail to find relevant data. This is because of not considering
semantics in the client inquiry in getting the necessary outcomes. So as to beat these
basic challenges, the onto-semantic information recovery framework is structured
as appeared in Fig. 1 Part 1 combined with Part 2. This framework recovers the
important outcomes for the client inquiry semantically.
Now let us discuss the working of the semantically enhanced modules is detail.

6.1 The Basic Query Mapper

At the point when end user gives any query pertaining to Indian tourism like “traveler
place in India,” “places of interest in India,” “India the travel industry,” “investigate
vacationer goal and improvement of India,” “incredible India” and so forth, then the
basic inquiry mapper is invoked and the pertinent outcomes are appeared to the query
seeker client along with meta-data and the time taken for its processing [21].

6.2 The Query Prototype Mapper

Mapping tool for query prototype [7] is a novel idea inferred in this study, utilizing
which one query prototype can deal with various client inquiries. The query models
contain (i) simple tokens, (ii) template tokens, (iii) ontological tokens and (iv)
stopwords.
Onto-Semantic Indian Tourism Information Retrieval System 7

For example, (flight) from [from-city] to [to-city]. Herein, various defined query
prototypes for 17 services are recognized for Indian travel industry domain. This
module will work if client question coordinated precisely with any of the query
prototype defined for different identified services [7].

6.3 The Query Word Order Mapper

If client question does not coordinate precisely with any of the query prototype
defined for different identified services, then there is strong probability that the
sequence of words in client inquiry is not matching with sequence of words in

Fig. 1 System architecture—part 1, part 2


8 S. S. Laddha and P. M. Jawandhiya

Fig. 1 (continued)

defined query prototypes, then to handle such inquiries and to locate the service for
execution, query word order mapper is invoked.

6.4 The Spelling Correction Algorithm

Another possibility is the client mistakenly may enter misspelled state/city name in
the inquiry. To deal with this, valid Indian cities and states’ name list is kept which
this module utilizes to replace the incorrectly spelled term with the nearest matching
term in the stored list, reframe the query and forward it to query prototype mapper
[22].
Onto-Semantic Indian Tourism Information Retrieval System 9

6.5 The Ontological Mapper

There is probability that rather than ontological token utilized in the defining the query
prototypes, client inquiry may use different term. To coordinate this sort of queries,
ontological mapper is utilized which inside makes use of ontology constructed using
WordNet. This accelerates the performance of the framework amazingly by dealing
with practically every query identified with Indian tourism domain. This research set
forwards a sort of semantic data recovery strategy dependent on ontology created
using clustering technique. The clustering algorithm is designed and implemented
which creates the cluster based on the different ontological tokens called as cluster
head defined in the query prototypes. The cluster elements are fetched using Java
WordNet Library (JWNL), the relationship is assigned, and the score is calculated
for each ontological token/cluster head with respect to cluster head. This process
results into creation of ontology stored in memory to shorten the time of retrieval.
The ontology representation for the ontological cluster head “Train” is as appeared
in Fig. 2.
The job of defined ontology is to characterize the relations among the terms
important to Indian tourism domain. At the point when the client enters a query, the
inquiry will be deciphered by related terms characterized in the ontology to improve
the exhibition of the semantic search. The specific tourism service concerning client
query is located semantically and executed.

Fig. 2 Ontology for the ontological token/cluster head “Train”


10 S. S. Laddha and P. M. Jawandhiya

6.6 The State–City Mapper

Another probability is that client may provide just name of city/state. To deal with
such questions, this mapper is utilized to summon the “About service” for that
particular state or city depicted in the performance analysis section of [22].

6.7 The Keyword Mapper

Ordinarily, client enters the inquiry which will not match with any city or state
name as well as with no defined query prototypes. In the event that any of the
previously mentioned mappers/stages cannot deal with the client-mentioned inquiry,
then keyword mapper attempts to coordinate keywords showed up in client query
with the keywords list which are retrieved at the beginning of the mapper interface
of resulting Web page on the basis of other input queries given by the end user. To
delineate, if client enters inquiry “about Mumbai”, it shows data about Mumbai and
simultaneously in the background the framework get every keyword from the result
giving URL’s and saves them in keyword.dat. Afterward, if client gives any of the
relevant keywords like “Gate Way of India”, framework can give the Web site page
with the relevant data. Along these lines, the framework turns out to be gradually
smarter as it develops and as it processes progressively more and more other type of
relevant numerous queries.

6.8 The Meta-processor

In this investigation, a meta-processor is planned which gives meta-data like title, time
and brief data regarding pertinent URLs of the client-mentioned data. At whatever
point the client enters a question first time, just the Web joins are shown to the client to
give the speedy outcomes and yet a string is produced by the meta-processor, which
in background gets the meta-data and dumps it on the server. Handling of these
URLs for meta and title at run time takes additional time, as it requires association
with numerous servers for these data. At next run of a similar question, client gets
the important data alongside the meta-data. Preparing the meta-data is a foundation
procedure performed by meta-processor. This novel meta-processor is enhancing the
performance of this system.
Onto-Semantic Indian Tourism Information Retrieval System 11

6.9 The Template Manager

A few services like “About city” service, “Best time to visit” service and so forth in the
tourism space require information in one line or passage rather than URL links. For
this, a template manager is planned in novel way which gets invoked in background.
Templates are site explicit. For including new URL, individual site format should be
included as various sites utilize diverse structure/layout to show/store the data. This
epic methodology of template manager helps in fetching these data.
Along these lines, various modules portrayed above get invoked based on the
pattern of the query. As appeared in Fig. 1 (Part 1 and Part 2), the client question is
initially matched with defined the query prototypes. On the off chance that precise
match is discovered, at that point the query prototype mapper recognizes the service
to be executed. If client inquiry is not coordinated with any of the defined query
prototype, in that case the query word order mapper checks for the alteration in word
sequence and determines the service to be invoked. If this mapper fails to invoke the
service, then the check is made for incorrectly spelled city or state name, the revision
is made by the spelling correction module, and the inquiry is sent back to the query
prototype mapper to recognize the service. In the event that still the coordinating
question model is not discovered, at that point the inquiry tokens are coordinated
with the closest ontological cluster parent head as clarified in [23] to invoke the
appropriate service. There is the likelihood that client may enter just city/state name,
and then the “About” service type is revoked for the separate state/city. In the event
that the client enters extremely basic inquiry, identified with respect to domain, at
that point the basic query mapper is summoned.
There is additionally the likelihood that client may demand data whose query
prototype is not defined in the framework; then, the framework handles such question
by summoning the basic keyword mapper. Along these lines, the recognized service
type is invoked and the pertinent Web links are recovered semantically. The initial
step of the procedure begins when the client enters the question in the semantic
pursuit interface delineated in Fig. 3.
User will enter the query in the search box, and on clicking the search button, the
relevant links along with the meta-information and time required for processing are
displayed to the end user as search result in Fig. 4.

7 Performance Analysis

The Web application is developed to make usage of the framework. The base
outcomes utilizing fundamental model are talked in detail in [24, 25]. The advanced
framework gets client query as an input through user interface designed as appeared
in Fig. 3, and the outcomes are obtained semantically utilizing ontology dependent
on the terms in the inquiry as appeared in Fig. 4. The framework performance is
controlled by computing the accuracy and effectiveness in terms of query execution
12 S. S. Laddha and P. M. Jawandhiya

Fig. 3 Home page semantic search interface

Fig. 4 Results with meta-information presented by semantic search interface

time. Exactness is utilized to gauge the precision of the framework. The performance
of the semantic hunt interface is assessed by setting up the wide variety of queries
for every one of the identified services recognized for tourism domain of India as
appeared in Table 1, and for each distinguished service, the testing is accomplished.
In each sort of service, we tried testing of diversified queries. The complete testing
results of major services are depicted in detail as follows.
Onto-Semantic Indian Tourism Information Retrieval System 13

Table 1 Indian tourism


S. No. Service name
domain services
1 About service
2 State service
3 City state service
4 Distance service
5 Tourist places service
6 Hotel type service
7 Hotel service
8 Hotel rating service
9 Train service
10 Things to do service
11 Flight service
12 Weather service
13 India place service
14 Keyword base service
15 Bus service
16 Best time to visit service
17 How to reach service

The query is given for processing to onto-semantic search interface and the
conventional keyword-driven Web search tools like Bing, Google and Yahoo, and
the outcomes retrieved are analyzed for each query. The framework performance is
determined in terms of average accuracy and time taken for processing as appeared
in following table.

7.1 Detailed Testing

Different users may demand the data from various perspectives. In view of the client
demand, the framework deciphers the service, and then, it renders the outcome. The
outcome gave by those all Web searching engines and individual time taken for the
processing of the query and the accuracy we observed is far better in performance
of the onto-semantic search engine over generally used conventional search engines
available, viz. Google, Bing and Yahoo.
14 S. S. Laddha and P. M. Jawandhiya

7.2 Appraisal of All Services

The principle portion of our assessment procedure is ascertaining the service


insightful precision esteems and time taken by the system for processing of inquiries.
These precision esteems and time taken for processing are then found the middle
value of normal accuracy esteems and normal time taken for processing as appeared
in Table 2. The examination is done on 1000 plus inquiries, and the service-wise
similar investigation appeared in Graphs 1 and 2 depicts that this semantic Web
crawler shows a stunning improvement over commonly used existing keyword-based
Web indexes, viz. Google, Bing and Yahoo. This strong system ensures the quick
retrieval of relevant, precise and efficient results, and it has very easy-to-understand
search interface.

8 Conclusion and Future Scope

This examination endeavors to display a novel onto-semantic data recovery structure


and its application in travel industry of India, which fuses the novel idea of “Query
Prototype” and “Query Word Order Mapper” and “Spelling Correction” along with
“Ontological Mapper” and “Keyword Mapper”. At the point when all these advances
joined with the solace of keyword-driven hunt interface, we get one of the most
simple-to-utilize, elite semantic inquiry interface to bargain the powerlessness and
lack of definition of the recovery procedure. Inside the extent of this exploration,
Indian tourism is picked as testing domain, inquiries are indicated for this area,
and the performance is assessed. Assessment results show that strategy structured
can without much of a stretch outflank the conventional Web crawlers like Google,
Bing and Yahoo, regarding exactness and inquiry processing time. This powerful
execution of the framework tested on Indian travel industry domain can be actualized
on different areas with the particular changes in the query prototypes along with
development of ontology specific to the domain.

Table 2 Comparative average precision and average processing time analysis for all services
Service Id No. of Semantic Google Bing Yahoo
name unique search search
queries Average Average Average Average Average Average
precision processing precision processing precision precision
time time
About 1 19 92.5 0.21 56 0.52 54.73 51.59
city
service
Distance 2 12 100 0.37 65.83 0.57 61.92 53.32
service
(continued)
Onto-Semantic Indian Tourism Information Retrieval System 15

Table 2 (continued)
Service Id No. of Semantic Google Bing Yahoo
name unique search search
queries Average Average Average Average Average Average
precision processing precision processing precision precision
time time
Best time 3 10 79.86 0.2 66.4 0.56 53.11 38.31
to visit
service
How to 4 17 100 0.24 82.57 0.63 89.03 73.43
reach
service
Things to 5 10 74.07 0.25 100 0.8 91.89 71.85
do
service
Hotel 6 13 96.61 0.18 89.23 0.69 88.07 85.56
service
Hotel 7 10 98.84 0.24 99.29 0.75 100 95.56
type
service
Hotel 8 10 92.84 0.18 88 0.71 90.96 84.65
rating
service
Flight 9 19 97.74 0.28 97.44 0.72 93.77 90.43
service
Tourist 10 11 100 0.28 94.95 0.72 80.69 66.83
places
service
Train 11 10 94 0.23 69 0.57 62.6 40.04
service
Weather 12 10 75.6 0.19 96 0.49 88.53 63.95
service
Bus 13 11 97.56 0.19 64.55 0.63 64.2 62.25
service
India 14 10 100 0 86 0.86 82.88 81.95
place
service
Keyword 15 22 100 0.11 84.8 0.79 83.07 70.12
base
service
City state 16 20 100 0.23 67.73 0.7 61.71 54.37
service
State 17 10 100 0.21 65.1 0.79 54.66 59.25
service
16 S. S. Laddha and P. M. Jawandhiya

Graph 1 Comparative average precision analysis for all services

Graph 2 Comparative average processing time analysis of semantic and Google search engines
for all services

References

1. Buhalis, D., & Law, R. (2008). Progress in information technology and tourism manage-
ment: 20 years on and 10 years after the internet—The state of eTourism research. Tourism
Management, 29(4), 609–623.
2. Hall, C. M. (2010).Crisis events in tourism: Subjects of crisis in tourism. Current issues in
Tourism, 13(5), 401–417.
3. Hauben, J. R. (2005). Vannevar Bush and JRC Licklider: Libraries of the future 1945–1965.
The Amateur computerist (p. 36).
4. Jakkilinki, R., Sharda, N., & Ahmad, I. (2005). Ontology-based intelligent tourism information
systems: An overview of development methodology and applications. In Proceeding of TES.
5. Laddha S. S., & Jawandhiya P. M. (2018) Onto semantic tourism information retrieval. Inter-
national Journal of Engineering & Technology (UAE), 7(4.7), 148–151. ISSN 2227-524X,
Onto-Semantic Indian Tourism Information Retrieval System 17

https://doi.org/10.14419/ijet.v7i4.7.20532.
6. Laddha S.S., Koli N.A., & Jawandhiya P. M. (2018). Indian tourism information retrieval
system: An onto-semantic approach. Procedia Computer Science, 132, 1363–1374. ISSN 1877-
0509, https://doi.org/10.1016/j.procs.2018.05.051.
7. Laddha S. S., & Jawandhiya P. M. (2020). Novel concept of spelling correction for semantic
tourism search interface. In: Tuba M., Akashe S., Joshi A. (eds) Information and Communica-
tion Technology for Sustainable Development. Advances in Intelligent Systems and Computing,
Vol. 933. Springer, Singapore. https://doi.org/10.1007/978-981-13-7166-0_2 ISBN: 978-981-
13-7166-0, ISSN: 2194-5357, Pages 13–21.
8. Laddha S. S., & Jawandhiya P. M. (2018) Novel concept of query-prototype and query-
similarity for semantic search. In: Deshpande A. et al. (eds) Smart Trends in Information Tech-
nology and Computer Communications. SmartCom 2017. Communications in Computer and
Information Science, Vol. 876. Springer, Singapore.Online ISBN 978-981-13-1423-0 ISSN:
1865-0929.
9. Kanellopoulos, D. N. (2008). An ontology-based system for intelligent matching of trav-
ellers’ needs for Group Package Tours. International Journal of Digital Culture and Electronic
Tourism, 1(1), 76–99.
10. Aslandogan, Y. A., & Clement T. Y. (1999). Techniques and systems for image and video
retrieval. IEEE Transactions on Knowledge and Data Engineering, 11(1), 56–63.
11. Tomai, E., Spanaki, M., Prastacos, P., & Kavouras, M. (2005). Ontology assisted deci-
sion making–a case study in trip planning for tourism. In OTM Confederated International
Conferences on the Move to Meaningful Internet Systems (pp. 1137–1146). Berlin: Springer.
12. Vinayek, P. R., Bhatia, A., & Malhotra, N. E. E. (2013). Competitiveness of Indian tourism
in global scenario. ACADEMICIA: An International Multidisciplinary Research Journal, 3(1),
168–179.
13. Kathuria, M., Nagpal, C. K., & Duhan, N. (2016). A survey of semantic similarity measuring
techniques for information retrieval. In 2016 3rd International Conference on Computing for
Sustainable Global Development (INDIACom) (pp. 3435–3440). IEEE.
14. Laddha S. S., & Jawandhiya P. M. (2017). Semantic tourism information retrieval inter-
face. In 2017 International Conference on Advances in Computing, Communications and
Informatics(ICACCI), Udupi, (pp. 694–697). https://doi.org/10.1109/icacci.2017.8125922.
15. Wang, W., Zeng, G., Zhang, D., Huang, Y., Qiu, Y., & Wang, X. (2008). An intelligent ontology
and Bayesian network based semantic mash up for tourism. In IEEE Congress on Services-Part
I (pp. 128–135). IEEE.
16. Laddha S. S., Laddha, A. R., & Jawandhiya P. M. (2015). New paradigm to keyword search:
A survey. In IEEE Xplore digital library (pp. 920–923). https://doi.org/10.1109/icgciot.2015.
7380594. IEEE Part Number: CFP15C35-USB, IEEE ISBN: 978-1-4673-7909-0.
17. Song, T. -W., & Chen, S. -P. (2008). Establishing an ontology-based intelligent agent system
with analytic hierarchy process to support tour package selection service. In International
Conference on Business and Information, South Korea.
18. Chiu, D. K. W, Yueh, Y. T. F., Leung, H., & Hung, P. C. K. (2009). Towards ubiquitous tourist
service coordination and process integration: A collaborative travel agent system architecture
with semantic web services. Information Systems Frontiers, 11, 3, 241–256.
19. Laddha S. S., & Jawandhiya P. M. (2017). An exploratory study of keyword based search
results. Indian Journal of Scientific Research, 14(2), 39–45. ISSN: 2250-0138 (Online).
20. Laddha S. S., & Jawandhiya P. M. (2019) Novel concept of query-similarity and meta-
processor for semantic search. In: Bhatia S., Tiwari S., Mishra K., & Trivedi M. (eds) Advances
in Computer Communication and Computational Sciences. Advances in Intelligent Systems
and Computing, Vol. 760. Springer, Singapore. https://doi.org/10.1007/978-981-13-0344-9_9.
Online ISBN 978-981-13-0344-9.
21. Laddha S. S., & Jawandhiya P. M. (2017). Semantic search engine. Indian Journal of Science
and Technology, 10(23), 01–06. https://dx.doi.org/10.17485/ijst/2017/v10i23/115568. Online
ISSN : 0974-5645.
18 S. S. Laddha and P. M. Jawandhiya

22. Park, H., Yoon, A., & Kwon, H. C. (2012). Task model and task ontology for intelligent tourist
information service. International Journal of u-and e-Service, Science and Technology, 5(2),
43–58.
23. Lee, R. S. T. (Ed). (2007). Computational intelligence for agent-based systems (Vol. 72).
Springer Science & Business Media.
24. Pan, B., Xiang, Z., Law, Rob, & Fesenmaier, D. R. (2011). The dynamics of search engine
marketing for tourist destinations. Journal of Travel Research, 50(4), 365–377.
25. Korfhage, R. R. (1997). Information storage and retrieval.
An Efficient Link Prediction Model
Using Supervised Machine Learning

Praveen Kumar Bhanodia, Aditya Khamparia, and Babita Pandey

Abstract Link prediction problem is subsequently an instance of online social


network analysis. Easy access and reach of Internet has scaled social networks expo-
nentially. In this paper, we have focused on understanding link prediction between
nodes across the networks. We have explored certain features used in prediction of
link using machine learning. The features are quantified exploiting the structural
properties of the online social networks represented through a graph or sociograph.
Supervised machine learning approach has been used for classification of poten-
tial node pairs for possible links. The proposed model is trained and tested using
standard available online social network datasets and evaluated on state-of-the-art
performance parameters.

Keywords Social network · Link prediction · Node · Graph · Common


neighborhood

1 Introduction

Online Social Network (OSN) has established an era where the life of humans is
being highly influenced by the trends and activities prevailing across these social
networks. The power of the social network could be understood in a way that
majority of the market and business trends have been decided and set on social
networks, even governments are relying upon these mediums to implement their part

P. K. Bhanodia (B) · A. Khamparia


School of Computer Science and Engineering, Lovely Professional Univerity, Phagwara, India
e-mail: kumarpkb2@gmail.com
A. Khamparia
e-mail: adityakhamparia88@gmail.com
B. Pandey
Department of Computer Science and IT, Babasaheb Bhimrao Ambedkar University, Amethi,
India
e-mail: Shukla_babita@yahoo.co.in

© The Editor(s) (if applicable) and The Author(s), under exclusive license 19
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_2
20 P. K. Bhanodia et al.

and parcels of policy implemented. Social network has now offering new avenues
to build new friendships, associations and relationships may be in business life of
social life [1]. Facebook, Twitter, LinkedIn and Flickr are few popular names of
online social network sites around us which have become integrated part of our daily
life. The rapid and exponential development of social networks has attracted the
research community for exploring its evolution in order to investigate and understand
its nature and structure. Researchers study these networks employing various math-
ematical models and techniques. Online social networks are represented through
graphs (Fig. 1 sample social network graph representation) wherein the network
nodes are representation of users and the relationship or association between the
nodes is being represented through links. The social networks are being crawled up
by the researchers for further examination, and during collection of information at a
particular instance, the network information collected is partially downloaded where
certain links between the nodes may be missing. The missing link information is
an apprehension in understanding the standing structure of the network. Apart from
this, crunching the network structure attributes for approximation of new fresh links
between the nodes is another interesting challenge to be addressed.
The formal definition according to Libon Novel and Klienberg [2] referred as: In
a social network represented by G(V, E) where e = (u, v) belongs to E is the link
between the vertices (endpoints) at a specific timestamp t. Multiple links between
the vertices have been represented using the parallel links between vertices (nodes).
Let us assume for time t ≤ t  that a subgraph denoted by G [t, t  ] of G restricted
by the edges of time instance between the t and t’. According to supervised training
methodology, a training interval [t0, t0 ] and a test interval [t1, t1 ] where t  < t1.
Consequently, the link prediction gives an output list of links which are not present
in graph G[t0, t0 ] but are predicted to appear in G[t1, t1 ].
Identification of contributing information for determining such missing and new
links is helpful in new friendship recommendation mechanism usually used in online
social networks. Several algorithmic techniques described by Chen et al. [3] have

Fig. 1 Social network representation in graph


An Efficient Link Prediction Model Using Supervised … 21

been introduced by IBM in their internal local private online social network estab-
lished for their employees and workers in order to connect with each other digitally.
The prediction or forecasting of such existing hidden links or creating new fresh
links using the existing social network data is termed as link prediction problem.
The applications of link prediction include domains like bioinformatics for finding
protein–protein interaction [4]; prediction of potential link between nodes across
network could be used to develop various recommendation systems for e-commerce
Web sites [5]; besides link prediction can also assist the security systems for detecting
and tracking the hidden terrorist groups or terrorist networks [6]. As to address the
problem of link prediction pertaining to answer the relevant different scenarios, many
algorithms and procedures were proposed and majority of the algorithms usually
belong to machine learning approaches.
The applications mentioned above are do not work only on social networks;
rather many different networks like information network, Web links, bioinformatics
network, communication network, road network, etc., may be included for further
processing.
It is obvious that crunching large and complex social networks in one single
processing is although possible but would be inefficient and complex. This complex
task can be simplified into subtasks and handled separately. The basic building blocks
of any social networks are nodes, edges, the degree associated with the node and
local neighborhood nodes. The prediction of potential links approximated exploiting
global and local topological features of the network. Exploitation of the local neigh-
borhood features will be used to identify the similarity between the nodes. Thus, the
feature estimated using neighborhood techniques like JC, AA and PA would further
be used in developing a model classifier for link prediction.
The paper tries to address the problem of link prediction based upon machine
learning approach or classifier which will be trained using certain similarity feature
extracted by exploiting topological features. The proposed classifier would be exper-
imentally evaluated using social networking dataset (Facebook and Wikipedia).
The paper also introduces the state-of-the-art similarity techniques which include
Adamic/Adar, Jaccard’s coefficient, preferential attachment which are used as a
feature extraction techniques. The objective of the paper is to explore:
• Online social network and its evolution along with appropriate representation
using graphs.
• The problem of link prediction and its evolution.
• How link prediction problems could be comprehended and addressed.
• The techniques employed for link prediction for establishing relationships
between nodes across the online social network.
• Contribution of machine learning in addressing link prediction between nodes in
online social network.
• Accordingly propose a model for effective and efficient link prediction between
nodes in a online social network.
22 P. K. Bhanodia et al.

2 Link Prediction Techniques

Advent of online social network has attracted the researcher to crunch the bulging
and getting complex network to extract knowledge for further predictions and recom-
mendations. Various techniques and predictive models have been introduced and
proposed to analyze the online social networks; these methods are classified based
on the way they exploit the data; local, global and machine learning-based methods
are usually employed for network data exploitation. Distinguished methods may be
explored in [7].
Common Neighborhood (CN). According to Newman [8], CN measure is
deduced by computing the number of existing common neighbors between adja-
cent nodes across which the future link is supposed to be predicted. Thus, it is a
score of similarity calculated by the intersection of number of adjacent connected
node to the nodes for identifying the similarity for having a potential link to establish
a relationship. It can be approximated using following expressions. The number of
common neighbors to x is represented by (x) and to y is by (y).

C N(x,y) = (x) ∩ (y)

Jaccard’s Coefficient. In common neighborhood, the measure across the network


is not effectively distinguishable to further refine it has been normalized. According
to JC, prediction of link within nodes can be measured using following expression.

(x) ∩ (y)
J C(x y) =
(x) ∪ (y)

Adamic/Adar. This measure is for identifying the similarity between nodes


referred as the summation of the logarithmic of the reciprocal of the number of
common neighbors connected with nodes across the social network. The Adamic
measure is computed using formula given below Adamic and Adar [9], whereas z is
the set of total number of neighbor nodes of node x and node y.

 1
A A(x y) =
z ∫(x)∩(y)
log|(z)|

Preferential Attachment. In the same fashion, Kunegis et al. [10] have proposed
another approach for detecting the similarity between the nodes which is typically
identifying potential nodes across the social network. It refers as the maximum
number of nodes will be attracted to the node of highest degree in the network.
Mathematical representation of the measure is:

P A(x,y) = (x) · (y)


An Efficient Link Prediction Model Using Supervised … 23

Sørensen Index. This measure is developed and proposed to crunch network


communities by Zou et al. [11]; mathematically, it is referred as:

2 ∗ |(x) ∩ (y)|
Sx y =
k(x) + k(y)

Hub Promoted Index (HPI). This technique is typically introduced for


processing overlapping pairs within the complex type of social networks. It is defined
as the ratio of twice of the common neighbors between the nodes to the minimum total
number of nodes associated with either of the node of the pair (x or y). Mathematically,
it is represented as

2 ∗ |(x) ∩ (y)|
Sx y =
min|k(x), k(y)|

Hub Depressed Index (HDI). The technique is similar to HPI; the only difference
here is the denominator is the degree of the node associated with the maximum
number of neighbor nodes associated with either of the node of the pair (x or y);
mathematically, it is,

2 ∗ |(x) ∩ (y)|
Sx y =
max|k(x), k(y)|

Leicht–Holme–Newman Index (LHN). Leicht et al. have proposed technique


for similarity approximation where it is referred as the ratio of common neighbors
and the product of degree of the nodes (x and y).

2 ∗ |(x) ∩ (y)|
Sx y =
k(x) ∗ k(y)|

Path Distance. It is basically a global method where the network global structure
is exploited for generating a measure over which the link prediction is estimated. It is
typical measured distance between the nodes for identifying the closeness between
the nodes. Dijkstra algorithm could be applied for retrieving the shortest path, but it
would be an inefficient method for large complex type of social networks. It is also
known as the geodesic distance between two nodes.
Katz. It considers all the paths across two nodes and designates the shortest path
with highest value. The approximation would reduce exponentially the involvement
of the path in a way to assign lower values to the longer paths; mathematically, it is
represented as


n
Katz(x y) = β < path(x, y)
l=0
24 P. K. Bhanodia et al.

where β is generally used for controlling the length of the paths, how much it should
be considered.

3 Experimental Methods and Material

It has been discussed during thorough literature review that social networks are
exploited on the basis of their graphical structures. Various methods have been used
to compute the links between the nodes, the methods typically vary with respect
to the nature of the networks, we have got information network, business network,
friendship network and so on, and therefore, there is no single method which can
effectively address the problem of link prediction. Thus, to simplify it has been solved
in two phases, wherein first-phase local structure of the network is exploited and a
new resultant network is formed with additional features. With additional features,
the new network is processed with machine learning techniques to build a classifier
for link prediction in social network. Naïve Bayes network has been used for further
experimental analysis.
Bayes Theorem. The theorem is used to find the probability of having an even A
given that event B is occurred. It is supposed that here B is designated as evidence
and A is designated as hypothesis. Assume that the attributes or predictors are inde-
pendent. It is understood that availability of one specific attribute does not affect the
other one; therefore, it is known as naive. The expression of naïve is represented as
under
P(A|B)P( A)
P(A|B) =
P(B)

The Naïve Bayes Model. It is an algorithm for classification in supervised learning


specifically for binary class and in certain cases multiclass classification problems.
The algorithm is best suited for data in binary or categorical input values. In our
social network dataset, we do have binary classification values for link prediction.
As known, it is also referred as idiot Bayes or naive Bayes because the probability
computation for hypothesis is simplified in a way to channelized. Instead of trying
to computer the attribute values individually P(a1, a2, a3|h), they are supposed to
be independent conditionally by giving the output values and computed as P(a1|h)
*P(d2|H) and so forth. Naive Bayes model uses probabilities for its representation,
and in order to learn the model, list of probabilities will be stored in a file. In this file
class probabilities along with conditional probabilities for every class present in the
training dataset of each input value will be given with respect to class values.
As far as learning of the model is concerned, it is simple as the frequency of the
instances belonging to every class and the probability where different input values
of x values are to be calculated; no additional optimization procedure is required to
perform this.
An Efficient Link Prediction Model Using Supervised … 25

The experimental study is evaluated over Wikipedia network, the dataset for which
is downloaded from snap Web site. The performance parameters used for analysis
are precision, recall, F1 score and accuracy. As the link prediction problem is a kind
of binary classification problem where positive link between nodes is designated
as presence of links and negative link between nodes is designated as absence of
potential link. Precision is determined dividing true positive value by the sum of
false positive and true positive both. Sensitivity or recall value can be determined by
division of true positive value by sum of true positive and false negative values. The
equations for performance evaluation are as follows.

True Positive
Precision =
True Positive + True Negative
True Positive
Recall =
True Positive + False Negative
2 ∗ True Positive
F1 score =
2 ∗ True Positive + false Negative + false positive

4 Experimental Study

The dataset used for experimental analysis is of voted history data. It includes around
2794 nodes and around 103,747 votes casted among 7118 users including existing
admins (either voted or being voted on). Partially, it consists of around half of the
votes in the dataset which are by existing admins, while the other half comes from
ordinary Wikipedia users. The dataset is downloaded from https://snap.stanford.edu/
data/wiki-Vote.html The network nodes are users, and directed edges are from node
i to node j designated user i has voted on user j.
Naive Bayes network classification technique is used to create a classifier model
for link prediction in a social network. The model created using stratified tenfold
crosses validation. It has been observed from Table 1 demonstrated below that the
classifier has predicted around 90.37% of the instances correctly leaving around
9.62% of the incorrect classified instances. The total time taken for building up of
the model is 0.03 s which is not much although the network selected may be of much
smaller size and in future on real data may be increased; however, it is reasonably
fair.

Table 1 Detailed accuracy of the model classifier


Model True positive False positive Precision Recall F—measure
AA + Naïve Bayes 1.000 0.750 0.901 1.000 0.948
JC + Naïve Bayes 0.991 0.000 1.000 0.991 0.996
26 P. K. Bhanodia et al.

Naive Bayes when combined with Jaccard’s coefficient has significantly produced
results where accuracy is improved to 99.12%. The classifier model is built in negli-
gible time. It has correctly classified around 340 instances compared to three incor-
rectly classified instances. Figures 2 and 3 represent the classification of true instances
and false instances of the network.

Fig. 2 Graphical representation of classifier (AA + Naïve Bayes)

Fig. 3 Graphical representation of classifier (Jaccard’s coefficient + Naïve Bayes)


An Efficient Link Prediction Model Using Supervised … 27

5 Conclusion and Future Enhancement

In this paper, online social networks are studied from the point of link prediction
between the set of nodes in a large scaling online social network. In the process,
we have introduced various local and global classical techniques which produce
a measure used for identification of a potential link between the nodes. These
dyadic structural techniques in this paper have been studied with supervised machine
learning techniques. Adamic/Adar and Jaccard’s coefficient are combined with naive
Bayes classification technique to build a classifier. the experimental analysis shows
that use of Jaccard’s coefficient with naive Bayes has produced better accurate results
than the previous one. Though the results are witnessing over-fitting compared to the
previous approach which reasonably fair as well but even though the later approach
is superseding in accuracy. The model trained and tested over only one type of social
network. Exploitation of other types of social network may produce a significant
result to generalize the model over other online social networks.

References

1. Liben-Nowell, D., & Kleinberg, J. (2007). The link prediction problem for social networks.
Journal of the American Society for Information Science and Technology, 58(7), 1019–1103.
2. Kautz, H., Selman, B., & Shah, M. (1997). Referral web: Combining social networks and
collaborative filtering. Communications of the ACM, 40(3), 63.
3. Chen, J., Geyer, W., Dugan, C., Muller, M., & Guy, I. (2009). Make new friends, but keep the old:
Recommending people on social networking sites. In: Proceedings of the 27th İnternational
Conference on Human Factors in Computing Systems, ser. CHI’09 (pp. 201–210). NewYork:
ACM. https://doi.acm.org/10.1145/1518701.1518735.
4. Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2006). Fixed membership stochastic
block models for relational data with application to protein-protein interactions. In Proceedings
of International Biometric Society-ENAR Annual Meetings.
5. Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In
Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries.
6. Hasan, M. A., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised
learning. Counter terrorism and Security: SDM Workshop of Link Analysis.
7. Pandey, B., Bhanodia, P. K., Khamparia, A., & Pandey, D. K. (2019). A comprehensive survey
of edge prediction in social networks: Techniques, parameters and challenges. Expert Systems
with Applications, Elsevier. https://doi.org/10.1016/j.eswa.2019.01.040.
8. Newman, M. E. J. (2001). Clustering and preferential attachment in growing networks. Physical
Review Letters E.
9. Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, Elsevier,
25(3), 211.
10. Kunegis, J., Blattner, M., & Moser, C. (2013). Preferential attachment in online networks:
Measurement and explanations. In Proceedings of the 5th Annual ACM Web Science Conference
(WebSci’13) (pp. 205–214). New York: ACM.
11. Zhou, T., Lü, L. & Zhang, Y. C. (2009). The European Physical Journal B, 71, 623. https://doi.
org/10.1140/epjb/e2009-00335-8.
Optimizing Cost and Maximizing Profit
for Multi-Cloud-Based Big Data
Computing by Deadline-Aware Optimize
Resource Allocation

Amitkumar Manekar and G. Pradeepini

Abstract Cloud computing is most powerful and demanding for businesses in this
decade. “Data is future oil” can be proved in many ways, as most of the business
and corporate giants are very much worried about business data. In fact to accom-
modate and process this data, we required a very expensive platform that can work
efficiently. Researchers and many professionals have been proved and standardize
some cloud computing standards. But still, some modifications and major research
toward big data processing in multi-cloud infrastructure need to investigate. Reliance
on a single cloud provider is a challenging task with respect to services like latency,
QoS and non-affordable monetary cost to application providers. We proposed an
effective deadline-aware resource management scheme through novel algorithms,
namely job tracking, resource estimation and resource allocation. In this paper, we
will discuss two algorithms in detail and do an experiment in a multi-cloud environ-
ment. Firstly, we check job track algorithms and at last, we will check job estimation
algorithms. Utilization of multiple cloud service providers is a promising solution
for an affordable class of services and QoS.

Keywords BDA · Resource allocator · Cloud computing · Optimization · Fare


share · Cost optimization

1 Introduction

The last decade was a “data decade.” Many multi-national company changes its
modes of operation based on data analysis. Big data and data analysis is an essential
and mandate for every industry. Companies like Amazon, Google and Microsoft are
ready with their data processing platform completely based on the cloud [1] in other

A. Manekar (B) · G. Pradeepini


CSE Department, KLEF, Green Fields, Vaddeswaram, Andhra Pradesh 522502, India
e-mail: asmanekar24@gmail.com
G. Pradeepini
e-mail: pradeepini.gera@gmail.com

© The Editor(s) (if applicable) and The Author(s), under exclusive license 29
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_3
30 A. Manekar and G. Pradeepini

sense, all social media companies are also targeting cloud as a prominent solution.
Netflix and YouTube have already started using the cloud [2]. Cloud computing
impacted and proved as a very effective and reliable solution for multivariate huge
data. Still, researchers and professionals are working to enhance more and more
possibilities from the existing cloud structure. One of the major and critical tasks
is resource provisioning in a multi-cloud architecture. We tried to solve some of
the issues in multi-cloud architecture by implementing a prominent algorithm in a
multi-cloud architecture. Cloud computing is available in three types for each of us
[3], the foremost Publics Cloud Platform in which third-party providers are respon-
sible to provide services on the public cloud. In most cases, these services are maybe
free or sold by service providers on-demand, sometimes customers have to pay only
per usage for the CPU cycles, storage or bandwidth they consume for their appli-
cations [4–6]. Second is the Private Cloud Platform in this entire infrastructure is
privately owned by the organization; also it completely maintained and manages via
internal resources. For any organization, it is very difficult to maintain and manage
the entire infrastructure then, they can own VPC (Virtual Private Cloud) where a
third-party cloud provider-owned infrastructure but used under organization premises
[7–9]. The third is the Hybrid Cloud Platform; as the name indicated, it is a mixed
computing resource from public and private services. This platform is rapidly used
by many as a cost-saving and readily available on demand for fast-moving digital
business transformation. Cloud providers enhanced their infrastructure in distributed
by expanding data centers in different geographical regions worldwide [4–6]. Google
itself operates 13 data centers around the globe. Managing distributed data centers
and maximizing profit is a current problem. Ultimately, the customer is affected by
high cost and maintenance charges by these data centers. This cost has four prin-
ciples bound by applications serving to big data. Numerous cost-effective parallel
and time-effective tools are available in big data processing with the programming
paradigm. The master player in this tool or every big data application is the manage-
ment of resource which use an available resource and manage trade-offs between
cost and result. Complexity, scale, heterogeneity and hard predictability are the key
factors of these big data platforms. All challenges like complexity, which exactly
in inner of architecture, consist of proper scheduling of resources, managing power,
storage system and many more. The scale totally depends on target problem—data
dimensions and parallelism with high deadline [10]. Heterogeneity is a technology
need—maintainability and evolving nature of the hardware. Hard predictability is
nothing but the crunching of these their major factors explained earlier as well as a
combined effect of hardware trade-offs.

2 Literature Survey and Gap Identification

Inacio, E. C., Dantas in 2014 specified characterization [11] which deals with opti-
mization problems related to large dataset has mentioned the scale exacerbates. A
variety of aspects have an effect on the feat of scheduling policies such as data volume
Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data … 31

(storage), data variety, data velocity, security and privacy, cost, connectivity and data
sharing [12, 13]. The resource manager can be organized in a two-layer architecture
as shown in Fig. 1. The job scheduler [12] is responsible for allocating resources to
mitigate the execution of various different jobs running at the same time.
Figure 1 represents the local executable resource scale which exacerbates known
management and dimensioning problems, both in relation to architecture and
resource allocation and coordination [14, 15]. The task-level scheduler, on the other
hand, decides how to assign tasks on multiple task executors for each job [10, 16].
Cluster scheduler measures each job as a black box and executes as a general policy
and strategy. Our efforts are that by optimizing fiscally application-specific features,
we finally optimize resource scheduling decisions and achieve better performance
for advanced data analytics [17].
Figure 2 shows various open-source big data resource management frameworks
[18]. In many pieces of literature, it is observed that most of the available big
data processing framework is an open-source framework. Some of the preparatory
frameworks have license fees and the necessity of specialized high-end infrastructure.
On the contrary, open-source uses commodity hardware with marginal varia-
tion and requirements. Basically, Spark is a mainstream data streaming framework
which is the industry likely and can be expanded and ultimately used in various
IoT-based application data analysis. YARN is the heart of Hadoop which works for
global resource management (ResourceManager) and per-application management
(Application Master) [19–21].
As far as research gap identification and problem formulation, some observations
are mentioned.

Fig. 1 Hierarchical resource management system


32 A. Manekar and G. Pradeepini

Fig. 2 Classification of big data resource management frameworks

1. Apache Spark with fault tolerance mechanism and characterization to support in


data streaming is a prominent platform
2. Spark MLlib and Flink-ML offer a variety of machine learning algorithms and
utilities to exploit distributed and scalable big data applications.
3. More and more focus should be made for a few issues such as throughput, latency
and machine learning.
4. Deadline-aware job tracing and scheduling resources should be managed instead
of the fine-grained splitting of resource pooled when deadlines are not mitigating.
5. Deadline achievement without wasting resources in IoT workloads in a resource-
constrained environment.

3 Problem Formulation

Missing a deadline disturbs entire large intensive data processing and leads to under-
used resource utilization, incurred the cost of cloud uses for both cloud service
provider and user, and leads to poor decision making [22, 23]. To address this issue,
we designed a framework that is actually to be framework-agnostic and not rely on job
repetition or pre-emption support. On the other side in this work, focus is maintained
to utilize job histories and statics to control job admission. Instead of traditional fair
share resource utilization, we design a deadline-aware optimized resource allocation
policy by implementing two algorithms—one is job tracking and other is resource
estimation and resource allocation [8, 24]. Consideration of the second algorithm
Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data … 33

based on an only decision is made for effective deadline-aware resource allocation.


Let us discuss the actual problem formulation.
To overcome such issues, some research objectives are drawn based on intensive
literature review and research gap identification. Our objectives for the formulation
of problems are listed below.
1. Design a policy for improving deadline and fairness in resource allocation
by using past workload data and deadline information for the more diverse
workload.
2. Use the information in objective 1 for estimating the fraction of requested
resource for complete job before actually deadline expires with hard and soft
deadline scenario.
3. Finally, optimize the result for fair share policy for allocating resources while
running in fewer or greater resources according to changing workload with
strategies for repetition of jobs for allocation in a fair share.
4. Comparing the result with YARN resource allocator and demonstrating the
purpose of improvement for finding the best possible solution with low cost in
terms of cloud providers and users in IoT workload.
With the above-said research objective by considering certain scenario, a frame-
work of fair share resource allocator has been proposed. In this framework, the goal
is to mitigate all objectives one by one. In this section, two algorithms are proposed
to fulfill these objectives. A first algorithm is job tracking algorithm, and the second
is job estimation algorithms. The proposed algorithm is constructed by keeping a
view on fair share and deadline-aware allocation with admission control by resource
negotiation. Our approach is to negotiating CPU and other commodity resources for
a job execution to meet deadlines in a resource-constrained environment.
To execute a newly arrived job, the note should be taken for execution time
data for previous jobs. Analyzing each job estimation can be drawn for a minimum
number of CPUs that the job would have needed and deadline awareness, this can
be noted as CPUDeadline . CPU Deadline can be calculated on the basis of the fraction of
devising the compute time of job and deadline gave (CPUDeadline = Ctime /deadline).
Maximum number of CPUs can be assigned to any job (Mcpus). Algorithm 1 describe
Job_Tracking algorithms to calculate the deadline and estimate a minimum number
of CPU.
Algorithm—Job track algorithms and resource allocation system based on Apache
Spark—For the execution of algorithm, an application is to be submitted to Spark
Cluster with desired RSA w.r.t possible resources computing like (CPU), Memory
(M) and total executors (Ex) per application. Prior knowledge of the total resource
amount of cluster need is essential.
A. In Apache Spark master and worker, nodes are being deployed on cloud
virtual machines. Assume that these virtual machines (VM’s) are homogenously
used in extension assumption made about all virtual machines that have the same
computation power, i.e., same CPU ( cores), storage and computational memory
[25].
34 A. Manekar and G. Pradeepini

Algorithm 1 Job_Tracking
1 Initiation of Asp
2 Accept Fun Job_Track( C time , RTask , D, N cpu Allo, R)
3 CPU Deadline = C time /D
4 M Cpus = min(ReqTask , CC)
5 ReqMinRate = CPU Deadline /Max CPU
6 ReqminList . add( ReqMinRate)v
7 CPU Frac Min = min(ReqminList)
8 CPU Frac Max = max(ReqminList)
9 CPU Frac Last = NcpuAllo/CPUFracMax
10 Success_Last = Success
11 Function Ends

Algorithm 1—Job Estimation based on Apache Spark—For the execution of


algorithm, an application is to be submitted to Spark Cluster with desired RSA
w.r.t possible resources computing like (CPU), memory (M) and total executors (Ex)
per application. The algorithm mentioned specified fair resource allocation system
(FRAS) based on Apache Spark. Prior knowledge of the total resource amount of
cluster need is essential. From Algorithm 1, use analyzed data for each completed job
which previously executed to finish the job who meets their deadline with maximum
parallelism; Stratos Dimopoulos [18] mentioned name Justice—for their algorithms.
In this paper, we tried to implement in the same way as mentioned in [26] with
modification with respective our objective is drawn earlier in this section. This
algorithm admits all jobs as in condition for bootstrapping the system. Fair share
resource allocator first.

Algorithm 2 Job_Estimation algorithms


1 Initiation of Asp
2 Accept Fun Job_Track(C time , RTask , D, N cpu Allo, R)
3 CPU Deadline = C time /D
4 M Cpus = min(ReqTask , CC)
5 ReqMinRate = CPU Deadline /Max CPU
6 If ReqMinRate > CPUFrac then
7 CPU Frac = ReqMinRate
8 End IF
9 Function Ends
Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data … 35

4 Static and Dynamic Performance Analysis

We proposed a novel algorithm for the resource-constrained environment with a


deadline for resource-constrained cluster capacities (number of CPUs). The static
and dynamic performance analysis of these algorithms will be evaluated on certain
parameters. We are very hopeful for the performance analysis after experimenting on
fairness evaluation, deadline satisfaction, efficient resource usage and cluster utiliza-
tion. Basically, all these parameters will be very helpful for enhancing the static
and dynamic stability of a multi-cloud environment of resources provided by the
various cloud service providers. Multi-cloud environment basically provided by a
various service provider may or may not be situated with the same geographical area
and also may not be performed hypothetically same. A proposed set of the algo-
rithm is statistically evaluated on a said parameter by using simulation developed
in Java and Python overfit the different BDA analysis tools. The proposed set of
algorithm address problem related to fairness evaluation, deadline satisfaction, effi-
cient resource usage, and cluster utilization and facilitate appropriate selection and
management. Ultimately, the cost of data-intensive applications is minimized, while
the specified QoS by users is met. Discussion of expected result on the basis of these
algorithms is mentioned following.
A. Fairness Evaluation—Fair share mechanisms can violate fairness in certain
conditions like when CPU demands exceed with respective available. Fairness
violations happen because future workload prediction is complicated and not
anticipated by this kind of mechanism. A job with high resource demand is
not to get resources and dropped from execution. Usually, jobs waiting in the
queue can miss the deadline due to a heavy workload. Proposed algorithms will
mitigate this kind of problem with fair share resource allocation with deadline
awareness in a resource-constrained environment.
B. Deadline Satisfaction—Admission control mechanism is very important in this
parameter. Without admission control, admit jobs cannot meet the deadline.
Ultimately, unnecessary queuing of jobs and resource congestion may lead to
the dropping of jobs from the execution queue. The proposed algorithm is trained
to achieve a larger fraction of deadline successes overall.
C. Efficient Resource Usage—For a fixed workload resource scarcity is not a
problem; perhaps the proposed algorithm gives a fair chance to get extra resource
and probably provision to expand in demands of extra resource in execution. This
will be carried out with conservative, prioritizing fairness and deadline success
over resource saturation.
D. Cluster Utilization—It is very complicated for the fair share resource allocator
without implementing admission control and proper resource utilization tech-
nique to enhanced cluster utilization. Hence, a proposed algorithm takes care of
implementation and execution of more workload without making CPU too busy
by analyzing the duration of idle CPU.
36 A. Manekar and G. Pradeepini

5 Experimental Setup

Existing fair share resource allocator does not take consideration of the deadline
of every individual job. The general assumption in this kind of resource allocation
is every job has indefinitely and that there is no limit on the turnaround time a
job’s owner is willing to tolerate. The proposed algorithm will be implemented
for the basic allocator by considering the job deadline for the resource-constrained
environment. Attempting to a trace-based simulation developed in Python and Java
for the admission control while submitting a job will give the desired result. We
are in phase to implement this for different resource-constrained with a variety of
hardware precisions. For the entire experimental setup, nodes run on Ubuntu 12.04
Linux system with mapped reduce Hadoop stack.

6 Results Obtained and Work in Progress

The proposed algorithm is promising in tracking the success of its allocation deci-
sions and improves its future allocations accordingly. Every time a job completes,
it updates a cluster-wide model that includes information about the duration, size,
maximum parallelization, deadline and provided resources for each job. If the job is
successful, the proposed algorithm is more optimistic providing the jobs that follow
with fewer resources hoping they will still meet their deadlines? Next we compare
the result of the proposed algorithm with the existing methodology in the big data
analytics framework. The novelty of the proposed algorithm is if the job is unsuc-
cessful Justice provides more conservative allocations to make sure no more jobs
miss their deadlines.

7 Expected Contribution to the Literature

Our research aims to satisfy deadlines and preserve fairness to enable reliable
use of multi-analytic systems in resource-constrained clusters. It achieves this in
a framework-agnostic way by utilizing admission control and predicting resource
requirements without exploiting job repetitions. A key point of our research is its
applicability without costly modifications and maintenance in existing popular open-
source systems like Apache Mesos and YARN. Thus it requires minimal effort to
integrate with the resource manager without the need to adapt to API or structural
changes of the processing engines.
Optimizing Cost and Maximizing Profit for Multi-Cloud-Based Big Data … 37

8 Conclusion and Future Work

Modern big data analytics system is designed to very-large-scale and fault-tolerant


operation which gives new revaluation to the corporate industry. In every sector of
industry whether it is a health care, tourism, bioinformatics education, finance, e-
commerce, social networks, sports and much more, fast analysis and strong support of
big data analysis are required. The advent of IoT brings the combined operation of big
data processing systems in smaller, resource-constrained and shared clusters. With
the advancement of cloud-enabled big data, adaptation processing works are assigned
with low latency, and fair share resource allocation and deadline optimization are the
challenges. We try to mitigate the problem with proposed algorithms in a convincing
way which can lead the faster and prominent BDA for various available tools like
Hadoop, Spark, etc. Our proposed algorithms are in implementation phase. As future
work, a try is on implementation of this work as a lightweight API-based integration
module for resource management.

Reference

1. GERA, P., et al. (2016). A recent study of emerging tools and technologies boosting big data
analytics.
2. Shvachko, K., et al. (2010). The Hadoop distributed file system. In Proceedings of the 26th
IEEE Symposium on Mass Storage Systems and Technologies (MSST), Washington, DC, USA.
3. George, L. (2011). HBase: The definitive guide: Random access to your planet-size data.
O’Reilly Media, Inc.
4. Ghemawatand, S., & Dean, J. (2008). Mapreduce: Simplified data processing on large clusters.
Communications of the ACM.
5. Malik, P., & Lakshman, A. (2010). Cassandra: A decentralized structured storage system. ACM
SIGOPS OS Review.
6. Zaharia, M., et al. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-
memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked
Systems Design and Implementation, San Jose, CA.
7. Vavilapalli, V. K., et al. (2013). Apache Hadoop yarn: Yet another resource negotiator. In
Proceedings of the 4th ACM Annual Symposium on Cloud Computing, Santa Clara, California.
8. Hu, M., et al. (2015). Deadline-oriented task scheduling for MapReduce environments. In
International Conference on Algorithms and Architectures for Parallel Processing (pp. 359–
372). Berlin: Springer.
9. Golab, W., et al. (2018). OptEx: Deadline-aware cost optimization for spark. Available
at https://github.com/ssidhanta/OptEx/blob/master/optex_technical.pdf, Technical Report, 01
2018.
10. Hindman, B., et al. (2011). Mesos: A platform for fine-grained resource sharing in the data
center. In NSDI (pp. 22–22).
11. “Netflix at spark+ai summit 2018,” by F. Siddiqi, in 2018.
12. Laney, D., et al. (2001). 3D data management: Controlling data volume, velocity, and variety.
13. Hindman, B., et al. (2011). Mesos: A platform for fine-grained resource sharing in the data
center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and
Implementation, Boston, MA, USA.
38 A. Manekar and G. Pradeepini

14. Ghemawat, S., et al. (2004). MapReduce: Simplified data processing on large clusters.
In Proceedings of the 6th Conference on Symposium on Operating Systems Design &
Implementation (Vol. 6 of OSDI’04, pp. 10–10).
15. Pradeepini, G., et al. (2016). Cloud-based big data analytics a review. In Proceedings—2015
International Conference on Computational Intelligence and Communication Networks, CICN
IEEE 2016 (pp. 785–788).
16. Misra, V., et al. (2007). PBS: A unified priority-based scheduler. In ACM SIGMETRICS
Performance Evaluation Review (Vol. 35. 1, pp. 203–214). ACM.
17. Zaharia, M., Das, T., & Armbrust, M., et al. (2016). Apache spark: A unified engine for big
data processing. Communications of the ACM.
18. Dimopoulos, S., & Krintz, C., et al. (2017). Justice: A deadline-aware, fair-share resource
allocator for implementing multi-analytics. In 2017 IEEE International Conference on Cluster
Computing (CLUSTER) (pp. 233–244).
19. Jette, M. A., et al. (2003). Slurm: Simple Linux utility for resource management. In Workshop
on Job Scheduling Strategies for Parallel Processing (pp. 44–60). Berlin: Springer.
20. Pradeepini, G., et al. Experimenting cloud infrastructure for tomorrows big data analytics.
International Journal of Innovative Technology and Exploring Engineering, 8(5), 885–890.
21. Cheng, S., et al. (2016). Evolutionary computation and big data: Key challenges and future
directions. In Proceedings of the Data Mining and Big Data, First International Conference,
DMBD 2016, Bali, Indonesia, (pp. 3–14).
22. Singer, G., et al. (2010). Towards a model for cloud computing cost estimation with reserved
instances. CloudComp.
23. Xiong, N., et al. (2015). A walk into metaheuristics for engineering optimization: Principles,
methods, and recent trends. International Journal of Computational Intelligence Systems, 8,
606–636.
24. https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarnsite/FairScheduler.html. for
YARN Fair Scheduler.
25. Chestna, T., & Imai, S., et al. Accurate resource prediction for hybrid IAAS clouds using
workload-tailored elastic compute units. ser. UCC’13.
26. Pradeepini, G., et al. (2017). Opportunity and challenges for migrating big data analytics in
cloud. In IOP Conference Series: Materials Science and Engineering.
A Comprehensive Survey on Passive
Video Forgery Detection Techniques

Vinay Kumar, Abhishek Singh, Vineet Kansal, and Manish Gaur

Abstract In the ongoing year, video falsification identification is a significant issue


in video criminology. Unapproved changes in video outline causing debasement of
genuineness and uprightness of inventiveness. With the progression in innovation,
video preparing apparatuses and procedures are accessible for modifying the record-
ings for falsification. The adjustment or changes in current video is imperative to
identify, since this video can be utilized in the validation procedure. Video authen-
tication thus required to be checked. There are various ways by which video can
be tempered, for example, frame insertion, deletion, duplication, copy and move,
splicing and so on. This article presents forgery detection techniques like inter-frame
forgery, intra-frame forgery & compression-based forgery detection that can be used
for video tampering detection. Thorough analysis of newly developed techniques,
passive video forgery detection is helpful for finding the problems and getting out
new opportunities in the area of passive video forgery detection.

Keywords Video forgery detection · Video tamper detection · Passive-blind video


forensic · Video authentication

V. Kumar · M. Gaur
Department of Computer Science and Engineering, Centre for Advanced Studies,
Dr. A.P.J Abdul Kalam Technical University, Lucknow, India
e-mail: vinay.kumar@cas.res.in
M. Gaur
e-mail: director@cas.res.in
A. Singh (B) · V. Kansal
Department of Computer Science and Engineering, Institute of Engineering and Technology
Lucknow, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India
e-mail: 2216@ietlucknow.ac.in
V. Kansal
e-mail: vineetkansal@ietlucknow.ac.in

© The Editor(s) (if applicable) and The Author(s), under exclusive license 39
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_4
40 V. Kumar et al.

1 Introduction

Video forgery in the modern era requires attention significantly. The prime reason
for the same is that transmission of information using multimedia is preferred choice
due to low encryption cost. The processing of information within multimedia is
through frame reading. Due to mass utilization of this mechanism in transmission,
it is maliciously attacked by hackers and frames are altered. To this end, researcher
uses distinct mechanism to perform encryption and detecting forgery if any within
video frames.
The digital video tampering in which the contents of videos is modified or changed
to made it doctored or fake video [1]. Attacks that change video can be divided into
three domains: spatial, temporal and spatial–temporal. Tampering can be done using
various techniques [2]. There are following types of tampering that are applied to
videos:
• Shot-level tampering: In this, the scene is detected from videos, and then this scene
is copied to another place or manipulation is done in this scene. This tampering
is used in temporal or spatial level.
• Frame-level tampering: The frames from videos are extracted first, then tampering
is done on these frames. The forger may remove, add or copy the frames for
changing the contents of videos. It is one of temporal tampering mechanisms
used to alter frames within the videos.
• Block-level tampering: It is applied on blocks of videos, i.e., any specified area
of video frames. In this, blocks are cropped and replaced in videos. It is spatial
tampering that is performed at block level.
• Pixel-level tampering. In this, video frames are changed at pixel level. In this,
pixels of videos are modified or copied or replaced [3]. The spatial attacks are
performed at pixel level.

1.1 Video Forensic

The last decade has seen video forensic becoming an important field of study. As
shown in Fig. 1, it is divided into three types of categories [4–6].

Video Forensic

Differentiate
Source Video Original & edited Forgery Detection
Video

Fig. 1 Type of video forensic


A Comprehensive Survey on Passive Video Forgery … 41

The categories are source recognition, the ability to distinguish between computer
generated and actual video, and the detection of forgeries. The first group emphasizes
on describing the source of a digital product, such as mobile phones, camcorders,
cameras, etc. The second objective is to differentiate between the real videos and
edited video. The third is forgery detection aimed at finding proof of tempering in
video digital data.

1.2 Objective of Digital Video Forensic

Digital video forensic is concerned with the three main tasks as shown in Fig. 2. To
tackle the contest of digital content authentication, the video media forensics area
provides a set of tools and techniques known collectively as tamper or forgery detec-
tion techniques. Minute digital video or image content adjustments can cause real
societal and valid problems. Altered recordings may be used as executing misleading
news accounts, or misleading individuals. There are larger number of networks who
manipulate media data on social networking sites such as Yahoo, Twitter, TikTok,
Instagram, Facebook and YouTube.
This paper is organized under following sections: Section 2 gives details of video
forgery detection methods which are used to avoid above tempering methods and
qualitative analysis of video forgery passive detection, and Sect. 3 presents compara-
tive analysis of different techniques. Section 4 presents highlights and issues in video
forgery detection, after that we conclude and present future scope of this paper in
Sect. 5.

2 Video Forgery Detection Mechanisms and Qualitative


Analysis of Passive Techniques

Video forgery detection aims to establish the authenticity of a video and to expose
the potential modifications and forgeries that the video might have undergone [7].
Undesired post-processing operations or forgeries generally are irreversible and leave
some digital footprints. Video forgery detection techniques scrutinize these footprints

Objective Video
Forensic

Camera Tamper Hidden Data


Identification Detection Recovery

Fig. 2 Objective of video forensic


42 V. Kumar et al.

Video Forgery Detection

Passive Approach
Active Approach

Digital Signature Inter Frame Intra Frame Compression Based


Watermarking
Forgery detection Forgery detection Forgery detection

Fig. 3 Classification of video forgery detection

in order to differentiate between original and the forged videos. When a video is
forged, some of its fundamental properties change and to detect these changes is what
is called as video forgery detection techniques used for. There are two fundamental
approaches for video forgery detection: active approach and passive approach as
shown in Fig. 3.

2.1 Active Approach

Active forgery detection includes techniques like digital watermarking and digital
signatures which are helpful to authentic content ownership and copyright violations
[8]. Though the basic application of watermarking and signatures is copyright protec-
tion, it can be used for fingerprint, forgery detection, error concealment, etc. There
are several drawbacks to the active approach as it requires a signature or watermark
to be embedded during the acquisition phase at the time of footage or an individual to
embed it later after acquisition phase. This restricts the application of active approach
due to the need of distinctive hardware like specially equipped cameras. Other issues
which have an impact on the robustness of watermarks and signatures are factors
like compression, scaling, noise, etc.

2.2 Passive Approach

Passive forgery detection techniques are considered as an advancing route in digital


security. The approach works in contrast to that of the active approach. This approach
works in without the constraint for specialized hardware nor does it require any first-
hand information about the video contents. Thus, it is also called as passive-blind
approach. The basic assumption made by this approach is that videos have some
inherent properties or features which are consistent in original videos. When a video
is forged, these patterns are altered. Passive approaches extract these features from
a video and analyse them for different forgery detection purposes. Video forgery is
A Comprehensive Survey on Passive Video Forgery … 43

Feature used for Video Forgery Detection

Camera/sensor Coding Motion Objects


Artifacts Artifacts Features Features

Fig. 4 Features used for video forgery detection

sometime not identified because of defect in software system; this defect is removed
by early predicting defect in software [9, 10].
Different types of descriptive features were used by various researchers to accom-
plish the task of forgery detection [11–16]. Figure 4 presents features used for video
forgery detection. Thus, to overcome the inefficiency encountered in the active
approach, the use of passive approach for video forgery detection can be made.
Passive approach thus proves to be better than the active ones as it works on the
first-hand and information without the need for extra information bits and hardware
requirements. It totally relies on the available forged video data and its intrinsic
features and properties without the need of original video data.
To be specific, active techniques include motion detection mechanisms and passive
technique includes static mechanisms. The forgery under static mechanisms falls
under inter-, intra- and compression-based mechanisms.

2.2.1 Inter-frame Forgery Detection

Inter-frame forgery detection mechanism adventures the time-based similarity within


the video between frames [17]. The parity difference between frames is used as
a footprint to locate any problems within the video frames. The parity difference
between frame is exploited by the use of even or odd parity check mechanisms. The
parity check mechanism incorporated checks whether data transmission includes
even number of frames or odd number of frames. In case sent frames in even parity
and transmitted frames are in odd parity, then forgery is detected. Table 1 is describing
the different inter-forgery detection techniques based on frame deletion, insertion,
duplication used in video forgery their advantages, disadvantages and accuracy result.
Table 2 is describing the different inter-forgery detection technique-based copy
frame analysis used in video forgery their advantages, disadvantages and accuracy
result.

2.2.2 Intra-frame Forgery Detection

Intra-frame forgery detection uses the gaps between the frames to detect the forgery
if any between the video frames. These mechanisms include copy-move forgery,
splicing, etc. The image frames within videos are altered by the use of this mechanism.
44

Table 1 Inter-frame forgery based on frame deletion, insertion & duplication techniques
Paper references Year Description Advantages Limitation Results
Detection of Inter-frame 2018 Proposes new forensic It gives reliable result and It is not efficient due to using Detection rate is better and
Forgeries in Digital Videos footprint based on the detection rate is improved more than one for utilizes CBR and VBR
[18] variation of the macro-block compressed video
prediction types (VPF) in the
P-frames &, also estimate
the size of a GOP
Inter-frame Passive-Blind 2018 A passive-blind video Motion-based detections Motion within the video can Accuracy of scheme is
Forgery Detection for Video shooting forensics scheme is done with accuracy using be further detected 99.01%
Shot Based on Similarity that inter-frame forgeries are tangent-based approach accurately using noise
Analysis [19] found. This method consists handling procedure
of two parts:
hue-saturation-value (HSV)
colour histogram comparison
and speeded-up robust
features (SURF) extraction
function along with fast
library for approximate
nearest neighbours (FLANN)
matching for double-check
Inter-frame Forgery 2017 Proposes methodology that It reduces the conflicting Performance of system It has average detection
Detection in H.264 Videos uses residual and optical results. It gives precise suffers when high accuracy around 83%
Using Motion and flow estimation in consistent localization of forgery illumination videos are used
Brightness Gradients [20] to detect frame insertion,
duplication in removal
videos encoded in MPEG-2
and H.264. It is used for
detecting forgeries in videos
by exhibiting object motion
V. Kumar et al.
Table 2 Inter-frame forgery based on copy frame analysis techniques
Video Inter-frame Forgery 2017 It proposes hybrid The defects are automatically It is unable to detect It detects max. and min.
Detection Approach for mechanism that uses motion detected using spikes count forgery frame in slow number of frames forged
Surveillance and and gradient feature to motion videos which is 60 and 10
Mobile-Recorded Videos extent variation between
[21] various frames. In this,
forensic artefacts are
analysed using objective
methodology
A New Copy Move 2016 Proposes a method for This method is efficient & The feature detection is The detection is better even
Forgery Detection Method detecting copy-move suitable for slower than the ORB if forged image has been
Resistant to Object forgery in the videos. It removing/inserting frames feature rotated, blurred, 98.7%
Removal of Uniform utilizes hybrid methodology
Background Forgery [22] of AKAZE feature and
RANSAC for detection of
copied frame and for
elimination of false match.
It detects forgery of object
A Comprehensive Survey on Passive Video Forgery …

removal and replication


Inter-frame Video Forgery 2016 Proposes inter-frame The picture is split into It does not split the image Accuracy of detecting errors
Detection & Localization manipulation detection and different frequency bands so into maximum level of the is high 99.46%
Using Intrinsic Effects of localization in that we can process the image
Double Compress on MPEGx-coded video particular image and video
Quantization Errors of Based on traces of frame block easily
Video Coding [23] quantization error in PMB
residual errors
(continued)
45
Table 2 (continued)
46

Detection of 2016 Proposes a forensic It is inexpensive and The frame addition is not Frame removal detection
Re-compression, technique that is used to independent of heuristically considered in this approach technique achieved an
Transcoding and Frame identify recompressed or computed thresholds average accuracy of 99.3%
Deletion for Digital Video transcoded videos by
Authentication [24] inspecting videos optical
flow. Its detection accuracy
is better because it does not
limit by the number of
post-production
compressions
Chroma Key Background 2016 Proposes a blurring Gives better recall rate with It does not handled Method achieving detection
Detection for Digital Video artefact-based technique for efficiency background colour accuracy of 91.12%
Using Statistical detecting features in video
Correlation of Blurring along with chroma key. It
Artifact [25] first of all extracts the frame
that has blurring effect; then
it is further analysed for
forged region
V. Kumar et al.
A Comprehensive Survey on Passive Video Forgery … 47

To detect such forgery, boundary colours and frames distinguishment are analysed.
Result in terms of bit error rate is expressed using these mechanisms. Table 3 is
describing the different intra-forgery detection techniques used in video forgery their
advantages, disadvantages and accuracy result.

2.2.3 Compression-Based Mechanisms

The compression-based mechanisms include discrete cosine transformation. These


mechanisms replace multiple distinct values from within the image frame with
single-valued vectors. The feature vector then identified any malicious activity within
the video frames. Results are most often expressed in the form of peak signal-to-
noise ratio. Table 4 is describing the different compression-based forgery detection
technique features.

3 Comparative Analysis of Passive Video Forgery Detection


Techniques

In this section, comparative analysis of various video forgery detection. Earlier paper
appraises only a few forensic recording techniques. Many noteworthy and recent
achievements were not examined or analysed [35–38]. It analyses the performance
of copy-paste forgery detection techniques on motion-residue based approach [39],
object based approach [40] and optical-flow-based approach [41]. Figure 5 presents
the comparative outcomes of approach based on the quality factors.
Figure 6 analyses the performance of inter-forgery detection on noise-based
approach [42], optical-flow-based approach [43] and pixel based approach [44] and
presents comparative overview of the findings as a feature of specific Quality factors
percentage and different number of inserted/deleted/duplicated frames.
Now as analysis suggests that motion-based forgery detection mechanisms are
uncommon and hard to detect. In category 1(Inter-frame forgery) mechanism of the
research papers is analysed and major part of the research is focused upon the param-
eters such as mean square error and peak signal-to-noise ratio. In category 2(Intra
frame forgery) of research papers, noise handling procedure accommodated within
these papers allows peak signal-to-noise ratio to enhance. In category 3(Compres-
sion based mechanism) of the papers lies and video forgery detection mechanism
employed within such situation causes frame rate to decrease and hence noise within
frame increases. Sometimes video forgery cannot detect due to software failure due
to that peak signal-to-noise ratio value is altered [45–47]. These detection mecha-
nisms allow parameters like PSNR and MSE to be optimized, and the accuracy result
obtained in these mechanisms is shown Fig. 7. Generally, more than 100 videos were
tested and used during comparative analysis. All these videos show both basic and
Table 3 Intra-frame forgery detection techniques
48

Paper references Year Description Advantages Limitation Results


Object-Level Visual 2018 Proposed a method that Take less time to predict the Inefficiency with It gives detection
Reasoning in Videos Fabien detects objects from videos forgery region compressed videos accuracy with 96%
[26] using cognitive
methodology. It provides the
facility to learn from detailed
interactions and forged
region is detected
MesoNet: A Compact Facial 2018 Describes a methodology SIFT technique is used with We cannot get the exact It is useful for automatic
Video Forgery Detection that detects facial video the aid of neighbouring pixel value of the pixels in the image frame
Network[27] forgery, and it is hybrid values to process the frame SIFT technique and do not
technique. It focuses on and image consider the exact route of
mesoscopic features of frame processing
image properties
Coarse-to-Fine Copy-Move 2018 It proposed a coarse-to-fine Duplicated regions detect High computation time of Accuracy was found to
Forgery Detection Video approach based on video OF with changed contrast values the algorithm be 98.79%
Forensics [28] features. It detects and blurred regions can also
copy-move forgery from be detected
videos with the help of
overflow feature
(continued)
V. Kumar et al.
Table 3 (continued)
Paper references Year Description Advantages Limitation Results
Improvement in Copy-Move 2016 Proposes SIFT and SURF Multi-dimensional and Cannot be applied on The outcome of error
Forgery Detection Using methodology along with multi-directional give precise compressed images detection and the JPEG
Hybrid Approach [29] DWT for detecting forgery results compression test is vital
in videos. Those algorithms
are based on the descriptor
of colour and texture. The
aim of these two algorithms
is to extract digital image
features and then match to
test whether the image is
being faked or not
A Video Forgery Detection 2016 Describes a SIFT- and It can robustly identify It is not useful for real-time The leaving model
Using Discrete Wavelet DWT-based algorithm that objects even among clutter videos 98.21% accuracy
Transform and Scale first of all extract features of and under partial occlusion
Invariant Feature Transform video frames then forged
A Comprehensive Survey on Passive Video Forgery …

Techniques [30] region is detected. It is


mostly used for location of
vindictive control with
computerized recordings
(advanced frauds)
49
50

Table 4 Compression-based forgery detection techniques


Paper references Year Description Advantages Limitation Results
Optimizing Video Object 2018 Proposed a hybrid approach The symmetry reduces Classification accuracy can Selected key frames as
Detection via a Scale-Time that uses detection, temporal execution time be further improved by the opposed to random sampled
Lattice Kai [31] propagation, and across-scale use of STLK ones
refinement under this
vacuum, the various
configurations built
Frame-wise Detection of 2017 Proposed a methodology that It has high performance The filtering mechanism The average accuracy is
Relocated I-frames in utilizes pre-processing and for relocating I-frames should be enhanced for better around 96% which is based
Double Compressed H.264 CNN mechanism for in compressed videos detection on GOPs
Videos Based on frame-wise detection of It does not apply frame-wise
Convolutional Neural compressed videos forgery detection result for various
Network [32] detection of inter-frame
forgeries
Detection of 2017 Proposes a The accuracy of forgery Only robust to MPEG The rank of accuracy is better
Double-Compressed double-compressed detection is better as compression and
H.264/AVC Video H.264/AVC video-detection compared to existing recompression
Integrating the Features the method. The feature of the method
String of Data Bits and Skip string of the data bits for each
Macroblocks [33] P-frame was extracted and
then incorporated with the
skip macro-blocks feature
Double Compression 2016 Technique uses block It handles compressed Low compression bit rate It gives better discriminative
Detection in MPEG-4 artefacts for detection of videos efficiently videos are no handled performances compared
Videos Based on Block compression-based forgery. existing technique
Artifact Measurement with It combined VPF along with
Variation of Prediction block artefacts for robust and
Footprint [34] efficient detection abilities
V. Kumar et al.
A Comprehensive Survey on Passive Video Forgery … 51

100
90
Moon Based
80
70 Object based

60 Opcal Based
50
85.3

85.2
76.1

85.2

82.1
82.6

79.9

88.7
89.1
40
X axis: Bit-rates
30
Y axis: Quality
20 Factor
10
0
Bitrate(3) Bitrate(6) Bitrate(9)

Fig. 5 Comparative outcomes of copy-paste forgery at different bit rate and quality factors

100
Noise Based
90
80 Opcal flow
70 based
60 Pixel Correcon

50
82.3

X axis: Number
85.3
86.1

40
83.9

82.9

88.0
71.9

85.4
79.1

30 Of Frames
20
Y axis: Quality
Factor
10
0
Frame (30) Frame (60) Frame (100)

Fig. 6 Effect on quality factor percentage by insertion/deletion/duplication the different number


of frames

complex lifelike scenarios, depicting scenes both indoor and outdoor. All of the
forgeries were created plausibly to simulate practical forensic scenarios.
52 V. Kumar et al.

100
90
80
70
Inter Frame Forgery
60

96.73
Intra Frame Forgery
93.42

98.62
50
Compression Based Forgery
40
30
20 X axis : Video Forgery
10
Detection techniques.

0 Y-axis : Accuracy
Category 1 Category 2 Category 3 Percenatage

Fig. 7 Cumulative accuracy in passive video forgery detection method

4 Major Highlights, Issues & Finding in Passive Video


Forgery Detection

The domain of video forensic and video anti-forensic is explored. The results consist
majority of passive video methodologies discussed in this survey. Most of the tech-
niques use GOP structure [48–53] because it is easier to understand and they are
having fixed number of frames. Types of temper video can suffer and various source
for passive technique used to detect attack. The major highlights for detecting forgery
of video are following.
• In inter-frame techniques, detection of forgery is done by taking one frame at
single instance.
• In intra-frame techniques, detection of forgery is done by establishing the
relationship between two adjacent frames.
• Various techniques in which detection of forgery via detection of double
compression.
• Detection of forgery by motion and brightness feature-based inter-frame forgery
detection technique.
• Pixel-level analysis-based techniques for detecting pixel similarities in video
forgery.
• Analysis by copy-paste forgery detection techniques by looking for similarities
or correlation between same regions.
A Comprehensive Survey on Passive Video Forgery … 53

Digital video forensics was also seen as being in its rudimentary stages. The iden-
tification of digital forgery is a very difficult activity, and the lack of a widely avail-
able solution exacerbates the situation. The various issues in video forgery detection
techniques obtained during this survey are following
• A significant shortcoming is that on realistically manipulated video, they lack
sufficient validation. Manually producing fake videos is very time-consuming and
so most authors performed research on synthetically doctored sequences [54–56].
• Digital video forensics was also seen as still in its rudimentary stages. The identifi-
cation of digital forgery is a rather complex job, and the lack of a widely applicable
solution exacerbates the situation [57–59].
• Video forensic detects frame manipulation by double compression if forger
directly modifies the encoded video than insufficient anti-forensic and counter
anti-forensic strategies [60, 61].
• For better video forgery detection, a huge database of tempered video is required
[62–65].
From video forgery detection, we analyse the performance of various techniques
like optical-based, motion-based, object-based, noise-based, pixel correction-based
and copy-paste detection techniques. The major finding we obtained during these
technique analyses is following
• Understanding the reliability factors in video forgery detection in much better way.
Video forgery includes issue related to multimedia heterogeneity, issue related to
editing software and content of video which effect the reliability.
• In future combining the active and passive techniques for obtaining the better
accuracy in quality factors of forged video.
• Integration of fields like artificial intelligence, machine learning, signal
processing, computer vision and deep learning with the discussed techniques
can also produce more accurate result.

5 Conclusion and Future Scope

Nowadays, mostly information is presented through videos rather than textually,


earlier forgery commonly takes place with text information, but nowadays video
forgery is common. Digital forensics on video is also in its infancy. Digital video’s
fidelity to fact is under threat from hacking. Various video-editing tools like Adobe
Photoshop & Illustrator, Cinelerra, Lightworks, GNU Gimp, Premier, Vegas are
available easily [66–72], so video forensic is one of the major research areas to detect
the forgery in video. This paper analyses the various techniques used in order to detect
the forgery within the digital videos. To tackle the issues in video forgery detection
mechanisms are researched over in this paper with their advantages, limitation and
result. Video counter forensic and anti-forensic is also explored.
In future, researcher and developers work on overall lack of vigour poten-
tially induced by lack of standardized databases, pixel-based approach and motion
54 V. Kumar et al.

detection mechanism like tangent-based strategy that can be used to enhance for
better encryption and decryption of video frames along with splicing techniques for
enhancement.

References

1. R. Saranya, S. Saranya, & R. Cristin. (2017). Exposing Video Forgery Detection Using Intrinsic
Fingerprint Traces 1 1, 2, 3 IEEE Access, 73–76.
2. Kelly, F. (2006). Fast probabilistic inference and GPU video processing [thesis], Trinity College
(Dublin, Ireland). Department of Electronic & Electrical Engineering, p. 178.
3. Li J, He H, Man H, Desai S (2009) A general-purpose FPGA-based reconfigurable platform
for video and image processing. In W. Yu, H. He, & N. Zhang (Eds.), Advances in neural
networks—ISNN 2009. ISNN 2009. Lecture notes in computer science (Vol. 5553, Berlin:
Springer).
4. H. T. Sen car, & N. Memon (2008). Overview of state-of-the-art in digital image forensics.
Statistical Science and Interdisciplinary Research, 325–347.
5. Asok, D., Himanshu, C., & Sparsh, G. (2006). Detection of forgery in digital video. In 10th
World Multi Conference on Systemics, Cybernetics and Informatics, pp. 16–19, Orlando. USA.
6. Su, L., Huang, T., & Yang, J. (2014). A video forgery detection algorithm based on compressive
sensing. Springer Science and Business Media New York.
7. Shanableh, T. (2013). Detection of frame deletion for digital video forensics. Digital
Investigation, 10(4), 350–360.
8. Hsu, C. C., Hung, T. Y., Lin, C. W., & Hsu, C. T. (2008). Video forgery detection using
correlation of noise residue. In 2008 IEEE 10th workshop on multimedia signal processing,
pp. 170–174.
9. Ghosh, S., Rana, A., & Kansal, V. (2019). Evaluating the impact of sampling-based nonlinear
manifold detection model on software defect prediction problem. In S. Satapathy, V. Bhateja, J.
Mohanty, & S. Udgata (Eds.), Smart intelligent computing and applications. Smart innovation,
systems and technologies (Vol. 159), pp. 141–152.
10. Ghosh, S., Rana, A., & Kansal, V. (2017). Predicting defect of software system. In Proceedings
of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Appli-
cations (FICTA-2016), Advances in Intelligent Systems and Computing (AISC), pp. 55–67,
2017.
11. Kurosawa, K., Kuroki, K., & Saitoh, N. (1999). CCD fingerprint method identification of a
video camera from videotaped images. In Proceedings of IEEE International Conference on
Image Processing, Kobe, Japan, pp. 537–540.
12. Lukáš, J., Fridrich, J., & Goljan, M. (2006). Digital camera identification from sensor pattern
noise. IEEE Transactions on Information Forensics and Security, 1(2), 205–214.
13. Goljan, M., Chen, M., Comesaña, P., & Fridrich, J. (2016). Effect of compression on sensor-
fingerprint based camera identification. Electronic Imaging, 1–10.
14. Mondaini, N., Caldelli, R., Piva, A., Barni, M., & Cappellini, V. (2007). Detection of malev-
olent changes in digital video for forensic applications. In E. J. Delp, & P. W. Wong (Eds.),
Proceedings of SPIE Conference on Security, Steganography and Watermarking of Multimedia
Contents (Vol. 6505, No. 1).
15. Wang, W., & Farid, H. (2007). Exposing digital forgeries in interlaced and deinterlaced video.
IEEE Transactions on Information Forensics and Security, 2(3), 438–449.
16. Wang, W., & Farid, H. (2006). Exposing digital forgeries in video by detecting double MPEG
compression. In: S. Voloshynovskiy, J. Dittmann, & J. J. Fridrich (Eds.), Proceedings of
8th Workshop on Multimedia and Security (MM&Sec’06) (pp. 37–47). ACM Press, New York.
A Comprehensive Survey on Passive Video Forgery … 55

17. Hsia, S. C., Hsu, W. C., & Tsai, C. L. (2015). High-efficiency TV video noise reduction
through adaptive spatial–temporal frame filtering. Journal of Real-Time Image Processing,
10(3), 561–572.
18. Sitara, K., & Mehtre, B. M. (2018). Detection of inter-frame forgeries in digital videos. Forensic
Science International, 289, 186–206.
19. Zhao, D. N., Wang, R. K., & Lu, Z. M. (2018). Inter-frame passive-blind forgery detection for
video shot based on similarity analysis. Multimedia Tools and Applications, 77(19), 25389–
25408.
20. Kingra, S., Aggarwal, N., & Singh, R. D. (2017). Inter-frame forgery detection in H.264
videos using motion and brightness gradients. Multimedia Tools and Applications, 76(24),
25767–25786.
21. Kingra, S., Aggarwal, N., & Singh, R. D. (2017). Video inter-frame forgery detection approach
for surveillance and mobile recorded videos. International Journal of Electrical & Computer
Engineering, 7(2), 831–841.
22. Ulutas, G., & Muzaffer, G. (2016). A new copy move forgery detection method resistant to
object removal with uniform background forgery. Mathematical Problems in Engineering,
2016.
23. Abbasi Aghamaleki, J., Behrad, A. (2016). Inter-frame video forgery detection and localization
using intrinsic effects of double compression on quantization errors of video coding. Signal
Processing: Image Communication, 47, 289–302.
24. Singh, R. D., & Aggarwal, N. (2016). Detection of re-compression, transcoding and frame-
deletion for digital video authentication. In 2015 2nd International Conference on Recent
Advances in Engineering & Computational Sciences RAECS.
25. Bagiwa, M. A., Wahab, A. W. A., Idris, M. Y. I., Khan, S., & Choo, K. K. R. (2016). Chroma key
background detection for digital video using statistical correlation of blurring artifact. Digital
Investigation, 19, 29–43.
26. Baradel, F., Neverova, N., Wolf, C., Mille, J., & Mori, G. (2018). Object level visual reasoning
in videos. In Lecture Notes in Computer Science (including Subseries Lecture Notes Artificial
Intelligence, Lecture Notes Bioinformatics) (Vol. 11217, pp. 106–122). LNCS.
27. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A compact facial video
forgery detection network. In 2018 IEEE International Workshop on Information Forensics
and Security (WIFS).
28. Jia, S., Xu, Z., Wang, H., Feng, C., & Wang, T. (2018). Coarse-to-fine copy-move forgery
detection for video forensics. IEEE Access, 6(c), 25323–25335.
29. Kaur Saini, G., & Mahajan, M. (2016). Improvement in copy—Move forgery detection using
hybrid approach. International Journal of Modern Education and Computer Science, 8(12),
56–63.
30. Kaur, G., & Kaur, R. (2016). A video forgery detection using discrete wavelet transform and
scale invarient feature transform techniques. 5(11), 1618–1623.
31. Chen, K., Wang, J., Yang, S., Zhang, X., Xiong, Y., Loy, C. C., & Lin, D. (2018). Optimizing
video object detection via a scale-time lattice. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 7814–7823).
32. He, P., Jiang, X., Sun, T., Wang, S., Li, B., & Dong, Y. (2017). Frame-wise detection of relocated
Iframes in double compressed H.264 videos based on convolutional neural network. Journal
of Visual Communication and Image Representation, 48, 149–158.
33. Yao, H., Song, S., Qin, C., Tang, Z., & Liu, X. (2017). Detection of double-compressed
H.264/AVC video incorporating the features of the string of data bits and skip macroblocks.
Symmetry (Basel), 9(12), 1–17.
34. Rocha, A., Scheirer, W., Boult, T., & Goldenstein, S. (2011). Vision of the unseen: Current
trends and challenges in digital image and video forensics. ACM Computing Surveys, 43(4),
26.
35. Milani, S., Fontani, M., Bestagini, P., Barni, M., Piva, A., Tagliasacchi, M., & Tubaro, S. (2012).
An overview on video forensics. APSIPA Transactions on Signal and Information Processing,
1(1), 1–18.
56 V. Kumar et al.

36. Wahab, A. W. A., Bagiwa, M. A., Idris, M. Y .I., Khan, S., Razak, Z., & Ariffin, M. R. K. Passive
video forgery detection techniques: a survey. In Proceedings of 10th International Conference
on Information Assurance and Security, Okinawa, Japan, pp. 29–34.
37. Joshi, V., & Jain, S. (2015). Tampering detection in digital video e a review of temporal
fingerprints based techniques. In Proceedings of 2nd International Conference on Computing
for Sustainable Global Development, New Delhi, India, pp. 1121–1124.
38. Bestagini, P., Milani, S., Tagliasacchi, M., & Tubaro, S. (2013). Local tampering detection in
video sequences. In Proceedings of 15th IEEE International Workshop on Multimedia Signal
Processing. Pula, pp. 488–493.
39. Zhang, J., Su, Y., Zhang, M. (2009). Exposing digital video forgery by ghost shadow artifact.
In Proceedings of 1st ACM Workshop on Multimedia in Forensics (MiFor’09) (pp. 49–54).
NewYork: ACM Press.
40. Bidokhti, A., Ghaemmaghami, S.: Detection of regional copy/move forgery in MPEG videos
using optical flow. In: International symposium on Artificial intelligence and signal processing
(AISP), Mashhad, Iran, pp. 13–17 (2015).
41. De, A., Chadha, H., & Gupta, S. (2006). Detection of forgery in digital video. In Proceedings
of 10th World Multi Conference on Systems, Cybernetics and Informatics (pp. 229–233).
42. Wang, W., Jiang, X., Wang, S., & Meng, W. (2014). Identifying video forgery process using
optical flow. In Digital forensics and watermarking (pp. 244–257). Berlin: Springer.
43. Lin, G. -S., Chang, J. -F., Chuang, F. -H. (2011). Detecting frame duplication based on spatial
and temporal analyses. In Proceedings of 6th IEEE International Conference on Computer
Science and Education (ICCSE’11), SuperStar Virgo, Singapore, pp. 1396–1399.
44. Ghosh, S., Rana, A., & Kansal, V. (2017). Software defect prediction system based on linear and
nonlinear manifold detection. In Proceedings of the 11th INDIACom; INDIACom-2017; IEEE
Conference ID: 40353, 4th International Conference on—Computing for Sustainable Global
Development (INDIACom 2107) (pp. 5714–5719). INDIACom-2017; ISSN 0973–7529; ISBN
978–93–80544–24–3.
45. Ghosh, S., Rana, A., & Kansal, V. (2018). A nonlinear manifold detection based model for
software defect prediction. International Conference on Computational Intelligence and Data
Science; Procedia Computer Science, 132(8), 581–594.
46. Ghosh, S., Rana, A., & Kansal, V. (2019). Statistical assessment of nonlinear manifold detec-
tion based software defect prediction techniques. International Journal of Intelligent Systems
Technologies and Applications, Inderscience, Scopus Indexed, 18(6), 579–605. https://doi.org/
10.1504/IJISTA.2019.102667.
47. Luo, W., Wu, M., & Huang, J. (2008). MPEG recompression detection based on block artifacts.
In E. J. Delp, P. W. Wong, J. Dittmann, N. D. Memon, (Eds.), Proceedings of SPIE Security,
Forensics, Steganography, and Watermarking of Multimedia Contents X (Vol. 6819), San Jose,
CA.
48. Su, Y., Nie, W., & Zhang, C. (2011). A frame tampering detection algorithm for MPEG
videos. In Proceedings of 6th IEEE Joint International Information Technology and Artificial
Intelligence Conference, Vol. 2, pp. 461–464. Chongqing, China.
49. Vázquez-Padín, D., Fontani, M., Bianchi, T., Comesana, P., Piva, A., & Barni, M. (2012). Detec-
tion of video double encoding with GOP size estimation. In Proceedings on IEEE International
Workshop on Information Forensics and Security, Tenerife, Spain, Vol. 151.
50. Su, Y., Zhang, J., & Liu, J. (2009). Exposing digital video forgery by detecting motion-
compensated edge artifact. In Proceedings of International Conference on Computational
Intelligence and Software Engineering (Vol. 1, no. 4, pp. 11–13). Wuhan, China.
51. Dong, Q., Yang, G., & Zhu, N. (2012). A MCEA based passive forensics scheme for detecting
framebased video tampering. Digital Investigation, 9(2), 151–159.
52. Kancherla, K., & Mukkamal, S. (2012). Novel blind video forgery detection using Markov
models on motion residue. Intelligent Information and Database System, 7198, 308–315.
53. Fontani, M., Bianchi, T., De Rosa, A., Piva, A., & Barni, M. (2011). A Dempster-Shafer frame-
work for decision fusion in image forensics. In Proceedings of IEEE International Workshop
on Information Forensics and Security (WIFS’11) (pp. 1–6), Iguacu Falls, SA. https://doi.org/
10.1109/WIFS.2011.6123156.
A Comprehensive Survey on Passive Video Forgery … 57

54. Fontani, M., Bianchi, T., De Rosa, A., Piva, A., & Barni, M. (2013). A framework for decision
fusion in image forensics based on Dempster-Shafer theory of Evidence. IEEE Transactions
on Information Forensics and Security, 8(4), 593–607. https://doi.org/10.1109/TIFS.2013.224
8727.
55. Fontani, M., Bonchi, A., Piva, A., & Barni, M. (2014). Countering antiforensics by means
of data fusion. In Proceedings of SPIE Conference on Media Watermarking, Security, and
Forensics. https://doi.org/10.1117/12.2039569.
56. Stamm, M. C., & Liu, K. J. R. (2011). Anti-forensics for frame deletion/addition in mpeg video.
In Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing
(ICASSP’11) (pp. 1876–1879), Prague, Czech Republic.
57. Stamm, M. C., Lin, W. S., & Liu, K. J. R. (2012). Temporal forensics and anti-forensics for
motion compensated video. IEEE Transactions on Information Forensics and Security, 7(4),
1315–1329.
58. Liu, J., & Kang, X. (2016). Anti-forensics of video frame deletion. [Online] https://www.paper.
edu.cn/download/downPaper/201407-346. Accessed 9 July (2016).
59. Fan, W., Wang, K., & Cayere, F., et al. (2013). A variational approach to JPEG anti-forensics.
In Proceedings of IEEE 38th International Conference on Acoustics, Speech, and Signal
Processing (ICASSP’13) (pp. 3058–3062), Vancouver, Canada.
60. Bian, S., Luo, W., & Huang, J. (2013). Exposing fake bitrate video and its original bitrate. In
Proceeding of IEEE International Conference on Image Processing (pp. 4492–4496).
61. CASIA Tampered Image Detection Evaluation Database. [Online]. https://forensics.idealtest.
org:8080. Accessed 30 Mar (2016).
62. Tralic, D., Zupancic, I., Grgic, S., Grgic, M., CoMoFoD—New Database for Copy-
63. Move Forgery Detection. In: Proceedings of 55th International Symposium ELMAR, Zadar,
Croatia (pp. 49–54), [Online]. https://www.vcl.fer.hr/comofod/download.html. Accessed 18
July (2016).
64. CFReDS—Computer Forensic Reference Data Sets, [Online]. https://www.cfreds.nist.gov/.
Accessed 17 May (2016).
65. Kwatra, V., Schödl, A., Essa, I., Turk, G., & Bobick, A. F. (2003). Graph cut textures image
and video synthesis using graph cuts. ACM Transactions on Graphics, 22(3), 277–286.
66. Pèrez, P., Gangnet, M., & Blake, A. (2003). Poisson image editing. ACM Transactions on
Graph. (SIGGRAPH’03, 22(3), 313–318.
67. Criminisi, A., Pèrez, P., & Toyama, K. (2004). Region filling and object removal by exemplar-
based image inpainting. IEEE Transactions on Image Processing, 13(9), 1200–1212.
68. Shen, Y., Lu, F., Cao, X., & Foroosh, H. (2006). Video completion for perspective camera
under constrained motion. In Proceedings of 18th IEEE International Conference on Pattern
Recognition (ICPR’06) (pp. 63–66). Hong Kong, China.
69. Komodakis, N., & Tziritas, G. (2007). Image completion using efficient belief propagation via
priority scheduling and dynamic pruning. IEEE Transactions on Image Processing, 16(11),
2649–2661.
70. Patwardhan, K. A., Sapiro, J., & Bertalmio, M. (2007). Video inpainting under constrained
camera motion. IEEE Transactions on Image Processing, 16(2), 545–553.
71. Hays, J., & Efros, A. A. (2007). Scene completion using millions of photographs. ACM
Transactions on Graph (SIGGRAPH’07), 26(3), 1–7.
72. Columbia Image Splicing Detection Evaluation Dataset. [Online]. https://www.ee.col
umbia.edu/ln/dmvv/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm. Accessed 3
June (2016).
DDOS Detection Using Machine
Learning Technique

Sagar Pande, Aditya Khamparia, Deepak Gupta, and Dang N. H. Thanh

Abstract Numerous attacks are performed on network infrastructures. These


include attacks on network availability, confidentiality and integrity. Distributed
denial-of-service (DDoS) attack is a persistent attack which affects the availability of
the network. Command and Control (C & C) mechanism is used to perform such kind
of attack. Various researchers have proposed different methods based on machine
learning technique to detect these attacks. In this paper, DDoS attack was performed
using ping of death technique and detected using machine learning technique by using
WEKA tool. NSL-KDD dataset was used in this experiment. Random forest algo-
rithm was used to perform classification of the normal and attack samples. 99.76%
of the samples were correctly classified.

Keywords DDoS · Machine learning · Ping of death · Network security · Random


forest · NSL-KDD

S. Pande · A. Khamparia (B)


School of Computer Science Engineering, Lovely Professional University, Phagwara,
Punjab, India
e-mail: aditya.khamparia88@gmail.com
S. Pande
e-mail: sagarpande30@gmail.com
D. Gupta
Maharaja Agrasen Institute of Technology, New Delhi, India
e-mail: deepakgupta@mait.ac.in
D. N. H. Thanh
Department of Information Technology, School of Business Information Technology, University
of Economics Ho Chi Minh City, Ho Chi Minh City, Vietnam
e-mail: thanhdnh@ueh.edu.vn

© The Editor(s) (if applicable) and The Author(s), under exclusive license 59
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_5
60 S. Pande et al.

1 Introduction

With the ongoing convergence of data innovation (IT), various data gadgets are
turning out to be massively muddled. Associated with one another, they keep on
making furthermore spare significant computerized information, introducing a period
of big data. However, the probability is extremely high that they may expose signif-
icant data as they transmit a lot of it through consistent correspondence with one
another. A framework turns out to be more vulnerable as more digital devices are
connected. Hackers may additionally target it to take information, individual data,
and mechanical insider facts and break them for unlawful additions [1]. Given these
conditions, attack detection system (ADS) ought to likewise be more smart and
successful than previously to battle attack from hackers, which are continuously
evolving. Confidentiality, integrity and availability can be considered as the main
pillars of security [2, 3]. All these pillars are discussed below.

1.1 Confidentiality

Confidentiality is also called as secrecy. The motive behind secrecy is to keep sensitive
information away from illegitimate user and to provide access to the legitimate user.
Along with this, assurance must be given on restricted access of the information.

1.2 Integrity

Integrity means keeping up the data as it is without any modification in the data.
Data must be received as it at the receiver end. To provide integrity, file permissions
and user access controls can be used. A variety of techniques has been designed to
provide integrity, and some of them are as follows: checksums, encryption, etc.

1.3 Availability

Availability can also be called as accessibility. Availability means providing the


required data whenever and wherever required by fixing all the issues as early as
possible. Sometime, it is difficult to overcome the situation caused by bottleneck
condition. RAID is one of the popular techniques used for providing availability.
Precaution needs to be taken from the hardware context also. The hardware used
must be kept away at secured places. Apart from this, firewalls can be used to prevent
from malicious activity.
DDOS Detection Using Machine Learning Technique 61

1.4 DDoS Attack Dynamics

As per the report of Kaspersky [4], growth in the frequency and the size of DDoS
attack in the 2018 can be seen. One of the largest DDoS attacks was implemented
on GitHub in the month of February, 2018, which consists of 1.3 TBPS of traffic
transfer [5].

1.5 DDoS Tools

Various tools are freely available for performing DDoS attack; some of them are
listed below [6]
• HULK (HTTP Unbearable Load King)
• GoldenEye HTTP DoS tool
• Tor’s Hammer
• DAVOSET
• PyLoris
• LOIC (Low Orbit Ion Cannon)
• XOIC
• OWASP DoS HTTP Post
• TFN (Tribe Flood Network)
• Trinoo.

2 Related Work

Lot of researchers are working on the detection of the most DDoS which has its
largest impact in the area of social networking by using deep learning and machine
learning techniques. Some of the recent work done in this area is discussed below.
Hariharan et al. [7] used machine learning C5.0 algorithm and have done the
comparative analysis of the obtained results with different machine learning algo-
rithms such as Naïve Bayes classifier and C4.5 decision tree classifier. Mainly, the
author tried to work in offline mode.
BhuvaneswariAmma N. G. et al. [8] have implemented a technique, deep intelli-
gence. The author extracted the intelligence from radial basis function consisting of
varieties of abstraction level. The experiment was carried out on famous NSL KDD
and UNSW NB15 dataset, where 27 features were considered. The author claimed
to have better accuracy compared to other existing techniques.
Muhammad Aamir et al. [9] implemented feature selection method based on clus-
tering approach. Algorithm was compared based on five different ML algorithms.
Random forest (RF) and support vector machine (SVM) were used for training
purpose. RF achieved highest accuracy of around 96%.
62 S. Pande et al.

Dayanandam et al. [10] have done classification based on features of the packets.
The prevention technique tries to analyze the IP addresses by verifying the IP
header. These IP addresses are used for differentiating spoofed and normal addresses.
Firewalls do not provide efficient solution when the attack size increases.
Narasimha et al. [11] used anomaly detection along with the machine learning
algorithms for bifurcating the normal and attacked traffics. For the experiment, real-
time datasets were used. Famous naive Bayes ML algorithm was used for classi-
fication purpose. The results were compared with existing algorithms like J48 and
random forest (RF).
J. Cui et al. [12] used cognitive-inspired computing along with entropy technique.
Support vector machine learning was used for classification. Details from switch
were being extracted from its flow table. The obtained results were good in terms of
detection accuracy.
Omar E. Elejla et al. [13] implemented an algorithm for detecting DDoS attack
based on classification technique in IPv6. The author compared the obtained results
with five different famous machine learning algorithms. The author claimed that
KNN obtained the good precision around 85%.
Mohamed Idhammad et al. [14] designed entropy-based semi-supervised
approach using ML technique. This implementation consists of unsupervised and
supervised compositions, among which unsupervised technique gives good accuracy
with few false-positive rates. While supervised technique gives reduce false-positive
rates. Recent datasets were used for this experiment.
Nathan Shone et al. [15] implemented deep learning algorithm for classification
of the attack. Along with this, it used unsupervised learning nonsymmetric deep
autoencoder (NDAE) feature. The proposed algorithm was implemented on graphics
processing unit (GPU) using TensorFlow on famous KDD Cup 99 and NSL-KDD
datasets. The author claimed to obtain more accurate detection results.
Olivier Brun et al. [16] worked in the area of Internet of Things (IoT) to detect the
DDoS attack. The author implemented one of the famous deep learning techniques,
i.e., random neural network (RNN) technique for detection of the network. This deep-
learning-based technique efficiently generates more promising results compared to
existing methods.

3 Implementation of DDoS Attack Using Ping of Death

While performing ping of death attack, the network information needs to be gathered,
and to achieve this, ipconfig command can be used. In Fig. 1, the detailed information
of the network is gathered after giving ipconfig command. As soon as the network
information is gathered, we can start performing the ping of death attack on the IP
address.
Enter the following command to start the attack:
DDOS Detection Using Machine Learning Technique 63

Fig. 1 DDoS attacks dynamics in 2018 [4]

ping-t –l 65500 XX.XX.XX.XX

• “ping” command transfer the data packets to the target


• “XX.XX.XX.XX” is the IP address of the target
• “−t” means sending packets repeatedly
• “−l” specifies the data packet load to be sent to the target.

Figure 2 shows the packet information after performing ping of death attack; this
attack will continue till the target resources are exhausted. The primary goal of this

Fig. 2 Details of the network obtained using ipconfig


64 S. Pande et al.

Fig. 3 Packets transfer after implementing ping of death

type of DDoS attack is to utilize all the CPU memory and exhaust it. In Fig. 3, clearly
we can see that before starting the attack, the performance graph was linear, and as
soon as the attack is started, the spikes are visible. Figure 4 signifies that CPU is
being utilized as much as possible, and this will continue till the complete network
is exhausted. Details of the memory consumption, CPU utilization, uptime, etc., can
be seen in Figs. 3, 4 and 5.

4 DDoS Detection Using Machine Learning Algorithm

Random forest (RF) is one of the popular machine learning techniques which is
used for classification developed by Leo Breiman [3]. The random forest produces
different decision trees. Each tree is built by an alternate bootstrap test from the first
information utilizing a tree classification algorithm. NSL-KDD dataset was used for
this experiment [16]. The experiment was performed using a laptop with Windows
10 64-bit operating system, Intel (R) Core (TM) i5-2450 M CPU@ 2.50 GHz, having
8.00 GB RAM. Total instances used for training were 22,544, and the dataset consists
of attributes 42. Random forest was used for training the model. 8.71 s was building
time of the model, and 1.28 s was the testing time of the model. This experiment
was carried out using Weka 3.8 tool. Table 1 provides the summary of the instances
after classification using random forest. Table 2 shows the performance evaluation
using various parameters. Table 3 consists of confusion matrix using normal & attack
classification.
• Accuracy: It measures the frequency of the attack instances of both classes
correctly identified.
DDOS Detection Using Machine Learning Technique 65

Fig. 4 CPU specifications before the attack

TP + TN
Accuracy =
TP + FN + FP + TN
• Precision: It is the ratio of the number of related attacks that were identified to
the total number of unrelated and related attacks that were identified. Also known
as positive predictive value.

TP
Precision =
TP + FP
• Recall: This is the ratio of the number of related attacks to the total number of
related attacks received and also known as positive sensitive value.

TP
Recall =
TP + FN
66 S. Pande et al.

Fig. 5 CPU specifications after the attack

Table 1 Classification summary


Correctly classified attack instances 22,490 99.7605%
Incorrectly classified attack instances 54 0.2395%

Table 2 Performance evaluation


TP rate FP rate Precision Recall Class
0.998 0.002 0.997 0.998 Normal
0.998 0.002 0.998 0.998 Attack

Table 3 Confusion matrix


a b Classification
9689 22 a = normal
32 12,801 b = attack
DDOS Detection Using Machine Learning Technique 67

5 Conclusion

In this paper, several ongoing detection techniques for DDoS attack are discussed,
especially using machine learning techniques. Along with this, list of freely available
DDoS tools is also discussed. Command-based ping of death technique was used
to perform DDoS attack. Random forest algorithm was used to train the model
which resulted into 99.76% of correctly classified instances. In future, we will try to
implement deep learning technique for the classification of the instances.

References

1. Ganorkar, S. S., Vishwakarma, S. U., & Pande, S. D. (2014). An information security scheme
for cloud based environment using 3DES encryption algorithm. International Journal of Recent
Development in Engineering and Technology, 2(4).
2. Pande, S., & Gadicha, A. B. (2015). Prevention mechanism on DDOS attacks by using multi-
level filtering of distributed firewalls. International Journal on Recent and Innovation Trends
in Computing and Communication, 3(3), 1005–1008. ISSN: 2321–8169.
3. Khamparia, A., Pande, S., Gupta, D., Khanna, A., & Sangaiah, A. K. (2020). Multi-level
framework for anomaly detection in social networking, Library Hi Tech, 2020. https://doi.org/
10.1108/LHT-01-2019-0023.
4. https://www.calyptix.com/top-threats/ddos-attacks-101-types-targets-motivations/.
5. https://www.foxnews.com/tech/biggest-ddos-attack-on-record-hits-github.
6. Fenil, E., & Mohan Kumar, P. (2019). Survey on DDoS defense mechanisms. John Wiley &
Sons, Ltd. https://doi.org/10.1002/cpe.5114.
7. Hariharan, M., Abhishek, H. K., & Prasad, B. G. (2019). DDoS attack detection using C5.0
machine learning algorithm. I.J. Wireless and Microwave Technologies, 1, 52–59 Published
Online January 2019 in MECS. https://doi.org/10.5815/ijwmt.2019.01.06.
8. NG, B. A., & Selvakumar, S. (2019). Deep radial intelligence with cumulative incarnation
approach for detecting denial of service attacks. Neurocomputing. https://doi.org/10.1016/j.
neucom.2019.02.047.
9. Aamir, M., & Zaidi, S. M. A. (2019). Clustering based semi-supervised machine learning
for DDoS attack classification. Journal of King Saud University—Computer and Informa-
tion Sciences, Production and hosting by Elsevier, https://doi.org/10.1016/j.jksuci.2019.02.
0031319-1578/_2019.
10. Dayanandam, G., Rao, T. V., BujjiBabu, D., & NaliniDurga, N. (2019). DDoS attacks—analysis
and prevention. In H. S. Saini, et al. (Eds.), Innovations in computer science and engineering,
Lecture notes in networks and systems 32. © Springer Nature Singapore Pte Ltd.https://doi.
org/10.1007/978-981-10-8201-6_1.
11. NarasimhaMallikarjunan, K., Bhuvaneshwaran, A., Sundarakantham, K., & Mercy Shalinie, S.
(2019). DDAM: Detecting DDoS attacks using machine learning approach. In N. K. Verma & A.
K. Ghosh (Eds.), Computational Intelligence: Theories, Applications and Future Directions—
Volume I, Advances in Intelligent Systems and Computing, 798, https://doi.org/10.1007/978-
981-13-1132-1_21.
12. Cui, J., Wang, M., & Luo, Y., et al. (2019). DDoS detection and defense mechanism based on
cognitive-inspired computing in SDN. Future Generation Computer Systems. https://doi.org/
10.1016/j.future.2019.02.037.
13. Elejla, O. E., Belaton, B., Anbar, M., Alabsi, B., & Al-Ani, A. K. (2019). Comparison of
classification algorithms on ICMPv6 based DDoS attacks detection. In R. Alfred et al. (Eds.),
Computational Science and Technology, Lecture Notes in Electrical Engineering 481. , Springer
Nature Singapore Pte Ltd.https://doi.org/10.1007/978-981-13-2622-6_34.
68 S. Pande et al.

14. Idhammad, M., Afdel, K., & Belouch, M. (2018). Semi-supervised machine learning approach
for DDoS detection.Applied Intelligence. . Springer Science+Business Media, LLC, part of
Springer Nature 2018. https://doi.org/10.1007/s10489-018-1141-2.
15. Shone, N., Ngoc, T. N., Phai, V. D., & Shi, Q. (2018). A deep learning approach to network
intrusion detection. IEEE Transactions on Emerging Topics in Computational Intelligence,
2(1).
16. Brun, O., Yin, Y., & Gelenbe, E. (2018). Deep learning with dense random neural network for
detecting attacks against IoT-connected home environments. Procedia Computer Science, 134,
458–463, Published by Elsevier Ltd.
Enhancements in Performance
of Reduced Order Modelling
of Large-Scale Control Systems

Ankur Gupta and Amit Kumar Manocha

Abstract The enhancements in model order reduction techniques is occurring at


very fast rate to obtain the more accurate and reduced approximation of large-scale
systems to easier the task to study the large-scale systems. In this paper, the enhance-
ments occurring in the field of model order reduction is studied with the help of a
test example. The initially developed techniques like balanced truncation are studied
and compared with newly developed MOR techniques like dominant pole retention,
clustering approach, response matching technique. The study reveals that the devel-
opments in MOR techniques are helping in designing reduced order approximation
of large-scale systems with less error amongst the large-scale and reduced order
system, and more study is required in this field to make the study of large-scale
systems more accurate.

Keywords Balanced truncation · Clustering · Dominant pole retention · Order


reduction · Response matching

1 Introduction

The designing of linear and dynamic systems of higher order is tough to tackle due to
the problems in implementation and computation, and it is too tedious to be employed
in reality. Model order reduction is a technique for simplification of the linear and
dynamic high-order systems which are depicted by differential equations. The main

A. Gupta (B)
Department of Electronics and Communication Engineering, Maharaja Ranjit Singh Punjab
Technical University, Bathinda, Punjab, India
e-mail: ankurgarg2711@gmail.com
A. K. Manocha
Department of Electrical Engineering, Maharaja Ranjit Singh Punjab Technical University,
Bathinda, Punjab, India
e-mail: akmanochagzsccet@gmail.com

© The Editor(s) (if applicable) and The Author(s), under exclusive license 69
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_6
70 A. Gupta and A. K. Manocha

motive of model order reduction (MOR) is to replace the system with high order into
a system with comparatively lower order by keeping intact the initial properties.
The purpose of carrying out his simplification is to get a reduced order system
of the higher-order system so that the initial and final systems both are same and
identical in terms of the response of the system and other physical means of repre-
sentation. Numerous researches have been done, and varied techniques have been
suggested for the reduction of transfer function with high order [1, 2]. The ways of
these techniques include Hankel-norm approximation [3], projection technique [4],
Schur decomposition [5], continued fraction expansion approximation [6], Pade or
moment approximation [7], stability-equation method [8], factor division method
[9]. Each technique has its own benefits and drawbacks, and the most important
parameter of concern amongst the limitations is difficult procedure of computation
and maintaining stabilization in the reduced model. Many other approaches were also
developed [10–18] to state the need of model order reduction by mixed approaches
and various evolutionary technique.
In this paper, timeline pertaining to the advancements in the various techniques
in the model order reduction of the high-order system into the low-order system has
been described in detail, and various illustrative examples have been used to justify
the facts. The major driving force behind this study was to get a comprehensive
view of the advancements so that the best technique can be used in the further
work minimizing the drawbacks and highlighting the benefits amongst the numerous
techniques developed till date.
The first part of this paper defines the problem of model order reduction and then
presents methods for the model described with detailed survey of each. Following
it is a test example to show the comparison of different techniques by step response
behaviour, and finally, the derived result is mentioned in the conclusion.

2 Defining the Problem

Consider a dynamic system of linear nature which is described by the transfer function
[11, 12, 19, 20] as

N (s) am s m + am−1 s m−1 + · · · + a1 s + a0


G(s) = = (1)
D(s) bn s n + bn−1 s n−1 + · · · + b1 s + b0

where m < n,
The number of poles is n, and m depicts the number of zeros. The zeros and poles
could be either complex or real or combinations of complex and real. The complex
poles they occur in conjugate pair if they are present.
The reduced rth-order system is given by
Enhancements in Performance of Reduced … 71

Nr −1 (s) cr −1 s r −1 + cr −2 s r −2 + · · · + c1 s + c0
G(s) = = (2)
Dr (s) dr s r + dr −1 s r −1 + · · · + d1 s + d0

where r − 1 is the number of zeros and r is the number of poles, of the reduced order
model Gr(s). The zeros and poles could be either complex or real or combinations of
complex and real. The model order reduction’s aim is to decrease the linear system’s
order, for the sake of maintaining, bring on a response with minimum error and for
the system stability.

3 Description of Methods

The accuracy of reduced order system obtained by different techniques is studied to


check the improvements in the techniques of model order reduction. These techniques
discussed in this paper are

3.1 Balanced Truncation

The balanced truncation method [2] is the rudimentary technique for the model order
reduction as maximum reduction methods depend on it for getting the system in the
balanced form. This method comprises the state matrices A, B, C and D which are
transformed to form a balanced system by the use of a non-singular matrix T such
that
   
A , B  , C  , D  = T −1 AT, T −1 B, C T, D (3)
 
The balanced system A , B  , C  , D  is obtained which is the reduced order
approximation of (A, B, C, D).

3.2 Mukharjee (2005)

Mukharjee (2005) showed the response matching algorithm (dominant pole reten-
tion) to obtain reduced order system from original high-order system [21]
The roots of the denominator polynomial (poles) of OHOS as shown in Eq. 1
can be of varied form, viz. distinct or repeated, real or complex conjugate. Using the
technique of varied types of poles, a ROS of 3rd order is assumed having (A) all poles
are repeated (B), one pole is distinct and two or more repeated poles (C), one pair of
complex poles and one real pole (D), and all real poles. All the four conditions for
ROS of three poles can be given as:
72 A. Gupta and A. K. Manocha

a1 s 2 + b1 s + c1
G 1r (s) = (4)
(s + d1 )3

when all poles are repeated

a2 s 2 + b2 s + c2
G 2r (s) = (5)
(s + d2 )(s + e2 )2

when one pole is distinct and other pole is repeated

a3 s 2 + b3 s + c3
G 3r (s) = (6)
(s + γ )(s + δ + jβ)(s + δ − jβ)

when one pole is real and one pair is complex poles

a4 s 2 + b4 s + c4
G 4r (s) = (7)
(s + d4 )(s + e4 )(s + f 4 )

when all poles are real


Now the minimization is done by impulse and step transient response to find
out the non-familiar parameters of the reduced order system obtained above. The
unknown coefficients have to be detected in such a way that the integral square error
(ISE) amongst the transient parts of the OHOS and ROS responses is minimal.
Therefore, two sets of reduced order model would be obtained, one by the mini-
mization of the ISE between transient parts of the unit step responses of OHOS and
ROS and the other by minimization of the ISE between impulse responses of ROS
and OHOS.

3.3 Philip (2010)

Philip (2010) described the procedure to estimate the reduced order polynomial by
dominant pole retention technique. Philip [8] described various algorithms to estimate
the dominant poles of the OHOS given by Eq. (1). These algorithms are described
as
A. Dominant pole estimation using reci1procal transformation
The transfer function G(s)’s as shown in Eq. (1) has its reciprocal transformation
as,
 
1 1 an−1 + an−2 s + · · · + a0 s n−1
G̃(s) = G = (8)
s s bn + bn−1 s + bn−2 s 2 + · · · + b0 s n
Enhancements in Performance of Reduced … 73

By using this transformation, the reversal of polynomial’s coefficients is


achieved. The inversion of the nonzero poles and zeros of the original transfer
function is the essential property of reciprocal transformation. Now, the
denominator polynomial of the ‘reciprocate’ is considered.

D̃(s) = bn + bn−1 s + bn−2 s 2 + · · · + b0 s n (9)

Then, the approximation of one dominant pole is calculated to be,

b0 n
rd = (10)
b1

The calculation is in accordance with the results of classical algebra that the
negation of the addition of its roots (only real parts) corresponds to (n − 1)
degree term’s coefficients in a polynomial of n degree. Thus, through the division
of the term obtained by polynomial’s degree, its average value of can be easily
computed. The dominant root inversion’s approximate value will be shown as it
is reciprocate polynomial. The reciprocal of this value is the original system’s
approximate dominant root.
B. Estimation of dominant pole by principal pseudo-break frequency
The next approximation of system with dominant pole is the characteristic poly-
nomial’s principal pseudo-break frequency [19]. The denominator polynomial
of Eq. (1) gives the estimated dominant pole as,

b0
r p =   (11)
b − 2b2 b0 
2
1

With r1 and r2 s knowledge, the polynomial in the denominator of reduced order
system could be determined.
C. Model of reduced order dependent on frequency
Latest observations for MOR research used response matching which uses user-
specific frequency. Moving by same path, the proposed technique is elaborated
to make the model of reduced order to be able to match various frequencies
frequency response. It is quite obvious to say that the technique employed
for determining the real pole of less magnitude can be utilized to obtain the
estimation of real pole of the highest magnitude as well, it can be described as,

bn−1
rh = (12)
n
By obtaining the average, we get one more estimation of pole.
The above cited three estimated values give assistance in improving the approx-
imation of the reduced order in approximately high-, medium- and low-frequency
regions, i.e., the entire frequency range.
74 A. Gupta and A. K. Manocha

3.4 Desai and Prasad (2013)

Desai and Prasad (2013) did the reduction of the model order by the assistance of two
techniques combined together [22, 23] . The coefficients of the denominator of ROS
are obtained by Routh approximation method for finding out the stable model. In
the above-mentioned technique, initially the denominator of the high-order original
system is reciprocated to get
 
1
D̃n (s) = s Dn
n
(13)
s

Then, the α array from the coefficients of obtained polynomial from Eq. 13 is
formed, and the values of α 1 , α 2, α 3 , ….…… α n parameters are obtained.
The reduced (rth)-order denominator polynomial is obtained using

D̃r (s) = αr s Dr −1 (s) + Dr −2 (s) for · r = 1, 2, . . . and D−1 (s) = D0 (s) = 1 (14)

Then, the reciprocal transformation is applied again for obtaining the reduced
order system’s reduced denominator as
 
1
Dr (s) = s r D̃r (15)
s

Using the Big Bank Big Crunch (BBBC) theory, the numerator’s coefficients
are obtained for minimization of the objective function ‘F’ which is known as the
integral square error (ISE) between the transient responses of the OHOS and ROS.
Big Bank Big Crunch is an algorithm just like genetic algorithm which operates on
the principle of formation of universe.

3.5 Tiwari (2019)

Tiwari (2019) carried out the reduction of the model by separating the OHOS into two
parts: denominator and numerator by keeping the stability of the system [24]. The
denominator part is reduced by the usage of technique of dominant pole retention with
the additional concept of clustering. Within this algorithm, the quantitative analysis
of the dominant poles of OHOS is done, and using MDI, formation of the dominancy
of particular pole is done. The highest value MDI of a particular pole depicts that the
pole has high controllability and observability. Then, a cluster of dominant poles is
made, and a cluster center is found out by the application of Eq. (16)
Enhancements in Performance of Reduced … 75
⎡ k−1

k
1
+ 1
⎢ |λ1 | i=2 |λi | ⎥
λc = ⎣ 2 ⎦ (16)
k
1
|λ1 | + i=2
1
|λi |

where λc is known as the cluster center obtained from k, where k is the number of
poles (λ1 , λ2, λ3 , …, λk ). The number of poles of ROS clusters is equal to the number
of clusters.
The reduced order numerator is found out from a popular technique known as
Pade’s approximation. It is a rational function N(s)/D(s) of degree m and n each.

4 Calculative Experiments

The performance for all MOR methods discussed in Sect. 3 is compared with the
help of numerical experiments on the basis of overshoot, integral square error (ISE),
settling time and rise time with in the OHOS and ROS obtained after applying MOR
technique.
Integral square error is a measure of quality of the found out reduced order system
as
∞
ISE = [y(t) − yr (t)]2 (17)
0

where the response of OHOS is y(t) and the response of obtained ROS is yr (t).
Test Example: Consider linear dynamic system of order nine used by [6, 9, 10,
20, 22] given by the following transfer function as

s 4 + 35s 3 + 291s 2 + 1093s + 1700


G(s) = (18)
s 9 + 9s 8 + 66s 7 + 294s 6 + 1029s 5 + 2541s 4 + 4684s 3 + 5856s 2 + 4620s + 1700

Applying all the techniques of MOR, the corresponding third-order transfer


functions of the model of the reduced order are

0.1405s 2 − 0.8492s + 1.881


G 1 (s) = (19)
s 3 + 1.575s 2 + 3.523s + 1.717

0.2945s 2 − 2.202s + 2.32


G 2 (s) = (20)
s 3 + 2.501s 2 + 4.77s + 2.32

0.5058s 2 − 1.985s + 3.534


G 3 (s) = (21)
s 3 + 3s 2 + 5.534s + 3.534
76 A. Gupta and A. K. Manocha

Step Response
1.2

0.8
Amplitude

0.6

0.4
Original System
0.2 Balanced Truncation
Mukharjee
Philip & Pal
0 Desai & Prasad
Tiwari & Kaur
-0.2
0 2 4 6 8 10 12 14 16 18 20
Time (seconds)

Fig. 1 Step response of original and reduced order models for test example

Table 1 Comparison between various reduced order models for test example
Method of order ISE Steady-state Rise time (s) Overshoot (%) Settling time
reduction value (s)
Original – 1 2.85 – 8.72
Balanced High 1.09 2.92 – –
truncation [2]
Mukharjee [21] 8.77 × 10–2 1 4.67 0 12.9
Philip [8] 2.82 × 10–2 1 2.99 0 7.6
Desai [23] 2.52 × 10–2 1 3.43 1.96 10.6
Tiwari [24] 1.74 × 10–2 1 2.92 0 6.91

0.0789s 2 + 0.3142s + 0.493


G 4 (s) = (22)
s 3 + 1.3s 2 + 1.34s + 0.493

−0.4439s 2 − 0.4901s + 2.317


G 5 (s) = (23)
s 3 + 3s 2 + 4.317s + 2.317

The responses of all the MOR techniques are plotted by their step response
behaviour as shown in Fig. 1. The quantitative comparison amongst all the methods
is also carried out as given in Table 1 on the basis of integral square error, peak
overshoot, rise time and settling time along with the achieved steady-state value.
Enhancements in Performance of Reduced … 77

The comparative analysis shows that initially developed balanced truncation tech-
nique gives a good approximation of large-scale system, but the amount of error is
significant. Latest developed techniques decrease the amount of error whether it is
steady-state error or integral square error.

5 Conclusion

This paper shows the enhancement occurring in the area of model order reduction.
The initially developed balanced truncation technique gives high erroneous reduced
order system which suggested the requirement for the development of more improved
techniques. After that Mukharjee in 2005 with the help of response matching devel-
oped, more accurate system which initiated the interest in the model order reduction
and hence more advanced techniques were developed as described by Philip, Desai &
Tiwari. These techniques improved the accuracy between original and reduced order
system and reduced the error amongst them. The present work shows that the integral
square error is reduced with the development in the study of MOR techniques, but the
amount of error should be reduced more to find the exact approximation of original
higher order system. This limitation in the present work can be removed with the
design of a more advanced technique which can eliminate the error to improve the
accuracy amongst the original higher-order system and reduced order system. So,
future development in the area of model order reduction can help in obtaining more
advanced techniques, which can make the reduced order system more accurate so
that the study of large-scale systems can become easier.

References

1. Antoulas, A. C., Sorensen, D. C., & Gugercin, S. (2006). A survey of model reduction methods
for large-scale systems. Math: Contemp.
2. Moore, B. C. (1981). Principal component analysis in linear systems: Controllablity, observ-
ability and model reduction. IEEE Transactions on Automatic Control, AC-26(1), 17–32.
3. Villemagne, C., & Skelton, R. E. (1987). Model reduction using a projection formulation. In
26th IEEE Conference on Decision and Control (pp. 461–466).
4. Safonov, M. G., & Chiang, R. Y. (1989). A Schur method for balanced-truncation model
reduction. IEEE Transactions on Automatic Control, 34(7), 729–733.
5. Shamash, Y. (1974). Continued fraction methods for the reduction of discrete-time dynamic
systems. International Journal of Control, 20(2), 267–275.
6. Shamash, Y. (1975). Linear system reduction using pade approximation to allow retention of
dominant modes. International Journal of Control, 21(2), 257–272.
7. Chen, T. C., & Chang, C. Y. (1979). Reduction of transfer functions by the stability-equation
method. Journal of the Franklin Institute, 308(4), 389–404.
8. Philip, B., & Pal, J. (2010). An evolutionary computation based approach for reduced order
modeling of linear systems. IEEE International Conference on Computational Intelligence and
Computing Research, Coimbatore, pp. 1–8.
78 A. Gupta and A. K. Manocha

9. Lucas, T. N. (1983). Factor division: A useful algorithm in model reduction. IEE Proceedings,
130(6), 362–364.
10. Sikander, A., & Prasad R. (2015). Linear time invariant system reduction using mixed method
approach. Applied Mathematics Modelling.
11. Tiwari, S. K., & Kaur, G. (2016). An improved method using factor division algorithm for
reducing the order of linear dynamical system. Sadhana, 41(6), 589–595.
12. Glover, K. (1984). All optimal hankel-norm approximations of linear multivariable systems
and their L∞ -Error Bounds. International Journal of Control, 39(6), 1115–1193.
13. Le Mehaute, A., & Grepy G. (1983). Introduction to transfer and motion in fractal media: The
geometry of kinetics. Solid State Ionics, 9 & 10, Part 1, 17–30.
14. Vishakarma, C. B., & Prasad, R. (2009). MIMO system reduction using modified pole clustering
and genetic algorithm. Modelling and Simulation in Engineering.
15. Narwal, A., & Prasad, R. (2015). A novel order reduction approach for LTI systems
using cuckoo search and Routh approximation. In IEEE International Advance Computing
Conference (IACC), Bangalore, pp. 564–569.
16. Narwal, A., & Prasad R. (2016). Optimization of LTI systems using modified clustering
algorithm. IETE Technical Review.
17. Sikander A., Prasad R. (2017), “A New Technique for Reduced-Order Modelling of Linear
Time-Invarient system”, IETE Journal of Research.
18. Parmar, G., Mukherjee, S., & Prasad, R. (2007). System reduction using factor division algo-
rithm and eigen spectrum analysis. International Journal of Applied Mathematical Modelling,
31, 2542–2552.
19. Cheng, X., & Scherpen, J. (2018). Clustering approach to model order reduction of power
networks with distributed controllers. Advances in Computational Mathematics.
20. Alsmadi O., Abo-Hammour Z., Abu-Al-Nadi D., & Saraireh S. (2015). soft computing tech-
niques for reduced order modelling: Review and application. Intelligent Automation & Soft
Computing.
21. Mukherjee, S., & Satakshi, M. R. C. (2005). Model order reduction using response matching
technique. Journal of the Franklin Institute, 342, 503–519.
22. Desai, S. R., & Prasad, R. (2013). A novel order diminution of LTI systems using big bang
big crunch optimization and routh approximation. Applied Mathematical Modelling, 37, 8016–
8028.
23. Desai, U. B., & Pal, D. (1984). A transformation approach to stochastic model reduction. IEEE
Transactions on Automatic Control, AC-29(12), 1097–1100.
24. Tiwari S. K., Kaur G. (2019), “Enhanced Accuracy in Reduced Order Modeling for Linear
Stable/Unstable System”, International Journal of Dynamics and Control.
Solution to Unit Commitment Problem:
Modified hGADE Algorithm

Amritpal Singh and Aditya Khamparia

Abstract This research paper proposes a hybrid approach which is the extension
of hGADE algorithm consisting of differential evolution and genetic algorithm aims
at solving mixed-integer optimization problem called unit commitment scheduling
problem. The ramp up and down constraints have been included for calculation
of total operating cost of power system operation. The proposed approach is easy
to implement and understand. The technique has been tested on six-unit system
by taking into consideration various system and unit constraints for solving unit
commitment problem. Hybridization of genetic algorithm and differential evolution
has produced the significant improvement in overall results.

Keywords hGADE · Thermal · Commitment · Ramp Rate · Genetic

1 Introduction

Nowadays, there are only thermal plants or is it a combination of hydro and thermal
or is it a combination of thermal, hydro, and nuclear. As far as modeling is concerned,
nuclear is same as thermal, in fact, that is also called thermal plant. So, hydro, thermal,
and nuclear are same as hydro and thermal [1]. The problem related to power system
operation is hierarchical or multilevel. The problem starts with load forecasting,
which is a very important problem even in control system or even in energy system.
So, load has to be first ascertained, forecasted well in advance. One can has a short,
very short term load forecasting, next 10 min how the load is going to change. So in
case one needs power plant in 2025, one has to start planning right now because the
gestation period for hydropower plant is 7–8 years that is the time we have. And even

A. Singh (B) · A. Khamparia


Department of Computer Science and Engineering, Lovely Professional University, Phagwara,
Punjab, India
e-mail: apsaggu@live.com
A. Khamparia
e-mail: aditya.khamparia88@gmail.com

© The Editor(s) (if applicable) and The Author(s), under exclusive license 79
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_7
80 A. Singh and A. Khamparia

in thermal power plant one can has some 5–6 years of gestation period. Gestation
period is the time required before a megawatt is produced from power plant, from the
time it is conceived, it is planned, very long term planning to decide the initiation of
the installation of new power plants, because you have to decide which place, which
site, which fuel, from where you are going to get resources. So, that is why, we need
to do load forecasting. The major outcomes of the research are given as follows:
• This research aimed at solving UC problem which is one of the biggest concerns
for power companies.
• Proposed a hybrid approach which is the extension of hGADE algorithm.
• The ramp up and down constraints have been included for calculation of total
operating cost of power system operation.

2 Unit Commitment

In power system operations, there is a load. Now that load changes from hour to hour,
day to day, and week to week. So, one cannot have a permanent solution which units
should be on and which units should to be off. Suppose there is a given load, now
we have to find out which units of the power station should be on to tackle that load,
to take up that load and which units should are not required should be put off which
can be represented by binary numbers 1 or 0, available or not available, working or
not working [2]. This is the binary situation which solution you to find out, and this
is called unit commitment problem solution. The generic unit commitment problem
can be formulated mathematically which is given as Eq. 1.

If u(i, t) = 0 and tidown < tidown,min then u(i, t + 1) = 0 (1)

f i is cost function of each generating unit i


xit is generation level
u it is state of a unit.
Dt and Rt is demand and reserve requirement for time period t, respectively.

The constraints which play an important role in unit commitment are as follows:
i. Maximum generating capacity
ii. Minimum stable generation
iii. Minimum up time
iv. Minimum down time
v. Ramp rate
vi. Load generation constraint.

Notations

u(i, t) Status of unit i at time period t.


u(i, t) = 1 Unit i is on during time period t.
Solution to Unit Commitment Problem: Modified hGADE Algorithm 81

u(i, t) = 0 Unit i is off during time period t.


x(i, t) Power produced by unit i during time period t.
N Set of available units.
R(t) Reserve requirement at time t.

Minimum up time:
Once a unit in a power system is running, it may not be shut down immediately. The
mathematical representation of minimum up time is given as Eq. 2:
up up,min
If u(i, t) = 1 and ti < ti then u(i, t + 1) = 1 (2)

Minimum down time:


Once a unit shuts down, it may not be restarted immediately. The mathematical
representation of minimum up time is given as Eq. 3:

If u(i, t) = 0 and tidown < tidown,min then u(i, t + 1) = 0 (3)

Maximum Ramp Rate:


The electrical output of a unit cannot change by more than a definite number over an
interval of time to avoid damaging the turbine [3]. The mathematical representation
of maximum ramp up rate and maximum ramp down rate is given as follows as Eqs. 4
and 5, respectively.
Maximum ramp up rate constraint:
up,max
x(i, t + 1) − x(i, t) ≤ Pi (4)

Maximum ramp down rate constraint:

x(i, t + 1) − x(i, t) ≤ Pidown,max (5)

Load Generation constraint:


It can be defined as the state when generator’s electricity production equal to demand
of electricity and mathematical representation of it is given as Eq. 6.


N
u(i, t)x(i, t) = L(t) (6)
i=1
82 A. Singh and A. Khamparia

Reserve Capacity constraint:


Sometimes, there is a need to increase production from other units to keep frequency
drop within acceptable limits which results from unanticipated loss of a generating
unit. The mathematical formulation of reserve capacity constraint is given as Eq. 7
[4].


N
u(i, t)Pimax ≥ L(t) + R(t) (7)
i=1

3 Techniques for Solving UC Problem

Several investigators have tried various optimization techniques to find solutions to


UC problem in the past [5–9]. The available solutions are categorized into three
categories:
(i) Conventional techniques
(ii) Non-conventional techniques
(iii) Hybrid techniques.

Conventional Techniques
Conventional techniques include dynamic programming [10], branch and bound
[11], tabu search, Lagrangian relaxation [12], integer programming, interior point
optimization, simulated annealing, etc.
Walter et al. have presented the field-proven dynamic programming formulation
of unit commitment. The following equation which is marked as 8 is the dynamic
programming algorithm mathematical representation for unit commitment.

Fcost (M, N ) = min P (M, N ) + S (M − 1, L : M, N )
{L} cost cost
+Fcost (M − 1, L)] (8)

where
Fcost (M, N ) least cost to arrive at state (M, N)
Pcost (M, N ) production cost for state (M, N)
Scost (M − 1, L: M, N ) transition cost from state (M − 1, L) to state (M, N).
Arthur I. Cohen has presented a new algorithm based on branch and bound tech-
nique [11] which is different from other techniques as it assumes no priority ordering
as most early techniques were priority list of units which defines the order in which
units start-up or shut down.
Solution to Unit Commitment Problem: Modified hGADE Algorithm 83

Non-Conventional Techniques

Evolutionary Algorithms
From the last few years, the global optimization has received lot of attention from
authors worldwide. The reason may be that optimization can play a role in every
area, from engineering to finance, simply everywhere. Inspired by Darwin’s theory
of evolution, evolutionary algorithms can also be used to solve problems that humans
do not really know how to solve.
Differential Evolution
Differential evolution (DE) worked through identical steps as used by evolutionary
algorithms. DE was developed by Storn and Price in year 1995 [13]. DE used to
provide optimal solution (global maxima) as it never got trapped in local maxima.
As compared to other algorithms, space complexity is quite low in DE [14].
Genetic Algorithm
It belongs to the category of evolutionary algorithm. It is widely used to figure out
the optimal solution to complex problems [2]. The mathematical representation of
UC problem formulation using genetic algorithm (Eq. 9) is given as follows:
⎡ ⎤

T 
N
⎣ (ai + bi Pi j + ci Pi2j ⎦ ∗ u i j
j=1 i=1
⎛ −Tioff
 ⎞
j

⎜ T N
1−e
Ti
⎟  
+⎝ σi + δ ⎠ ∗ u i j 1 − u i j−1 (9)
j=1 i=1 i
.

Subject to


N
(Pi j ) ∗ u i j − PD j = 0
i=1

j > MUTi
TiON

TiOFF
j > MDTi

where
N units
T scheduling interval
Pi j unit i’s generation for hour j
ai bi ci coefficients of fuel cost
84 A. Singh and A. Khamparia

σ coefficient of start-up
PDj demand for hour j
TiON
j ON time for unit i for hour j
MUT Min. up time
MDT Min. down time
Pimax Max. generation of unit i
PRj Spinning reserve for hour j
uij ON/OFF status for unit i at hour j.

Hybrid Techniques
Numerous optimization algorithms have been devised in the past to address the
optimal power flow. Examples of such algorithms are gray wolf optimizer [7], drag-
onfly algorithm, artificial bee colony, ant colony optimization, and so on. Himanshu
Anand et al. have presented technique to solve profit-based unit commitment (PBUC)
problem [15]. Anupam Trivedi et al. [7] have presented the unique approach for
solving power system optimization problem popularly known as UC scheduling
problem. Authors have named algorithm as hybridizing genetic algorithm and differ-
ential evolution (hGADE). The GA algorithm works well with binary variables while
DE works well with continuous variables. The authors have taken the advantage of
same to solve UC problem.

4 Hybridization of Genetic Algorithm and Differential


Evolution

Anupam Trivedi et al. have presented the unique approach for solving power system
optimization problem popularly known as unit commitment scheduling problem.
Authors have named algorithm as hybridizing genetic algorithm and differential
evolution (hGADE) [16]. The constraints involved in UC are spinning reserve, least
up time, least down time, start-up cost, hydro constraints, generator ‘must run’
constraints, ramp rate and fuel constraints. Authors of the paper have mentioned
in their future work that ramp up/down constraint was neglected and not taken into
consideration for solving unit commitment problem. So, it motivated us to work
further and included ramp up and down constraint in calculation of cost. The fitness
function has been designed for the same.

5 Proposed Approach

Figure I describes the working of already implemented hGADE algorithm. The ramp
up and down is considered for this research. In addition to this, new fitness function
for differential evolution [17] and genetic algorithm has been formulated. Table 1
Table 1 Input data for six-unit system
Unit a ($/hr.) b ($/MW hr.) c ($/MW2 hr.) P min (MW) P max (MW) Min. up Min. down Start-up cost Ramp up Ramp down
(hours) (hours) (MW/hr) (MW/hr)
1 0.00375 2 200 200 50 3 1 176 130 130
2 0.0175 1.75 257 80 20 2 2 187 130 130
3 0.0625 1 300 40 15 3 1 113 90 90
4 0.00834 3.25 400 35 10 3 2 267 60 60
5 0.025 3 515 30 10 2 1 180 60 60
6 0.05 3 515 25 12 3 1 113 40 40
Solution to Unit Commitment Problem: Modified hGADE Algorithm
85
86 A. Singh and A. Khamparia

Table 2 Load pattern (in MW) for 1 h interval


Hour 1 2 3 4 5 6 7 8 9 10
Load (MW) 140 166 180 196 220 240 267 283.4 308 323
Hour 11 12 13 14 15 16 17 18 19 20
Load (MW) 340 350 300 267 220 196 220 240 267 300
Hour 21 22 23 24 25 26 27 28 29 30
Load (MW) 267 235 196 166 140 166 180 196 220 240
Hour 31 32 33 34 35 36 37 38 39 40 41
Load (MW) 267 283.4 308 323 340 350 300 267 220 196 220

Table 3 Parameter setting


Parameter Value
Genetic algorithm mutation rate 0.35
Genetic algorithm crossover rate 0.6
Differential evolution mutation rate 1
Differential evolution crossover rate 0.98

represents the input data for six-unit systems which consists of values of cost coef-
ficients, minimum up/down costs, and start-up costs. Table 2 shows the load pattern
(in MW) of interval 1 h. Table 3 shows the mutation and crossover rates defined for
this research.
Fitness Function:
The ramp rate is considered for the calculation of overall cost of production of power
plant. The fitness function considered for the working of genetic algorithm is given
below (Eq. 10). Here, Fs is average cost per generation and Ft is mean of Fs
 
1 if (1 − e/(max(Fs))) ∗ ramprate < Ft
f = (10)
0 otherwise

The fitness function considered for the working of differential evolution algorithm
is given below (Eq. 11)
 
1 if (Fs) ∗ ramprate < (Ft/e)
f = (11)
0 otherwise

Working of proposed approach:


The algorithm starts with initial settings of various constraints considered for this
research which includes ramp up, ramp down, start-up cost, and distribution of cost.
The algorithm is iterative in nature. The number of iterations can be set as per the
requirement. The number of iterations set to be zero. The distributed cost included
Solution to Unit Commitment Problem: Modified hGADE Algorithm 87

only for the first iteration. There are six units which are taken for power system
analysis. These six units have to satisfy the load with minimum cost. The load is
distributed with one hour interval.
The operation cost is calculated as follows (Eq. 12):

oc = a ∗ Pt2 + b ∗ Pt + c (12)

Here,
oc Operation cost
a, b, c Cost coefficients
P Maximum power.
Then, the priority of power units are maintained as per the following Eq. 13:
oc
Priority = (13)
Pmax

The priority of all units are calculated and sorted in ascending order means unit of
higher priority (lower the number, higher is the priority) will take the load first. As per
working of hGADE, genetic algorithm works on binary component and differential
evolution algorithm works on the continuous component of chromosomes (Fig. 1).
Results
The research has been carried out on MATLAB 2016b. It has been found optimization
made a significant difference in the cost of operation. The following are the results
obtained under conditions specified in Tables 4 and 5.
The results obtained are promising. The graph as shown in Fig. 2 shows the
average cost of generators (units) over generations (iterations). Table 4 shows the
unit commitment schedule of six units over ten generations. Here, “on” suggest unit
is active in particular iteration and “off” indicates unit is not taken into consideration
for calculation of total operating cost.
It has been found that average cost of operation is 142,814.9603 $ and it get
reduced to 142,809.8944 $ after applying optimization (hGADE) and it is shown in
Table 5. Case 1 represents the total operation cost without using any optimization.
Case 2 shows results with optimization. The comparative analysis clearly shows
that there is a significant improvement with respect to cost if proposed approach is
applied.

6 Conclusion

The research is carried out using hybridization of genetic and differential evolution
algorithms with consideration of ramp up and down rates. The fitness functions
have been designed, respectively. It has been observed that the hybridization of
88 A. Singh and A. Khamparia

Start

Initialize Population

Fitness Evaluation of
parent population

Find Best Solution

Yes
Optimal Solution
Condition Satisfied
Output

GA working on binary compo- DE working on continuous com-


nent of chromosomes ponent of chromosomes

No
Perform stochastic uni-
form selection Perform DE mutation on
continuous components

Perform GA Crossover
on binary components
Perform DECrossover on
continuous components
Perform GA mutation
on binary components

Fitness Evaluation

Carry out replacement to


form population of next
generation

Fig. 1 Working of hGADE [16]


Solution to Unit Commitment Problem: Modified hGADE Algorithm 89

Table 4 UC schedule
Unit 1 Unit 2 Unit 3 Unit 4 Unit 5 Unit 6
Generation 1 Off On Off Off On Off
Generation 2 On On Off Off On On
Generation 3 On Off On On Off On
Generation 4 On On On Off On On
Generation 5 On Off Off Off On Off
Generation 6 On Off Off Off Off Off
Generation 7 Off On On Off On Off
Generation 8 On On Off Off Off Off
Generation 9 On Off Off On Off Off
Generation 10 On On On Off Off On

Table 5 Optimization results (comparative analysis)


Case GA mutation GA crossover DE mutation DE crossover Cost
rate rate rate rate
Case 1 0.35 0.60 1 0.98 142,814.9603
Case 2 0.35 0.60 1 0.98 142,809.8944

Fig. 2 Average cost


difference of generators/units

evolutionary algorithms with ramp rates and newly designed fitness function showed
significant improvement with respect to total operation cost.
90 A. Singh and A. Khamparia

References

1. Wood, A. J., & Wollenberg, B. F. (2007). Power generation, operation & control, 2nd edn.
New York: John Wiley & Sons.
2. Håberg, M. (2019). Fundamentals and recent developments in stochastic unit commitment.
International Journal of Electrical Power & Energy Systems. https://doi.org/10.1016/j.ijepes.
2019.01.037
3. Deka, D., & Datta, D. (2019). Optimization of unit commitment problem with ramp-rate
constraint and wrap-around scheduling. Electric Power Systems Research. https://doi.org/10.
1016/j.epsr.2019.105948
4. Wang, M. Q., Yang, M., Liu, Y., Han, X. S., & Wu, Q. (2019). Optimizing probabilistic spinning
reserve by an umbrella contingencies constrained unit commitment. International Journal of
Electrical Power & Energy Systems. https://doi.org/10.1016/j.ijepes.2019.01.034
5. Zhou, M., Wang, Bo., Li, T., & Watada, J. (2018). A data-driven approach for multi-objective
unit commitment under hybrid uncertainties. Energy. https://doi.org/10.1016/j.energy.2018.
09.008
6. Park, H., Jin, Y. G., & Park, J. –K. (2018). Stochastic security-constrained unit commitment
with wind power generation based on dynamic line rating International. Journal of Electrical
Power & Energy Systems. https://doi.org/10.1016/j.ijepes.2018.04.026.
7. Panwar, L. K., Reddy, S. K, Verma, A., Panigrahi, B. K., & Kumar, R. (2018). Binary grey wolf
optimizer for large scale unit commitment problem. Swarm and Evolutionary Computation.
https://doi.org/10.1016/j.swevo.2017.08.002
8. Tovar-Ramírez, C. A., Fuerte-Esquivel, C. R., Martínez Mares, A., & Sánchez-Garduño, J.
L. (2019). A generalized short-term unit commitment approach for analyzing electric power
and natural gas integrated systems. Electric Power Systems Research. https://doi.org/10.1016/
j.epsr.2019.03.005.
9. Zhou, Bo., Ai, X., Fang, J., Yao, W., Zuo, W., Chen, Z., & Wen, J. (2019). Data-adaptive robust
unit commitment in the hybrid AC/DC power system. Applied Energy. https://doi.org/10.1016/
j.apenergy.2019.113784
10. Hobbs, W. J., Hermon, G., Warner, S., & Shelbe, G. B. (1988). An enhanced dynamic
programming approach for unit commitment. IEEE Transaction on Power Systems.
11. Cohen, A. I., & Yoshimura, M. (1983). A branch-and-bound algorithm for unit commitment.
IEEE Transactions on Power Apparatus and Systems.
12. Yu, X., & Zhang, X. (2014). Unit commitment using Lagrangian relaxation and particle swarm
optimization. International Journal of Electrical Power & Energy Systems.
13. Price, K. V., & Storn, R. (1997). Differential evolution: A simple evolution strategy for fast
optimization. Dr. Dobb’s Journal, 22(4), 18–24.
14. Singh, A., & Kumar, S. (2016). Differential evolution: An overview. Advances in Intelligent
Systems and Computing. https://doi.org/10.1007/978-981-10-0448-3_17
15. Anand, H., Narang, N. & Dhillon, J. S. (2018). Profit based unit commitment using hybrid
optimization technique. Energy. https://doi.org/10.1016/j.energy.2018.01.138.
16. Trivedi, A., Srinivasan, D., Biswas, S., & Reindl, T. (2015). Hybridizing genetical gorithm
with differential evolution for solving the unit commitment scheduling problem. Swarm and
Evolutionary Computation. https://doi.org/10.1016/j.swevo.2015.04.001
17. Dhaliwal, J. S., & Dhillon, J. S. (2019). Unit commitment using memetic binary differential
evolution algorithm. Applied Soft Computing. https://doi.org/10.1016/j.asoc.2019.105502.
In Silico Modeling and Screening Studies
of Pf RAMA Protein: Implications
in Malaria

Supriya Srivastava and Puniti Mathur

Abstract Malaria is a major parasitic disease that affects a large human population,
especially in tropical and sub-tropical countries. The treatment of malaria is becoming
extremely difficult due to the emergence of drug-resistant parasites. To address this
problem, many newer drug target proteins are being identified in Plasmodium falci-
parum, the major casual organism of malaria in humans. Rhoptry proteins participate
in the intrusion of red blood cells by the merozoites of the malarial parasite. Inter-
ference with the rhoptry protein function has been shown to prevent invasion of
the erythrocytes by the parasite. As the crystal structure of RAMA protein of Plas-
modium falciparum (Pf RAMA) is not yet available, the three-dimensional structure
of the protein was predicted using comparative modeling methods. The structural
quality of the generated model was validated using Procheck, which is based on the
parameters of Ramachandran plot. The Procheck results showed 92.7% of backbone
angles were in the allowed region and 0.4% in the disallowed region. This structure
was studied for interaction with the entire library of compounds in ZINC database
of natural compounds. The binding site of the protein was predicted using Sitemap
and the entire library was screened against the target. 189,530 compounds were
used as an input to HTVS for the first level of screening. The docking scores of the
compounds were further calculated using “Extra Precision” (XP) algorithm of Glide.
On the basis of docking scores, 54 compounds were selected for further analysis.
The binding affinity was further calculated using MMGBSA method. The interac-
tion studies using molecular docking and MMGBSA revealed appreciable docking
scores and Gbind . 10 compounds were selected as promising leads with appreciable
docking scores in the range of −17.891 to −5.328 kcal/mole. Our data generates
evidence that the screened compounds indicate a potential binding to the target and

S. Srivastava · P. Mathur (B)


Centre for Computational Biology and Bioinformatics,
Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida 201313, Uttar Pradesh,
India
e-mail: pmathur@amity.edu
S. Srivastava
e-mail: supriyabi14@gmail.com

© The Editor(s) (if applicable) and The Author(s), under exclusive license 91
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_8
92 S. Srivastava and P. Mathur

need further evaluation. Also, the analysis of interaction of these compounds can be
exploited for better and efficient design of novel drugs against the said target.

Keywords Molecular dynamics · Rhoptry-associated membrane antigen


(RAMA) · Virtual screening · Molecular docking · Malaria

1 Introduction

Malaria is a major tropical parasitic disease and affects a large population in the
countries located in this region [1]. According to a WHO World Malaria Report
2015, malaria has resulted in about 438,000 deaths globally [2]. Although major
steps have been taken to reduce the burden of malaria, the ultimate aim of roll back
malaria (RBM), which is zero death, is yet far from achieved. Various classes of
antimalarial drugs such as quinoline derivatives, artemisinin derivatives, antifolates,
antimicrobials, and primaquine are known. However, due to increasing incidences of
resistance and adverse effects of existing drugs, there is a growing need for discovery
and development of new antimalarials.
Malaria is caused by Plasmodium species. These are apicomplexan parasites that
contain secretory organelles such as rhoptries, micronemes, and dense granules.
Rhoptries of Plasmodium are joined club-shaped organelles situated at the apical end
of the parasite merozoites. When these merozoites attach on the surface of the human
erythrocytes, rhoptries discharge their contents on the membrane [3]. The merozoites
internalize and rhoptry disappears. Very less is known about rhoptry biogenesis
due to lack of biomarkers of organelle generation. Microscopic examination reveals
that rhoptry organelles are formed by continuous fusion of post-Golgi vesicles [3,
4]. Rhoptry is composed of proteins and lipids. Some rhoptry proteins have been
analyzed at the molecular level while others have been identified using immuno-
logical reagents [5, 6]. In the present work, a rhoptry protein of the Plasmodium
falciparum, namely rhoptry-associated membrane antigen (Pf RAMA) that appears
to have a role in both rhoptry biogenesis and erythrocyte invasion has been studied.
It has been suggested that RhopH3 and RAP1 show protein–protein interaction with
RAMA. Recently, it has been shown that certain proteins like sortilin are involved
in escorting RAMA from the Golgi apparatus to the rhoptries [7]. Considering the
importance of RAMA and its crucial role in forming the apical complex in Plas-
modium, the protein looks to be an interesting drug target. A threading-based model
of Pf RAMA (PF3D7_0707300), prediction of the binding site and virtual screening
using biogenic compounds belonging to ZINC database has been performed. The
compounds showing promising docking scores were selected. Molecular dynamics
simulation of the protein was performed separately to understand its stability.
In Silico Modeling and Screening Studies of Pf RAMA … 93

2 Materials and Methods

A. Modeling of structure of PfRAMA


The experimental conformation of Pf RAMA has not yet been deduced, there-
fore, a knowledge-based comparative modeling method was used for computa-
tionally predicting the 3D structure of the protein. Multiple-template modeling
and threading method employing the local meta-threading-server (LOMETS)
[8] was used to derive the 3D structure. The method aligned the target sequence
with protein-fold templates from known structures. Various scoring functions
such as secondary structure match, residue contacts, etc., were used and best
fit alignment was generated using dynamic programming. After the threading
alignment, the continuous fragments were excised from their respective template
structures and pulled together to generate a full-length model. At last, final model
with the most elevated scores was achieved.
B. Protein refinement and validation
Energy minimization of the modeled protein was performed using macromodel
(version 9.9, Schrodinger) with OPLS 2005 force field and PRCG algorithm
using 1,000 steps of minimization and energy gradient of 0.001 kcal/mol. The
total energy of the structure was calculated and the overall quality was evaluated
using Procheck [9] and Errat [10].
Further, a 50 ns molecular dynamics simulation using Desmond through the
multistep MD protocols of Maestro, (version 10.3, Schrödinger) for further
refinement of the modeled structure was carried out. OPLS 2005 molecular
mechanics force field was used for performing the initial steps of MD simula-
tions. The protein was solvated in cubic box with a dimension of 20 Å using
simple point charge water molecules, which were then replaced with 2 Na+
counter ions for electroneutrality. Total 5000 frames were generated, out of
which 2000 frames were used to get final structure of the Pf RAMA protein. The
total energy of the simulated model was calculated and the overall quality was
again validated using Procheck.
C. Ligand preparation
A library of 276,784 biogenic compounds from ZINC database was used for
screening [11]. This library is composed of molecules of biological origin. The
structures in the library were prepared for further analysis using LigPrep, version
3.5, Schrödinger [12]. For each structure, different tautomers were produced and
a proper bond order was assigned. All possible (default value 32) stereoisomers
for each ligand were generated.
D. Molecular docking
The MD simulated structure of Pf RAMA protein was used to perform the molec-
ular docking studies in order to predict the protein–ligand interactions. Various
steps of Schrödinger’s protein preparation wizard prepared the protein before
docking calculations. Automatically, hydrogens were added using Maestro inter-
face, Maestro, (version 10.3, Schrödinger) leaving no lone pair and using an
explicit all atom model [13].
94 S. Srivastava and P. Mathur

As the binding site of the protein was not known, it was predicted using Sitemap,
(version 3.6, Schrödinger). Molecular docking calculation for all the compounds, to
determine the binding affinity of Pf RAMA protein binding site, was performed using
Glide (version 6.8, Schrödinger) [14]. Receptor-grid file was generated after protein
and ligand preparation using receptor-grid generation program. At the centroid of
the predicted binding site, receptor grid was generated. A cube of size 10 Å × 10 Å
× 10 Å was defined in the center of binding site for the binding of docked ligand
and to occupy all the atoms of the docked poses one more enclosing box of 12 Å ×
12 Å × 12 Å was also defined.
The structure was studied for interaction with the entire library of biogenic
compounds selected from the ZINC database. 276,784 compounds were screened
using different filters (Qikprop, reactive, Lipinski’s rule of five) and selected
compounds obtained were used as input for high throughput virtual screening, HTVS.
The screened compounds were subjected to the next level of molecular docking
calculations using “Standard Precision” (SP) algorithm. Compounds selected after
SP docking were further refined by “Extra Precision” (XP) method of Glide. On
the basis of XP docking scores, 10 compounds were selected for further analysis.
The binding affinity was calculated based on molecular mechanics generalized Born
surface area (MMGBSA) using Prime, (version 4.1, Schrödinger), and the interaction
studies using molecular docking and MMGBSA [15] revealed appreciable docking
scores and Gbind.

3 Results and Discussion

A. Sequence analysis
Pf RAMA protein sequence of 861 amino acids was primarily analyzed using
BLAST against the PDB database to find structurally categorized proteins that
display significant sequence resemblance to the objective protein utilizing the
evolutionary information by accomplishing profile-profile alignment and the
assessment of the likelihood that two proteins are correlated to each other as
shown in Fig. 1. Sequence identity and query coverage for the templates available
for Pf RAMA proteins were very low.

B. Modeling of structure of selected Pf RAMA protein


Three-dimensional structure of PfRAMA protein was constructed using
LOMETS as shown in Fig. 2. This meta-server used nine threading programs and
ensured a quick generation of the resultant structure. A predominantly helical
structure was generated with many loops as connectors.

C. Refinement of the structure model and minimization of energy


Predicted model was further refined by performing molecular dynamics simula-
tions for 50 ns using Desmond 2.2. To evaluate the structure deviation, RMSD
was calculated during the simulation based on initial backbone coordinates for
In Silico Modeling and Screening Studies of Pf RAMA … 95

Fig. 1 Graphical results of BLAST query for PfRAMA; the regions numbers from the target
database that lined up with the inquiry sequence

Fig. 2 Structure of modeled


PfRAMA protein

the protein as represented in Fig. 3. RMSD plot revealed that the structure was
stable after 35 ns.

D. Validation of modeled structure of PfRAMA


Validation of the protein structure was performed by SAVES server. The
Procheck results showed 92.7% residues were in allowed regions and 0.4%
96 S. Srivastava and P. Mathur

Fig. 3 50 ns molecular
dynamics simulations run of
Pf RAMA protein for
refinement of structure:
RMSD of heavy atoms and
back bone atoms

residues in disallowed region (Fig. 4). Thus, a good quality structure was gener-
ated and the refined structure with minimum energy was further used to perform
molecular docking studies.

E. Analysis of predicted PfRAMA protein binding site


The binding site of PfRAMA protein was computationally predicted to analyze
the protein for interaction with lead molecules. Five druggable binding sites were
predicted using Sitemap. Binding sites having site score 1.286 was selected for
receptor-grid generation as shown in Fig. 5.

Fig. 4 Validation of MD
simulated structure of
PfRAMA protein:
Ramachandran plot which
shows 92.7% residues were
in allowed regions and 0.4%
residues in disallowed region
In Silico Modeling and Screening Studies of Pf RAMA … 97

Fig. 5 Different predicted


binding sites of the
Pf RAMA protein

F. Molecular docking calculations of PfRAMA protein


The interaction of the Pf RAMA with the various ligands was studied using
molecular docking calculations using Glide, (version 6.8, Schrödinger). Total
189,530 compounds were used as input for HTVS in the first level of screening.
The result of HTVS gave 60,657 compounds, which were further used in the
next level of molecular docking calculations using “Standard Precision” SP. 1003
compounds selected after SP docking were further refined by “Extra Precision”
(XP) method of Glide docking. On the basis of XP docking scores, 54 compounds
were selected for further analysis. 10 compounds among the above, showing
promising leads with appreciable docking scores in the range of −17.891 to −
5.328 are shown in Table 1.

G. Binding affinity calculation


The binding free energy was derived using MMGBSA. Table 1 shows the selected
best ten compounds showing promising leads with appreciable docking scores
and a range of Gbind score from −91.547 to −80.351 kcal/mol. After analyzing
the binding mode, the ligand bearing the best (lowest) docking score and Gbind
value, namely ZINC08623270 was selected for further calculations analysis of
binding mode of the protein–ligand docked complex within the binding pocket
of PfRAMA showed hydrogen bonding (H-bond) patterns as shown in Fig. 6a, b.
A total of three hydrogen bonds were formed between the ligand and the protein,
one main chain hydrogen bond with Tyr625 and two main chain hydrogen bonds
with Asn614 (Fig. 6a, b).
98 S. Srivastava and P. Mathur

Table 1 Glide energy, docking score and MMGBSA (Gbind ) score of selected ligands
S. No. ZINC ID Glide energy Docking score MMGBSA
1 ZINC08623270 −48.440 −17.891 −91.547
2 ZINC03794794 −36.907 −11.029 −90.356
3 ZINC20503905 −35.241 −7.427 −85.930
4 ZINC67870780 −43.708 −5.547 −85.490
5 ZINC67870780 −47.18 −7.561 −85.490
6 ZINC22936347 −34.835 −6.461 −84.941
7 ZINC15672677 −34.502 −6.02 −84.845
8 ZINC09435873 −43.937 −6.53 −82.966
9 ZINC09435873 −37.125 −6.095 −82.966
10 ZINC20503855 −33.255 −5.328 −80.351

Fig. 6 a Ligand interaction


with Pf RAMA protein
showing hydrogen bonds
between the ligand and the
protein, b three-dimensional
structure fitting of ligand-1
with the binding site of
Pf RAMA protein showing
hydrogen bonds
In Silico Modeling and Screening Studies of Pf RAMA … 99

Fig. 7 Graph showing per residue energy a E vdw of ligand, b E ele of ligand

H. Per residue energy contribution


The amino acids present in binding site showed significant contribution to the van
der Waals (E vdw ) and electrostatic (E ele ) energy. Significant E vdw contribution
was made by amino acids such as Lys-40, Leu-39, Gly-3, Glu-116, Asn-41, Tyr-
60, and Gln-4 (Fig. 7a). Appreciable E ele energy contribution was made by the
residues, Lys-40, Lys-2, Lys-34, Lys-36, Met-1, Lys-153, and Arg-154 (Fig. 7b).

4 Conclusion

In conclusion, a good quality three-dimensional structure of Plasmodium falci-


parum rhoptry-associated membrane antigen (Pf RAMA) protein was determined
using comparative energy-based modeling method and molecular dynamics simula-
tions. Ramachandran plot showed 92.7% of backbone angles of Pf RAMA protein in
the allowed regions and 0.4% residues in disallowed region. A series of docking
100 S. Srivastava and P. Mathur

studies were performed and binding affinity of ligands with the protein eval-
uated. Out of ten molecules that showed appreciable docking scores and high
affinity toward the binding site of the protein, ZINC08623270 was selected for
further analysis. The popular name of the ligand was 1-(3-methylsulfanyl phenyl)-3-
[[5-[(4-phenylpiperazin-1-yl)methyl]quinuclidin-2-yl]methyl]urea. The interaction
between the protein and this ligand was stabilized by three hydrogen bonds,
hydrophobic as well as ionic interactions. Our data generates evidence that the
reported compounds indicate a potential binding to the target and need further exper-
imental evaluation. It is therefore proposed that this study could be the basis for
medicinal chemists to design better and efficient compounds which may qualify as
novel drugs against the said target of malaria, caused by Plasmodium falciparum.

References

1. Cowman, A. F., Healer, J., Marapana, D., & Marsh, K. (2016). Malaria: Biology and disease.
Cell, 167, 610–624.
2. WHO. (2015). The World Malaria Report http://wwwwhoint/malaria/publications/world-
malaria-report-2015/report/en/. ISBN 978 92 4 156515 8.
3. Bannister, L. H., Mitchell, G. H., Butcher, G. A., & Dennis, E. D. (1986). Lamellar membranes
associated with rhoptries in erythrocytic merozoites of Plasmodium knowlesi: A clue to the
mechanism of invasion. Parasitology, 92(2), 291–303.
4. Jaikaria, N. S., Rozario, C., Ridley, R. G., & Perkins, M. E. (1993). Biogenesis of rhoptry
organelles in Plasmodium falciparum. Molecular and Biochemical Parasitology, 57(2), 269–
279.
5. Preiser, P., Kaviratne, M., Khan, S., Bannister, L., & Jarra, W. (2000). The apical organelles of
malaria merozoites: Host cell selection, invasion, host immunity and immune evasion. Microbes
and Infection, 2(12), 1461–1477.
6. Blackman, M. J., & Bannister, L. H. (2001). Apical organelles of Apicomplexa: Biology and
isolation by subcellular fractionation. Molecular and Biochemical Parasitology, 117(1), 11–25.
7. Hallée, S., Boddey, J. A., Cowman, A. F., & Richard, D. (2018). Evidence that the Plasmodium
falciparum protein sortilin potentially acts as an escorter for the trafficking of the rhoptry-
associated membrane antigen to the Rhoptries. mSphere, 3 (1), e00551–17. https://doi.org/10.
1128/mSphere.00551-17.
8. Wu, S., & Zhang, Y. (2007). LOMETS: A local meta-threading-server for protein structure
prediction. Nucleic Acids Research., 35(10), 3375–3382.
9. Laskowski, R. A., MacArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK:
A program to check the stereochemical quality of protein structures. Journal of Applied
Crystallography, 26, 283–291.
10. Colovos, C., & Yeates, T. O. (1993). Verification of protein structures: Patterns of non-bonded
atomic interactions. Protein Science, 2(9), 1511–1519.
11. Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S., & Coleman, R. G. (2005). ZINC–a
free database of commercially available compounds for virtual screening. Journal of Chemical
Information and Modeling, 45, 177–182.
12. Greenwood, J. R., Calkins, D., Sullivan, A. P., & Shelley, J. C. (2010). Towards the comprehen-
sive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules
in aqueous solution. Journal of Computer-Aided Molecular Design, 24(6–7), 591–604.
13. Andrec, M., Harano, Y., Jacobson, M. P., Friesner, R. A., & Levy, R. M. (2002). Complete
protein structure determination using backbone residual dipolar couplings and sidechain
rotamer prediction. Journal of Structural and Functional Genomics, 2(2), 103–11.
In Silico Modeling and Screening Studies of Pf RAMA … 101

14. Friesner, R. A., Banks, J. L., et al. (2004). Glide: A new approach for rapid, accurate docking
and scoring. 1. Method and assessment of docking accuracy. Journal of Medicinal Chemistry,
47(7), 1739–49.
15. Jacobson, M. P., Friesner, R. A., Xiang, Z., & Honig, B. (2002). On the role of the crystal
environment in determining protein side-chain conformations. Journal of Molecular Biology,
320(3), 597–608.
IBRP: An Infrastructure-Based Routing
Protocol Using Static Clusters in Urban
VANETs

Pavan Kumar Pandey, Vineet Kansal, and Abhishek Swaroop

Abstract Vehicular ad hoc networks (VANETs) are a popular subclass of mobile


ad hoc networks (MANETs). These kinds of networks do not have a centralized
authority to control the network infrastructure. The data routing is one of the most
significant challenges in VANETs due to its special characteristics. In this paper, an
effective cluster-based routing algorithm for VANETs has been proposed. Unlike
other clustering approaches, RSUs have been considered as a fixed node in VANETs
and treated as cluster heads in this approach. Due to static clusters, the overhead to
create and to maintain clusters has been reduced. Multiple clusters in large network
headed by RSUs make routing more efficient and reliable. Three levels of routing
mentioned with this approach. At first level routing, the source vehicle itself is capable
to send data to the destination node. Second level routing needs RSU’s intervention
in routing and RSU finds the path from source to destination vehicle. At third level,
RSU broadcasts the message to other connected RSU to spread messages in a broader
way. The proposed approach is suitable for urban VANETs because of the availability
of the dense network with multiple RSUs. Some applications where the approach
may be useful are emergency help, broadcasting information, reporting to authority
or vehicles for any observation, and to collect nearby traffic-related information from
RSU of the respective cluster. The static analysis of the proposed approach shows
that the proposed approach is efficient, scalable, and able to reduce network overhead
in large and dense urban VANETs.

P. K. Pandey (B)
Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India
e-mail: pavan.pandey.1312@gmail.com
V. Kansal
Institute of Engineering and Technology, Dr. A.P.J Abdul Kalam Technical University, Lucknow,
India
e-mail: vineetkansal@ietlucknow.ac.in
A. Swaroop
Bhagwan Parashuram Institute of Technology, New Delhi, India
e-mail: abhishekswaroop@bpitindia.com

© The Editor(s) (if applicable) and The Author(s), under exclusive license 103
to Springer Nature Singapore Pte Ltd. 2021
A. Khanna et al. (eds.), Recent Studies on Computational Intelligence,
Studies in Computational Intelligence 921,
https://doi.org/10.1007/978-981-15-8469-5_9
104 P. K. Pandey et al.

Keywords Vehicular ad hoc networks (VANETs) · Infrastructure-based routing ·


Cluster-based routing · Static cluster based · Urban environment

1 Introduction

Vehicular ad hoc networks (VANETs) [1] are prominent subclass of mobile ad hoc
networks (MANETs). It provides a special kind of framework to allow communica-
tion among several vehicles. In VANETs, vehicles create a huge network (millions
of vehicles running on roads) without any centralized authority by considering
each vehicle acts as a network node and router itself. VANETs [2] are kind of
infrastructure-less self-organized networks where nodes are completely mobile.
Unique characteristics of VANETs such as dynamic topologies, limited bandwidth,
and limited energy made this one of most challenging network scenarios.
As per Fig. 1, every vehicle in VANETs is well equipped [3] with wireless devices
to communicate with other vehicles and road side units (RSUs). Nodes in VANETs
can communicate to each other by following either single-hop or multi-hop connec-
tivity. Each vehicles and RSU are part of VANETs as nodes in any networks. Vehicular
communication can be further categorized in two different kinds of communication
[4], one is vehicle-to-vehicle (V2V) and another one is vehicle-to-infrastructure (V2I)
communication. V2V supports communication among vehicles only. However, V2I
supports the inclusion of other nodes in communication as well like RSUs, traffic

Fig. 1 Architecture of VANET [5]


IBRP: An Infrastructure-Based Routing Protocol … 105

authorities, etc. Hence, vehicles can communicate directly if they are within their
communication range and communication beyond their range can be possible through
other infrastructure nodes.
VANETs are used to design intelligent transportation systems because of their
useful applications [6] such as transportation safety, traffic efficiency, and traffic
improvements. Transportation safety [7] includes message dissemination about
several alerts and warnings such as accident alerts, traffic situation alerts, poor road
conditions alert, lane change warnings, overtaking warnings, and collision warn-
ings. Traffic efficiency and improvement using VANETs focus to assist and enhance
traffic flow by following the current situation of traffic. It provides comfort driving
for drivers by sharing dynamic traffic information to several vehicles running on
roads. These applications are helpful to avoid road accidents that directly reduces a
lot of casualties on road.
VANETs are kind of network, whichever having a lot of challenges those are yet
to be addressed. These challenges include routing challenges, security challenges,
reduced signal quality, degraded signal strength, and quality of communication. Addi-
tionally, the rapid change of vehicle’s position makes setup, implement, and deploy
vehicular communication framework more difficult.
To increase cooperation between vehicles and other nodes, there should be a way to
transmit a messages from one node to another node that requires an effective routing
procedure. By following characteristics of VANETs, routing is the most important
research challenge in VANETs. Therefore, to design an effective and efficient routing
protocol in VANETs, it is a key requirement for providing reliable communication in
VANETs. Vehicles are not able to share messages among them without a well-defined
routing mechanism. In order to address routing issues, several routing protocols have
been already proposed in VANETs.
This approach also focuses on the problem of routing and new routing approach
has been proposed by the following efficient clustering technique. The major
contributions of the present exposition are as follows.
(1) A new cluster-based routing protocol for urban VANET’s has been proposed.
(2) A mechanism has been added in the proposed approach to avoid flooding of
messages in the networks.
(3) The cache routes are used to apply control broadcasting.
(4) Static performance analysis of the proposed approach shows that the proposed
approach is efficient and scalable.
The rest paper is organized in like way, where Sect. 2 explains routing challenges
and discusses few routing protocols as well. Section 3 captures all details of the
new routing approach proposed in this paper, and then the performance evaluation
continues in Sect. 4 that helps to understand an effectiveness and efficiency of the
designed routing approach, and the last Sect. 5 concludes the paper.
106 P. K. Pandey et al.

2 Related Work

The way of transmitting message from one node to another node known as
routing. Numerous different routing protocols are proposed based on different kinds
of network scenarios. Routing protocols are designed for completely connected
networks like MANETs cannot be used effectively and efficiently in VANETs as well.
Therefore, different kinds of routing procedures need to be designed by supporting
dynamic topology, frequent link breakage, and high mobility of nodes.
In VANETs, several routing protocols [8] have already been proposed to achieve
efficient routing between nodes. Efficiency and usability of a routing protocol can
be measured by different parameters of quality of services (QoS) such as end-to-end
delay, round trip delay, packet loss, jitter, and interference. Based on different routing
behavior used in protocols, numerous routing approaches [9] can be further classified
in different categories. As per Fig. 2, major routing protocols can be classified into
five different categories by following the routing mechanism used. Each category of
the protocol has been discussed separately.
Topology-based routing protocols [11] are based on information of complete
network topology. Each node in networks keeps track of every other nodes in
networks and maintain a routing table by capturing best route information to every
other nodes in the network. This mechanism can be further classified into three cate-
gories, proactive, reactive, and hybrid routing protocol. Destination-based distance
vector (DSDV) [12], ad hoc on-demand distance vector (AODV) [13], and zone
routing protocol (ZRP) [14] are few popular routing protocols from this category.
Position-based routing protocol [15, 16] uses the current position or location of
vehicles using a global positioning system (GPS) or any other related technology.

Fig. 2 Routing protocols in VANET [10]


IBRP: An Infrastructure-Based Routing Protocol … 107

This is also known as geographical-based routing protocols. Therefore, each nodes


in network just rely on position and location information collected from different
sources and transmit a message from one node to other accordingly. Distance Routing
Effect Algorithms for Mobility (DREAM) [17] and Greedy Perimeter Stateless
Routing (GPSR) [18] are two major representatives of this category.
Broadcast-based routing protocol uses flooding techniques to collect routing infor-
mation in VANETs. The broadcast scheme allows any nodes to send packets to
every other node. Hence, it consumes a lot of bandwidth in network to provide reli-
able communication. This scheme is very useful when any information needs to be
circulated to all neighbor nodes like emergency notification, etc. Distributed Vehic-
ular Broadcast Protocol [19] (DV-CAST) and Hybrid Data Dissemination Protocol
(HYDI) [20] are the most suitable protocols in this category.
Geocast routing protocol is a separate category of routing protocol that is based on
transmission type. Geocast routing is based on zone of relevance (ZoR) that contains
a group of vehicles with similar properties. Unlike the broadcast routing mechanism,
this routing defines a way to send packet from source to particular ZOR only. No
routing table or location information required for this approach. Robust Vehicular
Routing (ROVER) and Distributed Robust Geocast (DRG) are best examples of
geocast routing mechanisms.
Cluster-based routing protocol [21, 22] supports divide of a large network in
several sub-networks that are known as clusters. One cluster head to be elected for
each cluster that will be responsible for communication beyond the cluster. There
are different criteria can be used to form cluster such as direction of movement and
speed of vehicles. Location-Based Routing Algorithm with Cluster-Based Flooding
(LORA-CBF) and Cluster for Open IVC Network (COIN) are famous protocol on
this category.
Cluster-Based routing is one of the most popular routing categories nowadays.
Recently, a lot of enhancement already proposed under this category. Our contribution
also lies in the same category. Therefore, some recent enhancements in this category
have been presented.
Mobility-based and stability-based clustering scheme proposed in [23] is a clus-
tering technique to design routing protocol. This approach is suitable for urban areas
with the assumption that every vehicle should be aware about the position and location
of every other neighboring vehicle. According to the clustering approach proposed
in the paper, every cluster should have one cluster head, two gateway node, and rest
nodes are cluster members.
Vehicular multi-hop algorithm for stable clustering (VMaSC) is proposed in [24].
This is kind of multi-hop clustering approach allows to select cluster head based on the
relative speed with other neighbor vehicles and link stability. This approach supports
direct connection between cluster head and cluster member to offer connections with
reduced overhead.
Destination aware context-based routing protocol [25] with hybrid soft computing
cluster algorithm have been proposed. In this clustering approach, two different soft
computing algorithms are discussed. First one is hybrid clustering algorithm, that is,
108 P. K. Pandey et al.

the combination of geographic and context-based clustering approach. This combina-


tion reduces control overhead and traffic overhead in network. Another algorithm is
destination-aware routing protocol that controls routing of packets between clusters
and it improves routing efficiency.
An unsupervised cluster-based VANET-oriented evolving graph (CVoEG) model
associated reliable routing scheme [26] is also one of clustering-based routing
approaches. In order to provide reliable communication, existing VANET-oriented
evolving graph (VoEG) has been extended further by introducing clustering tech-
nique in existing model. In this paper, link stability is used to mark vehicles either
as cluster head or cluster member. Additionally, reliable routing scheme known as
CEG-RAODV has been discussed based on the CVoEG model.
Moving zone-based routing protocol (MoZo) has been proposed in [27]. This
approach proposes an architecture to handle multiple moving clusters where each
cluster contains vehicles based on their moving patterns. This approach explains
in details about the formation of moving clusters and maintenance of those clus-
ters with design of routing strategy through these clusters. Apart from these
approaches, numerous other clustering techniques are proposed. Therefore, the
clustering approach in recent trends on routing enhancements in VANETs.

3 Proposed Routing Approach

VANETs are a collection of several vehicles and roadside units (RSUs), where each
vehicle supposed to be equipped with onboard units (OBUs). Designated OBUs
help vehicles to communicate each other and communicate with other infrastructure
nodes also like RSUs. Vehicles in the network are limited to communicate with other
vehicles and RSUs within a certain limit of distance. OBUs embedded vehicles are
enough capable to maintain and broadcast traffic-related information to respective
RSUs such as current position of vehicles, speed, direction, current time, and traffic
events.
Infrastructure-based routing protocol (IBRP) is also based on the clustering
approach. Therefore, IBRP allows the complete network to be divided into several
clusters by following the clustering approach. In this approach, each vehicle repeat-
edly exchange messages with RSU within their range. Based on messages exchanged
with vehicles, each RSU forms a cluster with all vehicles within its communication
range. In every cluster, RSU acts as cluster head and other vehicles act as cluster
members. Therefore, RSU has to maintain information about all vehicles in the
respective cluster.
IBRP: An Infrastructure-Based Routing Protocol … 109

3.1 System Model

VANETs contain hierarchical architecture that consists of multiple levels. At the


top of the hierarchy, transport authority (TA) or traffic control authority (TCA) is
responsible to monitor and manage traffic conditions. TA or TCA uses Web server
(WS) or application server (AS) to monitor vehicles and RSUs through a secure
channel. Then, RSUs situated at third level and works as gateway router for all
vehicles at the lowest level.
Generally, RSUs have higher computational power and a higher transmission
range than OBUs equipped with vehicles. Therefore, RSUs are expected to be placed
in high-density area like intersection point of multiple roads, etc. Based on the trans-
mission capability of vehicles and RSUs, we assume that every vehicles can commu-
nicate with other vehicles within the distance range of up to 500 m. In the same time,
vehicles in the same cluster can communicate through their RSUs within the range
of up to 1000 m. A further assumption is for a vehicle’s identity and each vehicle
assumed to have some unique identification number in VANETs. Vehicle identifica-
tion number could be registration number itself or some other unique number derived
from registration number using some algorithms. Use of registration numbers will
help vehicles to initiate communication toward any random vehicles.
In this approach, the road network of the urban areas has been represented in form
of a graph. In the designed graph, each edge represents any road and each vertex
represents an intersection point of multiple roads. RSUs deployed in networks are
marked as few highlighted vertices in the graph. Let one road (R) segment where
source vehicle (S) wants to send message to destination vehicle (D), where the speed
of vehicle represented by V 1 at the timestamp of T 1 and direction of movement varies
from −1 to 1. The direction is −1 when vehicle moves from respective RSU and +
1 if the vehicle moves toward respective RSU. In case of no movement direction of
vehicle movement can be considered 0. Then, RSU can calculate the further position
of the vehicle in different timestamp by itself and send that information to other
vehicles in the cluster.
Each RSU receives all such information from different vehicles within range
like identity, speed, direction of movement, etc. RSU maintains routing table for all
vehicles by analyzing provided details and keep this routing information updated
periodically.

3.2 Data Structure and Message Format

Apart from a logical explanation, implementation insights also needed to get


complete details of the algorithms. Therefore, the required data structures and format
of different messages are mentioned here. As part of the implementation, we need
to have two separate entities in protocol one is a vehicle and another one in RSU.
110 P. K. Pandey et al.

Data structure at vehicle:Data structure at vehicles Vi; for (0 < i < N) in VANETs
of N vehicles.
ID(i): Unique identification number of vehicle i.
S(i): Current state of vehicle i.
V(i): Current speed of vehicle i.
D(i): Direction of movement of vehicle i. where Di belongs to (−1, 0, 1).
L(i): Current location of vehicle i.
R(i): Identification number of RSU as cluster head.
Neighbors (i): Map of neighbor vehicles maintained on vehicle i. it will be defined
as map < ID(i), Path (i) > where path(i): List of nodes to traverse to reach V(i).
RSU(i): List of RSUs in communication range.
Data structure at RSU: Data structure at RSU R(j); for (0 < j < M) in VANETs
of M RSUs.
ID(j): Unique identification number of RSU j.
S(j): Current state of vehicle i.
Members (j): Map of member vehicles maintained on RSU (j). it will be defined
as map < ID(i), Path (i) > where ID(i) is identification number of vehicle and
path(i) is list of nodes to traverse to reach V(i).
Old_members (j): Map of old member vehicles maintained on RSU (j). it will be
map < ID(i), R(x) > where ID(i) is identification number of recently left vehicle
and R(x) is identification number of current RSU of that vehicle.
RSU(j): List of RSUs directly connected to R(j).
Message format: There are several messages are to be used in this protocol that
needs to be defined for better understanding of approach.
HELLO {ID(i), V(i), D(i), L(i)}: Message to be sent from vehicle to RSU while
joining cluster.
HELLO_ACK {R(j), Neighbor(i)}: Message to be sent from RSU to vehicle in
response of HELLO.
BYE {R(i)}: Message sent by vehicle to respective RSU, while leaving any cluster.
BYE_ACK {NONE}: response from RSU to vehicle, in response of BYE.
MESSAGE {Vs(i), Vd(i), Path(Vd(i)), String}: structure to keep information
related to data to be sent from one node to another. Where Vs(i) is source
vehicle identification number, Vd(i) is destination vehicle identification number,
Path(Vd(i)) is path traversed so far to reach destination vehicle, and string is data
to be sent.
Some data structures to be used for search operation frequently like a list of
neighbors maintained on vehicles and members maintained on RSU. Therefore, a
map is used for these data structures to reduce the complexity of search operation on
these data structures. Initialization of all data structures for both entities is explained
below.
IBRP: An Infrastructure-Based Routing Protocol … 111

Initialization of vehicle V(i):


for (i = 1 to N)
set Si = IN; set IDi;
fetch nearby RSU; set list RSUs(i);
set R(i) = none; neighbors(i) = none;
wait on queue of V(i);
end for
Initialization of RSU R(j):
for (j = 1 to M)
set Sj = IN; set IDj;
fetch all RSU connected directly; set list RSU (j);
set members(j) = none; set old_members = none;
wait on queue of R(j)
end for

3.3 Cluster Formation

The cluster formation process starts as soon as the vehicle starts and ready to commu-
nicate with nearby RSU. Once OBU equipped with the vehicle is ready with vehicle
details including movement details. It will send the “HELLO” message to all nearest
RSU. Hello, message consists identity of a vehicle (Id), the current location of the
vehicle (Lt), speed of the vehicle (Vt), direction of movement (Dt), and current times-
tamp (t). RSU receives a message, analyze details received and update their routing
information w.r.t to that vehicle. Based on provided traffic information, RSU fetches
all neighbors of the new joined vehicle and publish neighbor list back to the vehicle
in “HELLO_ACK” message that is designated as a response of “HELLO” message.
Same “HELLO” message should be sent periodically by vehicle until or unless it
receives response from any RSU or vehicle crosses intersection points.
After crossing the intersection point, there is a high probability of change in
traffic parameters like direction after turning their way, speed based on new road
condition, etc. Therefore, OBU again prepares a new set of data and starts sending
a new “HELLO” message to selected RSUs after crossing the intersection point.
By following this way of communication, RSU will have routing information of all
vehicles within its range and each vehicle should have all other neighbor’s infor-
mation that can be reachable directly. RSU is known as master vehicle here, as it
maintains routing information of all vehicles within its range and responsible to keep
information updated.
Cluster state transition: In proposed algorithms, at any moment, every vehicle
is marked as one of these five states: initial (IN), start election (SE), wait response
(WR), cluster member (CM), and isolated member (IM).
112 P. K. Pandey et al.

Initial (IN)—Initial state of the vehicle is state before joining any of cluster. Any
new vehicle shall be treated in this state for a certain period of time.
Start election (SE)—In this state, vehicles try to join the relevant clusters.
After expiring initial timer, vehicle changes state from IN to SE and starts sending
HELLO messages to all neighbors. For RSU, in this state, RSU is ready to process
HELLO/BYE request and respond accordingly.
Wait response (WR)—After sending HELLO message, vehicle change state
from SE to WR. In this state, vehicle waits response from respective RSU to become
members of that cluster. In case no response received within a certain period of time,
vehicle moved to SE state again.
Cluster member (CM)—After successfully exchange of HELLO messages and
HELLO_ACK from RSU, vehicle changes state to CM because the vehicle is now
part of cluster. If the vehicle supposed to change cluster, then the vehicle state gets
changed to SE again, after exchanging BYE and BYE_ACK messages.
Cluster head (CH)—This state corresponds to RSU only. After responding
HELLO_ACK for HELLO request, RSU supposed to be marked in CH state from
IN. For RSU, after cleaning up complete cluster RSU changes its state from CH to
IN again.
Isolated member (IM)—Vehicles are not part of any cluster to be marked in IM
state. Vehicles either completed a trip or changing cluster moved to IM state.
To illustrate proper state transition, few events are also designated to understand
state flow properly.
INIT_T—This event corresponds to the timer of 30 s initially to settle OBU and
to get ready to exchange messages.
HELLO—“HELLO” event signifies messages triggered from vehicle to RSU
while cluster formation.
HELLO_ACK—“HELLO_ACK” event is the response of “HELLO” message
from RSU to vehicle. It is to confirm that cluster formed properly and respective
RSU is cluster head.
BYE—BYE event signifies case of leaving cluster and the vehicle sends a BYE
message to RSU while leaving the cluster.
BYE_ACK—“BYE_ACK” event triggered by RSU in response of “BYE”, when
gracefully exit to be performed between vehicle and RSU.
WAIT_T—This event corresponds to the timer of 20 s. This is time to wait for
response of “HELLO” and “BYE” messages.
DROP_T—This event corresponds to timer of 20 s to wait whether the request
is dropped by RSU or not.
START—This event gets triggered, whenever OBU gets power on with vehicle
gets started.
STOP—As same as “START” this event gets triggered, whenever OBU gets
power off with vehicle gets stopped.
The state diagram presents state transition flow based on triggered events. These
state diagrams help to understand the complete flow of vehicles and RSUs. In this
approach, state diagrams for vehicles and RSU captured separately to explain their
roles properly.
IBRP: An Infrastructure-Based Routing Protocol … 113

First, we talk about the state diagram of the vehicle presented in Fig. 3, where
the vehicle starts from IN state initially. First event is “INIT_T” occurred on expiry
of the respective timer. This timer shall be started for a fixed time period of 30 s
to stabilize the vehicle and enable them to have proper data for joining the cluster.
This event cause changes in state from IN to SE state. Vehicle in SE state start
communicating with nearby RSUs to join the correct cluster. In SE state, the vehicle
starts sending “HELLO” messages to RSUs and moved to WR state. “HELLO_ACK”
event occurred when respective RSU responds to the “HELLO” message received
from vehicle. Vehicles in WR state receives HELLO_ACK and change their state to
CM state. In case no response received within 20 s, then “WAIT_T” event-triggered
and vehicle changes state back to SE. If any vehicle movement pattern forced vehicle
to leave cluster, then the respective vehicle has to convey its RSU for the same. For
leaving cluster or changing cluster “BYE” and “BYE_ACK” events are marked
in the proposed approach. BYE event specifies vehicle intimation to RSU before
leaving any cluster but that keeps the vehicle still in CM state until or unless the
BYE_ACK event gets triggered. BYE_ACK expected to be triggered when BYE
message properly responded by RSU. After receiving BYE_ACK, the state will be
changed to SE state again. The last event is “STOP” initiated by a vehicle when OBU
finds vehicles shut down after completing current trip and state moved from CM to
IM state. So in the state of IM, a vehicle state again gets changed to IN after receiving
the “START” event that gets triggered when OBU finds the vehicle started.
After the vehicle state diagram, we talk about state diagram of RSU presented in
Fig. 4. RSU also starts form IN state and ready to receive messages from vehicles.
In first, HELLO event occurred when RSU receives the “HELLO” message from
any vehicle. Then, RSU changes its state from IN to SE state and start analyzing
data received from the vehicle. If RSU finds that vehicle is part of the cluster, then
RSU responds vehicle with “HELLO_ACK” and changes state to CH state. For
subsequent HELLO and BYE messages, RSU will remain in CH state only. However,
it will change the state from CH to SE for processing further requests. In case RSU
does not find request suitable enough to respond due to any reason, DROP_T event
occurred and RSU moved to CH state again. After responding to all those requests
by HELLO_ACK and BYE_ACK, RSU changes its state to CH state to process other

Fig. 3 State diagram of


vehicles
114 P. K. Pandey et al.

Fig. 4 State diagram of


RSU

requests. In case of BYE_ACK request or STOP received from the last vehicle in a
cluster, RSU changes its state from CH to IN again.
(1) Clustering procedures: Cluster formation starts as soon as OBU gets ready. OBU
equipped vehicle is expected to prepare and send traffic-related information to
RSU. For detail understanding of approach, algorithms and pseudocodes are
mentioned below.
Algorithms and Pseudocodes: Step-wise step procedure of cluster formation are
captured with pseudocode of respective algorithms in this paper that gives detail
insights of the idea proposed here.

3.3.1 Vehicle Side

1. The vehicle prepares and sends HELLO message to RSU, with all relevant
information.
2. Start timer T for time period 20 s.
3. Receives HELLO_ACK from RSU and updates routing table information.
4. If timer T gets expired.
5. Then repeat steps 1–3 again.

3.3.2 RSU Side

1. Receives HELLO request and check the communication range of the vehicle.
2. If vehicle within communication range of RSU.
3. Then add vehicle entry on RSU.
4. Prepare HELLO_ACK and send back to the vehicle.
5. Otherwise, drop HELLO message.
IBRP: An Infrastructure-Based Routing Protocol … 115

For sending HELLO message to RSU:


if V(i) starts OR signal strength gets weaker
calculate and set value for Li, Vi and Di;
encode HELLO message with all above data;
for (j = 1 to size of RSU(i) )
send HELLO to Rj; start wait timer; set Si = WR;
end for
end if
On receiving HELLO from vehicle:
if HELLO received in the queue of R(j) from V(i)
set Si = SE; start timer for 20 secs;
parse Li, Vi, Di; check range for vehicle V(i)
if (if vehicle in range)
Add vehicle V(i) in members(j); process HELLO_ACK to vehicle
else if (vehicle not in range OR timer gets expired)
Drop HELLO message;
end if
end if

For sending HELLO_ACK message to vehicle:


if vehicle V(i) found in the range of R(j)
add vehicle V(i) in member (j);
prepare HELLO_ACK data; set R(j);
fetch neighbor list for V(i); add list in HELLO_ACK;
send HELLO_ACK to V(i);
set Si = CH;
else
drop HELLO message;
end if
On receiving HELLO_ACK from RSU:
if HELLO_ACK received from R(j)
stop wait timer; set Si = CM;
parse HELLO_ACK; set neighbors(i) of V(i);
R(i) = R(j);
end if

3.4 Routing

This section covers the procedure to send a message from source to destination
by taking advantage of the proposed clustering approach. The complete network
now is classified in several clusters and each cluster has been controlled by RSU.
Additionally, every vehicle will have a list of neighbor to send message directly.
Therefore, two kinds of communication frameworks will be supported. One when
116 P. K. Pandey et al.

source and destination node lies in communication range of same RSU. And other
one, when source and destination belong to the communication range of different
RSUs.
(1) Intra-cluster routing: Intra-cluster routing explains mechanism when source
vehicle and destination vehicles belong to the same cluster and communication
to happen within the same cluster. In this case, source node checks list of their
neighbors first those are directly reachable. If destination node belongs to that
list, then source node forwards the message [Si, NONE, M, Di] to destination
vehicle directly. Where Si is unique identifier of source vehicle, “NONE” indi-
cates that destination directly reachable from source, Di is a unique identifier
of destination vehicle, and M is information to transmit
In case of destination does not belong to a list of neighbors then source node
will forward the message [Si, NONE, M, Di] to RSU. Then, RSU checks their
routing table and finds next node toward destination node Di and forwards the
message to that node after adding RSU ID in a list of hops [Si, Ri, M, Di]. By
following the same mechanism that RSU will forward the message to further
nodes by adding their identifier details, until the message reaches to destination
node. That list of nodes traverse will be saved by destination node and that
will be used by back-trace message if some immediate reply back needed for
message instead of preparing a new route again. Destination will keep that path
record data up to a certain time interval, after that route data will be removed.
(2) Inter-cluster routing: Inter-cluster routing specifies the way of communication
between vehicles belong to different clusters. In case of communication between
vehicles from different clusters, RSU will not find destination nodes in their
routing table after receiving a message from the source node. Then, RSU first
check for a list of vehicles whichever is associated with this RSU earlier. If
destination does not belong in that list also, then RSU will broadcast messages
to all other directly reachable RSUs by adding their address into message [Si, Ri,
M, Di]. Further, next RSU will check its routing table and list of earlier associated
vehicles. If the destination is associated with this RSU earlier, then new RSU
will be tracked. Otherwise, the message will be broadcasted to RSUs again after
adding the identity of the current RSU. If RSU will not find destination vehicle
after broadcasting up to two-level, RSUs will drop messages to avoid network
overhead further.
While changing cluster, old RSU will keep information of new RSU up to 60 min
by assuming that the vehicle will be associated with new RSU up to an hour. It
will help to increase message delivery percentage with reduced network overhead.
Therefore, while processing messages, RSU will check vehicles in their cluster, then
it will check the list of vehicles maintained by RSU earlier. If destination vehicle
belongs to that list, then RSU will forward a message to the new RSU. It will increase
the probability to reach destination nearby new RSU elected for that node.
End-to-end routing algorithm and pseudocode: complete end-to-end routing algo-
rithm is mentioned below including inter-cluster and within cluster routing. Routing
IBRP: An Infrastructure-Based Routing Protocol … 117

algorithm mentioned here in term of pseudocode only, by taking the implementation


of the algorithm into consideration.
On sending message (M) from source vehicle (Vs) to destination vehicle (Vd):

/*****Vehicle Side*****/
if message M is valid message
for x = 1 to size of neighbors (Vs)
if (neighbors(Vs)[x]) equal to Vd
extract path(Vd) from neighbors (Vs);
set path (Vd) to MESSAGE;
set Vs, Vd, M to MESSAGE;
dispatch MESSAGE to send;
else
set R(Vs) as destination in MESSAGE
set Vs, M to MESSAGE;
set Vs in path (Vd);
dispatch MESSAGE to send;
endif
endif

/***** RSU Side *****/


if R(j) receives valid message from Vs
parse MESSAGE and fetch Vs, Vd, path (Vd) and M;
fetch broadcast counter from MESSAGE
for x =1 to size of members(R(j))
if members of R(j)-[x] equal to Vd
fetch path(Vd) from members(R(j))[x];
add R(j) in path (Vd);
set Vs, Vd, path(Vd), and M in MESSAGE;
send MESSAGE to next node towards path (Vd);
else if old_members of R(j)-[x] equal to Vd
get RSU-[x] from old_members of R(j)-[x]
add R(j) in path (Vd);
set Vs, Vd, path(Vd), and M in MESSAGE;
send MESSAGE to RSU[x];
else if number of broadcast < 2
fetch RSU(j);
increment broadcast counter in MESSAGE;
set Vs, Vd, path(Vd), and M in MESSAGE;
for y = 1 to size of RSU(j)
send MESSAGE to RSU(j)-[y];
else
drop MESSAGE;
end if
end if
118 P. K. Pandey et al.

3.5 Cluster Maintenance

As part of maintenance, the vehicle changing cluster is a major challenge to address


in this approach. Cluster size will be the same every time, as RSU is in a fixed position
in VANETs. Therefore, this cluster will be almost static and only cluster members
will be changed from time to time. RSU will have a certain formula based on details
received from OBU associated with vehicles. RSU will check whether the vehicle is
in range or not, using that formulae. On the vehicle side, same formulae will be used
to check whether cluster change is required or not.

Assumed Distance = Current distance ± (Direction


× expected distance to be traversed in next 10 minutes)

where direction would be either −1 or +1 based on the direction of movement.


Moving toward RSU direction will be −1 moving opposite direction of RSU then the
direction will be +1. Assumed distance should be within a range of communication.
While changing cluster first vehicle will ask for new RSU and will send BYE to
current RSU after joining a new cluster. The vehicle will send BYE to old RSU with
information of the new RSU. So that RSU will have new RSU information up to a
certain time period. It will be used while sending any message to that vehicle via
new RSU directly without broadcasting to other RSUs.
Algorithms and Pseudocode: Step-wise step procedure of cluster transition are
captured with pseudocode of respective algorithms in this paper that gives insights
of handoff technique used in this approach.
Vehicle Side
1. Vehicle sends BYE message to RSU with relevant information.
2. Start timer T for time period 20 s.
3. Receives response from RSU and cleanup cluster information.
4. If timer T gets expired.
5. Then repeat steps 1–3 again.
RSU Side
1. Receives BYE request and check communication range of vehicle.
2. If vehicle beyond communication range of RSU.
3. Then remove vehicle entry from RSU.
4. Prepare BYE_ACK and send to vehicle.
5. Otherwise, drop BYE message.
IBRP: An Infrastructure-Based Routing Protocol … 119

For sending BYE message to RSU:


if signal strength gets weaker for vehicle V(i)
prepare BYE message; set R(i) in a message;
start wait timer; set Si = WR;
send BYE to RSU R(i);
end if
On receiving BYE from vehicle:
if BYE received on queue
parse BYE message;
set old_members (j); set S(j) = SE;
start drop timer;
end if
For sending BYE_ACK message to vehicle:
if BYE processed successfully from V(i)
prepare BYE_ACK data;
stop drop timer; set S(j) = CH;
send BYE_ACK to V(i);
end if
On receiving BYE_ACK from RSU:
if BYE_ACK received from R(j)
stop wait timer;
parse HELLO_ACK;
reset entry of R(j) from R(i).
end if

4 Performance Analysis

The performance of routing protocols for VANETs is measured in terms of message


overhead, message delivery time, and probability of packet loss. As far as message
overhead and message delivery time are concerned three cases are possible:
1. The destination is in the neighbor list: In this case, the source will directly send
the message to the destination and one message will be required and message
delivery time will be T (where T is maximum message propagation delay.
2. The destination is not in the neighbor list but the same cluster: The source will
forward the request to its current RSU which will, in turn, forwards the message
to the destination. Thus, the two messages will be required and message delivery
time will be 2 T.
3. The destination is neither in the neighbor list of node nor in the member list of
RSU: In this case, the following subcases are possible:
• The destination was previously associated with RSU: The next RSU for the
destination is known to current RSU, hence, the message will be forwarded to
120 P. K. Pandey et al.

the next RSU which in turn will forward the message to the destination. Thus,
the message overhead will be three messages (Source → RSU, RSU → next
RSU, Next RSU → destination) and the message delivery time will be 3 T.
• The destination was not previously associated with RSU: The current RSU
will broadcast the message to all neighboring RSU. These neighboring RSUs
will check their respective member list and if anyone finds it in the member
list, they will forward the message to the destination. In this case, the number
of messages required will be n + 2 messages (Source → RSU, RSU → All
Neighboring RSU, Next RSU → destination) and the message delivery time
will be 3 T. However, if No RSU contains the destination as a member but has
the information about next RSU, the message will be forwarded to the next
RSU which in turn will forward the message to the destination. In this case,
n + 3 messages will be required and message delivery time will be 4 T.
If none of the cases is satisfied, the message will not be delivered. However,
the applications considered are such that this is highly probable that the destination
will be near the source only. Hence, it is highly unlikely that the destination is not
covered even by the two-hop away RSU’s from the current RSU. Thus, the probability
of message loss is very low.

5 Conclusion

In the present exposition, an effective and efficient clustering-based routing protocol


for urban VANETs has been presented. In order to divide a large network in several
clusters, infrastructure nodes such as RSUs have been used as cluster head. Therefore,
this approach gives us an advantage over other clustering approaches by supporting
static clusters in terms of cluster size and range. Further, it makes the proposed
approach simple, precise, and scalable. In order to reduce network load for broad-
casting packets, some cache routes are used and controlled broadcasting is used.
Our approach supports broadcast of the message up to two levels only which avoids
flooding. Based on approach assumptions and suitability, it is recommended for a
well-connected areas such as urban areas. The static performance analysis of the
proposed approach proves the scalability and efficiency. The dynamic performance
evaluation of IBPR and making it secure is left as a future work.

References

1. Basagni, S., Conti, M., & Giordano, S. (2013). Mobile Ad hoc networking: Cutting edge
directions. Book Second Edition: Willey IEEE Press Publisher.
2. Moridi, E., & Hamid, B. (2017). RMRPTS: A reliable multi-level routing protocol with Tabu
search in VANET. Telecommunication Systems,65(1), 127–137.
IBRP: An Infrastructure-Based Routing Protocol … 121

3. Kasana, R., & Sushil, K. (2015). Multimetric Next Hop Vehicle Selection for geocasting in
vehicular Ad-Hoc networks. In International Conference on Computational Intelligence &
Communication Technology (CICT) (pp. 400–405). IEEE.
4. Dua, A., Kumar, N., & Bawa, S. (2014). A systematic review on routing protocols for vehicular
Ad Hoc networks. Vehicular Communications, 1(1), 33–52.
5. Ahmad, I., Noor, R. M., Ahmedy, I., Shah, S. A. A., Yaqoob, I., Ahmed, E., & Imran, M.
(2018). VANET–LTE based heterogeneous vehicular clustering for driving assistance and route
planning applications. Elsevier Computer Networks, 145, 128–140.
6. Fekair, M., Lakas, A., & Korichi, A. (2016). CBQoS-VANET: Cluster-based artificial bee
colony algorithm for QoS routing protocol in VANET. In International Conference on Selected
Topics in Mobile & Wireless Networking (MoWNeT), (pp. 1–8).
7. Singh, S., & Agrawal, S. (2014). VANET routing protocols: Issues and challenges. In
Proceedings of RAECS-2014 UIET Panjab University Chandigarh (pp. 205–210).
8. Sharma, Y. M., & Mukherjee, S. (2012). A contemporary proportional exploration of numerous
routing protocol in VANET. International Journal of Computer Applications (0975–8887).
9. Singh, S., & Agrawal, S. (2014). VANET routing protocols: Issues and challenges. In
Proceedings of 2014 RAECS UIET Panjab University Chandigarh (pp 205–210). IEEE.
10. Altayeb, M., & Mahgoub, I. (2013). A survey of vehicular Ad hoc networks routing protocols.
International Journal of Innovation and Applied Studies, 3(3), 829–846.
11. Singh, S., & Agrawal, S. (2014). VANET routing protocols: Issues and challenges. In
Proceedings of IEEE Recent Advances in Engineering and Computational Sciences (RAECS)
(pp. 1–5).
12. Dhankhar, S., & Agrawal, S. (2014). VANETs: A survey on routing protocols and issues.
International Journal of Innovative Research in Science, Engineering and Technology, 3(6),
13427–13435.
13. Perkins, C., Belding-Royer, E., & Das, S. (1997). Ad hoc on-demand distance vector (AODV)
routing. In Proceedings of 2nd IEEE WMCSA (pp. 90–100).
14. Haas, Z. J. (1997). The Zone Routing Protocol.
15. Kumar, S., & Verma, A. K. (2015). Position based routing protocols in VANET: A survey.
Wireless Personal Communications, 83(4), 2747–2772.
16. Liu, J., Wan, J., Wang, Q., Deng, P., Zhou, K., & Qiao, Y. (2016). A survey on position-based
routing for vehicular ad hoc networks. Telecommunication Systems, 62(1), 15–30.
17. Basagni, S., Chlamtac, I., Syrotiuk, V., & Woodward, B. (1998). A distance routing effect
algorithm for mobility (DREAM). In Proceedings of ACM International Conference on Mobile
Computing and Networking, Dallas, TX, pp. 76–84.
18. Karp, B., & Kung, H. (2000). Greedy perimeter stateless routing for wireless networks.
In Proceedings of ACM International Conference on Mobile Computing and Networking
(MobiCom 2000), Boston, MA, pp. 243–254.
19. Tonguz, O. K., Wisitpongphan, N., & Bai, F. DV-CAST: A distributed vehicular broadcast
protocol for vehicular ad hoc networks. IEEE Wireless Communication, 1.
20. Maia, G., André, L. L., Aquino, D., Viana, A. C., Boukerche, A., Loureiro, A. A. F. (2010).
HyDi: A hybrid data dissemination protocol for highway scenarios in vehicular ad hoc
networks, DIVANet@MSWiM, pp. 47–56.
21. Luo, Y., Zhang, W., & Hu, Y. (2010). A new cluster based routing protocol for VANET. In
Proceedings of the 2nd International Conference on Networks Security Wireless Communica-
tions and Trusted Computing, IEEE Xplore Press, Wuhan, Hubei, China, pp. 176–180.
22. Zhang, Z., Boukerche, A., & Pazzi, R. (2011). A novel multi-hop clustering scheme for vehic-
ular ad-hoc networks. In Proceedings of the 9th ACM International Symposium on Mobility
Management and Wireless Access (pp. 19–26).
23. Ren, M., Khoukhi, L., Labiod, H., Zhang, J., & Vèque, V. (2017). A mobility-based scheme
for dynamic clustering in vehicular ad-hoc networks (VANETs). Vehicular Communications,
9, 233–241.
24. Ucar, S., Ergen, S. C., & Ozkasap, O. (2015). Multihop-cluster-based IEEE 802.11p and LTE
hybrid architecture for VANET safety message dissemination. IEEE Transactions on Vehicular
Technology, 65(4), 1–1.
122 P. K. Pandey et al.

25. Aravindan, K., Suresh, C., & Dhas, G. (2018). Destination-aware context-based routing
protocol with hybrid soft computing cluster algorithm for VANET. Journal of Soft Computing,
1–9.
26. Khan, Z., Fan, P., Fang, S., & Abbas, F. (2018). An unsupervised cluster-based vanet-oriented
evolving graph (CVoEG) model and associated reliable routing scheme. IEEE Transactions on
Intelligent Transportation Systems.
27. Lin, D., Kang, J., Squicciarini, A., et al. (2017). MoZo: A moving zone based routing protocol
using pure V2V communication in VANETs. IEEE Transactions on Mobile Computing, 16(5),
1357–1370.

You might also like