Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

IITM Journal of Management and IT

Indexed in:
Google Scholar, EBSCO Discovery, indianjournals.com & CNKI Scholar
(China National Knowledge Infrastructure Scholar)

 Monitoring water quality by sensors in Wireless Sensor Networks-A Review


Rakesh Kumar Saini
 Multimode Summarized Text to Speech Conversion Application
Archit Sehgal , Gitika Khanna
 Aadhaar Smart Meter: A Real-Time Bill Generator
Ramneek Kalra, Vijay Rohilla
A Review on Optical Character Recognition
Archit Singhal, Bhoomika
 Prediction of Heart Attack Using Machine Learning
Akshit Bhardwaj, Ayush Kundra, Bhavya Gandhi, Sumit Kumar, Arvind Rehalia,.
Manoj Gupta
 Detection and Prevention Schemes in Mobile Ad hoc Networks
Jeelani, Subodh Kumar Sharma, Pankaj Kumar Varshney
 A Review on Histogram of Oriented Gradient
Apurva Jain, Deepanshu Singh
 Breast Cancer Risk Prediction
Pankaj Kumar Varshney, Hemant Kumar , Jasleen Kaur, Ishika Gera
 CYBER: Threats in Social Networking Websites and Physical System Security
Tripti Lamba Ashish Garg
 Comparative Analysis of Different Encryption Techniques In Mobile Ad-Hoc
Networks
(MANETS)
INSTITUTE OF INFORMATION TECHNOLOGY AND MANAGEMENT Apoorva Sharma, Gitika Kushwaha
NAAC & NBA Accredited, Approved by AICTE, Ministry of HRD, Govt. of India  Fuchsia OS - A Threat to Android
Category 'A' Institute, An ISO 9001: 2015 Certified Institute Taranjeet Singh , Rishabh Bhardwaj
Affiliated to Guru Gobind Singh Indraprastha University, Delhi
D-29, Institutional Area, Janakpuri, New Delhi - 110058
 Sentiment Analysis using Lexicon based Approach
Tel: 91-011-28525051, 28525882 Telefax: 28520239 Rebecca Williams, Nikita Jindal, Anurag Batra
E-mail: director@iitmipu.ac.in, journal@iitmipu.ac.in
Website: www.iitmjanakpuri.com, www.iitmipujournal.org ISSN (PRINT): 0976-8629
ISSN (ONLINE): 2349-9826
www.iitmipujournal.org
ISSN (Print): 0976-8629 www.iitmipujournal.org
ISSN (Online): 2349-9826

Indexed in:
Google Scholar, EBSCO Discovery, indianjournals.com & CNKI Scholar (China
National Knowledge Infrastructure Scholar)

IITM Journal of Management and IT


Volume 10 Issue 1 January - June 2019

CONTENTS

Research Paper & Articles

• Monitoring water quality by sensors in Wireless Sensor Networks-A Review 1


Rakesh Kumar Saini
• Multimode Summarized Text to Speech Conversion Application 6
Archit Sehgal , Gitika Khanna
• Aadhaar Smart Meter: A Real-Time Bill Generator 11
Ramneek Kalra, Vijay Rohilla
• A Review on Optical Character Recognition 15
Archit Singhal, Bhoomika
• Prediction of Heart Attack Using Machine Learning 20
Akshit Bhardwaj, Ayush Kundra, Bhavya Gandhi, Sumit Kumar,
Arvind Rehalia,. Manoj Gupta
• Detection and Prevention Schemes in Mobile Ad hoc Networks 25
Jeelani, Subodh Kumar Sharma, Pankaj Kumar Varshney
• A Review on Histogram of Oriented Gradient 34
Apurva Jain, Deepanshu Singh
• Breast Cancer Risk Prediction 37
Pankaj Kumar Varshney, Hemant Kumar , Jasleen Kaur, Ishika Gera
• CYBER: Threats in Social Networking Websites and Physical System Security 46
Tripti Lamba Ashish Garg
• Comparative Analysis of Different Encryption Techniques In Mobile Ad-Hoc
Networks (MANETS) 55
Apoorva Sharma, Gitika Kushwaha
• Fuchsia OS - A Threat to Android 65
Taranjeet Singh , Rishabh Bhardwaj
 Sentiment Analysis using Lexicon based Approach 68
Rebecca Williams, Nikita Jindal, Anurag Batra

Volume 10, Issue 1 • January-June, 2019


Monitoring water quality by sensors in Wireless Sensor
Networks-A Review
Rakesh Kumar Saini
Department of Computer Science & Application, DIT University, Dehradun
Uttrakhand, India
rakeshcool2008@gmail.com

Abstract– Monitoring water quality is critical to has an onboard radio that can be used to send the
human health, hence employing wireless sensor collected data to interested parties. Such
netwoks for such a task requires a system that is technological development has encouraged
robust, secure and has a reliable communication. practitioners to envision aggregating the limited
Water borne diseases have become a major challenge capabilities of the individual sensor in a large scale
to human health. Around 400 million cases of such network that can operate unattended. Numerous civil
diseases are reported annually, causing 6–12 million and military applications can be leveraged by
deaths world-wide. Access to safe drinking water is networked sensors. A network of sensors can be
important as a health and development issue at employed to gather meteorological variables such as
national, regional and local level. The population in temperature and pressure. One of the advantages of
rural India mainly dependent on the ground water as wireless sensor networks (WSNs) is their ability to
a source of drinking water. The main problems at the operate unattended in harsh environments in which
time to effectively implement sensors are that, on one contemporary human-in-the-loop monitoring
hand, there is a lack of standards for contamination schemes are risky, inefficient and sometimes
testing in drinking water on the other hand, there are infeasible. Therefore, sensors are expected to be
poor links between available sensor technologies and deployed randomly in the area of interest by a
water quality regulations. In this paper the relatively uncontrolled means, e.g. dropped by a
application of WSN in environmental monitoring, helicopter, and to collectively form a network in an
with particular emphasis on water quality. Various ad-hoc manner. Given the vast area to be covered, the
WSN based water quality monitoring methods short life span of the battery-operated sensors and the
suggested by other authors are studied and analyzed, possibility of having damaged nodes during
taking into account their coverage, energy and deployment, large population of sensors are expected
security concerns. in most WSNs applications.
Keywords-- Water quality monitoring, Remote, In most part of the world, ground water is the only
Wireless Sensor Network and important supply for production of drinking
water, particularly in areas where water supply is
I. INTRODUCTION
limited. Groundwater quality will directly affect
Recent advances in MEMS-based sensor technology, human health [3]. A sensor is the electronic device
low-power analog and digital electronics, and low- that can detects and responds to the external stimulus
power RF design have enabled the development of from the physical environment. The external stimulus
relatively in- expensive and low-power wireless can be temperature, heat, moisture, pressure in the
micro sensors that are capable of detecting ambient environment. The output of the sensors is generally a
conditions such as temperature and sound. Sensors signal which can be converted into human readable
are generally equipped with data processing and form. Sensors may be classified as analog sensors
communication capabilities [1][2]. The sensing and digital sensors. Analog sensor senses the external
circuitry measures parameters from the environment parameters (wind speed, solar radiation, light
surrounding the sensor and transforms them into an intensity etc.) and gives analog voltage as an output
electric signal. Processing such a signal reveals some [4]. A digital sensor is an electronic or
properties about objects located and/or events electrochemical sensor, where data is digitally
happening in the vicinity of the sensor. Each sensor converted and transmitted. A base station is

Volume 10, Issue 1 ∙January-June 2019 1


responsible for capturing and providing access to all III. CHALLENGES OF MONITORING
measurement data from the nodes, and can WATER QUALITY
sometimes provide gateway services to allow the data
Access to safe drinking water is important as a health
to be managed remotely. The importance of
and development issue at national, regional and local
maintaining good water quality highlights the
level. The population in rural indie mainly dependent
increasing need for advanced technologies to help
on the ground water as a source of drinking water. As
monitor water and manage water quality. Sensors in
India is a developing country and it has wide-spread
Wireless sensor networks offer a promising
emerging technologies, there is a need for system for
infrastructure to municipal water quality monitoring
timely help and to monitor water pollution on the
and surveillance. In this paper we suggest some
total state of the water system [7] [8]. Monitoring of
parameters for drinking water quality and proposed
water quality is very important for good health.
model and methods for water quality [5][6].
Wireless sensor networks can help for control quality
II. SENSORS of water by using some methods but there are some
challenges of monitoring water quality are:
Sensors are electronic devices that can be used for
monitoring physical and environmental condition (a) Sensors Costs and Specifications
such as temperature, vibration and sound. Sensors are (b) Low energy of sensors
defined as the sophisticated devices which aids to (c) Security
detect and respond to electrical or optical signals. It (d) Connectivity
converts the physical parameters like temperature, (e) Sensor location
blood pressure, humidity, speed into a signal that can
(a) Sensors Costs and Specifications
be measured electrically. Sensors can be classified
based on the following criteria and conditions: Site specific installation cost projections need to be
developed. Typical cost components in the total cost
a. Primary Input quantity
of a sensor installation at each potential location
b. Transduction principle
include:
c. Material and Technology
d. Property Applications 1. Land purchase
2. Construction of the vault in which the sensors
Figure 1 represents another category of sensors.
and connections to the distribution piping will be
located
3. Installation of the sensors and RTU
4. Supplying power to the site
5. Installing the communications equipment and
upgrading/installing equipment at the central
control room
6. Design and bidding the construction and
installation work
7. Access to site
(b) Low energy of sensors
In Wireless Sensor Networks, Energy is a scarcest
resource of sensor nodes and it determines the
lifetime of sensor nodes. These are battery powered
sensor nodes. These small batteries have limited
power and also may not easily rechargeable or
removable. Long communication distance between
Figure 1. Classification of Sensors sensors and a sink can greatly drain the energy of
sensors and reduce the lifetime of a network. In

2 IITM Journal of Management and IT


Wireless Sensor Networks, energy of sensors is a strengths. Relative coordinates of neighboring nodes
major issue to be considered [9]. can be obtained by exchanging such information
between neighbors. To save energy, some location
(c) Security
based schemes demand that nodes should go to sleep
Underwater wireless sensor network is a novel type if there is no activity. More energy savings can be
of underwater networked system. Due to the obtained by having as many sleeping nodes in the
characteristics of Underwater Wireless Sensor network as possible.
Network and underwater channel, Underwater
IV. QUALITY RANGE OF SUGGESTED
Wireless Sensor Network are vulnerable to malicious
PARAMETERS FOR DRINKING WATER
attacks. The existing security solutions proposed for
WSN cannot be used directly in Underwater Wireless Parameters examined by US environmental
Sensor Network. Moreover, most of these solutions protection agency, resolved that both chemical and
are layer wise. In figure 2 all sensors sense the data biological waste has an adverse effect on many water
under water and send these data to the base station. monitoring parameters such as pH, ORP, Turbidity,
Different types of sensors can be used under the Nitrates, Dissolved Oxygen, Water temperature,
water for monitoring quality of water [10]. Fluoride, Chlorine, Oxygen. In order to detect the
water contamination or impurities, it is enough to
determine the changes in the suggested parameters
(Table 1).if there is any deviation when compared to
that of the drinking water standards recommended by
WHO or Central pollution control board, India, then
the water is not safe for drinking [13].
Table 1: Quality range of suggested parameters
Quali
Sr Paramete Measure
Units ty
. r d Cost
range
6.5-
1 pH pH Low
9.0
650-
2 ORP mV Low
800
Figure 2. Underwater Wireless Sensor Network
3 Turbidity NTU 0-5 Medium
(d) Connectivity 4 Nitrates mg/L <10 High
Dissolved
The most fundamental problem in underwater sensor 5 mg/L 0.2-2 Medium
Oxygen
network is network connectivity. The connectivity
problem reflects how well a sensor network is Water
tracked or monitored by sensors. An underwater 6 temperatur C <10 Low
wireless sensor networks is the emerging field that is e
having the challenges in each field such as the mg/lit <0.05
7 Fluoride Medium
deployment of nodes, routing, floating movement of re –0.2
sensors etc. For the connectivity of nodes use DFS mg/lit
8 Chlorine 0.2–1 Low
(Depth first search) algorithm and for coverage use re
distributed coverage algorithm [11] [12]. 9 Oxygen mg/l 1-2 High

(e) Sensor location


Sensor nodes are addressed by means of their
locations. The distance between neighboring nodes
can be estimated on the basis of incoming signal

Volume 10, Issue 1 ∙January-June 2019 3


V. PROPOSED MODEL FOR WATER Ganga Rejuvenation, Government of
QUALITY india.http://cgwb.gov.in/.
[2] U.S.Environmental Protection Agency,
―Drinking water standards and health
advisories‖.Tech.Rep.EPA 822-S-12-001,
2012.
[3] Water Resource Information System of
India.http:www.india-
wris.nrsc.gov.in/wrpinfo/index.php?title=Rive
r_Water_Quality_Monitoring.
[4] Szabo, J. and Hall, J., Detection of
Figure 3. Proposed model for monitor quality of
contamination in drinking water using
drinking water
fluorescence and light absorption based online
Proposed model (Figure 3) for monitor quality of sensors, EPA/600/R-12/672 (2012),
drinking water is very effective for monitor ground http://www.epa.gov/ord.
water. In this proposed model all sensors sense the
[5] Homeland Security Presidential
data and collect data then stored these data in local-
Directive/HSPD-9, United States of America.
controller. Once the local-controller receives the
data, it then transferred to the cloud for analyzing the [6] Hart, D. B., Klise, K. A., McKenna, S. A. and
data. Cloud storage work as a mediator between data Wilson, M. P., CANARY User‘s Manual,
transmission layer and database management layer. Version 4.2. EPA-600-R-08e040 (2009), U.S.
After analyzing data by cloud storage data Environmental Protection Agency, Office of
transferred to end-user. Domestic water supplied Research and Development, National
form Municipal Corporation or directly takesn from Homeland Security Research Centre,
the ground water are mainly used for drinking and Cincinnati, OH.
cooking purposes.
[7] Laine, J., Huovinen, E., Virtanen, M. J. et al.,
Traditional water supply management An extensive gastroenteritis outbreak after
involves<storing the pool of water at various drinking-water contamination by sewage
locations and distributing the same through water effluent, Finland, Epidemiol. Infect. (2010)
head tanks and domestic pipelines. 139 (7), pp. 1105–1113.
VI. CONCLUSIONS [8] Szabo, J. and Hall, J., Detection of
contamination in drinking water using
The main objective of this paper is to show the role
fluorescence and light absorption based online
of sensors for the betterment of water quality in order
sensors, EPA/600/R-12/672
to obtain a hygienic environment. In this paper we
(2012),http://www.epa.gov/ord.
study the challenges of monitor quality of drinking
water and parameters required for monitor drinking [9] Bergamaschi, B., Downing, B., Pellerin, B.
water. According to the study, drinking water and Saraceno, J. F., In Situ Sensors for
obtained from both groundwater and surface water Dissolved Organic Matter Fluorescence:
must satisfied the standards for safe drinking water. Bringing the Lab to the Field, USGS Optical
This paper gives a clear view about what is a sensor, Hydrology Group, CA Water Science Centre.
parameters to identify quality of water, and stages to
[10] Hart, D. B., Klise, K. A., McKenna, S. A. and
create online water quality management system.
Wilson, M. P., CANARY User‘s Manual,
REFERENCES Version 4.2. EPA-600-R-08e040 (2009), U.S.
Environmental Protection Agency, Office of
[1] Central Ground Water Board, Ministry of
Research and Development, National
water resources, River development and

4 IITM Journal of Management and IT


Homeland Security Research Centre, [12] Homeland Security Presidential
Cincinnati, OH. Directive/HSPD-9, United States of America.
[11] Hart, D. B., Klise, K. A., McKenna, S. A. and [13] Storey, M. V., Van der Gaag, B. and Burns, B.
Wilson, M. P., CANARY User‘s Manual, P., Advances in online drinking water quality
Version 4.2. EPA-600-R-08e040 (2009), U.S. monitoring and early warning systems, Water
Environmental Protection Agency, Office of Research 45 (2011), pp. 741–74.
Research and Development, National
Homeland Security Research Centre,
Cincinnati, OH.

Volume 10, Issue 1 ∙January-June 2019 5


Multimode Summarized Text to Speech Conversion
Application
Archit Sehgal 1, Gitika Khanna 2
1,2
Department of Computer Science, HMR Institute of Technology & Management
Hamidpur, Delhi-110036, India
Archit_150@yahoo.co.in, gitikakhanna392@gmail.com
Abstract - This paper draws focus towards construction of meaningful summary by selecting
summarizing the tremendous amount of data useful paraphrases from the text available. The
collected from various sources and presenting the summarized text is then transformed into speech using
output as speech. In recent years, huge data sets are Text-to-Speech Synthesizer (TTS). The whole
being generated every moment and it becomes approach is categorized into three phases which are
difficult to manage it. In order to extract relevant text extraction from input, formation of summary and
information, an innovative, efficient and real- time conversion of the same into speech.
cost beneficial technique is required that enables
II. LITERATURE REVIEW
users to hear the summarized content instead of
reading it. This kind of application is beneficial for Mrunmayee Patil[1] This paper tells us about an
visually impaired and people with disabilities. Text OCR system to recognize the characters from image.
Rank algorithm, a ranking based approach is Edge detection and Image segmentation plays a
proposed with a variation in similarity function to significant role in extraction of text from image. The
make summary based on the scores computed for algorithm which can be used to summarize the
each sentence. The summarized text is then spoken extracted text works similar to PageRank Algorithm
out using text-to-speech synthesizer (TTS). discussed in the paper for web search engines [10].
Modifications can be made to make the TextRank
Keywords - TextRank, PageRank, Lexemes, Image
algorithm more effective. Sunchit Sehgal[5] This
Segmentation, Character Recognition, Text-to-
paper represents a way to make the algorithm more
Speech (TTS).
efficient by taking the score of the title in account.
I. INTRODUCTION Marcia A. Bush [19] shows us the efforts put in the
research of recognition of documents and their
In our proposed work of collecting data from different
prediction models This has enabled us to analyze the
sources and converting it into summarized text, we
signal based processes taking vocabulary, font and the
develop a cost efficient and user friendly interface.
sentence formation sequence into account.
The input to the application can be an image, audio or
video. While converting the input into editable text, III. OVERVIEW OF IMAGE ANALYSIS
there are various techniques used such as image
Over the decades, many researchers have been looking
processing, image segmentation [1] and edge
for possible ways of retrieving data from images and
detection. The approach direct towards format
video content. In a research paper, a framework was
conversion, where audio, video or image data is
proposed that will decompose the scanned image into
converted into symbolic representations that fully
its constituent visual patterns and the parsed results
describe the content. In case of an image, the
will be converted into semantically meaningful text
segmented characters are obtained from preprocessing
report. A model was also introduced where the users
of images. It is then provided as input to the Optical
will send the image of their respective meter‘s display
Character Recognition (OCR) to obtain the converted
screen along with the kilo-watt information [2]. The
text. In order to manage the enormous amount of
information will then be processed to convert it into
information, the derived text is summarized using a
text.
graph based technique i.e. TextRank Algorithm.
Image analysis [3] is the extraction of information
The TextRank Algorithm has application in
from images using different image processing

6 IITM Journal of Management and IT


approaches such as image filtering, image recognizable. In an automatic speech recognition
compression, image editing and manipulation, image system, the size of vocabulary affects the performance
preprocessing, image segmentation, feature of the system. Amidst the initial process, the system
extraction, object recognition. An Image can be learns about pattern, different speech sounds which
considered as a matrix of square pixels arranged in the embody the vocabulary of the application. If there is
pattern of rows and columns. It can be considered as a any unknown pattern, it is identified using the cluster
linear sequence of characters. of references. The whole approach can be categorized
as phases such as analysis, feature extraction,
modeling and testing. The analysis phase is used to
extract information about speaker identity using vocal
tract, behavior feature and excitation source. Since
every speech has different characteristics which can be
fetched in the feature extraction phase in order to deal
with the speech signals.
V. TEXTRANK ALGORITHM

Fig 1. Conversion of Image to text using Optical With the tremendous growth in chunks of text data,
Character Recognition there is a need to effectively summarize it to be useful.
Automatic text summarization [4] is very demanding
Fig 1. Depicts various blocks responsible for detecting and non-trivial task. There have been methods
text from the image. Once the image is scanned, it is proposed which uses word and phrase frequency to
preprocessed to remove any noise and is further extract salient sentences from the text. Overall, there
divided into segments. Every segment has its own are two different approaches for text summarization:
unique feature which must be further extracted and extraction and abstraction. Extraction works by
classified to specific groups. selecting the sentences from original text whereas
Edge detection and image segmentation are important abstraction aim at modifying the original text using
aspects in image analysis. Edge detection advanced natural language techniques in order to
differentiates different regions of an image by generate a new brief summary. However, extractive
identifying the change in gray scale and texture. Image summarization yields better results as compared to
Segmentation is another technique which divide and abstractive summarization because abstraction face
decompose image for further processing. It categorizes issues such as semantic representation and natural
the pixels with similar gray scale values and organizes language generation. Here, we focus on graph based
it into higher level units so that the objects become TextRank algorithm to perform extractive
more meaningful. The proposed system will work in summarization. TextRank [5] is an autonomous
various phases. The input image will undergo pre- machine learning algorithm and is an extension of the
processing such as removing noise induced due to the PageRank algorithm.
technique applied for thresholding and improving the VI. PROPOSED APPROACH
quality. The image will undergo image segmentation
In our proposed approach to build this application,
to separate non-text part present in the next step. input can be taken in different modes such as editable,
Further, feature extraction is performed to extract text, image, audio and video also. After the input is
preliminary features and comparing the same which taken, TextRank Algorithm can be used to convert it
are stored in the database. Sometimes, there are often into summarized text. The summarized text will be
error in which characters might be blurred or broken. taken as output and convert into speech. Input
They are processed in post-processing stage. processing from different modes has been discussed
IV. AUDIO ANALYSIS above. The major concern is summarizing the content
efficiently and accurately. In further section we will
Conversion of real time speech to text requires special
discuss about improving the graph based technique i.e.
techniques as it must be quick and precise to be
TextRank Algorithm for accurate summarization.

Volume 10, Issue 1 ∙January-June 2019 7


For any summarizer, intermediate representation is sentences and for each individual sentence is given
done to express the main aspect of the text. It uses two by:
PR(Vi) = (1 - d) + d * X Vj ∈ In(Vi) PR(Vj) | |𝑾𝒌|𝑾𝒌 ∈ 𝑺𝒊&𝑊𝑘 ∈ 𝑆𝑗|
𝑺𝒊𝒎𝒊𝒍𝒂𝒓𝒊𝒕𝒚 𝒔𝒊 , 𝒔𝒋 =
Out(Vj) (1) |𝑺𝒊| + |𝑺𝒋|
𝟐
In order to build a connected graph, an edge is to be
added between the two vertices which represents the 𝑺𝒊𝒎𝒊𝒍𝒂𝒓𝒊𝒕𝒚𝒕𝒊𝒕𝒍𝒆 𝒔𝒊 , 𝒔𝒕𝒊𝒕𝒍𝒆
similarity between them. The similarity depends on |𝑾𝒌|𝑾𝒌 ∈ 𝑺𝒊&𝑊𝑘 ∈ 𝑆𝒕𝒊𝒕𝒍𝒆|
=
the words common between the two sentences which |𝑺𝒊| + |𝑺𝒕𝒊𝒕𝒍𝒆|
𝟐
can be calculated using the similarity function. Let Si
and Sj be two sentences where a sentence is Therefore, the cumulative score for any sentence say
represented by Ni words that forms it. S1 is given by:
Similarity(S, Si,j) = |𝑾𝒌 |𝑾𝒌∈𝑺𝒊 & 𝑾𝒌 ∈𝑺𝒋| 𝑺𝒊𝒎𝒊𝒍𝒂𝒓𝒊𝒕𝒚𝒕𝒊𝒕𝒍𝒆 𝒔𝒊 , 𝒔𝒕𝒊𝒕𝒍𝒆
𝐥𝐨𝐠(|𝑺𝒊|)+𝐥𝐨𝐠(|𝑺𝒋|) (2) 𝒋=𝒏
+ 𝑺𝒊𝒎𝒊𝒍𝒂𝒓𝒊𝒕𝒚 𝑺𝒂𝒏𝒕𝒆𝒏𝒄𝒆𝒔 𝒔𝒊 , 𝒔𝒋
𝒋=𝒊,𝒋=𝟐
Score is accredited to each sentence depending upon
− 𝑺𝒊𝒎𝒊𝒍𝒂𝒓𝒊𝒕𝒚𝑺𝒆𝒏𝒕𝒂𝒏𝒄𝒆𝒔 𝒔𝟏 , 𝒔𝟏
the type of representation approach. In topic
representation, the score depends on how well the VII. IMPLEMENTATION AND EVALUATION
sentence describes the topic whereas in case of
We have made this application using Apache
indicator representation, a variety of machine
Cordova and this application is compatible with both
learning techniques can be used in aggregating the
android and IOS. We divided the whole approach
results. In the final step, a methodology should be
into three modules:
used which selects the best combination of sentences
that maximizes the importance and minimize  Module 1: Uploading of image and processing
redundancy. of input text using OCR approach. The text can
directly be typed in the text field. It can also be
In our proposed method, TextRank algorithm is used
taken in the form of audio for which we used a
to find the similarity between the sentences. This
button to record voice and then processing it
method describes the document as a connected graph
using audio analysis.
where sentences represents the vertices and an edge
indicates how similar the two sentences are. It is  Module 2: In this module, text summarization
based on frequency of occurrence of words so any takes place once an event is fired. Sentences are
specific language processing is not required. ranked and the best sentences are picked to
make up a summary and be shown in the output
Consider an undirected graph, say G = (V, E), where
window.
V = set of vertices.
E = set of edges.However, the title can also play an  Module 3: The text in the output window is
important role in converted into speech when a button is clicked.

For a given vertex Vi, let In(Vi) represent the set of We have evaluated our application by taking 3 sample
vertices pointing towards the former vertices and articles and evaluating the summary using ROUGE
Out(Vi) represents the set of vertices pointing to the evaluation. ROUGE [6] is the most widely used
next-inline vertices. The score can be calculated for method to evaluate the summary automatically by
each vertex using the formula: adding distinguishing correlating it to human summaries. There are various
information to elaborate meaning of the text. The variations of ROUGE such as ROUGE-n, ROUGE-L
similarity function can be improvised by computing and ROUGE-SU. In ROUGE-n, a series of n-grams is
the correlation between each individual sentence and elicited from the human summaries used as reference
title of article as well. and the candidate summary. ROUGE-L used the
longest common subsequence (LCS) approach i.e. the
So, the modified similarity function between two longer the LCS, more will be the similarity. The metric

8 IITM Journal of Management and IT


ROUGE-SU makes use of bi-grams as well as uni-
grams.

Results of our evaluation are shown in a given table


below:
Rouge Task Average Average Average Number
Type Name Recall Precision FScore Referenced
Summaries
ROUGE1 Sample 1.0 0.29664 0.45732 1
1
ROUGE1 Sample 1.0 0.09125 0.16841 1 Fig2. Overview of conversion of text to speech
2
ROUGE1 Sample 1.0 0.33504 0.50192 1
3 Fig 2 shows how speech is generated from text.
Natural Language Processing analyze and synthesize
Table 1. Results of summary evaluation using
natural language and speech.
ROUGE 2.0 Evaluation Toolkit
IX. CONCLUSION & FUTURE SCOPE
VIII. TEXT TO SPEECH SYNTHESIS
The paper proposes an approach to generate an
The text-to-speech synthesis [7] is the self-regulating
optimized summary taking input from various mode
conversion of a text into speech by transcribing the
such as image, audio and editable text. We also talked
text into phonetic representation and then generates
about different summarization techniques such as
the speech waveform. A text-to-speech consists of a abstraction and extraction based. In order to generate
front-end and a back-end. The front-end performs two summary we proposed modification to graph based
major operations which are text normalization and algorithm i.e. TextRank algorithm. Besides the entire
assigning phonetic transcription to each word (text-to- paragraph, score of the paragraph title is also taken
phoneme). The back-end part generates the speech account. Three sample articles are computed using
waveform. The engine is divided into modules such as ROUGE evaluation toolkit and the results are depicted
Natural Language Processing (NLP) module, Digital in table 1. However there is a scope for video analysis.
Signal Processing (DSP) module, text analysis and Since the paper discusses about taking input from
application of pronunciation rules. This can be multiple modes, video can be amongst them. Also,
developed using Java programming language. improvements can be made to make the summary
There are various techniques to preform speech algorithm more efficient and accurate. This will in turn
synthesis like Concatenative synthesis, Articulatory ensure that the generated summary has its logical
synthesis, Formant synthesis, Domain Specific meaning.
synthesis, Unit selection synthesis, Diphone synthesis, REFERENCES
HMM based Synthesis etc. Concatenative synthesis
[1] Mrunmayee Patil, Ramesh Kagalkar, ―A Review
involves concatenation of short samples of speech on Conversion of Image to Text As Well As
recording. Articulatory synthesis makes use of Speech Using Speech Detection and Image
articulatory parameters like human vocal tract to Segmentation‖, International Journal Of
generate speech. Formant Synthesis is clear at high Science and Research.
speeds. It is rule based synthesis which synthesize [2] S. Shahnawaz Ahmed, Shah Muhammed Abid
speech using acoustic rules. Domain-specific Hussain and Md. Sayeed Salam, ―A Novel
synthesis uses a simple approach of concatenating pre- Substitute for the Meter Readersin a Resource
recorded words and phrases to complete a sentence. Constrained Electricity Utility‖ IEEE Trans.
On Smart Grid, vol. 4, no. 3, Sept. 2013.
Unit selection synthesis makes use of segmented
[3] K.Kalaivani, R.Praveena, V.Anjalipriya,
records stored in database to create speech.
R.Srimeena, ―Real Time Implementation of
Image Recognition and Text to Speech
Conversion‖, International Journal of
Advanced Engineering Research and
Technology, vol.2, Sept. 2014.
[4] Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi

Volume 10, Issue 1 ∙January-June 2019 9


Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan ―Document Summarization Using
B. Gutierrez, Krys Kochut, ―Text Conditional Random Fields‖. In
Summarization Techniques: A Brief Survey‖, IJCAI, Vol. 7. 2862–2867.
28 Jul. 2017.
[13] Sérgio Soares, Bruno Martins, and
[5] Sunchit Sehgal, Badal Kumar, Maheshwar Pavel Calado. 2011. Extracting
Sharma, Lakshay Rampal, Ankit Chaliya, ―A biographical sentences from textual
Modification to graph based approach for documents. In Proceedings of the
extraction based Automatic Text 15th Portuguese Conference on
Summarization‖, Institute of Electrical and Artificial Intelligence (EPIA 2011),
Electronics Engineer. Lisbon, Portugal. 718–30.
[6] Chirantana Mallick, Ajit Kumar Das, [14] Karen Spärck Jones. 2007.
Madhurima Dutta, Asit Kumar Das, Apurba ―Automatic summarizing: The state of
Sarkar, ―Graph-based Text Summarization the art. Information Processing &
Using Modified TextRank‖, Aug. 2018. Management‖ 43, 6 (2007), 1449–
1481.
[7] Itunuoluwa Isewon, Jelili Oyelade, Olufunke
Oladipupo, ―Design and Implementation of Text [15] Josef Steinberger, Massimo Poesio,
to Speech Conversion for Visually Impaired Mijail A Kabadjov, and Karel Ježek.
People‖, International Journal of Applied 2007. Two uses of anaphora
Information System, vol.7 no.2, Apr. 2014. resolution in summarization.
Information Processing &
[8] Kaladharan N, ―An English Text to Speech
Management 43, 6 (2007), 1663–
Conversion‖, International Journal of
1680.
Advanced Research in Computer Science and
Software Engineering, vol.5, Oct. 2015. [16] Mark Steyvers and Tom Griffiths.
2007. Probabilistic topic models.
[9] R. Aida-Zade, C. Aril, A.M.
Handbook of latent semantic analysis
Sharifova, ―The Main Principles of
427, 7 (2007), 424–440.
Text-to-Speech Synthesis System‖,
International Journal of Computer and [17] Fabian M Suchanek, Gjergji Kasneci,
Information Engineering, vol. 7 no. 3, and Gerhard Weikum. 2007. Yago: a
2013. core of semantic knowledge. In
Proceedings of the 16th international
[10] S. Brin and L. Page, ―The anatomy of
conference on World Wide Web.
a large-scale hypertextual Web
ACM, 697–706.
search engine‖, Computer Networks
and ISDN systems, 30(1-7), 1998. [18] Simone Teufel and Marc Moens.
2002. ―Summarizing scientific
[11] Horacio Saggion and Thierry
articles: experiments with relevance
Poibeau. 2013. ―Automatic text
and rhetorical status‖. Computational
summarization: Past, present and
linguistics 28, 4 (2002), 409–445.
future‖. In Multi-source,
Multilingual Information Extraction [19] Marcia A. Bush, ―Speech and Text-Image
and Summarization. Springer, 3–21. Processing in Documents‖, Technical Report
P92-000150, Xerox PARC, Palo Alto, CA,
[12] Dou Shen, Jian-Tao Sun, Hua Li,
November, 1992
Qiang Yang, and Zheng Chen. 2007.

10 IITM Journal of Management and IT


Aadhaar Smart Meter: A Real-Time Bill Generator
Ramaneek Kalra1*, Vijay Rohilla2*
1*
Computer Science Department, IEEE Member, HMRITM, New Delhi, India
2*
Assistant Professor EEE Department, HMRITM, New Delhi, India
kalraramneek@ieee.org, vijay1402rohilla@gmail.com

Abstract- In present scenario, humans are at Factor, Units Consumed with little bit of information
outskirts of changing technology i.e. really helping of days from installation. Due to this less
oneself to explore new things going around and get transparency between government‘s Database and
familiar in short time and giving much output. In Customer‘s need we are hereby proposing idea of
these days, Indian Government is at extent to convert ―Aadhaar Smart Meter‖ which is nothing but the live
all money in digital banks/ wallets like BHIM App, example of details discussed in above section. This
UPI Payment etc. idea is purely made upon the Database Connectivity
and SMS Service provided to Customer for particular
With that, the technology of smart meters is emerging
Smart Meter they owned with. The complete
in every street of Indian Societies/Colonies. Perhaps
exploration of ASM will be discussed in coming
this change initially is little bit unfamiliar since
Methodology. But, for now instant let‘s discuss
citizens are facing problems in getting details
coming scenario which Indian Government is trying
correctly in their hands. So, there‘s comes a need of
to establish under these needs.
developing and inventing such innovative way that
government should become completely transparent to Actually, since all governmental services are now
public and long queues for submitting electricity tending to connect with Aadhaar Database (The huge
bills. Biometric Database in world). But, due to some
insecurity issue public is not ready to share their
For this kind of problem approach, we are here to
personal Aadhaar Credentials with Government
introduce a new way to interact with all
because many are thinking of leaking of details in
governmental services like one we discuss above. So,
some wrong hands.
here introducing ―Aadhaar Smart Meter: A Real-
Time Bill Generator‖ (ASM). This new way of For that, we have solved this issue just by connecting
connecting government services brings Aadhaar Server‘s database with simple mobile number and if
server to come in existence and customer Aadhaar Database is provided then the mobile
connectivity with Smart Meter via Electricity number and respective details can be retrieved easily
Department‘s Server. The whole scenario is and quickly irrespective of huge database
discussed in coming parts. connectivity.
Keywords: Database, Smart Meter, Aadhaar II. PROPOSED IDEA
Database, Cloud, Gateway.
Before digging in the actual methodology let‘s have a
I. INTRODUCTION look on the basic architecture of ASM which can be
used to manipulate the required electrical factors and
A Smart Meter is a Digital Meter for analyzing the
to calculate the bill as month wise and set to count
Power Factor, Units Used, Power Used to be
zero.
displayed on a LCD screen with connectivity to both
Electricity Department‘s Server as well to Customer Actually, smart meter consists of the readings
Database stored in Cloud Database with respect to available for providing real-time sensors, power
customer‘s provided details like: Aadhaar_Number, outage notification and power quality monitoring.
Customer_Name, Customer_Address, Due to availability of these extra features, the simple
Customer_Region etc. But nowadays government has Automatic meter Reading differs in greater extent.
implemented Smart Meters for just displaying Power With that, the following symbolic block diagram can

Volume 10, Issue 1 ∙January-June 2019 11


clear much things about ASM and how‘s the working one month the user gets online payment link over
and functioning occurs. SMS Service and thus helps customer not to go
electricity bill payment office to stand in long
queue.
- With that, the real innovation comes when
one month expires and now it‘s turn to pay the
bill using payment portal via SMS service or
using PayTm Different electricity
department‘s portals. Plus, now how to
generate manual bill that is officially sent by
electricity department. This can also be
resolved using Customer‘s android Device or
if customer is not having then he/she can
request government officials to come and
generate bill using their official android
Fig 1: Symbolic Block Diagram of Smart Meter device.

Here, Fig 1 makes clear of important parts that plays - Since, nowadays the trend of using android
crucial role in making smart meter liable and device is so much at high scale of popularity
beneficial for home connection. This consists of that every citizen is active online and has
following components as follows: tremendous amount of data consumption. So,
one can use this amazing trend as a use of
- Data Reader: This is the initial point where the government‘s service.
functioning of a simple meter reading starts
which helps to store and get the data like power - The basic idea can be used to generate
outage, power units consumed which it passes to manual bill and clear out the reading shown on
LCD Controller. Also, the Home Connection is particular customer‘s meter and help to print
connected wirelessly to Electricity Department that bill at same time.
Server which helps to fetch electricity account - So, from Fig 2 one can understand the flow
no. stored as Elect_Acc_No (say) and of information takes place when scanning of
corresponding customer‘s account details which Meter LCD takes place to generate the
indirectly fetched from connected Aadhaar required bill from Meter itself. The
Database. Functioning of each component is as follows:
- LCD Controller: This is the second phase which
consists of LCD Screen connected with Data
Reader which provides the data like Power
Factor, Units Consumed to the outside world.
With that, the details at real time are getting
updated over Database maintained under
Electricity Department Server.
- Automatic SMS Sender Block: This is the block
used for sending the data manipulated in 15 days
and shown over database with calculated
proposed bill amount to the user‘s registered
mobile number so that for next 15 days,
electricity can be consumed safely to ensure
green environment and this facility will too help Fig 2: Depiction of Flow of Information
one to alert the late bill submission. Plus, after - Android Application: There will be an android

12 IITM Journal of Management and IT


application that scans the meter LCD screen and - Elimination of wrong data transmitted over
thus fetches the information from electricity Citizen‘s Electricity Bill.
department server which initializes the value in
- Will increase the transparency over
database, that this particular meter for particular
Government‘s Functioning and Processing.
customer_id with corresponding to
Customer_name taken from Aadhaar Database to IV. FUTURE WORK
status as true. Then, after turning status as true of This research paper represents one of the applications
scanning done the value of units consumed and that is applied but as there is still a huge number of
rupees calculated inside server of electricity applications to study which can be further be taken in
department. contrast of study and analysis.
- Mini Printer: There is one new feature in Smart As this paper only includes the architectural point of
Meter that includes mini printer which is view of Smart Meter.The same proposed idea can be
attached inside the meter which is used to print studied for more enhancements/applications as
the required bill which is requested by android follows:
app by electricity dept. server and the
corresponding request is sent back by server to - Can be used for large-scale level study in
Mini Printer and thus do the task of printing bill industrial application.
which is sent officially by server automatically. - Can be implemented by deploying the idea to
Plus, the data calculated over manipulation with real-world machine.
server is temporally stored over android app
database which secures that if any transaction - Security measures can study and apply
failed in between the changes will be rolled back. thoroughly to this study.

- Aadhaar Database: Here, in this scenario we REFERENCES


used Aadhaar database since after scanning if [1] W. Luan, J. Peng, M. Maras, J. Lo and B.
any customer wants to pay the bill amount at the Harapnuk, "Smart Meter Data Analytics for
same time can give request from android app to Distribution Network Connectivity
Server and thus the amount can be deduced using Verification," in IEEE Transactions on Smart
connected Bank Server which is connected to Grid, vol. 6, no. 4, pp. 1964-1971, July 2015.
one‘s Aadhaar Database and indirectly to
Electricity Department Server and after the [2] G. R. Barai, S. Krishnan and B. Venkatesh,
transaction is done the receipt of payment status "Smart metering and functionalities of smart
can be printed over mini printer. meters in smart grid a review," 2015 IEEE
Electrical Power and Energy Conference
Moreover, the applications over this proposed idea (EPEC), London, ON, 2015, pp. 138-145.
can be implemented easily under government‘s
services which will definitely reduce the manual [3] V. G. Vilas, A. Pujara, S. M. Bakre and V.
paper work and money exchange. Muralidhara, "Implementation of metering
practices in smart grid," 2015 International
III. CONCLUSIONS Conference on Smart Technologies and
Therefore, this research idea if implemented by the Management for Computing, Communication,
Governmental Services can give the current Nation‘s Controls, Energy and Materials (ICSTM),
Use of manual Electricity Bill payment to automatic Chennai, 2015, pp. 484-487.
payment and thus, eliminating the long queues. [4] R. Rashedi and H. Feroze, "Optimization of
Moreover, the advantages that this proposed idea process security in smart meter reading," 2013
gives to particular Indian Citizen are as follows: Smart Grid Conference (SGC), Tehran, 2013,
- Initialization of Paperless and Cashless pp. 150-152.
Technology which will promote Digital India [5] S. Elakshumi and A. Ponraj, "A server based
Movement initiated by Indian Government. load analysis of smart meter systems," 2017

Volume 10, Issue 1 ∙January-June 2019 13


International Conference on Nextgen 2017 IEEE International Conference on Smart
Electronic Technologies: Silicon to Software Grid and Smart Cities (ICSGSC), Singapore,
(ICNETS2), Chennai, 2017, pp. 141-144. 2017, pp. 243-247.
[6] J. Russell, "Smart metering: Working towards [9] P. Bansal and A. Singh, "Smart metering in
mass roll-out," IET smart grid framework: A review," 2016 Fourth
International Conference on Parallel,
[7] Conference on Power in Unity: a Whole
Distributed and Grid Computing (PDGC),
System Approach, London, 2013, pp. 1-7.
Waknaghat, 2016, pp. 174-176.
[8] A. M. Barua and P. K. Goswami, "Smart
metering deployment scenarios in India and
implementation using RF mesh network,"

14 IITM Journal of Management and IT


A Review on Optical Character Recognition
Archit Singhal1,and Bhoomika2
1 Department of Computer Science, The NorthCap University,
Gurugram, Haryana 122017
singhal97.archit@gmail.com, wbhoomikaw@gmail.com

Abstract--- Nowadays, using a keyboard for entering OCR in general is classified into two types: Off-line
data is the most common way but sometime it and On-line. This technique of Off-line recognition is
becomes more time consuming and need lots of used for automated conversion of text into codes of
energy. So, a technique was invented named Optical letter which are usable by computer and applications
Character Recognition abbreviated as OCR that developed for text processing. But, it is more
transfigures printed as well as handwritten text into difficult, as different people have different
machine encoded text by electronic means. OCR has handwriting font. Whereas, On-line recognition deals
been a topic for research for more than half a century. with a continuous input of data stream that comes
It electronically and mechanically converts the from a transducer when the user types or writes.
scanned images which can be handwritten,
typewritten or printed text. In general, to figure out
the characters of page, OCR compares each scanned
letter pixel by pixel to a known database of fonts and
decides onto the closest match.
Index Terms - optical character recognition,
processed, pixel, scanned document, machine
encoded text.
I. INTRODUCTION
Optical Character Recognition is a simple way of
digitizing machine-encoded text that can be searched
through and processed by a machine. It is amongst
the greatest topic of research in the field of Artificial Fig. 1. Types of Optical Character Recognition
Intelligence, Pattern Recognition, Machine Vision II. LITERATURE REVIEW
and Signal Processing. Character Recognition
techniques associate a symbolic identity within the Research Paper Statement: A technique named
image of character. It extracts the significant Optical Character Recognition abbreviated as OCR
information and directly enters it to the database which is in its development stage has proven to be
instead of using accustomed methods of manual data much beneficial for transfiguring any kind of
insertion. handwritten material to digitized form.

This technique was firstly introduced for two main This paper reviews the work done by various authors
reasons i.e., expanding telegraphy and helping blinds in the field of exploring Optical Character
to get education. Emanuel Goldberg and Edmund Recognition. Prior studies have identified various
Fourier d‘Albe were first to work on this technique in steps involved from pre-processing the image to give
1914. They built a machine that firstly scan the the final Digitized output. Also, the paper has
characters and later convert them into standard depicted various fields where this technology has
telegraph code and another device named Optophone been efficiently implemented. But as it is in its
that produced specific tone around specific letters or development stage, it also faces few challenges in
characters. These machines were patented in 1931 giving the best required output. Integrating the
and now they are acquired by IBM. concept and theories provided in paper to various

Volume 10, Issue 1 ∙January-June 2019 15


other fields with more advanced development will character‘s shape (slant, skew, curve, etc).
show much better results surpassing 99%. Sometimes this difficulty arises due to overlapping of
one or more characters also.
Additionally, material learned in paper can be applied
to benefit the community through a variety of D. Post Processing
tangible services
In this phase, features of every character is enhanced
III. PHASES OF OCR and extracted. In this phase, we can classify every
character in a unique way. Feature of individual
The whole procedure of transfiguring the handwritten
character is enhanced. Also, if there are some
as well as printed text into machine encoded text is
unrecognized characters found, they are also given
broadly divided into four simple phases:
some meaning. Extra templates can also be added in
this phase for providing a wide range of compatibility
checking in the systems database.
IV. APPLICATIONS OF OCR
Optical Character Recognition transfigures the
scanned documents into more than an image file;
rather, turn it into a readable as well as searchable
text-file that can be processed by computers.
OCR is a field with enormous application in number
of industries such as legal, healthcare, banking,
education, etc.
Fig. 2. Phases of Optical Character Recognition
A. Banking
A. Pre Processing
In banks, cheques are processed using OCR without
In this phase, the image is scanned starting from the any kind of involvement by humans. The inserted
top to the end & converted into gray level image, cheque in the device is searched & scanned for the
which is then converted into digital binary image. writing in various fields and the amount is transferred
This process is sometimes termed as Digitization of to the following payer. This whole process reduces
image or Binarization. We use various scanners for the overall cheque process time.
this phase and last digital image then goes to the next
step. B. Healthcare

B. Character Extraction The use of OCR technology has also been increased
in Healthcare industry to process paperwork. In the
The pre-processed image of the previous step serves healthcare industry, they deal with the huge amount
as the input of the following step. In this step, each of forms like patient details, medical-history,
single character of the image is recognized. Also, the insurance forms, etc., so, in order to reduce energy
image is converted into the window size from the and time, this technology is used.
normalized form in this step.
C. Legal Industry
C. Segmentation
Documents are scanned; information is extracted and
This is the most important step of the whole process automatically entered into the database to save space.
as it removes most of the noises from the images for The time consuming task that requires the need to
more understandable form. It segments different search for information through boxes is also
characters into various zones i.e., upper, middle and eliminated. This helps in locating any of the specific
lower zone. text/document easily. It has also helped legal industry
Segmenting is difficult in offline recognition because to have easy, fast and readily available access to a
of variability in paragraph, words of line and huge library of documents.

16 IITM Journal of Management and IT


D. Invoice Imaging environment. The object detected makes the text
recognition in processed image very challenging as
It is important to maintain a track of financial records
the appearances and structure of these objects is
to avoid any piles of backlog payments. Among other
comparative to the text present around it. Text itself
processes, OCR helps in simplifying collection and
is easily present in any form to encourage
analysis of large sets of data. It is also used to decrypt
decipherability making the scene of segregating text
the large amount of information stored in the Digital
from non-text very intricate
code like Bar & QR codes.
C. Conditions of uneven lighting
E. Other Fields
The major challenge for OCR is degradation of text
OCR is extensively used in many other different
quality due to uneven lighting and shadows when
areas also, like:
images are taken in a natural environment. This
CAPTCHA- to prevent hacking; results in poor detection, recognition and detection of
Digital Libraries- sharing of digital teaching material; text. This case of shadows and uneven lightning
differentiates between images taken by the camera
Optical Music Recognition- to extract information and scanners. The lack of proper lighting makes
from images; scanned images more preferred than images
Automatic Number Recognition- to identify vehicle processed by camera for their better text and
registration plates; characteristics quality. But these problems of lighting
can be solved by using flash in camera which also
Handwriting Recognition; Education; lead to some new challenges.
Maestro Recognition Server; Trapeze. D. Skewness
V. CHALLENGES OF OCR In OCR technique, the POV for the image used as
The techniques of OCR require images of high input might change when the image is taken from
resolution which have basic structural property camera or any hand-held devices which is not
differentiating text and background to get high applicable for scanner image input. As a result,
accuracy in character recognition. Image generation change of point of view leads to skewing which
plays an important role in determining the accuracy provides a great degree of poor results when image is
and successful recognition. The image generated by processed. To overcome this problem, many deskew
scanners gives high performance and accuracy while techniques are available such as RAST algorithm,
images generated by cameras have numerous errors Methods of Fourier transformation, projectile profile
due to surroundings & factors related to camera. etc.

These errors are clarified as follows E. Aspect ratio

A. Tilting The image of documents obtained by scanners is


parallel and in line to plane of sensor which is not
The image of documents obtained by scanners is observed in image taken by hand-held devices. The
parallel and in line to plane of sensor which is not text nearer to camera seems a little large while the
observed in image taken by hand-held devices. The text distant appears smaller which causes perspective
text nearer to camera seems a little large while the distortion resulting in tilted pictures. The perspective
text distant appears smaller which causes perspective intolerant recognizer causes lower recognition rate
distortion resulting in tilted pictures. The perspective and accuracy. The new latest cell phones can easily
intolerant recognizer causes lower recognition rate recognize if the portable device is tilted and then can
B. Scene Complexity prohibit clients to click images. This all detection and
prohibition is done with the help of orientation
The image taken by portable devices generally
sensors which also allows camera to align the text in
involves various no of artificial objects such as
plane of form resulting in greater degree of evenness
building, symbols, cars etc. considering a regular

Volume 10, Issue 1 ∙January-June 2019 17


F. Wraping scripts.
One character on another can be another challenge Imperfections and irregularities in OCR systems are
for OCR to be precise. This situation arises when mainly due to problems occurred during scanning
images are scanned using flatbed scanners which phase which usually result in inappropriate text or
procured text on picture of the twisted text. character. These irregularities often result in the
misinterpretation among text and graphics or among
For panacea, a technique called dewraping was
text and noise. Perfectly scanned character can also
introduced by Ulges et al, which treat these texts the
cause imperfections due to characters with the same
same way as they are equally distant and parallel to
shapes and features, which makes the system difficult
each other.
to exactly recognize the character. With this we can
G. Multilingual Environments conclude that precision of OCR totally depend on the
Latin language contains a large number of symbols, quality of input it takes.
character classes as it is composed of many other Although we have seen a lot of improvement and
languages like, Japanese, Korean and Chinese. advancement in OCR in recent years, from reading
Arabic languages have characters with different only a limited set of characters to reading characters
writing shapes. Hindi language contains syllables with different fonts and styles and further reading
which are made up of combining different shapes. handwritten text. In the coming years, seeing
Therefore, multilingual become a primary problem in advancement in technology, one can predict that
OCR. OCR can have much more potential and recognition
H. Fonts in following years.

Using different styles and fonts for different VII. CONCLUSION


characters can make them overlap with each other This paper tells about a field in Artificial Intelligence
and thus making OCR difficult. It is difficult to i.e., Optical Character Recognition; its types, its
perform precisely accurate recognition due to whole process and its applications in different areas.
various within subclasses variations and forming
Optical Character Recognition or OCR has made
pattern sub-spaces.
scanned documents to become more than an image
VI. SCOPE OF OCR file, rather, turning them into a fully searchable,
Nowadays, a diverse collection of OCR systems are readable as well as editable text file that can be
available but still we face many problems therein. processed by computers.
The collections of OCR systems were earlier The research in this area has been going from more
categorized into two groups. The first group includes than half of the century and the aftermath have been
machines that are specially designed to recognize striking with successful recognition rates surpass
specific set of problems which are mostly hard-wired 99% with notable advancement accomplishing for
so become little expensive and also decreases cursive handwritten character recognition. Further,
throughput rates. The second kind of group includes the research in this area aims for more improvement
all software based techniques which involve and scope.
computer or low cost scanner. Due to advancements
REFERENCES
in recent technologies, the second group of OCR
systems is much more cost effective with high [1] G.Vamvakas, B.Gatos, N. Stamatopoulos, and
throughput; however, there are few limitations in S.J.Perantonis: A Complete Optical Character
these systems regarding speed and reading set of Recognition Methodology for Historical
characters. They read the data line-by-line and Documents 2007.
transfer it to the OCR software systems. OCR
[2] Karez Abdulwahhab Hamad, Mehmet Kaya: A
systems are now categorized into five different
Detailed Analysis of Optical Character
groups based on character sets, namely, fixed-font,
Recognition Technology. International Journal
multi-font, omni-font, constraint handwriting and
of Applied Mathematics, Electronics and

18 IITM Journal of Management and IT


Computers Advanced Technology and Science [12] Yu, F. T. S., Jutamulia, S. (Editors): Optical
ISSN: 2147-8228. Pattern Recognition, Cambridge University
Press, 1998.
[3] Combination of Document Image Binarization
Techniques 2011. [13] Mantas, J.: An Overview of Character
Recognition Methodologies, Pattern
[4] International Conference on Document
Recognition, 19(6), pp 425–430, 1986.
Analysis and Recognition 2015.
[14] Pradeep J, Srinivasan E, Himavathi S.:
[5] D-Lib Magazine: How Good Can It Get?
Diagonal based feature extraction for
Analysing and Improvising of OCR Accuracy
handwritten character recognition system
in Large Scale Historic Newspaper
using neural network. InElectronics Computer
Digitisation Programs.
Technology (ICECT), 2011 3rd International
[6] Raghuraj Singh1 , C. S. Yadav2 , Prabhat Conference on 2011 Apr 8 (Vol. 4, pp. 364-
Verma3 , Vibhash Yadav4: Optical Character 368). IEEE.
Recognition (OCR) for Printed Devnagari
[15] Bishnu A, Bhattacharya BB, Kundu MK,
Script Using Artificial Neural Network
Murthy CA, Acharya T.: A pipeline
[7] B. B. Chaudhary and U. Pal: OCR Error architecture for computing the Euler number
Detection and Correction of an Inflectional of a binary image. Journal of Systems
Indian Language Script, Pattern Recognition, Architecture. 2005 Aug 31;51(8):470-87.
IEEE Proceeding of 13th International
[16] Verma R, Ali DJ. A-Survey of Feature
Conference on Image Processing 2002.
Extraction and Classification Techniques in
[8] Agia Paraskevi, Athens: Institute of OCR Systems. International Journal of
Informatics and Telecommunications, Computer Applications & Information
National Center for Scientific Research: Technology. 2012 Nov;1(3).
Demokritos, GR-153 10.
[17] Md. Anwar Hossain, Optical Character
[9] Optical Character Recognition System Using Recognition based on Template
BP Algorithm: Department of Industrial Matching(2018)
Systems and Information Engineering, Korea
[18] C. Vasantha Lakshmi1 and C. Patvardhan ―An
University, Sungbuk-gu Anam-dong 5 Ga 1,
optical character recognition system for
Seoul 136-701, South Korea.
printed
[10] Dholakia, K., A Survey on Handwritten
[19] Telugu text , Pattern Analysis &
Character Recognition Techniques for various
Applications‖, Category, Theoretical
Indian
Advances, Volume 7, Number 2 / July, 2004
[11] Languages, International Journal of Computer Pages 190-204
Applications, 115(1), pp 17–21, 2015.

Volume 10, Issue 1 ∙January-June 2019 19


Prediction of Heart Attack Using Machine Learning
Akshit Bhardwaj1, Ayush Kundra2, Bhavya Gandhi3,
Sumit Kumar4, Arvind Rehalia5, Manoj Gupta6
1,2,3,4,5,6
Department of Instrumentation & Control Engineering
Bharati Vidyapeeth’s College of Engineering Delhi-110063

Abstract- Cardiovascular diseases are one of the data and diagnose problems in the healthcare field.
biggest reasons for death of millions of people
A simplified explanation of what the machine
around the world only second to cancer. A heart
learning algorithms would do is, it will learn from
attack occurs when a blood clot blocks the blood flow
previously diagnosed cases of patients. A heart
to a part of the heart. In case this blood clot cuts off
Problem must be diagnosed quickly, efficiently and
the blood flow entirely, the part of the heart muscle
correctly in order to save lives. Due to this
begins to die as a result. Going by the statistics, a
Researchers are interested in predicting risk of heart
heart problem can gradually start between the age of
disease and they created different heart risk
40-50 for people with unhealthy diet and bad lifestyle
prediction systems using various machine learning
choices. So, an early prognosis can really make a
techniques. The presence of missing and outlier data
huge difference in their lives by motivating them
in the training set often hampers the performance of a
towards a healthy and active life. By changing their
model and leads to inaccurate predictions. So, it is
lifestyle and diet this risk can be controlled. This
critical to treat missing and outlier values before
Project intends to pinpoint the most relevant/risk
making a prediction.
factors of heart disease as well as predict the overall
risk using machine learning. The machine learning 1. Missing: For continuous variable, we can find
model predicts the likelihood of patients getting a the missing values using isnull() function. Mean
heart disease trained on dataset of other individuals. of the data can also help identify. We can also
As the result, the probability of getting a heart write an algorithm to predict the missing
disease based on current lifestyle and diet is variables.
calculated. The model was trained with Framingham 2. Outlier: We can use a scatter plot to identify and
heart study dataset. as per need, delete the data, perform
Keywords:-Heart Disease, Machine Learning, logistic transformation, binning, Imputation or any other
regression, Cross-validation method. Diagnosis of heart disease using K-fold
cross validation method will be used to evaluate
I. INTRODUCTION
the data and the result would be more accurate.
Machine Learning is one of the most rapidly evolving 80% data of the patients will be used for training
fields of AI which is used in many areas of life, and 20% for testing. Parameter tuning is also
primarily in the healthcare field. It has a great value necessary if accuracy is not close to 80 %.
in the healthcare field since it is an intelligent tool to Logistic regression is the suitable regression
analyse data, and the medical field is rich with data. analysis to perform when the dependent variable
In the past few years, numerous amounts of data were „y‟ is either 0 or 1. Like all regression models,
collected and stored because of the digital revolution. the logistic regression is a type of predictive
Monitoring and other data collection devices are analysis. Logistic regression is used to explain
available in modern hospitals and are being used the relationship between dependent variable
every day, and abundant amounts of data are being usually „y‟ and various nominal, ordinal,
gathered. It is very hard or even impossible for interval or ratio-level independent variables
humans to derive useful information from these (array of x features).
massive amounts of data that is why machine
3. Features have higher odds of explaining the
learning is widely used nowadays to analyse these
variance in the dataset. Thus, giving improved

20 IITM Journal of Management and IT


model accuracy. The dependent variable or target Vadicherla and Sonawane et al [4] proposed a
variable should be binary/dichotomous in nature. minimal optimization technique of SVM for coronary
heart disease prediction. This technique helps in
4. There should be no missing value/outlier in the
training of SVM by looking for the optimal values
data, which can be assessed by or converting the
during training period. This shows minimal
continuous values to standardized scores.
optimization technique provided good results even on
5. There shouldn‟t be a high correlation among the a big dataset and execution time was also reduced
predictors. This can be interpreted by a significantly.
correlation matrix among the predictors. The
Elshazly et al [5] presented a classifier called Genetic
regression analysis is the task of estimating the
algorithm SVM method Bio-Medical diagnosis in
log of odds ratio of an event.
which 18 features were reduced to 6 features via
6. Statistical tools easily allow us to perform the dimensionality reduction. Different kernel functions
analysis for better results. Adding independent were put up for use and performance was compared
variables to a logistic regression model will in terms of measures like accuracy, precision, recall,
always increase the amount of variance which area under curve and f1 score. The results showed
would reduce the accuracy. that linear SVM classifier managed an 83.10%
accuracy with 82.60% true positive rate, 84.90%
II. LITERATURE SURVEY
AUC and 82.70% f1 score.
In the past few years, a lot of projects related to a
heart disease risk prediction have been developed. III. PROPOSED SYSTEM
Work carried out by various researchers in the field The machine learning technique used for the
of medical diagnosis using machine learning analysis prediction of heart attack is Logistic Regression. The
has been discussed in this section of the paper. Dataset used for analysis and training is taken from
Das et al [1] worked on Deep Learning technology to Framingham Heart Study. It is a long-term, ongoing
find odds ratio or the prediction values from various cardiovascular cohort study of residents of the city of
different analytical models and with K-nearest Framingham, Massachusetts. The study began in
neighbours got 89.00% classification accuracy on the 1948 with 5,209 adult subjects from Framingham and
Cleveland dataset for heart study. is now on its third generation of participants. This
Dataset can be found at
Anbarasi et al [2] used 3 binary classifiers such as www.framinghamheartstudy.org
Naive Bayes, K-means clustering and Random forest
for heart attack risk prediction using 13 features and This research intends to pinpoint the relevant/risk
then applied feature engineering for algorithm tuning factors of heart disease as well as predict the overall
and got great prediction results. They discovered that risk using logistic regression. Mathematically, we can
Random forest outperforms the other two binary say that the logistic regression uses a Sigmoid
function. Logistic regression values are categorical
Classifiers with an accuracy of 99.2% for binary unlike linear regressions which are continuous. The
classification. The accuracy of K-means was 88.3% logistic function is a sigmoid function, which takes
and Naive Bayes was about 96.5%. any real value between 0 and 1. Mathematically,
Zhang et al [3] suggested an effective heart attack S(y) = 1 / 1 + e-y
prediction model using Support Vector Machine
(SVM) algorithm. In this, Principal Component Or p (probability) = 1 / 1 + e - (β0 + β1x) Consider
Analysis was applied to retrieve the imperative „y‟ as a linear function in a regression analysis,
features and different kernel functions. The highest y+= β0 + β1x
accuracy was found with Radial Basis Function. To
Putting y in s(y) sigmoid function, it becomes a
get the optimum parameter values, Grid search in
logistic function after solving, logit(p) = log(p/(1−p))
SVM was brought to use and optimum values were
=
found. The maximum classification accuracy touched
about 88.64%. β0+β1∗ Sexmale+β2∗ age+β3∗ cigsPeryear+β4∗ tot

Volume 10, Issue 1 ∙January-June 2019 21


Chol+β5∗ BP+β6∗ heartrate+β7∗ BMI testing datasets using cross-validation.
Here, β0 = Regression Constant p/1-p = odds ratio of d. Algorithm Tuning
the event βk = coefficient of x (predictors) where k =
The aim of parameter tuning is to find the best value
1,2...
for each parameter to improve the accuracy of the
ML model. To tune them, we must have a good
knowledge about their impact on the output. We can
repeat this process for other algorithms.
e. Results and Analysis
The machine learning model and implementation of a
heart disease risk predictor for patients with risk of
future heart disease using a logistic regression
algorithm was successful. Accuracy was calculated as
the ratio of total number of correct predictions to the
total number of predicted outputs. And written as
Fig 3.1: Block Diagram of the Model
(TP+FN)/(TP+TN+FP+FN) Where, TP = True
Diagnosis of heart disease using K-fold cross positive, TN = True negative, FN = False negative,
validation method will be used to evaluate the data FP = False positive.
and the result would be more accurate. 80% data of
Cost function f(x) is the sum of the squares of the
the patients will be used for training and 20% for
difference between the actual value and the predicted
testing. This shows us better accuracy score and takes
value and iterations are the number of times the code
the least amount of execution time.
will be executed to obtain the lowest value of f(x).
IV. IMPLEMENTATION Here, the Global minimum was calculated using
a. Feature Engineering gradient descent.

Feature Engineering aids in extracting information


from the current data. Information is extracted in
form of new features. These features might have a
higher chance of explaining the (variance) in the data.
Thus, giving better model accuracy.
b. Feature transformation
There are various cases where feature transformation
is required
• Changing the variable from original scale to a
scale between 0 and 1. This is known as
normalization.
• Some algorithms work well with normal
distribution. So, we have to remove skewness of
Fig 5.1: cost function
the variables. There are techniques such as
square root or log transformation to remove We also used a Receiver Operating Characteristic
skewness. curve which tells us about how good the model can
c. Feature Selection differentiate between TN and TP.
Feature Selection is a process of looking out for the If the area under the ROC curve is more than 0.8 or it
best attributes which better define the relationship of is bent more towards the left, the model will give
an independent variable with target or dependent better results and predictions would be precise.
variable. The data is sliced into x and y training-
Accuracy was 87% during testing as calculated by

22 IITM Journal of Management and IT


the expression for the model on predicted and test Blood Pressure also show increasing odds of having
values. heart disease. Interestingly total cholesterol shows no
significant change in the odds of CHD. This could be
Details of 5 individuals and their corresponding
due to the presence of 'good cholesterol‟ in the total
probabilities. Threshold is 0.2, that is if probability
cholesterol reading. The early prognosis of
risk > 0.2 would mean there is chance of a heart
cardiovascular diseases would aid in making better
attack in future
decisions on lifestyle changes in high risk patients
Table 5.1: Medical data of five individuals and in turn reduce any future heart problems.
VI. FUTURE SCOPE OF THE PROJECT
At some point in future, the machine learning model
will make use of a larger training dataset, possibly
more than a million different data points maintained
in electronic health record system. Although it would
be a huge leap in terms of computational power and
software sophistication but a system that will work
on artificial intelligence might allow the medical
In Table 5.1 we can see the parameters use in the practitioner to decide the best suited treatment for the
dataset for the analysis. Here several parameters are concerned patient as soon as possible [6]. A software
considered which include controllable and non- API can be developed to enable health websites and
controllable factors. The parameters are Age, Sex i.e. apps to provide access to the patients free of cost.
male or female, Cigarettes count per month, Total The probability prediction would be performed with
cholesterol, Systolic and diastolic blood pressure, zero or virtually no delay in processing [7].
Body Mass Index and Heart rate of the individual. REFERENCES
Table 5.2: Probabilities of the individuals [1] Das, Turkoglu, and Sengur, ―Efficient
diagnosis of heart disease via machine learning
models‖, Expert systems with applications,
2009.
[2] Vanisree and Jyothi, ―Decision Support model
for Heart Disease prognosis based on early
signs of 8–51 patients using binary
classification‖, International Journal of
Computer Applications, 2011.
[3] Y. Zhang, ―Studies on application of
In Table 5.2 we can see the result of the system. Here Support Vector Machines in coronary heart
the probability of the individual getting a heart attack disease prediction model‖, Electromagnetic
is calculated based on parameters discussed in table Field Problems and Applications, Sixth
5.1. Here the percentage of NOT getting heart attack International Conference (ICEF), IEEE 2012.
is depicted in first column followed by the percentage
[4] Vadicherla and Sonawane, ―Decision Support
of getting heart attack in second column.
for coronary Heart Disease analysis Based on
V. CONCLUSION
[5] Minimal Optimization technique‖,
Men seem to be more susceptible to heart disease International Journal of Engineering Sciences
than women. Every 1 in 4 men [8] are likely to have and Emerging Technologies, 2013.
heart disease whereas in case of women every 1 out
[6] H. Elshazly, Hassanien and Elkorany, ―Lymph
of 5 women is likely to have heart disease [9].
diseases prediction based on support vector
Increase in Age, number of cigarettes and systolic
machine algorithm‖, Computer Engineering &

Volume 10, Issue 1 ∙January-June 2019 23


Systems 9th International Conference (ICCES), [9] Xu, JQ, Murphy, SL., Kochanek, KD, Bastian,
2014. BA. Deaths: Final data for 2013. National Vital
Statistics Report. 2016.
[7] Bhupender Kumar & Yogesh Paul, "Medical
Applications of Machine Learning CDC. Million Hearts™: strategies to reduce the
Algorithms", UIET, Kurukshetra University, prevalence of leading cardiovascular disease risk
2016. factors. United States, 2011. MMWR 2011.
[8] ]Ram Avatar & Vineet Kumar, "Deep Learning
in healthcare", UIET, Kurukshetra University,
2018.

24 IITM Journal of Management and IT


Detection and Prevention Schemes in Mobile Ad hoc
Networks
Jeelani1, Subodh Kumar Sharma2, Pankaj Kumar Varshney3
1
Scholar, Mangalayatan University, Aligarh, India
2
Associate Professor, Mangalayatan University Aligarh, India
3
Associate Professor, Department of Computer Science, IITM Janakpuri, New Delhi, India
jeelani.jee@gmail.com, Subodh.sharma@mangalayatan.edu.in
pankaj.surir@gmail.com,

Abstract- Wireless Sensor Network (WSN) has wide mechanisms is increasing day by day. Wireless
range of application areas such as health care, Sensor Networks may interact with sensitive data or
military and industry for real time event detection. usually these networks operate in hostile, unattended
The sensing capability of a Wireless Sensor Network environments, it is necessary to address these security
(WSN) requires sensor node as a network of it. But concerns. Security challenges of sensor networks are
these nodes are constrained in terms of size, energy, different from traditional networks due to many
memory, processing power. These nodes sense constraints of these networks. Moreover when we
environmental data perform limited processing and look at the applications of WSNs, there are many
communicate over short distances. As the applications areas, e.g., battlefield awareness, traffic
applications of wireless sensor networks are monitoring system etc. In which security of
continuously growing also the need for security information remains as an important issue. Providing
mechanisms is increasing day by day. It is very security to a WSN is a nontrivial problem. Security
essential to save WSNs from malevolent attacks in mechanisms which are applicable to wired or other
unfriendly situations. Such systems require security ad-hoc networks are not suitable for WSN. There are
design because of different restrictions of assets and many reasons behind it and we discuss those in the
the noticeable attributes of a remote sensor arrange subsequent sections. Though there are varieties of
which is a impressive test. This article is a broad challenges in sensor networks, here we focus on
survey about issues of WSNs security, which different security issues and possible remedies of
inspected as of late by analysts and a superior those.
comprehension of future bearings for WSN security.
A. Security Requirements in Wireless Sensor
Keywords- Mobile Ad hoc Network, Wireless Sensor Networks
Network, Denial of service.
The main security requirements that each WSN has
I. INTRODUCTION to fulfill are as follows.
Wireless sensor networks as a part of MANET Confidentiality: Secrecy of message transmitted
consist of large number of tiny sensor nodes that between nodes should be maintained properly. For
continuously monitors environmental conditions. that important segments of message should be
Wireless Sensor Networks are a collection of encrypted. In some cases even the two end points are
thousands of sensor nodes that are self-organized and also hidden. In some dynamic systems where nodes
are capable of wireless communication. But these keep on joining and leaving the network, forward and
nodes are constrained in terms of size, energy, backward secrecy needs to be maintained. Forward
memory, processing power [23]. These nodes sense Secrecy means that nodes leaving the network may
environmental data perform limited processing and not be able to access future transmissions on the
communicate over short distances. As the network after leaving the network and Backward
applications of wireless sensor networks are Secrecy means that new nodes may not be able to
continuously growing also the need for security access past transmissions before their joining the

Volume 10, Issue 1 ∙January-June 2019 25


network. These phenomenons are needed to maintain jamming, tampering, exhaustion, flooding and so on
confidentiality of data in wireless sensor networks [3].
[23].
DENIAL-OF-SERVICE DEFENSE IN WSNs [22]
Authenticity: For preparation the security of
communicating node‘s identities, authenticity is vital.
Any node must verify even if an accepted message
comes from a true sender. In the absence of
authentication, attackers without difficulty are able to
extend wrong data into the wireless sensor networks.
Generally, for authentication the origin of a message,
an annexed message authentication code possibly
employed [22].
Integrity: Integrity should be prepared to assure that
attackers cannot change the transmitted messages.
Attackers are able to establish interference packets to
modify their polarities. In addition before forwarding
them a malicious routing node can alter significant
data in packets. To find random errors throughout
Hello Flood attack: HELLO flood attack is one of the
packet transmissions as a cyclic redundancy
active attacks that flood the HELLO packets in the
checksum (CRC) employed for detecting them,
network. In wireless sensor network attacker transmit
similarly keyed checksum, for example a MAC use to
the packets from source node to destination
secure packets against changes [22].
publicizing the packets as cluster head. All sensor
Availability: WSN services should always be nodes will select these packets and send join packet
available in spite of all the resource depletion attacks into it, thinking that the attacker is their neighbor and
that may occur on the system. So our network should the entire network will be in confusion. In wireless
be resistant to such attacks [23]. sensor network the sensor nodes are deploy in normal
orchestrated region. And the data is transmitted from
Non-Repudiation: Neither the sender nor the receiver
source node to destination through intermediate
should be able to deny that the message is sent by
nodes. Sensor nodes does not distinguish that the
him. For that message can be digitally signed by both
enemy node is not their neighbour nodes. So as a
the sender and the receive [23].
result network is spoofed by the attacker [3].
B. Attacks on Wireless Sensor Networks
Attacks on Information during transmission: The
Since wireless sensor networks operate in unsafe most dangerous attack in WSN are on information
environment these are vulnerable to several types of that is being transmitted between nodes because that
attacks. information is susceptible to eavesdropping,
Denial of service attack: Denial-of-Service attack is injection, modification.
the serious attack as it consumes the network Traffic analysis attack can also be performed because
resources like energy, bandwidth and power. Denial attacker may be able to get to know about the layout
of service attack floods access amount of unnecessary of the network and can damage the busiest portions
packets in the network and affects the overall of the network to perform greatest damage [23].
performance of the network. If there is only single
Replicating a Node Attack: The attacker may insert a
attacker in the network then this is DoS attack and if
new node into the sensor network which can be a
there are multiple attackers then this is known as
clone to an pre-existing node. This new cloned node
Distributed Denial of Service (DDoS) attack. Denial
can transmit useful information to the attacker. This
of service attack is multilayer attack. In WSN there
node replication attack is most dangerous when the
are numerous DOS attacks on different layers like
cloned node is some base station. So base stations

26 IITM Journal of Management and IT


needs to be deployed in secure locations [23]. sensor nodes. The role of a cNode is to analyze traffic
and to send back a warning to the cluster head if any
C. Routing Attack
abnormal traffic is detected.The election of these
The attacks that affect the routing protocol of cNodes is dynamic, and done periodically and based
wireless sensor network are as follows: on a Multiplicative Linear-Congruential Generator
(MLCG).
Selective Forwarding: In Selective Forwarding attack
the malicious node may drop certain packets and Rolla P. and Kuar M.[2] concluded an experimental
transmit the rest. If it drops all the packets then it is a study on time allocation based request forwarding
Black Hole attack. But if it forwards selective packets window technique to detect and prevent DoS (Denial
then is selective forwarding attack. The effectiveness of Service) attack. DoS attacks are flooding access
of the attack depends on how close is the malicious amount of packets in the network that consumes the
node to the base station because then maximum energy of the network. In Profile based Protection
traffic will go through it [23]. Scheme (PPS), the behavior of all the nodes deployed
in the network are observed.
Sinkhole Attacks: Sinkhole Attack is to attract
maximum traffic through malicious node which is Patil S. and Choudhari S.[3] analyzed that Denial-of-
placed somewhere near the base station. If the sensor Service (DoS) attack is most popular attack in sensor
network has one main base station then this attack network. Attack prevention techniques such as fuzzy
can be dangerous [23]. Q-learning algorithm, Dynamic Source Routing
(UDSR), Secure Auction-based Routing (SAR), have
Sybil attack: In Sybil attack one node presents
been used by author against DoS attacks. They
multiple identities in the network that may mislead
proposed cooperative immune system is an
nodes in the network. Sybil attacks can be used
enhancement to the existing immune system, CO-
against topology maintenance and routing algorithms
FAIS, which will improve the accuracy of the
[23].
system. In the stands they have reduced the false
Wormhole Attack: In Wormhole attack just like alarm rate.
Sinkhole attack the attacker sitting closer to base
Naik S. and Shekokar N.[4] designed and
station may tunnel the traffic to a low-latency link
experimental program on defending the mechanism
thus disrupting the traffic [23].
against the denial of sleep attack. This solution is an
II. LITREATURE REVIEW effective method for preventing this attack as all the
nodes sending the synchronization messages will be
The present literature review is based on the research
validated before those messages are accepted and
work entitled ―Security attacks detection and
rejected if the node is not validated. The attacker
prevention schemes in wireless sensor networks‖. For
node cannot replay the sleep synchronization signal
reviewing of literature, the researcher has gathered
again as its sleep schedule will not be accepted
more of articles as a secondary source of data from
without authentication.
which, selected material or articles related to the
researcher topic in order to acquire depth knowledge Chaudhary S. and Thanvi P.[5] concluded an
on the related topic and completed in the past days. experimental study on modified variant of Ad-hoc On
After reviewing of previous research articles, the Demand Distance Vector (AODV) protocol to
researcher summarized the reviewed literature and to analyze the effect of Dos attack on system
end up with, a research gap has been introduced. performance and later apply the prevention scheme to
analyze the change in network performance.
Guechari, M et al.[1] presented an experimental study
Researcher have used scenario in an experimental
on dynamic approach for detecting Denial of Service
which are Topology scenario for 80 nodes,
(DoS) attacks in cluster-based sensor networks. That
simulation in progress, average End-to-End Delay,
method is based on the election of controller nodes
Throughput analysis, Packet ratio and Packet drop
called cNodes which observe and report DoS attack
analysis. For successful attack detection various
activities. Each cluster contains cNodes and normal
methods have been proposed over time. The

Volume 10, Issue 1 ∙January-June 2019 27


technique used by researcher based on comparison on Automata Stochastic Logic. Researcher have used
RREP sequence number of packet received by the different different models which are Non-
sender from its neighbors broadcasting the Markoveianmodling and verification of DoS,
availability of fresher or shorter routes. Generalized stochastic Petri nets (GSPN), eGSPN
and HASL modeling checking with two algorithm
S. Fouchal et al.[6] carried out a parametric study to a
which are LEACH and k-LEACH.
novel approach to detect denial attacks in Wireless
Sensor Networks. This is approach based on a Ghildiyal S et al.[10] focued on the Wireless Sensor
recursive clustering. They have approved our Network characteristics, constraints and types of DoS
proposition with two clustering algorithms on 100- attacks at different layer. Different layers of WSN
sensor networks. In fact, they used the LEACH (low nodes have variety of roles to play for proper their
energy algorithm adaptive clustering hierarchy)and proper functioning at different layers like signaling,
FFUCA (Fast and flexible unsupervised clustering framing, forwarding, reliable transportation and user
algorithm) algorithms. The results are convincing in interaction at both receiving as well as sending end.
terms of the detection of the groups. In addition, the Many denial of service attacks are identified at each
use of FFUCA induces a better management of layer which are meant for purposeful, planned attacks
energy and thus a longer network lifetime. to jeopardize the availability of service, restricting
the WSN utility for application.
Mansouri D et al.[7] presented study on a method for
detecting and preventing Denial of Service attacks in Wazid M et al.[11] carried out a parametric study of
WSNs. The detection method they have considered is Blackhole attack is measured on the network
based on using special control nodes which are parameters followed by the proposal of a novel
monitoring the throughput of traffic in clusters. technique for the detection and prevention of
Control nodes (Cluster Heads) are elected by using Blackhole attack in WSN. The presences of
recursively LEACH clustering algorithm. We have blackhole attack both parameters of network which
presented by means of a set of experimentation, using are End-to-end delay and Throughput are affected.
Simevents simulator. The numerical results obtained They have observed that in the presence of blackhole
show that our approach gives significant results in attack the performance of network degraded very
term of detection rate and time detection. rapidly. The End-to-end delay increases to 4.03 msec
and throughput decreases to 5027.85 bps. So it has
Kiss I., Haller P. and Beres A.[8] proposed a
become very important to provide a detection and
clustering based approach for detecting the influences
prevention mechanism for blackhole attack.
of cyber-attacks, especially those of denial of service
(DoS) attacks, in the observed of the system. Singh V. Pal, UkeyAnand A. S. and Jain S.[12]
Proposed approach is presented in contrast with TEP proposed to detect and prevent hello flood attack
(Tennessee Eastman challenge process) therefore using signal strength of received Hello messages.
several scenario of DoS attack are experimented to Nodes have been classified as friend and stranger
validate the effectiveness of the method. Researcher based on the signal strength of Hello messages sent
have used the SCADA (Supervisory Control And by them. Nodes classified as stranger are further
Data Acquisition) simulator this approach. validated by sending a simple test packet; if the reply
of test packet comes back in a predefined time then it
Ballarini P., Mokdad L. and Monnennt Q.[9]
is treated as valid otherwise it is treated as malicious.
proposed a dynamic cNodes displacement schema
The algorithm is implemented in ns-2 by modifying
according to which cNodes are periodically elected
the AODV-routing protocol. The performance of
among ordinary nodes of each atomic cluster. Such a
algorithm has been tested under different network
solution results in a better energy balance while
scenarios. The simulation results show improved
maintaining good detection coverage. They analyze
performance of the new algorithm in terms of number
the tradeoffs between static and dynamic solutions by
of packet delivery ratio as compare to AODV with
means of two complementary approaches: through
hello flood attack, hello flood attack is an important
simulation with the NS-2 simulation platform and by
attack on the network layer, in which an adversary,
means of statistical model checking with the Hybrid

28 IITM Journal of Management and IT


which is not a legal node in the network, can flood were proposed for WSN, they cannot guarantee the
hello request to any legitimate node using high protection of the network from different attacks.
transmission power and break the security of WSNs. Sinkhole attacks which are launched by a new or a
compromised node attracts the network traffic to pass
Hai T. H. and Eui-Nam H.[13] proposed a
through it. This attack leads to many other attacks
lightweight detection algorithm based only on the
such as blackhole, wormhole, or even information
neighborhood information. His detection algorithm
fabrication attacks.
can detect selective forwarding attack with high
accuracy and little overhead imposed on detection Khanderiya M. and Panchal M.[17] proposed work
modules than previous works. His algorithm has been researcher has tried to give a method that could detect
evaluated and shows a good effectiveness even the Sybil attack in Wireless Sensor Networks.
high density of network and the high probability of Researchers have developed many schemes and
collisions in WSNs. Besides, our detection modules methodologies for detecting and preventing Sybil
consume less energy than previous works by using attack, but these security mechanisms are not being
over-hearing mechanism to reduce the transmission used satisfactorily in real scenario for Wireless
of alert packets. Sensor Networks. They have presented a robust and
lightweight solution to detect Sybil attack using RSSI
Malik R. and Sehrawat H.[14] studied on some
(Received Signal strength Indicator). RSSI value is
important attacks over WSNs with possible ways to
used for detecting Sybil attack, but there three
detect and defend selective forwarding attack. WSNs
detectors were used for this scheme. Three detectors
have issues like low memory and limited battery
were required because the nodes that are at same
availability, so conventional security establishments
distance from detecting node would have same RSSI
are not effective here. A number of attacks are
value, so single node was not enough for detection
possible over WSNs like black hole, wormhole and
process, as it would regard those nodes as Sybil too.
selective forwarding attack. Selective forwarding
attack is a special case of black hole attack where Dhamodharan U. K. R. and Vayanaperumal R.[18]
compromised nodes drop packets selectively. designed and experimental program on a scheme of
Researcher have reviewed the some OSI layers such assuring security for wireless sensor network, to deal
as Application, Transport, Network, Data link and with attacks of these kinds in unicasting
Physical and these attacks and defense strategy. andmulticasting. Basically a Sybil attack means a
node which pretends its identity to other nodes.
Soni V., Modi P. and Chuadhri V.[15] have presented
Communication to an illegal node results in data loss
some countermeasures against the sinkhole attack.
and becomes dangerous in the network.The existing
There are many possible attacks on sensor network
method Random Password Comparison has only a
such as selective forwarding, jamming, sinkhole,
scheme which just verifies the node identities by
wormhole, Sybil and hello flood attacks. Sinkhole
analyzing the neighbors. A survey was done on a
attack is among the most destructive routing attacks
Sybil attack with the objective of resolving this
for these networks. It may cause the intruder to lure
problem.The survey has proposed a combined
all or most of the data flow that has to be captured at
the base station. Once sinkhole attack has been CAM-PVM(compare and match-position verification
implemented and the adversary node has started to method) with MAP (message authentication and
work as network member in the data routing, it can passing) for detecting, eliminating, and eventually
apply some more threats such as black hole or gray preventing the entry of Sybil nodes in the wireless
hole. Ultimately this drop of some important data sensor networks.
packets can disrupt the sensor networks completely.
Amish P. and Vaghela V. B.[19] carried out a
Rassam M. A et al.[16] concluded that the parametric study on the techniques dealing with
vulnerabilities of Mintroute protocol to sinkhole wormhole attack in WSN are surveyed and a method
attacks are discussed and the existing manual rules is proposed for detection and prevention of wormhole
used for detection are investigated using different attack. AOMDV (Ad hoc On demand Multipath
architecture. On here different types of protocols Distance Vector) routing protocol is incorporated into

Volume 10, Issue 1 ∙January-June 2019 29


these method which is based on RTT (Round Trip mentioned should be analyzed using simulation and
Time) mechanism and other characteristics of some more features like speed-of- operation, Power
wormhole attack. As compared to other solution Consumption and Efficiency should be evaluated.
shown in literature, proposed approach looks very
Aldhobaiban D., Elleithy K. and Almazaydeh L.[24]
promising. NS2 simulator is used to perform all
carried out an experimental research has shown how
simulation.
the network nodes can be rerouted to avoid the
Shaikh F. A. and Patil U.[20] carried out the study to attacked nodes. The deletion of the links for the
explain a wormhole detection algorithm for Wireless infected node and its neighbor leads to the security of
Mesh Networks which detect the wormholes by the rest of the node network. Since this approach
calculating neighbor list as well as directional requires a table to monitor all the nodes, the load on
neighbor list of the source node. The main aim of the the network is overwhelming.
algorithm is that it can offer approximate location of
Sakthivel T. and Chandrasekaran R. M.[25] carried
nodes and effect of wormhole attack on all nodes
out the proposed Path Tracing (PT) algorithm for
which is helpful in implementing countermeasures.
detection and prevention of wormhole attack as an
The performance evaluation is complete in varying
extension of DSR protocol. The PT algorithm runs on
no. of wormholes in the network.
each node in a path during the DSR route discovery
Modirkhazeni A. et al.[21] carried out the parametric process. The PT algorithm detects and prevents the
study to focused on wormhole attack and proposed wormhole attack using per hop distance between two
distributed network discovery approach to mitigate nodes. They proposed algorithm implementation
its effect. Researcher has presented selected depends on DSR protocol. They then simulated the
countermeasures and then we proposed network proposed algorithm in NS-2. The parameters like
discovery approach which needs no additional tools throughput, overhead and the average delay of the
or accurate time synchronization. According to the proposed algorithm are compared with that of
simulation our approach can mitigated almost 100% existing wormhole prevention techniques.
of wormhole attack overload in the environment
Modirkhazeni A et al.[26] focused on wormhole
where 54% of nodes are affected with the wormhole.
attack and proposed distributed network discovery
Ahmad Salehi S. et. al.[22] have analyzed networks approach to mitigate its effect.Then they focused on
require security plan due to various limitations of the wormhole attack in these kinds of networks and
resources and the prominent characteristics of a presented selected countermeasures. Afterward they
wireless sensor network which is a considerable generalized previous countermeasures, analyzed them
challenge in WSNs the node nature causes limitations and selected the better one. And then base on the
like restricted energy, capability of processing, and presented results they proposed network discovery
storage capacity. These restrictions create WSNs so approach base on distributed scheme which needs no
distinctive from conventional ad hoc wireless additional tools or accurate time synchronization.
networks. Specific methods and protocols have been According to the simulation proposed approach acted
advanced to utilize in WSNs. All of the mentioned efficiently and mitigated almost 100% of wormhole
security dangers including the Hello flood attack, attack overload in the environment where 54% of
wormhole attack, Sybil attack, sinkhole attack, offer nodes are affected with the wormhole.
one usual goal which is for compromising the
S. I. Eludiora, et.al[27] reviewed the existing
integrity of the network they attack. In this paper,
approach to security solutions in WSNs and proposed
they principally concentrate on the threats in WSN
the use of a distributed approach. The approach will
security and the abstract of the WSNs threats which
allow SNs to communicate directly with the BSs
influence various layers along with their defense
rather than forming cluster-heads among themselves.
techniques is presented.
Mobile Agents (MAs) were introduced to facilitate
Bhalla M., Pandey N. and Kumar B.[23] have communication among the BSs. MAs can easily
analyzed security issues and vulnerabilities in move from one host to another and perform
wireless sensor networks. All the security protocols necessary tasks. Researchers developed a Distributed

30 IITM Journal of Management and IT


IDS for WSNs. Distributed IDS is implemented using the conceptual of the WSNs dangers which impact
TMote sky wireless sensor for testing and simulation different layers alongside their protection methods is
over specified parameters. displayed. As of late, set up of concentrating on
different layers, researchers are striving for
Ioannou C., Vassiliou V. and Sergiou C[28] have
incorporated framework for security component. The
analyzed they proposed a general methodology of an
most normal security risk in different layers and the
anomaly-based Intrusion Detection System (IDS),
most sensible arrangement in this paper are exhibited.
named mIDS, that uses the Binary Logistic
Regression (BLR) statistical tool to classify local REFERENCES
sensor activity to either benign or malicious to detect
[1] MalekGuechari, Lynda Mokdad and Sovanna
a malicious behavior within a sensor node. Attacks
Tan. "Dynamic Solution for Detecting Denial
have been implemented within the Contiki O/S and
of Service Attacks in Wireless Sensor
we tested the results using the associated COOJA
Networks" IEEE, pp. 173-177, (2012).
simulator. All sensor nodes are equivalent to TelosB
nodes and have a 25m radio range. They created a [2] Rolla P. and Kuar M.,"Dynamic Forwarding
model that took into consideration both attacks and Window Technique against DoS Attack in
evaluated in different network topologies. WSN" IEEE, pp.212-216, DOI
10.1109/ICMETE.2016.93, (2016).
Shanthi. S., E. G, Rajan[29] discussed many potential
issues of WSN security and detection mechanisms [3] Patil S. and Choudhari S. "DoS attack
and present a comprehensive analysis of various prevention technique in Wireless Sensor
Intrusion Detection approaches (signature based Networks" IEEE, pp.715-721, DOI:
detection system, anomaly based detection system, 10.1016/j.procs.2016.03.094, (2016).
hybrid based detection system) in Wireless Sensor [4] Naik S. and Shekokar N., "Conservation of
Networks. energy in wireless sensor network by
III. CONCLUSION preventing denial of sleep attack" ELSVIER,
PP.370-379, doi:
The interest for security in WSNs turns out to be
10.1016/j.procs.2015.03.164, (2015).
more self-evident amid capacity development of
WSNs and they are utilized substantially more, [5] Chaudhary S. and Thanvi P., "Performance
nonetheless, in WSNs the hub nature causes Analysis of Modified AODV Protocol in
impediments like confined vitality, ability of Context of Denial of Service (Dos) Attack in
preparing, and capacity limit. These confinements Wireless Sensor Networks" International
make WSNs so particular from regular specially Journal of Engineering Research and General
appointed remote systems. Particular techniques and Science Volume 3, pp.486-491(2015).
conventions have been progressed to use in WSNs. [6] S. Fouchal et al., "Recursive-clustering-based
The majority of the said security perils including the approach for denial of service (DoS) attacks
Welcome surge assault, wormhole assault, Sybil in wireless sensors networks" International
assault, sinkhole assault, offer one common objective journal of communication systems, pp.309-
which is for trading off the uprightness of the system 324, DOI: 10.1002/dac.2670, (2015).
they assault.
[7] Mansouri D., Mokddad L., Ben-othman J.
The security of WSNs has turned into a noteworthy and Ioualalen M., "Preventing Denial of
subject since of the distinctive perils showing up and Service Attacks in WirelessSensor Networks"
the importance of information classification, in spite IEEE, PP.3014-3019, (2015).
of the fact that before, there was a little focus on
WSNs security. There are a few answers for secure [8] Kiss I., Haller P. and Beres A., " Denial of
against all risks, albeit a few arrangements have Service attack detection in case of Tennessee
already been recommended. In this article, we Eastman challenge process" ELSEVIER,
essentially focus on the dangers in WSN security and pp.835-841, doi:
10.1016/j.protcy.2015.02.120, (2015).

Volume 10, Issue 1 ∙January-June 2019 31


[9] Ballarini P., Mokdad L. and Monnennt Q., [17] Khanderiya M. and Panchal M.,"A Novel
"Modeling tools for detecting DoS attacks in Approach for Detection of Sybil Attack in
WSNs" Security and Communication Wireless Sensor Networks" IJSRSET, Vol. 2,
Networks, pp.420-436, DOI: pp.113-117, (2016),.
10.1002/sec.630, (2013).
[18] Dhamodharan U. K. R. and Vayanaperumal
[10] Ghildiyal S., Mishra A. K., Gupta A. and R., "Detecting and Preventing Sybil Attacks
Garg N., "ANALYSIS OF DENIAL OF in Wireless Sensor Networks Using Message
SERVICE (DOS) ATTACKS IN WIRELESS Authentication and Passing Method" Hindawi
SENSOR NETWORKS" IJRET: Publishing Corporation Scientific World
International Journal of Research in Journal, pp.1-7,
Engineering and Technology, pp.140-143, http://dx.doi.org/10.1155/2015/841267,
(2014). (2015).
[11] Wazid M., Katal A., Sachan R. S., R H [19] Amish P. and Vaghela V. B., "Detection and
Goudar and D P Singh., "Detection and Prevention of Wormhole Attack in Wireless
Prevention Mechanism for Blackhole Attack Sensor Network using AOMDV protocol"
in Wireless Sensor Network" IEEE, pp.576- ELSEVIER, pp.700-707, doi:
581, (2013). 10.1016/j.procs.2016.03.092, (2016).
[12] Singh V. Pal, UkeyAnand A. S. and Jain S., [20] Shaikh F. A. and Patil U. "Efficient
"Signal Strength based Hello Flood Attack Detection and prevention of Wormhole
Detection and Prevention in Wireless Sensor Attacks in
Networks" International Journal of Computer
[21] Wireless Mesh Network" International
Applications, pp.1-6, DOI: 10.5120/10153-
Research Journal of Engineering and
4987, (2013).
Technology (IRJET), Vol. 4, pp.2208-2214,
[13] Hai T. H. and Eui-Nam H., "Detecting (2017),.
Selective Forwarding Attacks in Wireless
[22] Modirkhazeni A., Aghamahmoodi S.,
Sensor Networks Using Two-hops Neighbor
Modirkhazeni A. and Niknejad N.,
Knowledge" IEEE, pp.325-331, DOI
"Distributed Approach to Mitigate Wormhole
10.1109/NCA.2008.13, (2008).
Attack in Wireless Sensor Networks" IEEE,
[14] Malik R. and Sehrawat H., "Comprehensive pp.122-128, (2012).
Study of Selective Forwarding Attack in
[23] Ahmad Salehi S., Razzaque M. A., Naraei P.
Wireless Sensor Networks" International
and Farrokhtala A., "Security in Wireless
Journal of Advanced Research in Computer
Sensor Networks: Issues and Challanges"
Science, vol.8, pp.1835-1838, (2017).
IEEE, (2013).
[15] Soni V., Modi P. and Chuadhri V.,
[24] Bhalla M., Pandey N. and Kumar B.
"Detecting Sinkhole Attack in Wireless
―Security Protocols for Wireless Sensor
Sensor Network" International Journal of
Networks‖ IEEE, pp. 1005-1009, (2015),.
Application or Innovation in Engineering &
Management (IJAIEM), Vol. 2, pp.29-32, [25] Aldhobaiban D., Elleithy K. and
(2013). Almazaydeh L., "Prevention of Wormhole
Attacks in Wireless Sensor Networks" IEEE,
[16] Rassam M. A., Zainal A., Maarof A. and Al-
pp.287-291, DOI 10.1109/AIMS.2014.57,
Shaboti M., "A Sinkhole Attack Detection
(2014).
Scheme in Mintroute Wireless Sensor
Networks" IEEE, pp.71-75, DOI: [26] Sakthivel T. and Chandrasekaran R. M.,
10.1109/ISTT.2012.6481568, (2012). "Detection and Prevention of Wormhole
Attacks in MANETs using Path Tracing

32 IITM Journal of Management and IT


Approach" European Journal of Scientific Detection Scheme for Wireless Sensor
Research, pp.240-252, (2012). Networks‖ IEEE, 2011.
[27] Modirkhazeni A., Aghamahmoodi S., [29] Ioannou C., Vassiliou V. and Sergiou C.,
Modirkhazeni A. and Modirkhazeni N., "An Intrusion Detection System for Wireless
"Distributed Approach to Mitigate Wormhole Sensor Networks" IEEE. 2017.
Attack in Wireless Sensor Networks" IEEE,
[30] Shanthi. S., E. G, Rajan.
pp.122-128, (2012).
―Comprehensive Analysis of Security Attacks
[28] S. I. Eludiora, O.O. Abiona, A. O. and Intrusion Detection System in Wireless
Oluwatope, S. A. Bello, M.L Sanni, , D. O. Sensor Networks‖ IEEE 2016, pp. 24-31,
Ayanda, C.E Onime E. R. Adagunodo and 2016
L.O. Kehinde ―A Distributed Intrusion

Volume 10, Issue 1 ∙January-June 2019 33


A Review on Histogram of Oriented Gradient
Apurva Jain1, Deepanshu Singh2
1,2
Research Scholar, IITM, GGSIPU, New Delhi, India
apurva.2296@gmail.com, singh.deepanshu207@gmail.com

Abstract—Pedestrian detection systems are image of size 256 x 256 pixels. The input image is
receiving increasing attention in both industry and divided into 256 cells with a cell size of 16 x 16
academia with the rapid development of autonomous pixels and each cell is divided into four sub-cells as
automobiles which employ artificial intelligence.This with a sub-cell size of 8 x 8 pixels.
article describes the approaches to Histogram of
The gradients Gx and Gy of a particular pixel
Oriented Gradient and support vector machine,
location is computed using 1D masks in X and Y
focusing on studying the HOG feature and
direction as:
application, detailing the process of HOG feature
extracted and the design of classifiers in pedestrian Gx = Mx• Ix
detection. Gy = My• Iy
I. INTRODUCTION whereMx = [-1 0 1] and My = [-1 0 1]T are the masks
Real-time human detection from videos is one of the used on Ix in X-direction and Iy in Y-direction
most active areas in computer vision due to its respectively. The gradient magnitude, Gmag and
widespread applications such as intelligent orientation angle, Gdir are computed.
surveillance and home security in [1], personal The gradient vector for each pixel location in the sub-
protection and kidnapping detection in [2], automatic cells are drawn using Gmag and Gdir values The
detection of crimes in [3] and human computer gradient vectors in each sub-cell are normalized to
interfaces in [4]. The successful progress towards obtain a key descriptor as shown. Similarly, key
design of autonomous vehicles such as autonomous descriptors are computed from each cell to obtain
cars [5,6], self-driven rider less bicycles [7], HOG features of input image.
autonomous robots [8] and drones [9,10] have
spearheaded research in the area of pedestrian
detection. Pedestrian detection using HOG and neural
networks is reported in [11], pedestrian detection for
advance driver assistance system using HOG and
Adaboost is reported in [12], pedestrian detection
using Bayesian and Edgelet detectors in [13], using
local binary patterns (LBP) in [14], using Motion and
Appearance patterns in [15], using Shapelet features
in [16] and using Deep Networks in [17].
II. DESIGN OF PROPOSED PEDESTRIAN
B. SVM Classifier
DETECTION SYSTEM
Support vector machine (SVM) is a new method of
In this section, the design of proposed real-time
machine learning based on statistic learning theory
pedestrian detection system is explained with feature
proposed by Vapnik et, the goal of which is to find
extraction and classifier being the two main stages of
the optimal classification surface [21].
the system
SVM is considered to be the simplest and fastest
A. HOG Feature Extraction classifier for both linear as well as non-linear
The sequence of steps followed to obtain these HOG Classification problems [18]. SVM learning aims at
features from an input image, considering an input

34 IITM Journal of Management and IT


finding a good hyperplane in a higher dimensional scenery images provided in the INRIA database. It is
feature space, that best separates two classes as observed during experimentation that the proposed
shown in Fig. 5. The equation of SVM hyper plane real-time pedestrian detection system gave 93.22%
that classifies two classes is given as accuracy based on the distance between pedestrian
and camera, whereas 100% accuracy is obtained, the
moment pedestrian comes closer towards the camera.
This suffices the requirement of pedestrian detection
where, z is the test data, K(xsv , z) is the kernel for autonomous vehicles.
function, NSV denotes the number of support
IV. CONCLUSION
vectors, xsv is the input, is the target output and is the
Lagrange multiplier associated with each training In this paper, design of a real-time pedestrian
data. detection system for autonomous vehicles is
Fig. 2 optimal classification surface in the linearly proposed and its performance is evaluated by
separable case [21]. carrying out various experiments using offline
images, images from standard dataset and real- time
input. The system is capable of detecting pedestrians
with an accuracy of 98.31% and it is observed that
the non-detected pedestrians are also detected, once
they come closer to the camera, thus achieving 100%
recognition accuracy.
REFERENCES
[1] Yi G, Myoungjin K, Yun C and H Lee,
―Design and Implementation of UPnP-based
Surveillance Camera system for Home
Security‖, in Proc. Int. Conf. on Information
Science and Applications, Suwon, South
Korea, June 2013.
[2] Miyahara A and Nagayama I, ―An Intelligent
security camera system for kidnapping
III. EXPERIMENTAL RESULTS detection‖, Journal of Advanced
Computational Intelligence and Intelligent
The performance of proposed system is evaluated by
Informatics, 2017
carrying out various experiments using images
collected online, using images from INRIA Person [3] Goya K, Zhang X, Kitayama K and Nagayama
I, ―A Method for Automatic Detection of
Dataset [20] and using real-time inputs. The photo
Crimes for Public Security by using Motion
collage of samples considered for training from two Analysis‖, in Proc. Fifth Int. Conf. on
classes, background and pedestrians for images Intelligent Information Hiding and
obtained online. The performance of proposed Multimedia Signal Processing, Kyoto, Sep
pedestrian detection system using images obtained 2009.
online and real-time input images, which is again a [4] R. Moldovan, B. Orza, A. Vlaicu and C.
photo collage of few results obtained, wherein a blue Porumb, ―Advanced human-computer
box denotes the identification of apedestrian in an interaction in external resource annotation,‖ in
image. The performance result of proposed system is Proc. Int. Conf. on Automation, Quality and
reported in Table I for all the experiments carried out Testing, Robotics, Cluj-Napoca, 2014,
using different datasets. It is observed that a [5] K. Bimbraw, ―Autonomous cars: Past, present
maximum accuracy of 98.31% is INRIA datasets, and future a review of the developments in the
which is a mixture of cropped images of pedestrians last century, the present scenario and the
and backgrounds. Pedestrian detection accuracy of expected future of autonomous vehicle
technology,‖ in Proc. 12th Int. Conf. on
96.35% is obtained on testing the system using full

Volume 10, Issue 1 ∙January-June 2019 35


Informatics in Control, Automation and [14] X Wang, TX Han and S Yan, "An HOG-LBP
Robotics, Colmar, 2015. human detector with partial occlusion
handling," 2009 IEEE 12th International
[6] Pozna and C. Antonya, ―Issues about
Conference on Computer Vision, Kyoto, 2009.
autonomous cars,‖ in Proc. 11th Int. Sym. on
Applied Computational Intelligence and [15] P Viola, MJ Jones and D Snow, "Detecting
Informatics, Timisoara, 2016 pedestrians using patterns of motion and
appearance," Computer Vision, 2003.
[7] A Brizuela-Mendoza, CM Astorga-Zaragoza,
Proceedings. Ninth IEEE International
A Zavala-Río, F Canales- Abarca and J Reyes-
Conference on, Nice, France, 2003.
Reyes,―Fault Tolerant Control for
Polynomial Linear Parameter Varying (LPV) [16] P Sabzmeydani and G Mori, "Detecting
Systems applied to the stabilization of Pedestrians by Learning Shapelet Features,"
ariderless bicycle,‖ in Proc. Control 2007 IEEE Conference on Computer Vision
Conference (ECC), 2013 European, Zurich, and Pattern Recognition, Minneapolis, MN,
2013. 2007.
[8] S. A. Miratabzadeh et al., "Cloud robotics: A [17] Anelia A, Alex K, Vincent V, Abhijit O and
software architecture: For heterogeneous Dave F, ―Real-time Pedestrian Detection with
large-scale autonomous robots," 2016 World Deep Network Cascades,‖ in Proc. 26th British
Automation Congress (WAC), Rio Grande, Machine Vision Conference (BMVC),
PR, USA, 2016. Swansea UK, Sep 2015.
[9] S. Perkins, "From insects to drones," in [18] Christopher Burges, ―A tutorial on Support
Engineering & Technology, Vol. 10, No. 11, Vector Machines for Pattern Recognition,‖
pp. 72-76, December 2015. Data Mining and Knowledge Discovery 2,
1998.
[10] Moskvitch, "Take off: are drones the future of
farming?," in Engineering & Technology, Vol. [19] J. Manikandan and B. Venkataramani, ―Study
10, No. 7-8, pp. 62-66, August- September and evaluation of a multi- class SVM classifier
2015. using diminishing learning technique,‖
Neurocomputing, Volume 73, Issues 10–12,
[11] Gangqiang Shi and Weiwei Hu, "The video
June 2010.
people detection based on neural network,"
2015 IEEE 16th International Conference on [20] INRIA Person Dataset.
Communication Technology (ICCT), http://pascal.inrialpes.fr/data/human/.
Hangzhou, 2015.
[21] Huo, D., Liu, H., Shang, Z. and Li, R.,
[12] Rettkowski, A. Boutros and D. Gohringer, ―Research for pedestrian detection classifier.‖,
"Real-time pedestrian detection on a In 2016 7th IEEE International Conference on
xilinxzynq using the HOG algorithm," 2015 Software Engineering and Service Science
International Conference on ReConFigurable (ICSESS) ,2018.
Computing and FPGAs (ReConFig), Mexico
City, 2015. [22] Goel, R., Srivastava, S. and Sinha, A.K., 2017.
A Review of Feature Extraction Techniques
[13] Wu B and Nevatia Ram, ―Detection and for Image Analysis. International Journal of
Tracking of Multiple, Partially Occluded Advanced Research in Computer and
Humans by Bayesian Combination of Edgelet Communication Engineering, 2017.
based Part Detectors,‖ International Journal of
Computer Vision, Nov 2007.

36 IITM Journal of Management and IT


Breast Cancer Risk Prediction
Pankaj Kumar Varshney1, Hemant Kumar2,
Jasleen Kaur3, Ishika Gera4
1, 2, 3,4
Department of Computer Science,
Institute of Information Technology and Management, Janakpuri
pankaj.surir@gmail.com, hemantmbmgla@gmail.com,
kaurjasleen420.jk@gmail.com, ishikagera1998@gmail.com

Abstract.-The number and size of restorative/medical Artificial Neural Network.


databases are expanding quickly yet the greater part
I. INTRODUCTION
of these information are not investigated for finding
the significant and concealed learning. Propelled Breast cancer (BC) is the most common cancer in
information and data mining methods can be utilized women, affecting about 10% of all women at some
to find concealed examples and connections. Models stages of their life. In recent years, the incidence rate
created from these strategies are helpful for keeps increasing and data show that the survival rate
medicinal specialists to settle on right choices. The is 88% after five years from diagnosis and 80% after
present research contemplated the utilization of 10 years from diagnosis [1]. Breast cancer is the
information mining strategies to create prescient second leading cause of cancer death among women.
models for bosom (breast) malignant growth repeat It occurs as a result of abnormal growth of cells in
in patients who were followed-up for a long time. the breast tissue, commonly referred to as a Tumor.
Objective is to fabricate a model utilizing many A tumor does not mean cancer - tumors can be
machine learning algorithms to foresee whether benign (not cancerous), pre-malignant (pre-
bosom cell tissue is malignant (cancerous) or benign cancerous), or malignant (cancerous). Tests such as
(non-cancerous).We executed machine learning MRI, mammogram, ultrasound and biopsy are
techniques/algorithms, i.e., k-nearest algorithm, commonly used to diagnose breast cancer performed.
Logistic Regression, Decision Tree, Random Forest, Build model to predict whether breast cell tissue is
Gradient Boosting, Support Vector Machine (SVM), malignant or benign, we will construct a predictive
and Artificial Neural Network (ANN) to build up the model using SVM machine learning algorithm to
prescient models. The primary objective of this paper predict the diagnosis of a breast tumor. They studied
is to think about the execution of these outstanding 951 breast cancer patients and used tumor size,
calculations on our information through affectability, auxiliary nodal status, histological type, mitotic
explicitness, and precision to think about the count, nuclear pleomorphism, tubule formation,
execution of these outstanding calculations on our tumor necrosis, and age as input variables [7].
information through affectability, explicitness, and Pendharker patterns in breast cancer. In this study,
precision. Our analysis shows that accuracy of DT, they showed that data mining could be a valuable
RF, SVM and ANN are 0.937, 0.951, 0.965, and tool in identifying similarities (patterns) in breast
0.958 respectively. The SVM classification model cancer cases, which can be used for diagnosis,
predicts bosom malignancy repeat with least error prognosis, and treatment purposes [4]. These studies
rate and most astounding precision. The anticipated are some examples of researches that apply data
exactness of the DT demonstrate is the most minimal mining to medical fields for prediction of diseases.
of all. The results are achieved using 10-fold cross- 246,660 of women's new cases of invasive breast
validation for measuring the unbiased prediction cancer are expected to be diagnosed in the US during
accuracy of each model. 2016 and 40,450 of women‟s death is estimated2.
Breast cancer represents about 12% of all new cancer
Keywords- Classification; Logistic Regression‟s-
cases and 25% of all cancers in women[3]. Early
nearest algorithm; Decision tree; Random Forest;
prediction of breast cancer is one of the most crucial
Gradient Boosting; Support vector machine;

Volume 10, Issue 1 ∙January-June 2019 37


works in the follow-up process. Data mining models for breast cancer survival by analyzing a
methods can help to reduce the number of false large dataset, the SEER cancer incidence database
positive and false negative decisions [2,3].
[6]. Linden et al. used ANN and logistic
Consequently, new methods such as knowledge
regression models to predict 5, 10, and 15 -year
discovery in databases (KDD) has become a popular
breast cancer survival. They studied 951 breast
research tool for medical researchers who try to
cancer patients and used tumor size, auxiliary nodal
identify and exploit patterns and relationships among
status, histological type, mitotic count, nuclear
large number of variables, and predict the outcome
pleomorphism, tubule formation, tumor necrosis, and
of a disease using historical cases stored in datasets.
age as input variables
Machine learning is not new to cancer research.
Artificial neural networks (ANNs) and decision [7] Pendharker patterns in breast cancer. In
trees(DTs) have been used in cancer detection and this study, they showed that data mining could be a
diagnosis for nearly 20 years. Today machine valuable tool in identifying similarities (patterns) in
learning methods are being used in a wide range of breast cancer cases, which can be used for diagnosis,
applications ranging from detecting and classifying prognosis, and treatment purposes [4]. These studies
tumors via X-ray and CRT images to the are some examples of researches that apply data
classification of malignancies from proteomic and mining to medical fields for prediction of diseases.
genomic (microarray). According to the latest
III. MATERIAL AND METHODS
PubMed statistics, more than 1500 papers have been
published on the subject of machine learning and Machine Learning, a branch of Artificial Intelligence,
cancer. However, the vast majority of these papers relates the problem of learning from data samples to
are concerned with using machine learning methods the general concept of inference [5]. Every learning
to identify, classify, detect, or distinguish tumors and process consists of two phases: (i) estimation of
other malignancies. In other words machine learning unknown dependencies in a system from a given
has been used primarily as an aid to cancer diagnosis dataset and (ii) use of estimated dependencies to
and detection [4]. In this paper, using data mining predict new outputs of the system. It has also been
techniques, authors developed models to predict the proven an interesting area in biomedical research
recurrence of breast cancer by analyzing data with many applications, where an acceptable
collected from ICBC registry. The next sections of generalization is obtained by searching through an n-
this paper review related work, describe background dimensional space for a given set of biological
of this study, evaluate three classification models samples, using different techniques and algorithms
(DT, SVM, and ANN), explain the methodology [2]. There are two main common types of Machine
used to conduct the prediction, present experimental learning methods known as (i) supervised learning
results, and the last part of the paper is the and (ii) unsupervised learning. In supervised learning
conclusion. To estimate validation of the models, a labeled set of training data is used to estimate or
accuracy, sensitivity, and specificity were used as map the input data to the desired output. In contrast,
criteria, and were compared. In the present work only under the unsupervised learning methods no labeled
studies that employed ML techniques for modeling examples are provided and there is no notion of the
cancer diagnosis and prognosis are presented. output during the learning process. As a result, it is
up to the learning scheme/model to find patterns or
II. LITERATURE REVIEW discover the groups of the input data. In supervised
A literature review showed that there have been learning this procedure can be thought as a
several studies on the survival prediction problem classification problem. The task of classification
using statistical approaches and artificial neural refers to a learning process that categorizes the data
networks. However, we could only find a few studies into a set of finite classes. Two other common ML
related to medical diagnosis and recurrence using tasks are regression and clustering. In the case of
data mining approaches such as decision trees [5,6]. regression problems, a learning function maps the
Delen et al. used artificial neural networks, decision data into a real-value variable. Subsequently, for each
trees and logistic regression to develop prediction new sample the value of a predictive variable can be

38 IITM Journal of Management and IT


estimated, based on this process. Clustering is a partially relevant features can negatively impact
common unsupervised task in which one tries to find model performance. Feature selection and Data
the categories or clusters in order to describe the data cleaning should be the first and most important step
items. Based on this process each new sample can be of your model designing [1,2].In this post, you will
assigned to one of the identified clusters concerning discover feature selection techniques that you can use
the similar characteristics that they share. Suppose in Machine Learning. Feature Selection is the process
for example that we have collected medical records where you automatically or manually select those
relevant to breast cancer and we try to predict if a features which contribute most to your prediction
tumor is malignant or benign based on its size. The variable or output in which you are interested in.
ML question would be referred to the estimation of Having irrelevant features in your data can decrease
the probability that the tumor is malignant or no (1 = the accuracy of the models and make your model
Yes, 0=No). It depicts the classification process of a learn based on irrelevant features. There are many
tumor being malignant or not. The circled records benefits regarding the dimensionality reduction when
depict any misclassification of the type of a tumor the datasets have a large number of features. ML
produced by the procedure. Another type of ML algorithms work better when the dimensionality is
methods that have been widely applied is semi- lower [10]. Additionally, the reduction of
supervised learning, which is a combination of dimensionality can eliminate irrelevant features,
supervised and unsupervised learning. It combines reduce noise and can produce more robust learning
labeled and unlabeled data in order to construct an models due to the involvement of fewer features. In
accurate learning model. Usually, this type of general, the dimensionality reduction by selecting
learning is used when there are more unlabeled new features which are a subset of the old ones is
datasets than labeled. When applying a ML method, known as feature selection. Three main approaches
data samples constitute the basic components [4]. exist for feature selection namely embedded, filter
Every sample is described with several features and and wrapper approaches[8,9]. In the case of feature
every feature consists of different types of values. extraction, a new set of features can be created from
Furthermore, knowing in advance the specific type of the initial set that captures all the significant
data being used allows the right selection of tools and information in a dataset. The creation of new sets of
techniques that can be used for their analysis. Some features allows for gathering the described benefits of
data-related issues refer to the quality of the data and dimensionality reduction. However, the application
the preprocessing steps to make them more suitable of feature selection techniques may result in specific
for ML. Data quality issues include the presence of fluctuations concerning the creation of predictive
noise, outliers, missing or duplicate data and data that feature lists. Several studies in the literature discuss
is biased-unrepresentative. When improving the data the phenomenon of lack of agreement between the
quality, typically the quality of the resulting analysis predictive gene lists discovered by different groups,
is also improved. In addition, in order to make the the need of thousands of samples in order to achieve
raw data more suitable for further analysis, the desired outcomes, the lack of biological
preprocessing steps should be applied that focus on interpretation of predictive signatures and the
the modification of the data. A number of different dangers of information leak recorded in published
techniques and strategies exist, relevant to data studies. Fig 1 depicts the feature correlation/selection
preprocessing that focus on modifying the data for of diagnosis dataset.
better fitting in a specific ML method. Among these
techniques some of the most important approaches
include (i) dimensionality reduction (ii) feature
selection and (iii) feature extraction. Feature
Selection is one of the core concepts in machine
learning which hugely impacts the performance of
your model. The data features that you use to train
your machine learning models have a huge influence
on the performance you can achieve. Irrelevant or

Volume 10, Issue 1 ∙January-June 2019 39


the trained model. Validation set: The purpose of a
validation set is to tweak a model‟s hyper parameters
higher-level structural settings that can‟t be directly
learned from data. These settings can express, for
instance, how complex a model is and how fast it
finds patterns in data. The k-NN algorithm is
arguably the simplest machine learning algorithm. To
make a prediction for a new data point, the algorithm
finds the closest data points in the training dataset—
it‟s ―nearest neighbors.‖ Fig 3 depicts the training
Fig 1 Feature Correlation i.e, red dots correspond to dataset and test dataset accuracy when the
malignant diagnosis and blue to benign. n_neighbors value is 3.The figure shows the training
However, the application of feature selection and test set accuracy on the y-axis against the setting
techniques may result in specific fluctuations of n_neighbors on the x-axis. Considering a single
concerning the creation of predictive feature lists. nearest neighbor, the prediction on the training set is
Several studies in the literature discuss the perfect. But when more neighbors are considered, the
phenomenon of lack of agreement between the training accuracy drops, indicating that using the
predictive gene lists discovered by different groups, single nearest neighbor leads to a model that is too
the need of thousands of samples in order to achieve complex. This suggests that we should choose
the desired outcomes, the lack of biological n_neighbors=3.Then the Accuracy of K-NN
interpretation of predictive signatures and the classifier on training set and test set is 0.96 and 0.92
dangers of information leak recorded in published respectively. Logistic Regression is one of the most
studies. Fig 2 depicts count of diagnosis in which common linear classification algorithm is logistic
green color shows benign tumor and blue color regression. Logistic regression examines the
shows malignant tumor. relationship between a binary outcome (dependent)
variable such as presence or absence of disease and
predictor (explanatory or independent) variables such
as patient demographics or imaging findings. Fig 4
depicts the coefficient magnitude of features and the
accuracy level change when we set value of C i.e.,
C=1 provides quite good performance, with 96%
(0.955) accuracy on training and 0.94 accuracy on
test set. C=100 provides higher accuracy on both
training set (0.967) and test set (0.972). C=0.01
provides lower accuracy on the training set (0.948)
and much lower accuracy on the test set (0.895).

Fig 2 Diagnosis:- Benign(B)-357, Malignant(M)-219


A dataset used for machine learning should be
partitioned into three subsets — training, test, and
validation sets. Training set: A data scientist uses a
training set to train a model and define its optimal
parameters, a subset to train a model. Test set: A test
set is needed for an evaluation of the trained model
and its capability for generalization, a subset to test

40 IITM Journal of Management and IT


without hyper-parameter tuning, a great result most
of the time [2]. It builds multiple decision trees and
merges them together to get a more accurate and
stable prediction. Random forests, also known as
random decision forests, are a popular ensemble
method that can be used to build predictive models
for both classification and regression problems.
Ensemble methods use multiple learning models to
gain better predictive results — in the case of a
random forest, the model creates an entire forest of
random uncorrelated decision trees to arrive at the
best possible answer [1, 3]. Gradient boosting is a
machine learning technique for regression and
classification problems, which produces a prediction
model in the form of an ensemble of weak prediction
Fig 3 Training and test set accuracy
models, typically decision trees. Gradient Boosting
Classifier is used to find out accurate prediction. A
Support Vector Machine (SVM) is a discriminative
classifier formally defined by a separating hyper
plane in an N-dimensional space (N — the number of
features) that distinctly classifies the data points
either side. Support vector machine (SVM) is an
emerging powerful machine learning technique to
classify cases. SVM has been used in a range of
problems and they have already been successful in
pattern recognition in bioinformatics, cancer
diagnosis [6]. SVM is a maximum margin
classification algorithm rooted in statistical learning
theory. It is the method for classifying both linear
and non-linear data. It uses a non-linear mapping
technique to transform the original training data into
a higher dimension. It performs classification tasks
Fig 4 Coefficient Magnitude of features by maximizing the margin separating both classes
IV. MACHINE LEARNING TECHNIQUES while minimizing the classification errors[9].Neural
networks are a set of algorithms, modeled loosely
Artificial Neural Networks (ANNs), Bayesian after the human brain, that are designed to recognize
Networks (BNs), Support Vector Machines (SVMs) patterns. They interpret sensory data through a kind
and Decision Trees (DTs) have been widely applied of machine perception, labeling or clustering raw
in cancer research for the development of predictive input [5]. The patterns they recognize are numerical,
models, resulting in effective and accurate decision contained in vectors, into which all real-world data,
making [7]. These techniques have been utilized as are it images, sound, text or time series, and must be
an aim to model the progression and treatment of translated.
cancerous conditions. Decision trees are a non-
parametric supervised learning method used for V. DISCUSSION AND RESULTS
classification and regression. The goal is to create a This section presents the result of all the machine
model that predicts the value of a target variable by learning algorithms that we have used to do
learning simple decision rules inferred from the data prediction breast cancer risk. This paper has explored
features [5,6].Random Forest is a flexible, easy to use risk factors for predicting breast cancer by using
machine learning algorithm that produces, even machine learning techniques. Each technique has its

Volume 10, Issue 1 ∙January-June 2019 41


own limitations and strengths specific to the type of A. Random Forest
application. Our results show that SVM is the best Random Forest is a flexible, easy to use machine
predictor and indicator of breast cancer because it learning algorithm that produces, even without
gives the higher accuracy in predicting data. hyper-parameter tuning, a great result most of the
VI. DECISION TREE time [2]. It builds multiple decision trees and merges
them together to get a more accurate and stable
Decision trees are a non-parametric supervised
prediction. Random forests, also known as random
learning method used for classification and
decision forests, are a popular ensemble method that
regression. The goal is to create a model that predicts
can be used to build predictive models for both
the value of a target variable by learning simple
classification and regression problems. Ensemble
decision rules inferred from the data features [5, 6].
methods use multiple learning models to gain better
Decision Tree Classifier is a class capable of
predictive results — in the case of a random forest,
performing multi-class classification on a dataset. As
the model creates an entire forest of random
with other classifiers, it takes as input two arrays: an
uncorrelated decision trees to arrive at the best
array X, sparse or dense, of size [n_samples,
possible answer [1, 3]. Random forest classifier
n_features] holding the training samples, and an
creates a set of decision trees from randomly selected
array Y of integer values, size [n_samples], holding
subset of training set and the accuracy on training set
the class labels for the training samples. The
is 0.995 and accuracy on test set is 0.951. The
maximum depth of the tree „max_depth‟ is an
random forest gives us an accuracy of 95.8%, better
argument of Decision Tree Classifier. If the
than a single decision tree, without tuning any
max_depth of tree is none, then nodes are expanded
parameters. Similarly to the single decision tree, the
until all leaves are pure or until all leaves contain less
random forest also gives a lot of importance to the
than min_samples_split samples and then accuracy
―worst radius‖ feature, but it also chooses ―perimeter
on training set is 1.000 and accuracy on test set are
worst‖ to be the most informative feature overall. Fig
0.937. If the max_depth is set a value of 4 then the
6 illustrates similarly to the single decision tree, the
accuracy on training set is 0.986 and accuracy on test
random forest also gives a lot of importance to the
set is 0.937.Feature importance in decision trees it
―worst radius‖ feature, but it also chooses ―perimeter
rates how important each feature is for the decision a
worst‖ to be the most informative feature overall.
tree makes. It is a number between 0 and 1 for each
feature, where 0 means ―not used at all‖ and 1 means
―perfectly predicts the target.‖ Fig 5 illustrates
feature perimeter worst is by far the most important
feature. This confirms our observation in analyzing
the tree that the first level already separates the two
classes fairly well.

Fig 6 Feature importance of random forest


B. Gradient Boosting
Gradient boosting is a machine learning technique for
Fig 5 Feature importance of decision tree regression and classification problems, which
produces a prediction model in the form of an

42 IITM Journal of Management and IT


ensemble of weak prediction models, typically could either apply stronger pre-pruning by limiting
decision trees. Gradient Boosting Classifier is used to the maximum depth or lower the learning rate. When
find out accurate prediction. When „random state‟ „max_depth‟ argument of GB Classifier is one (1)
argument of GB Classifier is zero (0) then accuracy then accuracy on training set is 0.988 and accuracy
on training set is 1.000 and accuracy on test set is on test set is 0.937. When „learning rate‟ argument
0.944. As the training set accuracy is 100%, we are of GB Classifier is (0.01) then accuracy on training
likely to be over fitting. To reduce over fitting, we set is 0.984 and accuracy on test set is 0.930.Both
could either apply stronger pre-pruning by limiting methods of decreasing the model complexity reduced
the maximum depth or lower the learning rate. When the training set accuracy, as expected. In this case,
„max_depth‟ argument of GB Classifier is one (1) none of these methods increased the generalization
then accuracy on training set is 0.988 and accuracy performance of the test set. Fig 7 depicts that the
on test set is 0.937. When „learning rate‟ argument feature importances of the gradient boosted trees are
of GB Classifier is (0.01) then accuracy on training somewhat similar to the feature importances of the
set is 0.984 and accuracy on test set is 0.930.Both random forests, though the gradient boosting
methods of decreasing the model complexity reduced completely ignored some of the features. It chooses
the training set accuracy, as expected. In this case, ―perimeter worst‖ to be the most informative feature
none of these methods increased the generalization overall.
performance of the test set. Fig 7 depicts that the
feature importance‘s of the gradient boosted trees are
somewhat similar to the feature importance of the
random forests, though the gradient boosting
completely ignored some of the features. It chooses
―perimeter worst‖ to be the most informative feature
overall.

Fig 6 Feature importance of random forest

C. Gradient Boosting
Gradient boosting is a machine learning technique for Fig 7 Feature importance of Gradient Boosting
regression and classification problems, which
produces a prediction model in the form of an D. Support Vector Machine
ensemble of weak prediction models, typically A Support Vector Machine (SVM) is a
decision trees. Gradient Boosting Classifier is used to discriminative classifier formally defined by a
find out accurate prediction. When „random state‟ separating hyper plane in an N-dimensional space
argument of GB Classifier is zero (0) then accuracy (N — the number of features) that distinctly classifies
on training set is 1.000 and accuracy on test set is the data points either side. Support vector machine
0.944. As the training set accuracy is 100%, we are (SVM) is an emerging powerful machine learning
likely to be over fitting. To reduce over fitting, we technique to classify cases. SVM has been used in a

Volume 10, Issue 1 ∙January-June 2019 43


range of problems and they have already been Fig 8 shows the weights that were learned connecting
successful in pattern recognition in bioinformatics, the input to the first hidden layer. The rows in this fig
cancer diagnosis [6]. SVM is a maximum margin correspond to the 30 input features, while the
classification algorithm rooted in statistical learning columns correspond to the 100 hidden units. Light
theory. It is the method for classifying both linear colors represent large positive values, while dark
and non-linear data. It uses a non-linear mapping colors represent negative values. One possible
technique to transform the original training data into inference we can make is that features that have very
a higher dimension. It performs classification tasks small weights for all of the hidden units are ―less
by maximizing the margin separating both classes important‖ to the model. We can see that ―mean
while minimizing the classification errors [9]. By smoothness‖ and ―mean compactness,‖ in addition to
importing Support Vector Machine the accuracy on the features found between ―smoothness error‖ and
training set is 1.00 and accuracy on test set is 0.63 ―fractal dimension error,‖ have relatively low
the model over fits quite substantially, with a perfect weights compared to other features. This could mean
score on the training set and only 63% accuracy on that these are less important features or possibly that
the test set. SVM requires all the features to vary on we didn‘t represent them in a way that the neural
a similar scale. We will need to rescale our data that network could use.
all the features are approximately on the same scale.
Min Max Scaler is used to rescale the data so that all
features will vary on same scale. By importing Min
Max Scaler the accuracy on training set is 0.95 and
accuracy on test set is 0.94. Scaling the data made a
huge difference as training and test set performance
are quite similar but less close to 100% accuracy. C
and gamma are the parameters of Radial Basis
Function (RBF) kernel SVM. Now, we can try
increasing either C or gamma to fit a more complex
model. By increasing C=1000 in SVM the accuracy
on training set is 0.986 and accuracy on test set is
Fig 8 Weight Matrix
0.965 Here, increasing C allows us to improve the
model significantly, resulting in 96.5% test set VII. CONCLUSION
accuracy.
To analyze medical data, various data mining and
E. Neural Networks machine learning methods are available. An
Neural networks are a set of algorithms, modeled important challenge in data mining and machine
loosely after the human brain, that are designed to learning areas is to build accurate and
recognize patterns. computationally efficient classifiers for Medical
applications. In this study, we employed many
By importing MLP Classifier the accuracy on
algorithms: SVM, ANN, DT, RF, GB, and k-NN on
training set is 0.63 and accuracy on test set is
the Wisconsin Breast Cancer (original) datasets. We
0.63.Results are not good. Neural networks also
tried to compare efficiency and effectiveness of those
expect all input features to vary in a similar way, and
algorithms in terms of accuracy, precision, sensitivity
ideally to have a mean of 0, and a variance of 1. Now
and specificity to find the best classification accuracy
we need to scale our data by importing Min Max
of SVM reaches and accuracy of 97.13% and out
Scaler then the accuracy on training set is 0.962 and
performs, therefore, all other algorithms. In
accuracy on test set is 0.958.The results are much
conclusion, SVM has proven its efficiency in Breast
better after scaling, and already quite competitive.
Cancer prediction and diagnosis and achieves the
After scaling the data now we will again classify the
data with MLP Classifier. After scaling the data, best performance in terms of precision and low error
MLP Classifier is used then the accuracy on training rate. The results indicated that SVM are the best
set is 0.923 and accuracy on test set is 0.895. classifier predictor with the test dataset, followed by

44 IITM Journal of Management and IT


ANN and DT. Further studies should be conducted to REFERENCES
improve performance of these classification
[1] https://www.omicsonline.org/using-three-
techniques by using more variables and choosing for
a longer follow-up duration. machine-learning-techniques-for-
predictingbreast-cancer-2157
Breast cancer has created a terrible situation in
almost all over the world according to this study and 7420.1000124.php?aid=13087
discussion. It has been observed that the death rate is [2] https://www.sciencedirect.com/science/article/p
gradually coming down in some developed countries
ii/S2001037014000464https://www.sciencedire
like the UK and US because of the developed
technologies used in diagnosis and awareness. But in ct.com/science/article/pi i/S1877050916302575
developing countries like India the situation is not [3] https://www.ncbi.nlm.nih.gov/pmc/articles/PM
good and some effective steps should be taken in this
C2675494/
direction without any delay [3]. This study has been
made on methodologies by which the breast cancer [4] https://pubs.rsna.org/doi/pdf/10.1148/rg.301095
can be detected at early stages by using the breast 057
cancer data set. It is clear from this study that the [5] http://journal.waocp.org/article_31073_4ac28ea
Association Rule Mining, Classification, and
Clustering and Evolutionary Algorithms are good at 9398b1d19335b6f44a0e79afd.pdf
detection and classification of breast cancer data. [6] https://breast-cancer-
It is also observed that if the properties of the esearch.biomedcentral.com/track/pdf/10.1186/b
symptoms are identified correctly, the chances of
c r3110.
accurate detection will improve [8]. It is also
observed by the results of the previous methods that [7] https://s3.amazonaws.com/academia.edu.docum
the classification algorithm increases the ents/32919287/V3I1201402.pdf?AWSAccessK
possibility or improved detection accuracy. The
eyId=AKIAIWOWYYGZ2Y53UL3A&Expires
characteristics of breast cancer symptoms
are different, so the chances of good results by =1555299017&Signature=rVk6dRW4pUDsqP
using single algorithm are less. But by the use of v6J Xmm3zL%2B2Sg%3D&response-content-
combined algorithms at different levels will produce ition=inline%3B%20filename%3DV3I1201402
good results. So it is concluded that the
framework based on data mining and evolutionary . Pdf
algorithms can be a milestone in case of breast [8] https://pdfs.semanticscholar.org/7bf7/3b15b7fd
cancer detection [10]. 64c2b01a718a2848b4a3d35b939.pdfhttp://canc
erpreventionresearch.aacrjournals.org/content/c
anprevres/9/1/13.full.pdf

Volume 10, Issue 1 ∙January-June 2019 45


CYBER: Threats in Social Networking Websites and
Physical System Security
Tripti Lamba1, Ashish Garg2
1
Associate Professor, Institute of Information Technology, Janakpuri, New Delhi.
2
Research Scholar, Institute of Information Technology, Janakpuri, New Delhi
triptigautam@yahoo.co.in, ashishgarg518123@gmail.com
Abstract- A social network may be a social system
made up of people or organizations referred to as
nodes, that are connected by one or additional
specific kind of reciprocity, like friendly relationship,
common interest, and exchange of finance,
relationships of beliefs, information or status. A
cyber threat will be each unintentional and
intentional, targeted or non- targeted, and it will
come back from a spread of sources, as well as
foreign nations engaged in undercover work and
knowledge warfare, criminals, hackers, virus writers,
discontent staff and contractors operating inside a
company. Social networking sites don't seem to be Fig.1. General Representation of a CPS
solely speaking or act with people globally, however
Nowadays, innumerable net users frequently visit
conjointly one effective means for business
thousands of social websites to stay linked with their
promotion. In this paper, Tendency to investigate and
friends, share their thoughts, photos, videos and
study the cyber threats in social networking websites.
discuss even regarding their daily-life. In 2003,
the aim of this paper is to review and analyze these
MySpace was launched and within the following
threats of social network and develop measures to
years, several different social networking sites were
shield the identity in cyberspace i.e., security of non-
launched like Facebook in 2004, Twitter in 2006 etc.
public data and identity in social networks are
studied.
Keywords- Cyber threats, Protection, Crime,
Malware, Hackers, Attacks, Breaches, Security.
I. INTRODUCTION
The Term cyber–physical systems (CPSs) emerged
simply over a decade ago as an endeavor to unify the
emerging application of embedded PC and
communication technologies to a range of physical
domains, including aerospace, automotive, chemical
production, civil infrastructure, energy, healthcare,
producing, materials, and transportation. The goal of
Fig2.- Total number of users with respect to different
the CPS program is to reveal crosscutting basic
social platforms
scientific and engineering principles that underpin
the combination of cyber and physical components There are such a big amount of social networking
across all application sectors. sites and social media sites that there's even
computer programs and search engines for them.
These social websites have had positive and negative
impacts.

46 IITM Journal of Management and IT


INTERNET SECURITY THREAT REPORT
2019:
Threat Report takes a deep dive into insights from
the world‘s largest civilian world intelligence
network, revealing:
• Form jacking attacks skyrocketed, with a
mean of 4,800 websites compromised every month.
• Ransom ware shifted targets from shoppers
to enterprises, wherever infections rose 12 %. Fig 3. The number of malicious programs targeting
• More than 70 million records taken from popular social networking sites
poorly organized S3 buckets, a casualty of fast cloud Figure-2, shows the Total number of users with
adoption. respect to different social platforms and shows the
• Supply chains remained a soft target with number of users who actives on social networks.
attacks flight by 78 %. Another side Figure-3, shows the number of
• Smart Speaker, get ME a cyber-attack‖ - IoT malicious programs targeting popular social
was a key entry purpose for targeted attacks; most networking sites The Internet today, unfortunately,
IoT devices square measure vulnerable. offers to the cyber criminals, many chances to hack
This analysis is informed by 123 million sensors accounts on social network sites and the number of
recording thousands of threat events each second malicious programs that target the social web sites is
from 157 countries and territories. very huge.

Due the actual fact that the quantity of social II. LITERATURE REVIEW
network users is increasing day by day, the number The popularity of the term social networking internet
of attacks disbursed by hackers to steal personal data sites has been hyperbolic, since 1997, and numerous
is additionally raised. Hacked data will be used for individuals currently square measure victimization
several functions like causing unauthorized messages social networking internet sites to speak with their
(spam), stealing cash from victim's accounts, etc. friends, perform business and lots of different usages
Section-1 gives the brief introduction about the need per the interest of the users.
of cyber security and Threat protection. Literature
The interest of social networking internet sites has
Review has been discussed in section-II. Section-III
describes the Applications of Cyber Security. cyber been hyperbolic and lots of analysis papers are
threats in social networking websites are discussed in revealed. A number of them mentioned the
protection problems with social networking,
section-IV. Anti-Threat strategies and various ways
analyzing the privacy and therefore the risks that
can be suggested for circumventing threats related to
threat the web social networking internet sites.
social website are discussed in Section-V. Risk
Assessment Methodology are discussed in section- The article [7] identifies the protection behavior and
VI. Section-VII describes the various cyber security attitudes for social network users from completing
Threats and Trends. The 5-Laws of cyber security different human ecology teams and assess, however
are describes in Section-VIII. Section-IX describes these behaviors map against privacy vulnerabilities
the Reasons cyber security is more important than inherent in social networking applications.
ever and Section-X gives the conclusion of the paper.
In the article [8], the scientific highlights the
industrial and social edges of safe and well wise use
of social networking internet sites.
It emphasizes the foremost vital threats of the users
and illustrates the basic factors behind those threats.
Moreover, it presents the policy and technical
recommendations to enhance privacy and security

Volume 10, Issue 1 ∙January-June 2019 47


while not compromising the advantages of the
knowledge sharing through social networking
internet sites.
In the article [11], addresses security problems,
network and security managers, which regularly
address network policy management services like
firewall, intrusion, intromission system, antivirus and
knowledge lose. It addresses security, framework to
safeguard corporate info against the threats
associated with social networking internet sites.
Also, several other scientific research papers are Fig 4-. Threats percentage-pose on social networks
revealed, wherever the new technology and methods
were mentioned associated with the privacy and
security problems with social networking websites.
III. APPLICATIONS OF CYBER
SECURITY
• Filtered Communication – Include a firewall,
anti-virus, anti-spam, wireless security, and
online content filtration.
• Protection – Cybersecurity solutions provide
digital protection to your data that will ensure
your employees aren‘t at risk from potential
threats.
Fig. 5- Phishing and Trojan Attacks on different
• Increased Productivity – Viruses can slow down softwares
computers to a crawl, and making work
V. ANTI-THREAT STRATEGIES
practically impossible. Effective cybersecurity
eliminates this possibility, maximizing the This section describes the different types of cyber
potential output. threats in social networks and the possible
contributing factors are also listed below:
• Denies Spyware – Spyware is a kind of cyber
contamination which is intended to behold on • Most of the users aren't concerned with the
your computer operations and deliver that data importance of the private info revelation and
back to the cyber-criminal. therefore they're underneath the danger of over
revelation and privacy invasions.
IV. CYBER THREATS IN SOCIAL
NETWORKING WEBSITES • Users, who are aware of the threats,
unfortunately choose the inappropriate privacy
Lately, social networks attract thousands of users
setting and manage privacy preference properly.
who represent potential victims to attackers from the
following type is shown in figure-4. First Phishers • The policy and legislation aren't equipped
and spammers who use social networks for sending enough to influence every kind of social network
fraudulent messages to victims‘ ―friend‖, threat that are increasing day by day with
Cybercriminals and fraudsters who use the social additional challenges, fashionable and
networks for capturing user‘s data, then carrying out complicated technologies.
their social-engineering attacks and Terrorist groups
• Lack of tools and acceptable authentication
and sexual predators who create online communities
mechanism to handle and influence completely
for spreading their thoughts, propaganda, views and
different security and privacy problems.
conducting recruitment.

48 IITM Journal of Management and IT


• Because of the mentioned factors that cause and physical infrastructure can assist within the risk
threats, following ways can be suggested for review and mitigation processes. This paper presents
circumventing threats related to social a rough assessment methodology parenthetically the
website dependency between the ability applications and
supporting infrastructure.
(a) Building awareness, the information
disclosure: Risk is traditionally defined as the impact times the
likelihood of an event. Likely should be addressed
Users most beware and extremely aware concerning
through the infrastructure vulnerability analysis step
the revealing of their personal information in profiles
that addresses the supporting infrastructure‘s ability
on social websites.
to limit attacker‘s access to the important
(b) Encouraging awareness -raising the academic management functions. Once potential vulnerabilities
campaigns: are discovered, the applying impact analysis ought to
Governments got to give and provide educational be performed to see accomplished grid management
categories regarding awareness -raising and security functions. This data ought to then be wanting to
problems. judge the physical system impact.

(c) Modifying the present legislation: A. Risk Analysis

Existing legislation has to be changed associated The initial step within the risk analysis method is that
with the new technology and new frauds and attacks. the infrastructure vulnerability analysis. Numerous
difficulties are encountered once crucial cyber
(d) Empowering the authentication: vulnerabilities among system environments thanks to
Access management and authentication should be the high accessibility needs and dependencies on
terribly sturdy in order that cyber crimes done by inheritance systems and protocols.
hackers, spammers and alternative cyber criminals A comprehensive vulnerability analysis ought to
might be reduce the maximum amount as doable. begin with the identification of cyber assets as well
(e) Mistreatment the foremost powerful antivirus as code, hardware, and communications protocols.
tools: Then, activities like penetration testing and
vulnerability scanning is utilized to see potential
Users should use the foremost powerful antivirus security considerations among the atmosphere. in
tools with regular updates and should keep the addition, continuing analysis of security advisories
suitable default setting, in order that the antivirus from vendors, system logs, and deployed intrusion
tools might work additional effectively. detection systems ought to be utilized to see extra
(f) Providing appropriate security tools: system vulnerabilities.

Here, we have a tendency to provide B. Risk Mitigation


recommendation to the protection software system Mitigation activities should attempt to minimize
suppliers and is that: they need to offer some special unacceptable risk levels. This may be performed
tools for users that alter them to get rid of their through the readying of a lot of strong supporting
accounts and to manage and management the various infrastructure or power applications.
privacy and security problems.
Understanding opportunities to concentrate on
VI. RISK ASSESSMENT METHODOLOGY specific or mix approaches might gift novel
The quality of the cyber–physical relationship can mitigation ways. Varied analysis efforts have self-
present unintuitive system dependencies. Acting addressed the cyber– physical relationship among the
correct risk assessments needs the event of models danger assessment method.
that offer a basis for dependency analysis and VII. CYBER SECURITY THREATS AND
quantifying ensuing impacts. This association TRENDS
between the salient options among each the cyber
Phishing Gets a lot of subtle — Phishing attacks,

Volume 10, Issue 1 ∙January-June 2019 49


during which rigorously targeted digital messages transportation systems, water treatment facilities,
square measure transmitted to fool folks into clicking etc., represent a significant vulnerability going
on a link that may then install malware or expose forward.
sensitive information, are getting a lot of subtle.
State-Sponsored Attacks — On the far side hackers
Now that workers at the most organizations square
trying to create a profit through stealing individual
measures a lot of alert to the risks of email phishing
and company information, entire nation states square
or by clicking on suspicious-looking links, hackers
measure currently victimization, their cyber skills to
square measure upping the ante — for instance,
infiltrate different governments and perform attacks
victimization machine learning to rather more
on important infrastructure. Cybercrime nowadays
quickly craft and distribute convincing pretend
could be a major threat not only for the non-public
messages within the hopes that recipients can
sector and for people except for the govt and also the
inadvertently compromise their organization‘s
nation as a full. As we tend to come in 2019, state-
networks and systems. Such attacks change hackers
sponsored attacks square measure expected to
to steal user logins, Mastercard credentials and
extend, with attacks on important infrastructure of
different forms of personal money data, moreover as
specific concern.
gain access to non-public databases.
IoT Attacks — The net of Things is changing into a
Ransom ware methods Evolve — Ransom ware
lot of presents by the day (the variety of devices
attacks square measure believed to price victims
connected to the IoT is anticipated to achieve nearly
billions of greenbacks once a year, as hackers deploy
31 billion by 2020). It includes laptops and tablets, of
technologies that change them to virtually enable a
course, however additionally routers, webcams,
person or organization‘s database and hold all of the
house appliances, good watches, medical devices,
data for ransom. The increase of cryptocurrencies
producing instrumentation, vehicles and even home
like Bitcoin is attributable to serving to fuel
security systems. Connected devices square measure
ransomware attacks by permitting ransom demands
handy for shoppers and lots of firms currently use
to be paid anonymously.
them to save lots of cash by gathering huge amounts
As firms still concentrate on building stronger of perceptual information and streamlining business
defenses to protect against ransomware breaches, processes. However, a lot of connected devices
some consultants believe hackers can progressively suggest that bigger risk, creating IoT networks a lot
target different, doubtless profitable ransomware of at risk of cyber invasions and infections. Once
victims like high-net-worth people. controlled by hackers, IoT devices will be wanting to
produce disturbance, overload networks or lock
Crypto jacking — The cryptocurrency movement
down essential instrumentation for gain.Good
additionally affects cyber security in different ways
Medical Devices and Electronic Medical Records
that. For instance, crypto jacking could be a trend
(EMRs) — The health care business continues to be
that involves cyber criminals hijacking third-party
inquiring a significant evolution as most patient
home or work computers to ―mine‖ for
medical records have currently affected on-line, and
cryptocurrency. As a result of mining for
medical professionals notice the advantages of
cryptocurrency (like Bitcoin, for example) needs
advancements in good medical devices. However,
huge amounts of laptop process power, hackers will
because the health care business adapts to the digital
build cash by on the QT piggybacking on somebody
age, there square measure variety of issues around
else‘s systems. For businesses, crypto jacked systems
privacy, safety and cyber security threats. Third
will cause serious performance problems and tear
Parties (Vendors, Contractors, Partners)
down time because it works to trace down and
resolve the difficulty. Cyber-Physical Attacks — A Third parties like vendors and contractors)-
similar technology that has enabled the United States create an enormous risk to companies, the bulk of
of America to modernize and computerize important that doesn't have any secure system or dedicated
infrastructure additionally brings risk. The continued team in situ to manage these third-party workers. As
threat of hacks targeting electrical grids, cyber criminals become increasingly sophisticated

50 IITM Journal of Management and IT


and cyber security threats still rise, organizations are This trend is anticipated to continue into 2019 and on
getting a lot of and a lot of alert to the chance third the far side, with some estimates indicating that there
parties create. Many years ago, Wendy‘s fell victim are some 1 million empty positions worldwide
to an information breach that affected a minimum of (potentially rising to 3.5 million by 2021).
one,000 of the fast-food chain‘s locations and was
VIII. THE 5 LAWS OF CYBER SECURITY
caused by a third-party merchant that had been
hacked. It's time to determine a universal language and
understanding of these foundational facts that govern
Connected Cars and Semi-Autonomous Vehicles
our data-security levels.
— whereas the driverless automobile is shut,
however, not nonetheless here, A connected So, while not additional ruction, here are 5 laws of
automobile utilizes a board sensors to optimize its cyber security, and whereas there may simply be a
own operation and also the comfort of passengers. lot of, these 5 can forever be the immutable universal
This can be usually done through embedded, bound constants that govern this subject and our existence
or smartphone integration. As technology evolves, in relevance to it.
the connected automobile is changing into a lot of Law No. 1: If there's A Vulnerability, it'll be
and a lot of prevalent; by 2020, associate degree Exploited
calculable 90% of recent cars are going to be ―Consider, for a flash that once the primary bank
connected to the net. was formed and designed, there was a minimum of
For hackers, this evolution in automobile producing one person out there UN agency wished to rob it.‖
and style suggests that yet one more chance to use within the a lot of epoch, since the first ―bug‖ was
vulnerabilities in insecure systems and steal sensitive found in a system, we‘ve been trying to find ways
information and/or hurt drivers. Additionally, to that bypass the framework or laws that govern a
safety issues, connected cars create serious privacy trojan horse, a tool or perhaps our society. Think
issues. about that there are those in our society who will
attempt to hack everything at intervals their
capability. This might be obvious with a lot of basic
exploits, just like the one that discovered the way to
impede their car‘s vehicle plate to travel through a
stall for complimentary, or the a lot of obscure, like
infecting a fancy ADPS to derail a remarkable
nuclear weapons program. Finding ways that around
everything for each sensible and dangerous function,
thus present nowadays that we have a tendency to
even have a term for it: ―Life Hacking.‖
Law No. 2: Everything is Vulnerable in a way
We cannot assume that something is off the table and
utterly safe anymore. State-sponsored hacking is a
superb example of this. Government intelligence has
As manufacturers rush to market with high-tech been astonishing over the years in gaining access to
automobiles, 2019 will likely see an increase in not AN opponent‘s systems after they were thought to be
only the number of connected cars, but in the number secure. Publicly, we‘ve seen a series huge
and severity of system vulnerabilities detected. information breach over the years from companies
that pay millions annually on cyber defense methods.
A Severe Shortage of Cyber Security
Professionals — The cyber-crime epidemic has Law No. 3: Humans Trust Even after they should
escalated speedily in recent years, whereas firms and not
governments have struggled to rent enough qualified
Trust, quite honestly, sucks. Yes, it‘s an essential
professionals to safeguard against the growing threat.

Volume 10, Issue 1 ∙January-June 2019 51


part of the human expertise. A tendency to trust our understanding that attribute makes these laws
vital others, trust by virtue fails in, no matter immutable. Once we begin thinking sort of a hacker
religion, and tendency to adhere to and conjointly is once we will truly stop them, thus here‘s to
trust within the infrastructure around us. An hacking the long term along for our own security.
expectation that the switch can ON the light or that
IX. REASONS CYBER SECURITY IS
the mechanic we have a tendency to pay to perform
MORE IMPORTANT THAN EVER
the automotive in our car can truly know. We cannot
have a functioning society while not a way of trust, The threat of crime to businesses is rising quick. in
and this is often why it‘s our greatest weakness in step with one estimate, the damages related to crime
cyber security. People fall for phishing scams, currently stands at over $400 billion, up from $250
assume that the computer program they bought for billion 2 years past, with the prices incurred by
$20 can flip their PC into Fort Knox (it won‘t) or United Kingdom of Great Britain and Northern
believe the shape they‘re filling out is legit (it Ireland business conjointly running within the
typically isn't). billions. in a very bid to foreclose e-criminals,
organizations are more and more finance in ramping
It sounds weird to mention we'd like to combat
up their digital frontiers and security protocols,
thrust; however, we have a tendency to do if we‘re
however, several are still deferred by the prices, or
planning to survive against the nonstop hacking that
by the unclear vary of tools and services on the
takes place.
market. 5 reasons why finance in cyber security
Law No. 4: With Innovation Comes Opportunity could be a smart call to form.
for Exploitation
The rising value of breaches
The world is full with good people. computer
The fact is that cyber attacks are extraordinarily dear
scientist created a world computing platform to
for businesses to endure. Recent statistics have
induce humanity on a similar page. However, with
instructed that the common value of an information
every innovation and evolution in our technology
breach at a bigger firm is £20,000. However, this
comes sure exploits. we have a tendency to sleep in
truly underestimates the important expense of
the age of IoT, and by virtue of this, our lives have,
associate degree attack against a corporation. It's not
hopefully, been created higher. one in all the primary
simply the money injury suffered by the business or
huge samples of this is often the Ring button. It
the price of remediation; an information breach can
created, adding a video camera to your front button
even communicate much reputational injury.
simple and extremely simple to watch through a
Suffering a cyber attack will cause customers to lose
mobile app. Life was sensible with the clearly
trust in a very business and pay their cash elsewhere.
innovative Ring device -- till a security vulnerability
In addition, having a name for poor security can even
was discovered. the corporate has since mounted that
cause a failure to win new contracts.
exploit, however as is usually the case, we have a
tendency to are awaiting consequent vulnerability to Increasingly Sophisticated Hackers
be discovered. And naturally, it‘s created even worse Almost each business features a website and
by Law No. 3. outwardly exposed systems that might offer
Law No. 5: Once Unsure criminals with entry points into internal networks.
Hackers have a great deal to realize from roaring
This one isn‘t a cop-out. Each single law written here
knowledge breaches, and there are unnumbered
comes right down to the easy incontrovertible fact
samples of well-funded and coordinated cyber-
that despite what the issues or concerns are with
attacks against a number of the most important
relevance cybersecurity, all of them stem from a
corporations within the United Kingdom of Great
vulnerability of some kind. If we have a tendency to
Britain and Northern Ireland. Ironically, even
ever forget this, we have a tendency to do nothing
Deloitte, the globe‘s largest cybersecurity adviser,
however posing for the bother.
was itself rocked by the associate degree attack in
Our ability to properly defend ourselves comes from October last year.With extremely refined attacks

52 IITM Journal of Management and IT


currently commonplace, businesses ought to assume worsen with time. To use of IoT devices doubtless
that they'll be broken at some purpose and implement introducing a good variety of security weaknesses,
controls that facilitate them to sight and reply to it's wise conduct regular vulnerability assessments to
malicious activity before it causes injury and assist determine and address risks conferred by these
disruption. assets.
Tighter Regulations
It is not simply criminal attacks that mean businesses
got to be additional endowed in cyber security than
ever before. The introduction of laws like the GDPR
means organizations got to take security additional
seriously than ever, or face serious fines.
The GDPR has been introduced by the EU to force
organizations into to taking higher care of the non-
public information they hold. Among the wants of
the GDPR is that the want for organizations to
implement applicable technical and organizational
measures to shield personal information, often
Fig-6. Interconnection of different sectors review controls, and find, investigate and report
Above image shows the different-different sectors breaches.
are interconnected to each other and people consume X. CONCLUSION
and use various things through online services and
they registered and login itself, that‘s why hackers Social networking community‘s area unit associate
easily hack all the information about the people. inherently a part of today‘s net. People love
Widely accessible hacking tools victimising them to remain in touch with friends,
exchange photos, or simply to pass the time once
While well-funded and extremely masterly hackers bored. Firms have conjointly discovered social media
create a big risk to your business, the wide avails of as a brand-new approach of targeting their customers
hacking tools and programs on the net additionally with relevant info. With user teams with many
means that there's additionally a growing threat from countless members, there is a unit forever some
less masterly people. The exploitation of law- black sheep with malicious intent. We've got seen
breaking has created it simple for anyone to get the several worms unfold through social networks. In
resources they have to launch damaging attacks, like most cases, they need to use social engineering tricks
ransomware and crypto mining. to post attractive messages on behalf of the associate
A proliferation of IoT devices infected user. Curious friends Who follow the link
also will get infected with malware and unwillingly
More good devices than ever are connected to the
unfold the message further. Many people can click
net. These are referred to as net of Things, or IoT,
on nearly any link that they see announce and add
devices and are progressively common in homes and
anybody to their personal network that asks, while
offices. On the surface, these devices will alter and
not knowing Who extremely is behind it. This
speed up tasks, in addition, as supply larger levels of
inherent trust, particularly in messages returning
management and accessibility. Their proliferation,
from friends that have had their account
however, presents a retardant.
compromised, makes it simple for attacks to succeed,
If not managed properly, every IoT device that's regardless if it's a phishing attack, a spam run, or a
connected to the net may give cyber criminals with malicious worm spreading through machine-
some way into a business. IT services large Cisco controlled scripts.
estimates there'll be 27.1 billion connected devices
Some of the newer attacks area unit terribly refined
globally by 2021 – thus this drawback can solely
associated area unit typically arduous to identify for

Volume 10, Issue 1 ∙January-June 2019 53


a primitive eye. Use comprehensive security https://csjournals.com/IJCSC/PDF9-
software package to safeguard against these threats. 1/28.%20Divya.pdf. Accessed 3 Aug 2019.
[3] Admin. ―Hacking Social Media. Threats &
You should ne'er share your PIN with others. This
Vulnerabilities- ' Threats & Anti-Threats
includes services that promise to assist you get more
Strategies for Social Networking Websites'.‖
friends or one thing similar. Don't lose manage of
Hakin9, 2 Sept. 2014, hakin9.org/hacking-
your PIN. If you enter your PIN, make sure that
social-media-threats-vulnerabilities--threats-
you're on the original website and not a phishing
anti-threats-strategies-for-social-networking-
scam page that simply sounds like the initial site.
websites/.Espinosa,nick.2018.
Must you suspect that you just have fallen for a
[4] http://www.forbes.com/sitesforbestechcouncil
phishing attack and your account has been
/2018/01/19/the-five-laws-of-
compromised, use a clean system to log into the
cybersecurity/#17c9f4a82265 2019.
initial service and alter your PIN.
[5] http://resource.elq.symantec.com/LP=6
REFERENCES 821?cid=70138000001Qv0FAAS,Vol. 24. ,
2019.
[1] Giraldo, Jairo, et al. ―Security and Privacy in
[6] https://www.quora.com/what-is-cyber-
Cyber-Physical Systems: A Survey of
security-why-is-it-impoetant‖Computer
Surveys.‖IEEE Design & Test, vol. 34, no. 4,
Security", En.Wikipedia.Org, 2019,
2017, pp. 7–17.,
[7] https://en.wikipedia.org/wiki/Computer_securi
doi:10.1109/mdat.2017.2709310.
ty.,"What Is Cyber Security? Definition, Best
[2] Shree, Divya. "Cyber Attack". Social
Practices & More". Digital Guardian, 2019,.
Networking Websites, vol 9, no. 1, 2017, p. 6.,
[8] https://digitalguardian.com/blog/what-cyber-
security.

54 IITM Journal of Management and IT


Comparative Analysis of Different Encryption Techniques
in Mobile Ad-Hoc Networks (MANETs)
1 2
Apoorva Sharma , Gitika Kushwaha
1,2
Research Scholar, Institute of Information Technology & Management, New Delhi, India
apoorva.sharma098@gmail.com, gitika1512@gmail.com
Abstract - This paper is in depth analysis of Data among the transmission vary will acquire the
Encryption Standard (DES), Triple DES information. a way to stay data confidential is to also
(3DES) and Advanced Encryption Standard (AES) be a threat to confidentiality if the scientific
even secret writing algorithms in painter was done discipline keys aren't encrypted and hold on within
victimization the Network Simulator 2(NS-2) in the node [2].
terms of energy consumption, information transfer
Another challenge once it involves Edouard Manet
time, End-to-End delay time and out turn with varied
security is that the key management issue. so as to
information sizes. 2 simulation models were
stop the malicious nodes from connection within the
adopted: the primary simulates the network
networks, it is necessary to evidence the nodes once
performance assumptive the supply of the common
they square measure connection in Because of the
key, and also the second simulates the network
restricted energy and machine capability of
performance as well as the employment of the Diffie-
MANETs, it is necessary to style a light-weight
Hellman Key Exchange (DHKE) protocol within the
weight and storage economical key management
key management part. The obtained simulation
theme [3] [4].
results showed the prevalence of AES over DES by
sixty fifth, seventieth and eighty three in term of the Numerous security solutions, key management and
energy consumption, information transfer time, and scientific discipline techniques are designed to
network out turn severally. On the opposite hand, the support Manet, a number of them square measure
results showed that AES is healthier than 3DES by custom-made to suit the network necessities
around ninetieth for all of the performance metrics. (minimum delay, minimum power
Supported these results the AES was the suggested Consumption and most throughput) whereas others
secret writing theme square measure famed to be computationally stern.
Keywords - MANET, AES, DES, Key management. They consume a substantial quantity of computing
resources like information measure and power
I. INTRODUCTION
[5].
In recent years, MANETs emerged as a serious next
generation wireless networking technology. There is not any enough data regarding the potency
However, the safety problems on Manet became one of incorporating totally different coding techniques
amongst the first issues. MANETs square measure in unplanned networks.
liable to attacks over wired networks. As a result, This study was done to investigate DES, 3DES and
attacks with malicious goals can invariably devise to AES coding techniques potency and suitableness for
use these vulnerabilities and to disrupt the Edouard MANETs.
Manet operation. The matter display by potential
breaching of the systems by passive observations and Table 1 shows a comparison between these coding
masquerading is more sophisticated by the variable techniques in line with [6].
nature of the wireless atmosphere [1].
Security is provided through security services like
confidentiality. The goal of confidentiality is to
manage or prohibit access to sensitive data to the
sole approved people. Edouard Manet uses associate
degree open medium, therefore sometimes all nodes

Volume 10, Issue 1 ∙January-June 2019 55


DH algorithmic rule was the primary revealed public thanks to one bit variation in plain text keeping the
key algorithmic rule by Diffie, and is generally key constant, avalanche result thanks to one bit
spoken as DHKE. Several industrial products use this variation in key keeping the plain text constant,
key exchange technique [7]. The purpose of the memory needed for implementation and simulation
algorithmic rule is to permit 2 users to firmly time needed for coding. The authors finished that the
exchange a key that may then be used for encryption. DES coding algorithmic rule incorporates a
The algorithmic rule itself is proscribed to the disadvantage in term of high memory demand.
exchange of secret values. The DH algorithmic rule Moreover, in AES the avalanche result is
depends for its potency on the problem of computing incredibly high in order that AES is good for
separate logarithms. DHKE algorithmic rule general encrypting messages sent between objects via
steps square measure shown in Figure 1. unsecured channels, and is helpful for objects that
square measure a part of financial transactions,
and gave a future direction to incorporate
experiments on alternative sorts of information like
pictures.
Umaparvathi and Varughese in [9] bestowed a
comparison of the foremost unremarkably used
symmetrical coding algorithms AES (Rijndael),
DES, 3DES and Blowfish in terms of power
consumption. A comparison had been conducted for
those coding algorithms victimization completely
different information sorts like text, image, audio and
video. The assorted coding algorithms had been
Figure 1. Diffie-Hellman Key Exchange Algorithm enforced in Java. Within the experiments, the
General Steps software package encrypts completely different file
formats with file sizes (4MB - 11MB). The
The rest of this paper is organized as follows; section
performance metrics like coding time, decipherment
a pair of demonstrates the connected add the sphere
time and outturn had been collected. The bestowed
of our study. Section three describes the
simulation results showed that AES incorporates a
implementation procedure of the cryptographic
higher performance than alternative common coding
schemes in NS-2. Section four contains experimental
algorithms used. Since AES had not showed any
results. Finally, this paper is complete in section five.
celebrated security weak points within the bestowed
II. RELATED WORK study, this makes it a wonderful candidate. 3DES
MANET security problems square measure quite showed poor performance results sinceit needs
common topic. we are going to survey some analysis additional process power. Since the battery power is
efforts during this topic. one among the most important limitations in Manet
nodes, the AES coding algorithmic rule is that the
Some researchers targeted on the analysis of the
most suitable option.
performance of various coding schemes, others
targeted on the key management and distribution Sahu and Kushwaha in [10] enforced symmetrical
problems that precede the particular encoding. key coding algorithms DES, AES and Blowfish
victimization NS-2 network machine to check their
Mandal, et al.[8] projected a study that investigated
performance with completely different information
the 2 most generally used symmetrical coding
sorts like text and image supported some
techniques DES and AES. The coding schemes had
performance metrics. within the experiments, the
been enforced victimization MATrix
algorithms write a special file sorts like text, image
LABoratory (MATLAB) software package. once the and video sizes (0.3KB - 1KB). The performance
implementation, these techniques were compared on metrics like coding and decipherment time, battery
some points, were theses points avalanched the result consumption, residual battery and outturn had been

56 IITM Journal of Management and IT


recorded for every file sort. produce authentication at the network layer. They
planned a completely distributed certification
The projected symmetrical key coding algorithms
authority (CA) for Optimized Link State Routing
were enforced victimization NS-2 (v-2.34) with
(OLSR) primarily based impromptu networks. The
completely different packet size, the gettable
initial assumption was that the network contains
simulation results showed that AES is easy and
predefined special nodes known as shareholders.
higher in term of residual battery and coding time
Shareholders will generate partial signatures. A node
than alternative enforced algorithms. Blowfish had
connection the network, will get a certificate given
higher performance in term of outturn, however it
that it receives a minimum of k partial signatures
consumes additional battery power compared with
type k completely different shareholders ,a investor
the opposite enforced algorithms.
providing service will be known from the
Norouzi, et al. [11] targeted on the improvement of broadcasted how-do-you-do messages.
security performance associate degree exceedingly
On the opposite hand, completely different
|in a wireless impromptu with an coding formula and
cryptography schemes were enforced and analyzed
transmission rate that planned. Simulation had been
within the study; RC4, 3DES, AES-128, AES-256,
done victimisation MATLAB the input was text
Salsa20-128 and Salsa20-256 and also the time
files with minimum size of fifty bytes and most size
needed to inscribe completely different sizes of
used is three hundred bytes,then these information
knowledge were adopted as a performance metric.
transmitted victimization 2 modes; with coding and
The results showed that for RC4, 3DES, AES-128,
while not coding. For the primary mode, the
AES-256, Salsa20-128 and Salsa20-256 took but
information transmitted while not victimisation any
one500 ms to inscribe the 1
coding. meantime for the second technique
information transmitted with 3 coding algorithms; MB computer file. 3DES consumes the biggest
DES, AES and Blowfish. These algorithms were coding time followed by Salsa20-256, Salsa20-128,
chosen as a result of they were usually employed in AES-256, AES-128 and RC4 severally.
previous researches. throughout the conducted
Sandhiya, et al. [13] planned associate degree
experiments only 1 key was accustomed inscribe and
intrusion detection system named Enhanced
decipher information, that is that the largest size key
Adaptive ACKnowledgment (EAACK) that consists
within the specific formula.For the coding,
of 3 parts; ACK, Secure ACKnowledgment (S-
information was encrypted with software system,
ACK), and misdeed Report
Encrypt On Click for AES formula with 256 bit,
Blowfish 2000 for Blowfish formula and Kryplite for Authentication (MRA). All the acknowledgement
DES formula. supported the input that is distance packets were signed and verified to forestall cast
and size, time that Accustomed send information to acknowledgement packets. For linguistic
receiver and turnout might be calculated. All of those communication and corroborative the
calculation drained the MATLAB programming and acknowledgement packets, keys were generated and
also the output produces time of knowledge transfer. distributed earlier. The planned system uses one-hop
supported the gained results the authors suggested ACK that accustomed enhance the misdeed of
selecting AES to attain quick delivery of knowledge detection rates.
and high turnout, and selecting Blowfish formula To eliminate the need of pre-distributed keys the
once larger size of knowledge causation with smaller planned system thought-about DHKE that depends
transmission rate. on the problem of computing separate logarithms and
Kashani and Mahriyar in [12] analyzed video permits user to firmly inscribe messages. NS-2
streaming characteristics in impromptu networks machine tool was used for running simulation, and
victimisation many cryptography algorithms. The also the results showed the advance of misdeed
authors conferred associate degree application setup detection rates which ends up in lower routing
for secured video streaming in impromptu networks. overhead than the prevailing Intrusion Detection
Public key infrastructure approach was chosen to Systems (IDS) once victimisation the DHKE

Volume 10, Issue 1 ∙January-June 2019 57


Mechanism. accessible to any or all and thus, not solely supply
and destination area unit ready to access these
Du and Xiong in [3] planned a hop-by-hop
parameters rather each third party can also observe a
authentication and routing driven dynamic key
similar. Once the general public parameters set,
management them enamed HARD-KM. associate
secret numbers of the supply and also the destination
degree improved Elliptic Curve Diffie-Hellman
area unit chosen by pushing the button select secrets
(ECDH) protocol with mutual authentication was
in CrypTool. Then the supply sends the shared key
accustomed generate 2 combine keys, that were keep
to the destination and contrariwise. As a final step,
in caches before their expiration.
the supply and destination produce common and
HARD-KM handling all nodes within the network secret session key by pushing the button generates
equally rather than putt some cluster heads or a base common session key in CrypTool. The
station within the network, the theme used associate implementation of a brand new security
degree off-line certificate authority (CA) to sign extension and cryptanalytic schemes square measure
certificates and distributed authentication materials written as a brand new implementation within the
matrix for all the mobile nodes.NS2 to machine was NS-2[15].
accustomed valuate HARD-KM feasibleness and
This section discusses the new security agent and
potency.
functions that been wont to simulate the performance
The results showed that HARD-KM key of the encoding schemes of our interest. The NS-2
management theme was resilient to the adversaries could be a common separate event machine
and reduces key cupboard space. The benefits of the developed chiefly for networking analysis. NS-2 is
planned key management theme were; neighboring Associate in Nursing open supply software package
pair-wise keys on demand creation to save lots of provides wide simulating network sorts, network
cupboard space, the pair-wise keys were derived applications, routing protocols, information sources
from associate degree authentication materials matrix and network components. In NS-2, the system is
to wear down eavesdropping attack and sculptured as ordered events that take Associate in
compromised nodes had restricted threats to different nursing discretionary quantity of your time. NS-2 is
uncompromised nodes. meant having 2 basic building blocks; C++ for the
Taneja, et al. [14] planned a standard secret key core practicality that handle processing and therefore
institution for even coding over impromptu networks the Object TCL (OTCL) for scripting functions that
victim misation DH key agreement protocol. The is just a special purpose language used for writing
idea will be accustomed develop a brand new management script to run the simulation.
routing protocol for MANETs to produce most III. IMPLEMENTATION OF THE
security against every kind of attacks. whereas DH CRYPTOGRAPHIC SCHEMES IN NS-2
key agreement protocol uses regular system to The protocol implementation needs the C++
inscribe the information associate degreed an uneven language for packet process. and therefore the use of
system to inscribe the symmetric keys, the authors script language makes the modification of simulation
planned a protocol consists of 5 stages; the key configuration quicker and freely adjustable with
generation and an exchange, shared secret creation, dynamic parameters [15].
encrypting victimisation even key and encrypted
NS-2 is additionally supported with the Network
information transmission.
AniMator (NAM) that offers a GUI of the network
CrypTool machine had been employed in modeling that's simulated. For MANET, NS-2 provides an
associate degreed testing the DH key agreement oversized library for circumstantial routing, topology
protocol that is an open supply e-learning generators, propagation models, quality models and
application, employed in the implementation and information sources. To run any simulation situation
analysis of cryptological algorithms. As a primary in NS-2, it should be written exploitation TCL script
step in simulation, public parameters should be set. within the OTCL file [15]. though NS-2 offers
Since the general public parameters were freely various style alternatives it doesn't provide all. Our

58 IITM Journal of Management and IT


enforced cryptanalytic schemes and security The Simulation program accepts four inputs: the
extensions wasn't enclosed within the original NS- secret writing formula, secret writing mode, key and
2, we've got enforced our supply codes and compiled an computer file .once a winning execution, the
viable files and record results supported some cipher text generated.
network metrics [15].
A. Simulation Parameters
The security agent file throughout the protection Along with usual configuration of the wireless
institution method must be feed with the encoding network simulation in NS-2, we have a tendency to
kind from the supply and destination nodes through had set the routing protocol as AODV mistreatment
the TCL file. The encoding kind received from the the command, set val(rp) AODV the macintosh
TCL file hooked up with the encoding kind variable layer, data rate, transmission vary, simulation space,
kind exploitation the bind statement. once a node simulation time, range of nodes and alternative
receives the encoding kind and details conjointly set within the network
therefore the key worth the particular encoding start configuration TCL file. we have a tendency to used
by reading an information file with variable size the AODV routing protocol for power
exploitation the subsequent pseudo code: optimisation, as a result of it needs less
management packets. the main points of the pc
Get pointer to file ("test.txt");
system that we've got wont to compile NS-2 and run
if (not permitted access file) the simulation ar given in Table a pair of, and
therefore the NS-2 simulation parameters that we
return (error);
have a tendency to utilized in our experiments as
read data items from ("test.txt"); shown in Table 3. Table 3
read data as a separate block test
for end of file:
if yes end with read data;
return (done);
IV. SIMULATION AND RESULTS
DISCUSSION
The two main functions of the enforced secret
writing schemes performance analysis we have a
tendency to had exhausted the unplanned network
were; to perform a short study of the enforced
symmetric performance, and to see the overhead that
the DH formula adds to the general network
performance.
During this Chapter we are going to gift the
simulation results that we have a tendency to had
recorded consistent with completely different
performance metrics.
By considering completely different sizes of
knowledge files (2 kilobyte to 64KB) the DES,
3DES and AES (128 key) secret writing algorithms
were evaluated in terms of the energy consumption,
knowledge transfer time and network output. All the B. Simulation Factors and Metrics
implementations were balanced to form certain that The performance of enforced cryptanalytic schemes
the results are comparatively truthful and correct.

Volume 10, Issue 1 ∙January-June 2019 59


within the spontaneous network depends upon many Performance analysis assumptions:
factors:
1. Free house network with no multipath
1. Encryption schemes: This study evaluates 3 and/or attenuation
completely different symmetric algorithms; DES, 2. No noise touching the network
AES (128 key) and 3DES. 3. 20 repetitions for every experiment.
2. Number of hops: within the conducted Results and Discussion
experiments the performance of the enforced
This Section discusses the performance supported the
cryptanalytic schemes was evaluated individually
chosen metrics upon the variable factors
upon 3 main scenarios; one hop, 2 hops and 3 hops
that elaborated within the previous section.
between the supply and therefore the destination
nodes. 1) Energy Consumption
In our experiments the energy consumption was
3.Data file size: the enforced algorithms encipher
evaluated exploitation constant technique delineated
completely different file sizes; 2KB, 4KB, 8KB,
in [16]. we tend to gift a basic value of secret writing
16KB, 32KB and 64KB.
and decoding conferred by the merchandise of the
4. Simulation models: In our study we have a whole range of clock cycles taken by the secret
tendency to applied 2 simulation modes; the primary writing and also the average current drawn by every
mode simulates the network behavior forward the CPU clock cycle. The author in [17] showed the
supply of the common key, and therefore the second price of some secret writing algorithms on Pentium
mode simulates the network behavior as well as the processor as clock cycles per computer memory unit,
key management innovate the link sensing between that we tend to employed in our calculations as
the supply and therefore the destination nodes to shown in Table 4. To calculate the whole energy
confirm areliable and secure key management that value, we tend to divide the price in Amperes for all
precedes the particular coding. secret writing and decoding clock cycles by the
processor clock speed in cycles/sec.
We have performed many tests on our enforced
cryptanalytic schemes to watch its performance For a Pentium processor the clock speed is 7590
exploitation many performance metrics that area unit cycle/sec as shown in [18] that employed in our
outlined in Table 4 calculations as shown in Table 4. The energy value
calculations per computer memory unit done
Table 4. Simulation Metrics
exploitation the subsequent equation, and also the
Metric Definition Energy consumption for various file sizes area unit
The energy The energy consumption is the
Consumption (Joule) average amount of energy shown in figure 2 for DES, 3DES and AES secret
consumed by th encryption and writing schemes.
decryption during algorithm
processing.
The data transfer The time from starting the
time (sec) encryption of the first packet in
a selected data file till the end
of the decryption of the last
encrypted packet that reached
the destination node including
the End-to-End delay time.
End-to-End delay The time taken for a packet to be
time (sec). transmitted across a network
from source to destination.
The network The network throughput that
Throughput (Kb/sec) evaluated by dividing the total
plaintext size that been
encrypted on the total
encryption time consumed
during encryption.

60 IITM Journal of Management and IT


Figure 2: Energy Consumption for Varying Data File
Sizes
In general the results showed the prevalence of
AES rule over DES and 3DES in term of the energy
consumption
(when inscribe identical knowledge file).
Actually, we tend to found that the
AES needs around sixty fifth, eighty fifth energy less
that the energy consumed by DES and 3DES
algorithms severally.
DES rule consumes around fifty eight
energy but 3DES rule.
2) Data Transfer Time
The data transfer time calculations in our conducted
experiments were supported an
equivalent technique utilized by[11] that thought-
about because the time from beginning the secret
writing of the primary packet in an exceedingly
hand-picked record until the tip of
the cryptography of the last encrypted packet that
reached the destination node together with the End-
to-End delay time. so as to work out the transfer
time the subsequent equation was used:

Table 5. Energy Consumption Results

For the implemented encryption schemes in our


study the transfer time results are shown graphically
in Figure. 3.

Volume 10, Issue 1 ∙January-June 2019 61


Fig.3 : The Implemented Encryption Schemes
Transfer Time Results for the Two Simulation
Fig.4: Network Throughput Results for the
Modes Implemented Encryption Schemes End-to-End Delay
As we will notice from Figure 4 a bonus of Time
victimization the AES secret writing theme is that it The End-to-End delay time in our study measured
takes less information transfer time than DES and because the measure from the instant that the supply
3DES secret writing schemes. The experimental
node sends a primary packet of information once
results showed that the AES transfer time is some secret writing procedure completion till the instant
ninetieth but DES secret writing once running that the destination node within the network receives
simulation mode one. On the opposite hand, AES the last encrypted packet. in keeping with the End-to-
consumes associate degree some twenty fifth transfer End delay definition the DHKE transactions adds a
time but DES secret writing for tiny information files
definite preprocessing time overhead to the particular
and (57%-80%) but DES for larger information files End-to-End delay time between supply and
once applying the DHKE rule in simulation mode 2 destination nodes now is fastened for DES, 3DES
applied experiments (loading constant information and AES as a result of it's associated with the transfer
sizes for each secret writing schemes). packets throughout session initiation stage and not
3) Network Throughput the particular encoding. forward totally different
In our study the throughput of the network whereas variety of hops between the supply and destination
running the enforced coding schemes is calculated nodes, and exploitation 16KB record size the End-to-
exploitation the formula given by [11], that done by End delay time results square measure shown in
normalizing the whole encrypted file size in bytes by ―Fig. 5‖ for the 2 applied simulation modes. usually
the information transfer time exploitation the the file size VS. the proportion of the DHKE
subsequent formula: overhead is shown in Table 6.

Throughput = size of plain text / time consumed


during encryption
For different record sizes the throughput results
whereas running the 2 simulation modes area unit
shown in Figure 4.

62 IITM Journal of Management and IT


Table 7. Performance Evaluation Results Summary
AES DES
AES superiority
Performance superiority superiority
over
over over
Metric DES (%) 3DES (%) 3DES (%)
Energy 65 85 59
Consumption
Transfer Time 70 95 63
Network 83 95 64
Throughput
Security in unintentional networks is associate
degree open analysis issue, and fact-finding work
remains in progress for brand spanking new security
solutions. The scientific discipline solutions, and
their suitableness with unintentional limitations, can
Figure 5 Ad hoc Network End-to-End Delay Time perpetually be a challenge so as to produce
Calculations for 16KB Data protection from malicious attacks. The followings ar
some future work suggestions:
Table 6 The Data File Size VS. DHKE Overhead
•Analyze and measure the performance of another
The following conclusions were obtained:
isobilateral block cipher like the Blowfish cipher.
File Size (KB) DHKE Overhead (%)
•Analyze and measure the performance of stream
1 66.7 cipher cryptography like the RC4 and SEAL ciphers.
2 50 A comparative analysis of stream cipher
4 33.3 cryptography with block cipher cryptography is
8 20 assumed to be valuable.
16 11.1
•Evaluate the performance of the network
From the results shown within the on top of table we victimisation another network machine like Opnet
will conclude that the overhead caused by applying network machine so as to validate the obtained thesis
DHKE protocol to the general Manet performance is results.
suitable scrutiny with its edges particularly for giant
•Evaluate the performance of the network with
information files.
totally different network topologies.
V. CONCLUSIONS AND FUTURE
•Evaluate the performance of the network forward
DIRECTIONS
new nodes joining/leaving the network.
In this study we have a tendency to tried to judge the
performance of DES, 3DES and AES bilateral REFERENCES
coding algorithms beneath painter setting. On the
[1] Nadeem, A. and Howarth, M. P. (2013), A
opposite hand, we have a tendency to applied a
secure key management resolution exploitation the Survey of MANET Intrusion Detection &
Prevention Approaches for Network Layer
DHKE protocol. and eventually we have a tendency
Attacks. IEEE Communications Surveys &
to offered the power to decide on the coding sort by
Tutorials, 15(4), 2027-2045.
the user supported the specified security level.
[2] Chen, J. and Wu, J. (2010), A Survey on
Cryptography Applied to Secure Mobile Ad
hoc Networks and Wireless Sensor networks.
Handbook of Research on Developments and
Trends in Wireless Sensor Networks: From
Principle to Practice, IGI Global, AH

Volume 10, Issue 1 ∙January-June 2019 63


ALTALHI, 5, 2414-2424. Network. In International Journal of Emerging
[3] Du, D. and Xiong, H. (2011), A Dynamic Key Technology and Advanced Engineering
Management Scheme for MANETs. In Cross IJETAE, 4(6).
Strait Quad-Regional Radio Science and [11] Norouzi, M. esmaeel Akbari, M. and Souri, A.
Wireless Technology Conference (CSQRWC), (2012), Optimization of Security Performance
IEEE, 1, 779-783. in MANET. Journal of American Science,
[4] Mokhtarnameh, R. Muthuvelu, N. Ho, S. B. 8(6).
and Chai, I. (2010), A Comparison Study on [12] Kashani, A. A. and Mahriyar, H. (2014), A
Key Exchange-Authentication Protocol. New Method for Securely Streaming Real-
International Journal of Computer time Video in Ad hoc Networks. Advances in
Applications IJCA, 7(5), 5-11. Environmental Biology, 8(10), 1331-1338.
[5] Abdul, D. S. Elminaam, H. M. A. K. and [13] Sandhiya, D. Sangeetha, K. and Latha, R. S.
Hadhoud, M. M. (2009), Performance (2014), Adaptive ACKnowledgement
Evaluation of Symmetric Encryption Technique with Key Exchange Mechanism for
Algorithms. International Journal of Computer MANET. In Electronics and Communication
Science and Network Security, 8(12), 78-85. Systems (ICECS), 2014 International
[6] Alanazi, H., Zaidan, B. B., Zaidan, A. A., Conference, IEEE, 1-5.
Jalab, H. A., Shabbir, M. and Al-Nabhani, Y. [14] Taneja, S. Kush, A. and Hwang, C. J. (2011),
(2010). New comparative study between DES, Secret Key Establishment for Symmetric
3DES and AES within nine factors. arXiv Encryption over Adhoc Networks. In
preprint arXiv:1003.4085. Proceedings of the World Congress on
[7] Stallings, W. (2006), Cryptography and Engineering and Computer Science (Vol. 2).
Network Security: Principles and Practice, [15] Fall, K. and Varadhan, K. (2002). The NS
(5^th ed.). India: Pearson Education. Manual. Notes and Documentation on the
[8] Mandal, A. K. Parakash, C. and Tiwari, A. Software NS2-Simulator.
(2012), Performance Evaluation of [16] Elminaam, D. S. Kader, H. M. A. and
Cryptographic Algorithms: DES and AES. In Hadhoud, M. M. (2009), Energy Efficiency of
Electrical, Electronics and Computer Science Encryption Schemes for Wireless Devices.
(SCEECS), 2012 IEEE Students' Conference, International Journal of Computer Theory and
IEEE, 1-5. Engineering, 1, 302-309.
[9] Umaparvathi, M. and Varughese, D. K. [17] Biham, E. (Ed.). (2006), Fast Software
(2010), Evaluation of Symmetric Encryption Encryption. 4th International Workshop,
Algorithms for MANETs. In Computational FSE'97, Haifa, Israel, January 20-22, 1997,
Intelligence and Computing Research Springer, Proceedings (Vol. 1267).
(ICCIC), IEEE International Conference, 1-3. [18] Rhett, (1999), x86 CPU Reference, Part 2.
[10] Sahu, S. K. and Kushwaha, A. (2014), Retrieved May 25, 2015, from
Performance Analysis of Symmetric http://alasir.com/x86ref/index.htm
Encryption Algorithms for Mobile Ad hoc

64 IITM Journal of Management and IT


Fuchsia OS - A Threat to Android

Taranjeet Singh1, Rishabh Bhardwaj2


1,2
Research Scholar, Institute of Information Technology and Management
t23singh@gmail.com , rishabh77k@gmail.com

Abstract-Fuchsia is a fairly new Operating System both personal computers as well as low power
whose development was started back in 2016. running devices, particularly IOT devices.
Android supports various types of devices which is Initially, Android was developed for cameras and
having different types of screen size, Architecture, then it is extended to other electronic devices,
etc. But problem is that whenever google releases developing apps for these devices are still a complex
new updates due to a large variety of devices lots of task because of compatibility issues of native
devices doesn't receive updates that are the main devices.
issue with android.
Android operating system supports various types of
This review is about fuchsia and its current Status devices such as android wear devices, auto cars,
and how is it different from the Android operating tablets, smart phones, etc. so to develop an android
system. app for all these devices is a very tedious task.
Keywords: Internet Of Things( IOT ), Operating The Major problem with android is, not all the
System (OS), Microkernel, Little Kernel, Software devices receive updates on time.
Development Kit (SDK), GitHub
Fuchsia is developed to overcome these problems,
I INTRODUCTION with fuchsia we can develop apps for all these
devices and they can be implemented flawlessly.
Fuchsia is an open source Hybrid Real-time
Operating System which is under development. A. Architecture of Fuchsia
Prior to Fuchsia we already had android OS which is Fuchsia uses Microkernel which is an evolution of
used in almost all kinds of devices. Android little kernel (LK) named as Zircon. Zircon is a much
development was started in 2003 by android inc later segmented model and designed for smaller electronic
on google purchased them in 2005. Android‘s first devices, this is what makes Zircon different from
beta version was released in 2007 by Google and Android‘s Linux kernel [3]. The biggest problem
OHA, and then it was officially accepted by the most with Android Operating System its fragmentation,
of the companies which were in Open Handset there are nearly 840 million versions of android are
Alliance [1]. But android wasn‘t dedicatedly available which makes it complex for developers to
developed for IOT, so Google came up with an idea decide the targeted versions.
to develop an Operating system for Internet of
B. Languages supported in Fuchsia for
Things devices.
software development
In 2016 Google was uploading some of the
It supports almost all modern and trending languages
documentation[2] on the GitHub and at that time and
like Swift, C++, Go, Rust, python, which might
no-one was aware of this new operating system after
attract lots of developers on the environment.
some time google further uploaded some of the other
documents and made it clear that it is a new OS and C. Technologies Recommended by Fuchsia
its main focus is on IOT devices. Fuchsia runs on for Software Development
modern 64-bit Intel and ARM processors[1]. Flutter is Google's mobile UI framework for crafting
Purpose of development (Fuchsia) high-quality native experiences on iOS and Android
in record time[4].
Fuchsia is an Open source Operating System
Optimised and developed in a way that it supports It has lots of amazing features such as:

Volume 10, Issue 1 ∙January-June 2019 65


 Fast Development: Due to hot reload we develop fuchsia apps android apps and ios apps via
can develop apps faster[4]. using single code base which is kind of amazing
 Expressive UI: Flutter has built-in Material thing because nowadays we have to write codes for
Design Widgets, rich API, Smooth natural different platforms individually, so fuchsia has huge
scrolling and platform awareness[4]. scope when it comes to native software development
 Native Performance: Flutter‘s widgets for different devices within same software
incorporate all critical platform differences development kit using Common language for all.
such as scrolling, navigation, icons and fonts B. Fuchsia: A threat to android?
to provide full native performance on both
iOS and Android[4]. As observed above that android‘s architecture has
some issues with compatibility with smaller devices.
DART [5](recommended programming language for
developing application based on fuchsia): an open- It is observed on the basis of Google docs regarding
source project that aims to enable developers to build the Fuchsia OS code[9] that Android apps will run on
more complex, highly performant apps for the Fuchsia too. But regarding this, there is no official
modern web. statement has been made by Google that Fuchsia will
completely make Android obsolete.
Using the Dart language, you can quickly write
prototypes that evolve rapidly, and you also have Google wants to peg Fuchsia as a unifying OS, the
access to advanced tools, reliable libraries, and good single operating system, the company had been
software engineering techniques. Meanwhile, dart is mulling since 2015. But at this point, it is just a
also receiving a little criticism for being a modern kernel – the core of an operating system.
language that doesn‘t handle Null references. C. Fuchsia’s effect on Windows
II CURRENT STATUS OF FUCHSIA OS Implementation of fuchsia seems a nice strategy to
Fuchsia is still under development and may take few catch up with Microsoft‘s Windows 10 IoT
more years for being operational. Fuchsia is not Enterprise [10]. However, Microsoft has sheer
google‘s 20% project. Armadillo was the first dominance in Desktop oriented Operating system.
renderer engine of fuchsia but now it has been And Fuchsia‘s Target is to dominate smaller devices
replaced by Escher which is Vulkan based rendering like Phones and Tablets.
engine. Escher provides faster graphics rendering. Microsoft is also working on similar projects to
Google has released Fuchsia SDK and anyone can create a new unified mobile platform, for example,
download and execute it. the Universal Windows Platform and Continuum,
this basically enable the apps to resize and formulate
Currently, Google is using HonorPlay[6] to the interface for different screen sizes and
demonstrate the very first look of Fuchsia, and a web orientations. Fuchsia‘s objective is also something
Interface demo[7] by Manuel Goulão[8]which has similar to handle portability based on different
attracted the audiences due to the spectacular User requirements. If Google successfully develops a new
Interface. OS, the market could definitely shift, experts
A. Scope indicate. When Microsoft Windows Mobile failed to
take off, they left the field open for Google and
Fuchsia is an operating system which is primarily
Apple.
built to support Internet Of Things devices and
Internet Of Things is a booming industry, more and III OBSERVATION
more devices are connected with each other but we Fuchsia is designed to have the ability to scale up
currently don‘t have any operating system so Fuchsia from small devices like watches, mini cameras to
definitely have huge scope in near future. larger devices like laptops and computers.
Fuchsia uses Flutter Software development kit which The recommended language by google for fuchsia
is used to create native appsso developers can i.e. DART is highly preferred for prototyping but can

66 IITM Journal of Management and IT


also receive little criticism for being a modern REFERENCES
language that doesn‘t handle Null references[11].
[1] https://www.openhandsetalliance.com/
Dart in reference with billion dollar mistake (Null [2] https://android.googlesource.com/platform/preb
References) uilts/fuchsia_sdk/
[3] https://github.com/fuchsia-mirror
Dart being a modern language is missing null able
[4] https://flutter.dev/
types. That might kill it as a modern language, and
[5] https://www.dartlang.org/
do not solves the Billion Dollar Mistake[12] referred
[6] https://www.digit.in/news/mobile-
by Tony Hoare[13](British Computer Scientist -
phones/google-fuchsia-os-successfully-booted-
inventor of quick sort), null references historically do
on-kirin-970-powered-honor-play-44842.html
not have positive references If the billion dollar
[7] https://mgoulao.github.io/fuchsia-web-demo/
mistake was the null pointer, the C gets function is a
[8] https://mgoulao.github.io/cv.html
multi-billion dollar mistake that created the
[9] https://fuchsia.googlesource.com/third_party/an
opportunity for malware and viruses to infiltrate[11].
droid/platform/frameworks/native/
This mistake was recovered in many modern
[10] https://developer.microsoft.com/en-
language like Kotlin, C# etc.
us/windows/iot
IV CONCLUSION [11] https://www.infoq.com/presentations/Null-
It is intended that fuchsia will run on everything from References-The-Billion-Dollar-Mistake-Tony-
phones and laptops to IoT devices and more. Hoare
Fuchsia is going to accomplish much of what [12] https://www.youtube.com/watch?v=kz7DfbOuv
Microsoft and Apple already have in Windows 10 OM
and iOS-to-macOS Sierra Continuity, respectively, [13] https://www.cs.ox.ac.uk/people/tony.hoare/
but in a very Google way.

Volume 10, Issue 1 ∙January-June 2019 67


Sentiment Analysis using Lexicon based
Approach
Rebecca Williams1, Nikita Jindal2, Anurag Batra3
Research Scholar, Institute of Information Technology & Management, Janakpuri, New Delhi, India
rebecca3williams@gmail.com, nikitajindal68@gmail.com, anuragbatra1999@gmail.com

Abstract - Triple talaq is also known as talaq-e- in any form- spoken or written or by any electronic
biddat instant divorce. It is a kind of Islamic divorce means like by email, message or any other mean of
used by Muslims in India. It allows Muslims man to communication will be considered illegal and if any
divorce their wife legally by simply stating the word such practice found then there will be a three years
‗Talaq' three times in any form which can be in any imprisonment and will be fined. In Islam marriage is
way (verbal, written, or in electronic form). Now a considered as a contract between husband and wife
day, the huge amount of data is posted on daily basis and in that various procedures have been written on
on the social media platform. Twitter is a well known how to annul it. As per Islamic traditions, a woman
social networking platform where the user can post can ask divorce via "khula", whereas the husband can
their views, opinions, and thoughts freely. The end the marriage instantly by pronouncing talaq
sentimental analysis is a process of understanding thrice. But many have highlighted the misuse of
opinions, thoughts and feelings of people about a instant divorce by men as a reason to ban it. The man
given subject. This paper analyses tweets posted on will pay some maintenance and custody of the child
Twitter on the subject Triple from the year 2002 to to the mother. The bill passed should not be viewed
the year 2019. We have transformed unstructured from point of politics. The bill passed should not also
data into well-informed data for getting the insights be viewed from the point of religious motive or for
of people. The main focus of the work is to analyze vote bank. The bill is passed for the rights and
the feelings of people using two well-known API like respect of women. The sentimental analysis is a
TextBlob, and SpaCy. These APIs are based on process of analyzing views of people about a given
Lexicon approach. This paper predicts sentiment into subject or a topic which can be in the form of written
three classes positive, negative and neutral. or spoken language. Today in this world where a
huge amount of data generated every day,
Keywords: - Talaq, ApplicationProgramming
sentimental analysis has become a vital tool for
Interface(API),SpaCy,TextBlob,Parts-of-Speech
making sense of each processed data. This has
(POS),Natural Language Processing(NLP)
allowed companies to get the results of various
I. INTRODUCTION processes they are doing nowadays. It is a type of
text research or mining. In this paper, we are
Triple talaq also named as talaq-e-biddat instant
applying statistics, natural language processing
divorce. It is an Islamic divorce in the Muslim
(NLP), and machine learning to identify, analyze and
religion. It allows Muslims man to divorce their wife
extract some important information from tweets. The
legally by simply stating the word ‗Talaq‘ three
main objective is to observe the reviewer‘s feelings,
times in any form which can be in any way (verbal,
expressions, thoughts or judgments about a particular
written, or in electronic form).22nd August 2017 was
topic, events, or a company. This type of analysis is
the date when the Indian supreme court instantly
also known as opinion mining which has its main
deemed the triple talaq. Out of five judges, three
focus on extraction. The goal of this analysis is to get
judges have the same opinion that triple talaq is
the opinion of our audience on a particular subject by
illegal and the remaining two stated that the
analyzing a large amount of data from heterogeneous
government should ban this practice by simply
sources. Today sentimental analysis has a different
following the law. The Modi government made a bill
number of uses. With the increase in the use of social
known as the Muslim women bill, 2017 and it was
networking sites and the rise in feedbacks forms and
passed on 28th December 2017 by the Lok Sabha.
ranking sites, companies are becoming more
The bill stated that if a man gives instant triple talaq

68 IITM Journal of Management and IT


interested in this type of analysis. Consumers can II. RELATED WORK
freely share their thoughts easily on the web. But
In the last few decade there has been the vast
with a huge amount of information out there, it can
expansion of social media due to which social media
be tedious for companies to hone in on the most
has started playing a vital role when it comes to
valuable parts of consumers comment.
gathering sentiments of masses. Something similar
This is the main reason why we are applying was done there are various ways to perform
sentimental analysis. Organizations are leaning on sentiment analysis using lexicon based approach and
the basis of sentiment analysis and then filtering out some of the methods are discussed by Taboada, M.,
valuable information so that they successfully Brooke, J., Tofiloski, M., Voll, K., & Stede, M.in
understand consumers behavior on a topic so that their research paper Lexicon-based methods for
they can take targeted decisions. Now a day, the sentiment analysis.
huge amount of data is posted on daily basis on the
social media platform. Twitter is a well known social III. EXPERIMENTAL SET-UP
networking platform where the user can post their The experimental setup of the approach presents the
views, opinions, and thoughts freely. research methodology employed, the tools, and
libraries used to analyze the opinion of people of
India on Triple Talaq. We used a laptop of DELL, i3,
3.6 GHz with 8GB DDR3 RAM. In this study, open
source libraries, packages, APIs are extensively used.
This section is further categorized into two sub-
sections.
A. System Architecture
In this section, we are discussing the overall
architecture of a system. The tweets that were
extracted do not contain their corresponding labels.
So, labeling of tweets was required in order to get the
desired results. In this paper, we used a lexicon-
based approach to classify our tweets. Currently,
many web services are available which automatically
Fig 1. Flow chart of the process provide more clear-cut labels as compared to human
annotators. The main focus of the work is to analyze
Sentiment Analysis can be done by either machine
the feelings of people using four well-known API
learning or lexicon-based approach. In this paper, we
like Text Blob, NLTK, SentiWords, and Vivekn. The
have applied a Lexicon based approach. This is a
services provided by these four approaches vary
feasible and practical approach which can analyze
from one another.
tweet text without training or using machine
learning. Lexicon is a collection of words or one can a. Data Extraction
say it is like a dictionary in which words are arranged
With an increase in the importance of text analysis in
alphabetically. This approach is subdivided into a
many research areas, so many researchers started
dictionary-based approach and corpus-based
analyzing the sentiments of people by their posts on
approach. Here we are using a corpus-based
many social sites. In this paper, we are analyzing the
approach. Corpus is a large body of words or text
tweets posted by people on triple talaq. The first of
which formulate a set of conceptual rules that govern
analyzing is data extraction. Data extraction is the
a natural language from texts in that language and
process of retrieving the data posted by people from
examine how that language relates to other
available sites (in this research we are taking data
languages.
from twitter). After this we preprocessed our data
i.e., we cleaned our data (removed noisy data).

Volume 10, Issue 1 ∙January-June 2019 69


● SpaCy
b. Labeling of Tweets It is a free and open source library which is used for
advanced NLP ( natural language processing).
In this part, we have labeled our tweets using four
well known APIs. The four well-known API that we ● Matplotlib
have used over here are TextBlob, SentiWord,
It is a python plotting library (in 2D). It produces
NLTK, and Vivekn. These four APIs label our tweets
quality figures in a variety of formats. It also helps in
into three well-known categories (Positive, Neutral
plotting various graphs for data visualization.
and Negative).
● Seaborn
c. Data Visualization
It is a python library for making statistical graphs.
In this section, we discussed how after applying all
our techniques we Visualize all our gathered IV. Data Extraction
information.
A. Extraction of Data
● Pie charts of our results of APIs
It is one of the most important step cause from here
In this, we have visualized how many tweets are only things start. Now Twitter is the most famous
classified into which category(positive, negative or social network where people from all around the
neutral) in the form of pie charts for each API. world share their views which are also called tweets.
Twitter helps us to access its API through python
● Comparison of our APIs used
library called Tweepy which allows us to extract the
In this, we have compared all our four API and have data from the twitter of any user. Tweepy provides
determined the accuracy of each API that we have an easy to use Cursor interface by which we can
used. This accuracy has been shown using a iterate through different types of objects. This python
histogram. library can be installed by using pip command on
cmd or terminal.
B. Tools Used
The first thing we need to know are the keys, that
In this section, we are discussing the various tools
are: the consumer key, the consumer secret key, the
that we have during this research. Here we are
access key and the access secret key from Twitter's
discussing programming languages, APIs, open
developer site available easily for each user. These
source libraries in brief.
keys will help the API for authentication.
● Tweepy
Steps to obtain these keys are as follows :
Tweepy is open source which enables python to
● Log in to Twitter's developer site.
communicate with Twitter and enable us to use its
● Go to ―Create an App‖.
API.
● Fill the details in the form.
● Python programming language ● Then Click on ―Create your Twitter
Python is a popular programming language which Application‖.
nowadays is being used for text mining and analysis. ● Details of our new application will be shown
It is object-oriented and high-level programming along with consumer key and consumer
language. secret key.
● To obtain access token key, click on ‖Create
● Text Blob my access token‖.
It is a python library for textual data. Using text blob The page will refresh and generate an access token.
we can tokenize our paragraph into sentences or For authorization of our twitter account, we use
words. OAuth Interface. By this, we can authorize our app
to access our account.
Now, Twitter limits the tweets to be extracted to

70 IITM Journal of Management and IT


3200 maximum at a time. A JSON file is created
when we fetch tweets from Twitter using tweeps. csvWriter.writerow([tweet.created_at,
Since JSON file data is complex and it contains a lot tweet.text.encode('utf-8')])
of unrequired data, we created a corpus file, which
B. Pre-Processing of Data
contains only required data which can be analyzed.
This is another important step to proceed further is
We also converted time to the timestamp to fetch
preprocessing. Preprocessing of data is required to
data between the year 2002 to 2019. Timestamp data
clean the data to acquire the required data. In this
is easily read by machine. We have Created separate
step, all the noisy characters are removed from the
files for name and text to improve the performance
text to further analyze it. The misspelled words,
of the code.
grammatical errors, punctuation errors, unnecessary
Code snippet for extracting tweets capitalization, stop words and use of non-dictionary
words such as abbreviations or acronyms of common
import tweepy
terms are few examples of noise in the text.
import csv
import pandas as pd A classic tweet contains variations of words,
def TWITTER(Save): emoticons, mentions of users, hash tags etc. The
#input your credentials here main goal of the preprocessing step is to standardize
CONSUMER_KEY = input("Enter the text into a relevant form to derive the sentiments
Consumer Key: ") of the user. Following are the steps to pre-process
C0NSUMER_SECRET) = input("Enter text into useful data for classification:
Consumer Secret Key: ") a. Tokenization-
ACCESS_TOKEN= input("Enter Access
First of all the text is tokenized. Tokenization is the
Token: ")
process of natural language processing(NLP) by
ACCESS_TOKEN_SECRET = input("Enter
which large textual data is divided into smaller parts
Secret Access Token: ")
called tokens. In other words, tokenization helps to
subdivide sentences into a group of words and
AUTH =
paragraphs in a group of phrases. This step is a
tweepy.OAuthHandler(CONSUMER_KEY,
crucial step in NLP.
C0NSUMER_SECRET)
AUTH.set_access_token(ACCESS_TOKEN Tokenization can be of two types:
, ACCESS_TOKEN_SECRET)
i) word tokenization
API =
ii ) sentence tokenization
tweepy.API(AUTH,wait_on_rate_limit=True)
file = STANDARDIZED_CSV(Save) We are using nltk word_tokenizer() to tokenize our
csvFile = open(file, 'a') textual data split a sentence into words. Then the
output of the tokenization is converted into a data
Code snippet to tokenize content
frame.
import nltk
from nltk.tokenize import word_tokenize Now, tokenization of text from the corpus is done in
m = nltk.word_tokenize(tex[1].strip().lower() three ways using nltk i.e. unigram, bigram, and n-
#Using csv Writer gram. These text models can also be used in
temp0 = get_input() tokenized sentences.
for temp1 in temp0:
csvWriter = csv.writer(csvFile) b. POS Tagging-
for tweet in
tweepy.Cursor(api.search,q=temp1,count=10, The second step of pre-processing the data is POS
lang="en", since="2001-04-03").items(): Tagging. Parts-of-speech Tagger is very useful as it
print (tweet.created_at, tweet.text) reads the text and assigns parts of speech or tokens

Volume 10, Issue 1 ∙January-June 2019 71


(i.e. noun, pronoun, verb, adjective, etc.) to each Code Example:
word. wordnet_lemmatizer.lemmatize("is", pos="v")
Output: 'be'
Code example:
text = word_tokenize("This sentence is written to d. Stop words removal
check pos tagging in nltk")
nltk.pos_tag(text) Stop words removal is one of the major pre-
Output: [('This', 'DT'),('sentence', 'NN'),('is', processing steps as it is used to filter out useless data.
'VBZ'),('written', 'VBN'),('to', 'TO'),('check', In natural language, stopwords are the frequently
'VB'),('pos', 'NN'),('tagging', 'VBG'), ('in', 'IN'),('nltk', used words such as is, am, are, an, the etc which
'NN')] have very little meaning. These words are ignored by
the search engine while indexing entries for
c. Lemmatization: searching and retrieving them. The programming
The third step of preprocessing is Lemmatization. It languages are programmed to ignore such words. We
is an algorithmic process of finding the lemma of a are removing these words as they are considered as
word depending on its meaning. Lemmatization they do not add any value to our analysis.
usually refers to the linguistic analysis of words. The Code snippet for removing stop words and
main aim of this process is to remove any inflection punctuation in the tweet data
in the ending of a word. import math
Now in text pre-processing both stemming as well as from nltk.corpus import stopwords
lemmatization. They both seem similar but are self = ["`","``","''"]
different because stemming method cuts the suffix stop_words = set(stopwords.words("english") +
from the word i.e. either the ending or the beginning list(string.punctuation) + list(self))
of a word which sometimes makes the word
meaningless. For example: Stemming for studies is e. Translation and Language Detection
studi , which indeed have no meaning in the Last but not least this step is used to detect and
dictionary. translate a given language into a required language.
On the other hand, lemmatization is a much better We are using Textblob which is a python library for
method and is more powerful as it also considers the this task. Now, textblob is an amazing tool which
morphological analysis of a word which helps in makes NLP faster and easier to work with and this
conversion of the word into its base form without translation feature of textblob is one of the best
changing its meaning. For example Lemma for features. Let us do some code examples and see how
studies is study. it works:

So we can say that lemmatization is a smart method Code Example:


and stemming is a generic method. Hence, from textblob import TextBlob
lemmatization will help in creating better machine blob= TextBlob("‫)"مرحبا بالعالم‬
learning characteristics. blob.detect_language()
Output: ‗ar‘
Code Example:
from nltk.stem import WordNetLemmatizer Code Example:
wordnet_lemmatizer = WordNetLemmatizer() blob.translate(from_lang="ar", to='en')
wordnet_lemmatizer.lemmatize("teaches") Output: TextBlob("Hello World")
Output: 'teach'
Also, for verbs, we use ―v‖ as an argument to pos as V. Labeling of Data
the default pos argument is ―n‖. This step is further subdivided into four steps
including various APIs to label the text into negative,
positive or neutral. The APIs we are using are -

72 IITM Journal of Management and IT


TextBlob, Nltk, SentiWord, and Vivekn. We are tem = 1
using these four APIS to label the tweet text and then text_neg = text_neg + 1
we will compare and contrast them. These APIs are
B. SpaCy
freely available and are usually used in the lexicon-
based approach. The pre-processing steps are almost It is a free and open source library which is used for
common in all of these APIs i.e. tokenization, POS advanced NLP ( natural language processing). Spacy
tagging, stemming and lemmatization, stop words is gaining popularity at a very fast rate and is said to
removal etc. Vivekn labels sentiment using words, n- overtake NLTK. SpaCy is lightning fast, highly
grams, and phrases whereas TextBlob uses Parts-of- accurate and easy to run. It also works well with
Speech (POS) tagging to label the sentiments. Four other tools like TensorFlow, Scikit-Learn, PyTorch
python scripts are written for labeling the sentiment and Gensim and provides models for Named Entity
by accessing their APIs. Recognition, Dependency Parsing and Part-of-
Speech tagging. This library works best while
A. TextBlob
preprocessing data for deep learning. Some of its
TextBlob is a python library and it helps in various other features include pre-trained word vectors,
natural language processing tasks providing a simple support more than 31 languages and easy model
API. packaging and deployment. State-of-the-art speed is
the best unique feature and spaCy v2.0 features
Textblob has easy to use interface hence it is
neural models for tasks such as tagging, parsing and
beginner friendly. It is fairly simple to learn Textblob
entity recognition.
when compared to other open source libraries which
is one of its key feature. Code snippet:
import spacy
It is basically used for text processing which includes
from spacy.tokens import Doc
tasks such as tokenization, Parts-of-Speech tagging,
from nltk.sentiment.vader import
stemming, lemmatization, stopwords removal,
SentimentIntensityAnalyzer
translation, noun phrase extraction, sentiment
sentiment_analyzer =
analysis by labeling, classification by using machine
SentimentIntensityAnalyzer()
learning algorithms, and more. It has a sentiment
def polarity_scores(doc):
property which returns a tuple of the form Sentiment
return sentiment_analyzer.polarity_scores(doc.text)
(polarity, subjectivity). The polarity score is a
Doc.set_extension('polarity_scores',
floating point number within the range [-1.0,
getter=polarity_scores)
1.0](where -1 means negative statement and 1 means
for min tweet: tem =3
positive statement). The subjectivity is a floating
doc = nlp(m)
point number within the range [0.0, 1.0] (where 0.0
if doc._.polarity_scores["pos"] ==
is very objective and 1.0 is very subjective). The
doc._.polarity_scores["neg"]: tem = 0
subjectivity is basically the person‘s opinion or
spacy_confused = spacy_neg + 1
emotion on a particular topic. If you want to work on
elif doc._.polarity_scores["neg"] >
basic NLP tasks, TextBlob is the best open source
doc._.polarity_scores["pos"]: tem = 1
software. In fact, TextBlob performs better than
spacy_neg = spacy_neg + 1
NLTK for textual analysis.
elif doc._.polarity_scores["pos"] >
Code snippet: doc._.polarity_scores["neg"]: tem = 2
for m in tweets: tem = 3 spacy_pos = spacy_pos + 1
analysis = TextBlob(m)
VI. Visualization
if analysis.sentiment.polarity > 0: tem = 2
text_pos = text_pos + 1 Data in visual form makes more sense than data in
elif analysis.sentiment.polarity == 0: tem = 0 textual form. The representation of data using charts
text_confused = text_confused + 1 elif and graphs is more presentable and helps understand
analysis.sentiment.polarity < 0: facts and figures better.

Volume 10, Issue 1 ∙January-June 2019 73


In this step, we are using statistics and mathematical
functions to organize our data in graphical format.
Data visualization helps us in understanding
changing patterns or trends in the data over time. It
also helps in the comparison of the different sets of
data.
We are using the following three graphs for basic
data visualization:
● Line Plot
● Bar Chart
● Histogram Plot
● Pie Chart
We will do the following analysis and visualize the
Fig 3. Pie chart of data labelling using
data in forms of various graphs.
TEXTBLOB
A. Pie charts of our results of APIs
In this, we have visualized how many tweets are
classified into which category(positive, negative or
neutral) in the form of pie charts for each API.
B. Bar graph of our results of APIs
In this, we have visualized how many tweets are
classified into which category(positive, negative or
neutral) in the form of bar graph for each API.
C. Comparison of our APIs used
In this, we have compared our APIs and have
determined the accuracy of each API that we have
used. This accuracy has been shown using a
histogram.
Fig 4. Pie chart of data labelling using SpaCy
● Sentimental analysis Visualization Results
Using various API’s
Pseudocode: To create pie chart
# Data to plot
labels = 'Positive', 'Negative', 'Neutral' sizes =
[text_pos,text_neg,text_confused]
colors = ['gold', 'yellowgreen', 'lightskyblue'] explode
= (0.1, 0, 0) # explode 1st slice
# Plot
plt.pie(sizes, explode=explode, labels=labels,
colors=colors, autopct='%1.1f%%',
shadow=True, startangle=140)
plt.axis('equal')
plt.title("Sentimental Analysis Using TEXTBLOB ",
bbox={'facecolor':'0.8', 'pad':5})
plt.show()

74 IITM Journal of Management and IT


Last Commit About 2 months ago 2 days ago

Code Quality L3 -

Language Python HTML

License MIT License

Tags Text Processing, Natural


Natural Language Language
Processing, Linguistic Processing,
Scientific,
Engineerin
g

Fig 5. Bar graph of SpaCy and Textblob APIs VII. CONCLUSION


In this paper, we used Twitter's tweets on triple talak
and analysed how positive or negative or neutral a
tweet is. We have used two well-known APIs i.e.
Textblob and SpaCy. We compared the results of
two API and found that Textblob is faster than
SpaCy. But SpaCy produced more accurate results
(refer fig 3. and 4.). With gentle learning curve and
surprising amount of functionality Textblob has
become one of the best language on beginner level.It
makes text processing simple by providing an
intuitive interface to NLTK. TextBlob can be used
for initial prototyping for almost every NLP project.
On the other hand, SpaCy is the new kid on the
Fig 6. Histogram of SpaCy and Textblob APIs block, and it is becoming quite sensational. It‘s
showing comparison marketed as an ―industrial-strength‖ Python NLP
These Pie charts shows Results For SpaCy and library that‘s geared toward performance. SpaCy is
Textblob APIs. SpaCy shows better results than minimal and opinionated, and it provide you with
Textblob. As neutral is less in percentage. Also, plenty of options like NLTK does. It give the best
SpaCy took more time than Textblob for analysis. algorithm for each purpose so we don't have to waste
Hence, SpaCy is slow. To get the most accurate and time on choosing an optimal algorithm and we just
final results we created the text file which contains have to focus on productivity. SpaCy is built on
the result for each individual tweet and we take the Cython and lightning-fast but not as fast as
mode of both API to check weather the tweet is TextBlob. SpaCy is known as ―state-of-the-art,‖ but
Positive, Negative or Neutral. its main weakness is that it currently only supports
English.
Table 1. Difference between Textblob and SpaCy
on the basis of recent Repository SpaCy is new but growing at good pace and will
probably over take NLTK. If one is building a new
Repository TextBlob SpaCy application or revamping an old one then one must
Stars 6,130 13,094 try SpaCy.

Watchers 283 537 REFERENCES

Forks 805 2,214 [1] Taboada, M., Brooke, J., Tofiloski, M., Voll, K.,
& Stede, M. (2011). Lexicon-based methods for

Volume 10, Issue 1 ∙January-June 2019 75


sentiment analysis. Computational linguistics, [5] Braja Gopal Patra, Dipankar Das, and Amitava
37(2), 267-307. Das.Sentiment Analysis of Code-Mixed Indian
Languages: An Overview of SAIL_Code-
[2] M. Ravichandran, G. Kulanthaivel, and T.
Mixed Shared Task @ICON-2017.Article ·
Chellatamilan.Research Article on Intelligent
March 2018
Topical sentiment Analysis for the
Classification of E-Learners and Their Topics of [6] Prabu Palanisamy, Vineet Yadav and Harsha
Interest, Hindawi Publishing Corporation. Elchuri,Serendio: Simple and Practical lexicon
The Scientific World Journal.Volume 2015, based ap proach to Sentiment Analysis.
Article ID 617358, 8 pages Serendio Software Pvt Ltd Guindy, Chennai
600032, India
[3] Sudhir Kumar Sharma,Sentiment Predictions
Using Deep Belief Networks Model for Odd- [7] Haseena Rahmath P, Tanvir Ahmad. Sentiment
Even Policy in Delhi.International Journal Analysis Techniques - A Comparative Study.
of Synthetic Emotions , Volume 7 • Issue 2 • IJCEM International Journal of Computational
July-December 2016 Engineering & Management, Vol. 17 Issue 4,
July 2014 25 ISSN (Online): 2230-7893
[4] Ana ̈ıs Collomb ,Crina Costea ,Damien Joyeux
,Omar Hasan and Lionel Brunie .A Study and [8] Mining the Social Web, 3rd Edition.Book by
Comparison of Sentiment Analysis Methods for Matthew A. Russell, Mikhail Klassen
Reputation Evaluation. University of Lyon,
INSA-Lyon, F-69621 Vil leurbanne, France

76 IITM Journal of Management and IT

You might also like