Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Theory Digital Assignment

Aditya Sarangarajan
20BCE0985
Theory Slot: E2
Mathematical Modelling for Data Science

Comparing various Machine learning algorithms


for Intrusion Detection Systems:-
Research Algorithm Method Result
Paper
Bayesian The Bayesian The Bayesian
Bayesian Based Classifier classifier
IDS is built out
Intrusion identifies threats
Detection of a naïve
with an accuracy
System Bayesian
of 99.03%.
classifier. The
classifier is
Journal of King anomaly based. It
Saud University - works by
Computer and recognizing that
Information features have
Sciences
different
Volume 24, Issue probabilities of
1 occurring in
attacks and in
normal TCP
January 2012
traffic. The filter
is trained by
giving it
classified traffic.
It will then adjust
the probabilities
for each feature.
After training,
the filter will
calculate the
probabilities for
each TCP
connection and
classify it as
either normal
TCP traffic or an
attack. Therefore
our Bayesian
filter consists of
the following two
components:

(i)Training
engine
For each input
record there is a
label describing
the type of
connection. We
use this label to
train the engine
as following:

First the
number of
good records
and bad
records in
the training
dataset are
calculated.

Then two
hash tables
are created;
the first one
includes the
frequency of
each attribute
for normal
records, and
the second
one includes
the
frequency of
each attribute
of the not
normal
records.

Finally, a
third hash
table is
created. This
table
contains
each attribute
from the
normal and
not normal
records and
it is scored
using the
following
formula
where

B is the
frequency of
that attribute
in the hash
table related
to not-
normal file.

G the
frequency of
that attribute
in the hash
table related
to normal
file.

After training the


engine, it is
tested by loading
the KDD
corrected dataset.
The following
formula is
applied to obtain
a probability of
whether the
record is normal
or not where

n: number of
attributes
that we need
to use to test
the required
record

score(i): the
score of the
attribute
The record is
considered to be
an attack if the
P(record) is
greater than a
specified
threshold.

Decision Tree The experiments The accuracy in


Decision Tree were done for detecting threats
Based performance using this
Algorithm for comparison of algorithm was
Intrusion different tree found to be
Detection based classifiers 79.5245%.
January 2016 and the DTS
algorithm. The
analysis is done
based on
different
parameters such
as how many
seconds the
classifier takes to
construct the
model, false
positive rate, true
positive rate, and
accuracy. True
Positive (TP)
represents the
examples that are
correctly
predicted as
normal. True
Negative (TN)
shows the
instances which
are correctly
predicted as an
attack. False
Positive (FP)
identifies the
instances which
are predicted as
attack while they
are not. False
Negative (FN)
represents the
cases which are
prefigured as
normal while
they are attack in
reality. Accuracy
can be defined as
the number of
correct
predictions. The
Receiver
Operating
Characteristic
(ROC) curve is
also plotted for
various
techniques. ROC
plots the curve
between true
positive rate
(TPR) and false
positive rate
(FPR) of an
algorithm
Random Forest Random forest is The accuracy in
Random Forest an ensemble detecting threats
Modelling for classifier used to using this
Network improve the algorithm was
Intrusion
accuracy. found to be
Detection Random forest 99.67%
System consists of many
decision trees.
Random forest
January 2016
has low
classification
error compared to
other traditional
classification
algorithms.
Number of trees,
minimum node
size and number
of features used
for splitting each
node. When
constructing
individual trees
in random forest,
randomization is
applied to select
the best node to
split on. This
value is equal to
√A, where A is
no. of attributes
in the data set14.
However, RF will
generate many
noisy trees,
which affect
accuracy and
wrong decision
for new sample
Support Vector Support Vector The accuracy of
Support Vector Machines Machine or SVM the SVM was
Machine Based
Intrusion is one of the most highest in the
popular polynomial
Detection kernel with an
Supervised
System (IDS) accuracy of
with Different Learning
detection being
Kernels algorithms,
97.64%
July 2013 which is used for
Classification as
International well as
Journal of
Regression
Electronics
Communication problems.
and Computer However,
Engineering primarily, it is
used for
Classification
problems in
Machine
Learning.
The goal of the
SVM algorithm
is to create the
best line or
decision
boundary that can
segregate n-
dimensional
space into classes
so that we can
easily put the
new data point in
the correct
category in the
future. This best
decision
boundary is
called a
hyperplane.
SVM chooses the
extreme
points/vectors
that help in
creating the
hyperplane.
These extreme
cases are called
as support
vectors, and
hence algorithm
is termed as
Support Vector
Machine. Here
different Kernels
are used as for
each kernel the
SVM has a
different
performance.

Back propagation The proposed Success rate for


Back neural network IDS architecture detection of
propagation may be divided threats in this
neural network into four sub- method is 95.6%
approach to processes
Intrusion
Detection The
System functionalities of
December 2011 these
sub-processes are
briefly described
below.
Data Collector
Pre-processor
Encoder
Neural Network
Classifier

Data Collector
: NSL-KDD data
set is first of all
collected at
this block.

Pre-processor
: This block takes
the original data
from
the MIT Lincoln
Lab, extracts the
required features,
and
converts the data
set into Matlab
compatible
format. This
basically
performs the data
cleaning
procedure.

Encoder
: The attributes
given in the data
set are
converted into
double data type
to make it
compatible
with the ANN
Tool box of
Matlab.

Neural Network
Classifier
: The data at the
output of the
encoder stage is
fed into the
neural network

The results given are the highest detection rates


achieved with the particular type of algorithm used in
detecting various types of threats. According to the
above data and research papers, the least accurate
method for detecting threats successfully is Decision
Tree method with an accuracy of 79.5245% and the
most successful algorithm for detecting threats is
Random forest algorithm with an accuracy of 99.67%.
The above experiments were conducted using an
NSL-KDD dataset.

You might also like