Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

CHAPTER- 2

LITERATURE SURVEY

2.1. INTRODUCTION

This chapter is intended to affords an in-depth look at the existing approaches in


cloud security and attacks detection technologies, with the aim of giving comprehensive
background on the area in which the research fits. The chapter is structured around six main
sections. Section 2.2 outlines the state-of-the-art cloud security and cloud attacks as they exist
in past decades. Section 2.3 considers the role of effective intrusion detection in securing
cloud systems and devices, emphasising the need to move away from reliance on network
threat detection mechanisms that are widely deployed in systems. Section 2.4 explores the
role of honeypot technology used as a tool for data collection and attack analysis. Section 2.5
presents the earlier approaches in feature selection approaches for the intrusion detection
system. Section 2.6 discusses the state of the art in machine learning based classification for
intrusion detection systems. Section 2.7 explores the earlier approaches in ensemble learning
approaches for IDS. Section 2.8 explores the proposed research work.

2.2 CLOUD SECURITY

Cloud computing provides many new facilities to the user community. Therefore,
cloud security has become a major challenge for many researchers and internet users. This is
due to the fact that new kinds of attacks by malicious user’s are coming up every day. In
order to alleviate these security threats in the cloud, it is important to impose security
requirements on the data and services of the cloud [9]. There are many serious security
threats that have emerged in recent past like network security, data security for maintaining
secrecy and privacy of personal data, accessing and maintaining data, application security,
web security and virtualization security. As a result of the vulnerability of the cloud
computing system, Different security issues have emanated. Therefore, there is an urgent and
strong requirement to identify unique security threats and vulnerabilities by evaluating cloud
networks [12].
An intrusion detection system for a cloud system can be used to detect several types
of attacks mainly that denial of services and masquerade attacks that may compromise the
security and trust of the cloud system [10]. IDS is a useful software that protects servers and
nodes of a cloud system from malicious actions. Many methods are seen the literature on
cloud security in recent years, but IDS incorporating intelligent classification techniques have
proved to be effective in intrusion detection in comparison with other techniques used for
detecting cloud intrusions. In this chapter, a detailed literature survey on IDS is provided with
suitable discussions. Figure 2.1 depicts the overall structure of state-of- the-art analysis for
IDS.

Figure 2.1 Overall structure of the state-of-the-art analysis

2.2.1 Cloud Security Attacks


Cloud security alliance (CSA) identifies malicious insider attack as the most leading
threats in cloud system to affect the cloud resources [2]. Cloud security is most essential
research issue as malicious insider attack is a severe one. Junaid Arshad et al. [9] have
discussed insider’s threat from three different perspectives and investigated cloud system
structure which attracted the malevolent insiders. They have defined the nature of malicious
insider as. 1. Malevolent insider owing to the dishonest cloud administrator, 2. The worker in
the organization who reveals private information to others, 3. Insider who uses cloud
resources to harass the infrastructures. They have not provided any solution for the problem
of detection of attackers. The cloud system provides storages resources through IaaS which
makes use of cloud virtualization technology [15].
Preeti Mishra et al. [31] made a thorough analysis of IDS in cloud computing
systems. Multiple IDS were placed at each cloud infrastructure layer for protecting every VM
in the cloud environment. Cloud alliance concept has been adopted in their work for
obtaining communication among several IDS. Based on the detection technique, the IDS
could be divided as 1. Behaviour-based IDS and 2. Knowledge-based IDS. Behaviour based
IDS is used to detect the unknown attacks by monitoring the deviations from normal to
abnormal behaviour of users or systems. Knowledge-based IDS is helps to identify the
known threats by using the pattern of threats and expert rules of the system.

Chirag Modi et al. [32] have developed a methodology which considers security
issues as an essential part of cloud systems to design and implement. This was meant for the
implementation of secure cloud applications and service. A set of stereotypes was used to
define a vocabulary for annotating Unified Modelling Language (UML) based models with
information relevant for integrating the security specifications into clouarchitectures.

Taeshik Shon & Jongsub Moon. [39] have proposed a network intrusion detection
system based user behaviour stored in a profile. The system aims to create profiles for each
legitimate user. They have proposed new IDS by using the combination of behaviour based
and knowledge based IDS to complement each other. Hence any deviation from the
legitimate user actions will be identified as intruder. This system cannot detect the intruder
who is imitating legitimate users.

Takeshi Takahashi et al. [41] employed a hybrid and categorised event correlation
methodology to detect intrusion systems by using a comprehension base which is signified by
Ontology. This work uses distributed security probe and a complex event processing engine
to identify intrusions at various levels of cloud architecture.
Hypervisor is a software, used to control and monitor the behaviour of VMs and
middleware in cloud systems, also referred as Virtual Machine Monitor (VMM). Hence, the
feasible solution is to apply the intrusion detection framework in cloud [42]. Perhaps, a start
can be made, by considering the dynamic nature of cloud system, Kleber Vieira et al. [43]
have suggested the idea of the Virtual Machine Introspection (VMI). They have designed a
prototype named Livewire that exploits the functionalities of Host based IDS and drags out
the functionalities of network-based IDS to identify the network related attacks in the virtual
cloud network. The authors proposed a novel architecture for developing IDS having a
semantically rich view and high resistance qualities through virtual machine introspection
[44].

Several researchers have proposed cloud architectures like secure virtualized


architecture [45] architecture named CSAViD (Cloud Application SLA Violation Detection
Architecture) [46] and architecture for insider threat security reference (ITSRA) for securing
cloud environment where the organizations have to prepare adequate security controls.
Obviously, decoying information technology was used to confuse the attacker. H. Monowar,
et al. [47] have developed a detection component called VMWatcher which aims to support
especially anti-malware software for outsider attacks only. However, no one has used SLA
violation detection method for malicious insider detection. Likewise, artificial neural network
has been used for the detection of malicious insiders in a cloud environment.

Karen Scarfone et al. [48] offered a new IDS and prevention system to explore attacks
with various nondeterministic actions. They proposed a new direction for overcoming the
reasonable time and space overhead and checking the intrusive activities. This model uses
introspection and replay method to advance security by means of identifying malicious
activities that occurred even before the vulnerability was detected [49].

Several researchers established a novel IDS to identified the attackers by using


numerous machine learning tactics. Hi-Fong Tsai, et al. [50], log based approaches and data
mining techniques. Heber Ezra et al. [51] have created a new IDS for identifying a DoS
attack using Machine Learning Techniques with Gain Ratio.

2.3. INTRUSION-DETECTION-SYSTEMS

Intrusion detection systems are used for detecting intrusive activities in a computer,
network or cloud system. Best Intrusion Detection Systems (IDSs) are developed based on
signatures which are formed using manual coding of expert knowledge. These systems match
the normal and malicious activities on the system being monitored to known signatures of
attacks. The main problem with this approach is that the existing IDSs fail to generalize the
existing attacks in order to detect new attacks or attacks without known signatures. Recently,
there has been an increased interest in the application of data mining techniques for building
security models using IDSs. These models are capable of predicting known attacks and
normal behaviour and hence they are able to detect unknown attacks. One limitation of
existing IDS is that, it is not capable of performing difficult analysis tasks on audit data using
the knowledge of domain experts. Though several methods have been proposed for intrusion
detection in distributed databases by various researchers [23,24,25], they are not sufficient to
safeguard the data stored in cloud databases.

In the past, the intrusion detection systems were used for the detection of only specific
types of attacks. An integrated approach has been developed by A. Hisham et al. [26] for the
detection of all types of attacks and guarantee of safety for cloud users.

D.P. Gaikwad et al. [69], modelled multithreading approach to improve the


performance of IDS in cloud environment. This multithreading ID works on three modules
namely capture modules, analysis module and reporting module helps designing huge amount
of data packets flows. The explanation for such modules are as follows 1. The capture
module is designed to observe the data packets from the network, 2. The analysis module is
used for the analysis of observed data packets by utilizing the stored set of rules. 3. The
reporting module is used for the generation of reports based on the alerts issued. However,
this system is used for the detection of only network based attacks.

Chirag Modi et al. [33] have discussed the security and the privacy threats against
cloud and recent techniques to overcome the threats. They analysed the issues in different
perspectives including stored data security, user identity management, and secured
virtualized environment.
Gustavo Nascimento & Miguel Correia. [35] have proposed a hybrid architecture for
the detection of intrusions in the distributed cloud environment and protect them from
possible security threats. This system uses individual IDS for every user in cloud, where a
controller is used for the control of the several IDS instances. Since, the signature based IDS
is used in this system, it is not capable of detecting all the attacks. A survey on past cloud
based IDS in terms of their type, deployment, detection method, source of data, detection
accuracy and types of attack was carried out by the authors Ahmed Patel et al. [36]. This
research deals with the restrictions and limitations of each technique for the evaluation of
further security requirements of cloud computing environment. They have highlighted the use
of IDS which employs multiple detection methods for handling the security challenges.

Ryan Shea et al. [37] have used snort based IDS on linux to guarantee secure and
efficient management in virtualized environment. The snort based IDS was used for the
analysis of network traffic and flow of data in networks. Snort is used to analyse the traffic in
network by using the rule written by the programmer in any language.

K. Parag et al. [38] considered the security level of IDS and system performance for
enabling secured and reliable services due to the trade-off between them. They found the
system consuming large computational resources towards the level of security when more
security rules were used. They insisted on a huge amount of log details being very hard for
the analysis by the system administration. This problem has led to the proposal of a technique
to stop the trade-off between system resources and level of security.

An IDS based on collected audit data has been proposed by H.H. Kayacik et al. [17].
This system describes the IDS for cloud architecture and the use of agents for the collection
and passing of audit data to storage and analysis service where there is no knowledge about
how the data is used and how the system is not scalable. M.M. Najafabadi et al. [20] have
proposed an IDS to overcome the scalability problem by distributing the IDS among different
analysis servers.

J. Mirkovic et al. [19] recommended a performance based IDS for the detection of
DoS attack by doing analysis of the network traffic. Even though, the system uses cloud
abundant resources, it works only for detecting network based IDS.

B.Li et al. [16] made a survey of different intrusions affecting cloud computing
security with respect to availability, confidentiality and integrity of cloud services and
resources. They have discussed the integration of various IDS and their detection techniques.

Mrutyunjaya Pandaa, et al. [53] have proposed a network based intrusion detection
system for cloud systems which utilize a hybrid approach based on fuzzy clustering and
artificial neural network. The purpose was the detection and analysis of the activities of the
cloud network. This NIDS is located outside the virtual servers such as switches, routers or
gateways to examine the complete system. Since the Network based IDS is suitable only for
traditional network based systems, this model cannot obtain a high detection accuracy. The
system forwards the intrusive alerts to a third-party monitoring service who can, in turn
directly alert the cloud user about the system under attack. If the third party is compromised,
the monitoring service gets neutralized.

Jun-Ho Lee et al. [13] have designed a distributed IDS which integrates knowledge
and behaviour analysis to detect the intrusions to enable handling the large volume of traffic
data and administrative control. However, there are some problems still present arising as a
result of low detection accuracy and high false positives

IDSs developed for wired and wireless networks are different from the IDSs
developed for cloud systems due to the structure and properties of the cloud. Compared with
wired and wireless networks where traffic monitoring for security is usually performed at
networking devices, monitoring of the cloud requires additional methods to take care of
virtualized environments [24]. Therefore, the intrusion detection algorithms must work on
both networks and cloud.

Claudio Mazzariello et al. [30] suggested a distributed intrusion detection systems for
networks in which every node participates in intrusion detection activities and hence is a
host-based IDS. On the other hand, the network-based IDS model is more suitable for cloud
databases. Intelligence techniques can be used for making effective decisions on intrusion
analysis. Therefore, soft computing techniques and other techniques from Artificial
Intelligence (AI) such as Intelligent agents can be used to improve an intelligent intrusion
detection system.

S.N. Dhage et al. [23] have proposed the Intrusion Detection System architecture for
the deployment of security in distributed cloud infrastructure. A new plugin concept of the
Event Gatherer which is realized as Handler, Receiver, and Sender to provide flexible
integration of various sensors has been added. The Intrusion Detection Message Exchange
Format standard has been used in their work to provide an alarm representation. They have
developed a typical user interface for viewing the results as user friendly. It also combines
virtual machine monitor and virtualization techniques for handling Virtual Machine (VM)
based IDS.
P. Srinivasu, et al. [116] have proposed a new algorithm for intrusion detection based
on GA to detect intrusion in computer networks. They have used an effective feature
selection method and information theoretic approach for the extraction of relevant features
and reducing time complexity. Moreover, they have developed new methods for rule firing
and hence improved the classification accuracy. S.P. Muhammad et al. [119] have proposed a
GA based algorithm for effective misuse intrusion detection. In their work, they performed an
effective pre-processing in order to enhance the accuracy of classification. The main
limitation of their work is that they have focused on specific attacks from the KDD Cup
dataset.

J. Kevric et al. [109] have proposed a new intrusion detection model with a tree
induction algorithm that automatically classifies the attacks in a network environment. They
have also used a Decision tree algorithm to classify the network attacks. Following this, they
evaluated the methods on a cloud dataset, the results showed some low performance.
D.Gaikwad et al. [65] have proposed a DT algorithm with feature reduction in the design of
an IDS using a GA based approach. Their algorithm was developed by enhancing the C4.5
algorithm.

A. Snehal et al. [34] have recommended a new intrusion detection model based on
fuzzy logic. Their work is based on decision tree data mining approach for effective intrusion
detection on TCP dump data. Gisung Kim et al. [63] who have suggested a novel misuse
based intrusion detection system, they used two algorithms namely, decision tree and
clustering algorithms to identify the known and unknown intrusions effectively. Rajeswari &
Kannan (2008) have proposed a rule-based approach that used a modified version of the C4.5
algorithm called Enhanced Decision Tree algorithm. The algorithm has been tested with the
KDD Cup 99 data set and improvement in the classification accuracy has been proved. In
spite of these existing works, the securities of cloud networks are not complete without
suitable IDS for the cloud. Hence new IDS for cloud proposed in this thesis.

2.4 HONEYPOT AND HONEYNET FOR INTRUSION DETECTION

Honeypots are devices which imitate as legitimate and systems in order to detect,
track and analyse patterns of user behaviour when the system is illegally accessed. They are
categorised as an active network defence mechanism, and are deployed solely with the
intention that they will be attacked. A key characteristic of a successful honeypot its
attractiveness to an attacker. ’Attractive’ in this context means that the honeypot should
appear to be exactly the device that the attacker is looking for: A device that has the potential
to be easily exploited, whilst offering maximum value to the attacker.

Honeypots have been used widely for research on network traffic and malware. The
founder of Honeynet Project Lance Spitzner [32] defines honeypots as” a security resource
whose value lies being in attacked or compromised”. Honeypots leverage is the concept of
deception in order to combat attackers. As explained in a paper by Cliff et al.,” the crucial
parts in cyber fraud include made information by the protector and wrong actions taken by
the adversary as a result of the deception.” [35]

Dowling et al. developed a research honeypot targeted at attackers of Zigbee devices,


which are devices commonly used in WSNs. [14] The development of this honeypot was
motivated by the fact that as these IoT devices are being more widely-deployed their
vulnerabilities are also getting well understood. Therefore, an assessment of the threat to
these devices is essential. In their implementation of the honeypot, Dowling et al. included a
number of interesting measures to initiate the interest of attackers in the honeypot.

Pisarcik et al. (2014) [36] have developed a distributed honeynet of high-interaction


honeypots using containers for OS-level virtualisation, an approach which at that time was
relatively unexplored in research. [61] Their motivations for developing such a solution were:
The fact that administration of honeypot-driven systems is time-consuming, motivating the
use of containers to simplify their management; and the ability of honeypots to correlate
attack events in order to determine whether they are localised or distributed in nature, quickly
allowing for identification of attack trends.

Though the approach to designing and evaluating their honeynet solution is worthy of
note, the more significant contribution made by this research is in relation to the
containerisation of honeypots and honeynets. The research highlights the fact that compared
to virtual machines or bare-metal systems, OS-level virtualisation incurs very little
performance or maintenance overhead. [61] As well as this, they note that using containers
adds an additional layer of deception to honeypots: Since they are isolated environments
sharing the kernel of a real OS, they are more likely to appear as a legitimate system when
fingerprinted.
Kedrowitsch et al. [45] explore the use of Linux-based containers (LXCs) in honeypot
deployments. [62] However, the use of LXCs here is examined from the perspective of their
ability to evade detection by malware, which has often been found to change its behaviour
after detecting a VM-based environment. [63] The motive behind this was the identification
of whether container environments will be feasible in the long term as an approach to hosting
honeypots without being detected by malware.

TraCINg is a honeypot-driven cyber incident monitor developed by academic


researchers Vasilomanolakis et al. in 2015, [43]. Its development was motivated by the
observation that the consolidation of data gathered from honeypots in different deployment
contexts can allow attack data to be correlated, enabling the identification of emerging
outbreaks of related attacks. The TraCINg system gathers data from a large number
geographically distributed honeypots with the aim of correlating attack events. Many of these
honeypots are deployed on cloud platforms, which was the approach by the researchers to
provide the greatest diversity of deployment locations and maximum system uptime for
uninterrupted monitoring. Arbitrary open-source honeypots can be used with the TraCINg
system, provided that their data is logged in JSON format.

This incident monitoring solution is not intended for high deployment or low cost for
use in production systems, but addresses the consideration of aggregating and summarising
complex data in a meaningful manner through visualisation, enabling the succinct description
of important data for key decision-makers. Their proposal for correlation of data from large
deployments distributed honeypots is very interesting: However, their solution uses two low-
interaction honeypots, limiting the level of detail of the attack data captured. An improved
system would have the ability to capture more data about attack events in order to gain
greater insights into the motivations and methods of attackers.

Majithia et al (2017) [8] have used the model of running honeypots of three types on a
Docker server, with a logging management mechanism that is built on top of ELK framework
and discussed issues and security concerns associated with each honeypot. The honeypots
used were HoneySMB7, HoneyWEB-SQLi, an http protocol honeypot that includes SQL
injection vulnerability and HoneyDB, a honeypot built for mysql databases vulnerabilities,
the work displayed analysis of the attacks using a unique IPs and the distribution among the
honeypots.

Adufu et al. (2015) [9] investigated and compared running molecular modeling
simulation software, auto dock, on a container-based virtualization technology systems and
hypervisor-based virtualization technology systems, and concluded that the Container based
systems managed memory resources in an efficient manner even when memory allocated to
instances are higher than physical resources, not to mention the reduction in the amount of
execution times for multiple containers running in parallel

Stockman et al. (2015) [12] adopted a free honeynet system to capture the hacker’s
illegal actions. there experiment was done for 25days, almost 200,00 hits were found during
this period. This experiment investigated ways to afford destination interest for attackers to
attack services such as web server, FTP server and database server.

2.5 FEATURE SELECTION METHODS FOR INTRUSION DETECTION

This section presents some findings seen in literature relating to feature selection
methods for the intrusion detection system. Feature selection helps reducing the insignificant
features from the data set. It can be performed in two ways namely, tuple reduction and
attribute reduction. Feature reduction using attribute selection helps to improve the
performance of the classifier.

Literature has several research works on feature selection Travallaee et al (2009) [15];
Li et al (2009) [16]; Farhan et al (2010) [17]; Sindhu et al (2012) [18]; Ganapathy et al
(2013) [19]. Verikas (2002) [20] proposed a novel feed-forward neural network based feature
selection algorithm for selecting the features relevant for classification with focuses on
effective training based on features. Liu et al (2003) [21] have proposed a method for feature
selection based on reduced subset evaluation which uses search space optimization using
heuristics. It uses pruning of search space based on the search direction. Srilatha et al (2005)
[22] have recommended a novel feature selection techniques for providing an optimal
number of features to the intrusion detection system. They have applied these selected
features for classification with the Navie Bayes classifier and regression tree-based
classification algorithms. Chi-Ho Tsang & Kwong (2005) [23-24] in their work have
proposed a genetic oriented feature selection algorithm which provides an optimal feature
subset through the row and column reduction. All these works are based on better activation
functions for neural classifiers.

Dimensionality reduction can be done by either feature selection or feature extraction


Guyon and Elisseff (2003) [25,26], van der Maaten et al. (2008) [27]. Feature extraction
reduces data dimensionality by predicting the data into lower dimensional space formed by
combinations of features. Feature selection is with high dimensional data is especially useful
where there are thousands of features Kohavi and John (1997) [28]. In such data sets, it is
necessary to find an optimal feature subset. Guyon and Elisseff (2003) [29] have proposed a
new feature selection algorithm and have proved its efficiency in improving the performance
of learning models built on the selected features.

Bovas Abraham & Giovanni Merola (2005) [31-32] have proposed a feature selection
methodology for the selection of relevant input features, in which they used multivariate filter
methods for selecting the best feature subsets among the entire feature space. They achieved
the final robust subset among initial feature subsets and improved the unambiguousness of
learned outcomes.

Muhammad Atif Tahir et al (2007) [33] have proposed an innovative real-time feature
selection ranking methodology based on K-NN classifier. They achieved improved
performance and reduction in data dimensionality by predicting the data into lower
dimensional space formed by combinations of features.

Richard Nock et al (2002) [34] have offered a new feature selection techniques based
on hybrid approaches such as filter and wrapper methods. In which, provides an optimal
feature subset through information theory and the subsets of features are evaluated by
classification model, they have achieved the best performance.

Huan Liu et al (2003) [35] have presented a new feature selection approach using
stability-based search technique. They have used a heuristic feature selection algorithm and
evaluated the feature’s quality with some heuristic measures such as information gain, Gini
index, inconsistencies measure, and Chi-square. Manoranjan Dash et al (2003) [36] have also
proposed a feature selection methodology using the stability-based search technique. They
achieved improved performance and reduction in data dimensionality by predicting the data
into lower dimensional space. However, most of the existing systems are not suitable for
cloud environment. Hence, new feature selection techniques are proposed in this thesis to suit
the cloud environment

In any case, a huge majority of the current FS approaches, including PSO-based


methodologies, intend to boost the characterization execution as it were. In this manner, it has
tried to utilize PSO for building up a multi-target FS way for dealing with minimizing the
arrangement blunder rate and minimizing the number of elements chosen. Subterranean
insect state improvement is enlivened by the practices of ants and has numerous applications
in discrete advancement issues. The methodology depends on a meta-heuristic that is utilized
for the management of different heuristics keeping in mind the end goal to acquire preferable
arrangements over those that are created by neighbourhood enhancement strategies; in ACO,
a province of manufactured ants participates to search for good answers for discrete issues
Kennedy and Spears, (1998) [40]. ACO ant colony optimization algorithm is especially
appealing for highlight determination since there is no heuristic data that can manage pursuit
to the ideal insignificant subset unfailingly. Then again, if components are spoken to as a
diagram, ants can find the best element blends as they navigate the chart Qablan et al., (2012)
[42]. One of the effective way to deal with managing the class irregularity issue is engineered
minority over testing procedure (SMOTE) Synthetic Minority Over-Sampling Technique. In
this strategy, SMOTE creates minority class inside the covering locales.

2.6. MACHINE LEARNING METHODS FOR INTRUSION DETECTION

Lee & Stolfo (2000) [72] have proposed a new intrusion detection model based on
machine learning. They applied association mining algorithms on audit data to compute
frequent patterns and to select features. They used selected features from KDD Cup Dataset
to obtain better classification occurrence. Mukkamala et al (2000) [73] have provided a new
approach to intrusion detection using enhanced SVM.In their work, they used only relevant
for building features classifiers. Amor et al (2004) [74] proposed a hybrid classifier using
Naive Bayes and DT for intrusion detection. The performance of their hybrid model was
better than that the individual classifier. Steinwart et al (2005) [75] proposed a new and
modified SVM to identify anomalies. Their system was providing higher accuracy than the
basic SVM.
An anomaly-based IDS built using the Bayesian network discussed by Mutz et al
(2006) [76] was meant to detect traces of intrusive activities. In their work, multiple models
were established and consequent events are analysed to detect deviations, with an assumption
that anomalies represent evidence of an intruder. Anomaly score from each model was
calculated and a sophisticated Bayesian method was used for combining anomaly score from
each model into an overall aggregate score. This aggregate score was used for the
determination of whether an event is a part of intrusive activities. As individual anomaly
scores were frequently contradicting they proposed a weighted sum aggregate score that can
deliver reliable results.

Min (2007) [77] have proposed a network intrusion detection technique based on
pattern recognition which can detect intrusions even without any prior knowledge about
earlier intrusive signatures. This author generated fuzzy rules using induction enhanced PSO
for intrusion detection. Das et al (2008) [78] established a new network based IDS, in their
study, they used the Field Programmable Gate Array (FPGA) method. Their system which
was developed using FPGA handles high throughput and flexibility to the dynamic nature of
NIDS. Principal component analysis was used by them to detect outliers which included a
feature extraction module to extract features.

Yang et al (2008) [79] have suggested a new approach for intrusion detection in
which, rather than sending packets to any destination host at any time, senders must first gain
authorization to send from the receiver, which provides the authorization in the form of
capabilities to those sender’s host machine whose network traffic it agrees to believe. The
sender incorporates these capabilities in the packet. This allows the verification nodes which
are distributed across the network to verify that traffic has been authorized by the receiver
node and the path in between and hence it clearly discards unauthorized network traffic.
Evaluation of this approach have been performed by the authors using a network architecture
called traffic validation architecture.

A framework for automatic detection of known and unknown intrusions was put
forward by Portnoy et al (2001).[80] Their system, does not required any classified data for
training since a single linkage clustering algorithm is used for the creation of clusters from
the input data instances. Various combinations of KDD CUP 99 dataset have been used by
them for training and testing, using standard cross-validation techniques such as ten-fold
cross-validation. Each combination produced slightly different results. On an average, the
detection rate obtained was between 40% and 55% and a false positive rate within 1.3% to
2.3%. This detection rate was very low in comparison with all other works.

Zhi-Xin Yu et al (2005) [81] have outlined a novel adaptive intrusion detection model
based on data mining which applies a heuristic clustering algorithm for the classification of
normal and intrusive actions. In that system, an attribute constraint-based fuzzy mining
algorithm has been used for automatic construction of an intrusive pattern dataset. Their
experimental results, shows the clustering algorithm as successful in terms of not only in
accuracy but also in efficiency with reference to network intrusion detection. The main
limitation of their system is that some parameters are based on limited statistic data and
knowledge of domain experts. Therefore, intelligent methods should be applied for the
implementation of detection accuracy by adopting and proposing new techniques.

Feng et al. (2014) [83] have used a hybrid classifier based on Self Organized Ant
Colony Optimization technique and Support Vector Machine (SVM) resulting in a detection
rate of 94.86%. Adel SabryEesa et al. (2014) [84] have used Cuttle fish algorithm which
resulted in a detection rate of 71.1%. Akilesh Kumar Shrivas et al. (2014) [85] have used an
ensemble of ANN and Bayesian Network and achieved a detection rate of 97.53% using
97.53% and a detection rate of 99.41% using KDD’99 dataset.

Gang Wang et al. (2010) [86] are known for the work based on a hybrid classifier.
Yung-TsungHou et al. (2010) [87] have used four machine learning algorithms, namely,
Naïve Bayes (NB), Decision Tree, Support Vector Machine (SVM) and AdaBoost. The
detection accuracies achieved were 58.28%, 94.745%, 93.51% and 96.14% respectively. Su-
Yun Wua et al. (2009) [88] have used C4.5 Decision Tree (DT) algorithm for intrusion
detection and have achieved a detection rate of 70.62% with a false alarm rate of 1.44%.
TichPhuoc Tran et al. (2009) [89] have used Probabilistic Neural Network with Adaptive
Boosting and achieved a detection rate of 94.31%. The performance of an IDS is a very
important factor in its success when it is put into operation. A combined approach used for
the search is slow in the detection and hence many intruders may enter into the system in the
meantime. Hence, it is necessary to propose a single optimal search technique.

Bhat et al. (2013) [96] used a machine learning based CIDS, where 91% success
alone was appreciated in detecting intrusions. This was due to the theme of having larger
number of records for more-frequent attacks and a smaller number of records for less-
frequent attacks in a set of intrusive patterns of an intrusion detection dataset. And also, this

Farid Oveisi et al (2012) [97] have recommended an efficient tree based feature
extraction technique in which a new feature is created at each step by selecting and making a
linear combination of two features such that the mutual information between the new feature
and the class is maximized. In the work proposed by Gautheir et al (2014) [98], a new
Intelligent Multi-Objective PSO and Mutual Information based Feature selection (IMOPSO-
MIFS) algorithm are introduced for selecting necessary features.

Gowrison et al (2013) [99] have designed and presented an intrusion detection system
for classification through the incorporation of enhanced rules as learned from the network
user behaviour with the less computational complexity of O(n). Their method was tested with
the KDD’99 cup dataset. Krunal & Rohit (2013) [101] have proposed a security framework
that integrates a network intrusion detection system in the cloud infrastructure. They used the
snort and Bayesian classifier machine learning based techniques for implementing their
proposed framework. A large set of rules can be generated for detecting intruders by the uses
of genetic operation. A tree-structured multiclass SVM has been proposed by Jun et al (2008)
[106] for handling multi-label features of benchmark data set. It provides better classification
accuracy than SVM and normal tree based classifier. Later on, an Intelligent Agent-based
Enhanced Multiclass SVM (IAEMSVM) has been developed by Ganapathy et al (2012)
[107] according to the tree-structured multiclass SVM for improving the detection accuracy.
Hence, an Intelligent Rule based Enhanced Multiclass Support Vector Machine (IREMSVM)
has been used for improving the classification accuracy on feature selected datasets.

Shun and Malki (2008) [108] have authored a Neural Network (NN) based IDS for
detecting internet-based attacks in computer networks. NNs are used by the authors for the
identification of known attacks and to predict future attacks from the experimental results.
They performed the experiments using the KDD Cup 99 dataset and the authors have proved
that their results for detection of intrusions are more effective than the existing works.

Adel Nadrajan Toosi & Kahani (2007) [109-110] have proposed a new IDS in which
Adaptive Neural Fuzzy Inference System (ANFIS) was used for the classification of
intrusions. According to them, the use of ANFIS produced better intrusion detection rates.
Thomas & Balakrishnan (2009) [111] have explained the use of multiple sensors for intrusion
detection. Their model showed better performance in intrusion detection but the sensor
consumes more power.

Xiaodan et al (2006) [114]; Lippman et al (2006) [115]; and Xiang et al (2004) [116]
have drawn an effective IDSs using various tree classifiers on standard datasets. Their work
focuses only on network security. However, it is necessary to provide cloud security by
classifying the data to identify the DOS attacks in cloud databases.

Sandhya et al (2007) [117] have presented an intrusion detection models for wireless
networks using a supervised classification algorithm. Their system evaluated the performance
of the Multilayer perceptron (MLP), and Support vector machine (SVM) discussed by
Aikaterini Mitrokotsa et al (2009) [118]. Their results the higher accuracy SVM exhibits in
classification when there was a two-level decision. Therefore, many other researchers have
developed intrusion detection systems using SVM (Jun et al 2008).

2.7 ENSEMBLE APPROACHES FOR INTRUSION DETECTION

An ensemble is the combination of various machine learning algorithms that are used
in the detection of intrusions using the standard available data and real-time data. Several
algorithms are combined on the base of combination methods used. Ensemble methods
usually produce more accurate results as compared with a single model.

Parikh and Tsuhan Chen (2008) [120] have proposed an ensemble based intelligent
intrusion detection using a machine learning based ensemble classifier, where 91% success
alone was appreciated in detecting intrusions. This was due to the theme of having a larger
number of records for more-frequent attacks and a smaller number of records for less-
frequent attacks in a set of intrusive patterns of an intrusion detection dataset. Their results
evidenced that ensemble methods exhibits high accuracy in classification when there was a
two-level decision. Therefore, many other researchers having established intrusion detection
systems using ensemble approaches.

Ming Guang- Ouyang et al (2002) [121] have proposed an ensemble based feature
selection and classification for intrusion detection system, in which an ensemble classifier
with feature selection and boosting algorithm was used for production of better intrusion
detection rates. In their experiment was evaluated on kddcup 99 cup and they achieved the
best classification accuracy on feature selected datasets.

Tsung-Nan Chou and Te-Shun Chou (2009) [122] have proposed a hybrid intrusion
detection system that combines anomaly-based intrusion detection and signature-based
intrusion detection. They used an ensemble feature selecting classifier and a data mining
classifier. Their system was capable of finding intrusions effectively.

Dartigue et al (2009) [123] are known for a unique intrusion detection based on data
mining, they used machine learning based ensemble classifier techniques for implementing
their proposed framework on KDD cup dataset and achieved the best detection rate and
classification accuracy.

Breiman (1996) [124] demonstrated the success of stacking method for regression. In
their study, they used regression decision trees with different sizes or linear regression
algorithms with different numbers of variables as the first-level. They also used a linear
regression algorithm as the second-level algorithm and put a constraint that coefficients of all
regressions should be positive. Breiman (1996) [125] emphasized that this constraint is an
important factor for the performance of stacked ensemble when it is compared to the single
best learner. Ting & Witten (1999) [126] have recommended the use of the class probabilities
as a feature for a training dataset used in the second level, they refer to the mistake that made
it to take into account both the predictions and the confidences of the classifiers.

Bahri et al. (2011) [127] have proposed a new intrusion detection system model called
Greedy-Boost method, they have used three classifiers namely Decision tree, AdaBoost, and
Greedy-Boost, Classification performance was compared in terms of precision and recall
performance measures. Results obtained shown as the better performance of greedy-boost
than other methods, Syarif et al. (2012) [128] have proposed an ensemble based intrusion
detection system, they have used three different ensemble methods, namely bagging,
boosting and stacking, in order to improve accuracy and reduce the false positive rate. They
used selected features from NSL-KDD Dataset to obtain better classification occurrence.

Feng et al. (2014) [129] have suggested a new hybrid classifier based IDS using Self
Organized Ant Colony Optimization technique and Support Vector Machine (SVM) resulting
in a detection rate of 94.86%. Adel Sabry Eesa et al. (2014) [130] have used Cuttle fish
algorithm which results in a detection rate of 71.1%. Akilesh Kumar Shrivas et al. (2014)
[131] have used an ensemble of ANN and Bayesian network and achieved a detection rate of
97.53% using 97.53% and a detection rate of 99.41% using KDD’99 dataset.

Tama et al (2015) [133] and Rhee et al (2015) [134] proposed a new intrusion
detection model. They applied the ensemble feature selection method based on particle
swarm optimization and correlation-based feature selection techniques to compute frequent
patterns and select features. They used only the relevant features to build ensemble classifiers
and hence it was able to recognize known intrusion and unknown intrusions in real time.
They used selected features from NSL-KDD Dataset to obtain better classification
occurrence.

2.8 PROPOSED WORK

In this research work, a novel intrusion detection system has been designed for cloud
computing. which is capable of detecting and classifying the well-known malicious network
security attacks on the cloud environment using Supervised learning algorithms and
Ensemble learning techniques. In evaluation of an IDS, requires the creation of a dataset to
serve as a ground truth. The system has been integrated with the existing Honeynet setup, it
demonstrates the researcher’s approach using data from real Honeypots which are deployed
on the cloud using Docker. For this purpose, a new architecture has been proposed for data
collection in the cloud environment. Which explains the various components of the proposed
system.

In addition, an effective ensemble feature selection methodology has been proposed


for the selection of the valuable reduced feature set, in which the research has used univariate
ensemble based filter feature selection technique for intrusion detection. The output of filter
features selection techniques such as Info gain, Gain-ratio, Chi-squared, Symmetric
uncertainty, and Relief has been combined to produce the final outcome.

According to the inference from the in-depth study of the relevant literature survey,
the work proposed in this thesis is different and efficient in many ways. First, most of the
intrusion detection systems referred to the literature have been developed for securing cloud
networks. On the other hand, the intrusion detection systems available in literature have used
only the KDD cup and NSL-KDD datasets. The most important challenges in the KDD data
set have been seen over the past ten years outdated datasets and also noisy records, which
cause bias in the learning algorithms and also hazards to networks such as U2R and R2L
attacks.

This research work has developed the honeypot and honeynet setup to captured real
attack traffic data and attacker activity in the cloud environment for overcoming these issues.
Second, most of the feature selection algorithms referred to the literature are not adequate to
provide security to the cloud. Hence, this research work has proposed a feature ranking
methodology, which has come up with the solutions for the following issues. (1) An efficient
feature ranking algorithm to reduce the problems of ranking without adopting any learning
algorithms, the statistical bias of existing methods, and computational overheads. (2) The
least threshold value selection for holding important selective features to be identified.

Third, most of the works on intrusion detection discuss only the detection of the
intrusions but not intrusion classification problems. However, the system proposed in this
research work handles both intrusion detection and classification. Fourth, this research work
investigates the intrusion detection as a classification challenge and designates a system using
ensemble learning for effective classification of the cloud data set. The idea of building
ensemble models is that at affords the highest accuracy and lowest false alarm rate (FAR).

Finally, this proposed method has been evaluated with three different intrusion
datasets and four machine learning algorithms namely SVM, Naïve Bayes, and Logistic
regression and Decision tree for efficient classification of malicious threats using Bagging,
Boosting and Stacking ensemble methods. Compared with all these approaches the Ensemble
based feature selection and classification proposed in this research work is unique in many
aspects. The output of the Ensemble Design model is better compared to all the other
previous models and the detection accuracy is seam as a greatly improved than the existing
literature for new and known attacks.

You might also like