Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Research Proposal/Synopsis for MS/M.Phil/Ph.

D Thesis
Department of Computer Science

1. Name of the Student: Muhammad Shahid Azeem


2. VU ID:MS160400843 3. Session: Fall 2016
4. Semester: Fall 2018 5. Field of Specialization: Computer
Networks
6. Title of Research Proposal: Enhanced Network Anomaly Detection
Model Based on Supervised Learning Techniques with qualitative features
selection
7. Date of Enrolment in Research: October 19, 2017

8. Duration of Proposed Research: 1.5 years

9. Total Funds Requested (if any) Rs. No (Rupees No


)
=======================================================
========================
Supervisor, Supervisory Committee (SC) Information
1. Name of Supervisor: Hasnain Ahmed
esignation: Assistant Professor
Email ID: hasnain@vu.ed.pk
Affiliation: _Virtual University of
Pakistan________________________________________
2. Name of Supervisory Committee (SC) Member 1:_Mr. Syed Shah
Muhammad__________
Designation: Lecturer_______________________ Email ID: syed@vu.edu.pk
Affiliation: Virtual University of
Pakistan________________________________________
3. Name of Supervisory Committee (SC) Member 2: Mr. Zafar Naz ir
Designation: Instructor Email ID:
mzafarnazir@vu.edu.pk
Affiliation: Virtual University of Pakistan
Topic

Enhanced Network Anomaly Detection Model Based On Supervised Learning


Techniques with Qualitative Features Selection
Abstract/Summary

The massive growth in the Internet during last two decades increases the
importance of cyber security manifolds. Numerous new threats to data
security are being created on daily basis. Intrusion Detection System (IDS) is
a primary defence mechanism to secure data and resources from illegal
disclosure and unauthorized use. Various approaches for cyber security were
proposed by researchers i.e. signature based intrusion detection and
anomaly detection based intrusion detection. In signature based intrusion
detection approach the IDS has a database containing the signatures of
harmful traffic like viruses. An ID sniffs and analyses each data packet and
compare it with its database. In case of a match, it removes the data packet
from network. Anomaly detection based IDS categorizes network traffic into
valid traffic and anomalous traffic. Accuracy of this categorization bases on
appropriate features selection. Existing approaches mostly rely on
quantitative features. Scarcity of work on studying encoding of qualitative
features into quantitative form is a significant flaw of these approaches.
Encoding of qualitative features into quantitative features can increase
accuracy level of anomaly detection model. In this research an Intrusion
Detection System based on anomaly detection model using supervised
learning techniques and both quantitative i.e. attack type, protocol, timing of
attack, source IP Address, destination IP Address and qualitative features i.e.
number of bytes in source packets, Source to destination packet count,
Source bits per second, and Row total duration. Different supervised learning
techniques i.e. Nearest Neighbour, Random Forest, Multilevel perceptron and
Decision tree, along with encoding techniques i.e. Polynomial encoding,
Leave one out encoding, Target encoding are used to enhance the anomaly
detection process in unbalanced network traffic. Performance enhancement
and anomaly detection accuracy improvement is major concern in proposed
model. The proposed model will be trained and tested on UNSW-NB15 data
set. Experimental results are recorded and compared for suitability of
anomaly detection model against the UNSW-NB15 data set. 

Introduction

Catering security threats to information and other network resources is a hot


cake of research in the field of Information Technology. Cyber threats
landscape is increasing drastically. About 430 million new malwares, 362
Crypto-ransomware, and other cyber threats were discovered on Internet in
2015, reported by Internet Security Threat Report (ISTR) (Ali et al. 2017).
Cyber assets protection has emerged to a most critical concern for
governments, corporate society and for individuals as well. In 2016 75
million dollars were spent on cyber security services around the globe.
Intrusion detection in the domain of network security is a process of
analysing network traffic for invalid usage patterns. Being a primary security
mechanism, Intrusion detection system is responsible for differentiating the
antagonistic traffic flows from valid network flows. The need of protection of
cyber assets forms unauthorized access and illegal disclosure make the IDS
the forefront of cyber security domain. Various approaches were proposed
for invalid behaviour detection. In past signature based IDSs were commonly
used. Signature based IDS were good for known attacks and attack vectors
(Naseer et al., 2018). These systems are not appropriate for modern era
because of ever changing threats landscape. A huge number of new threats
are being invented and spreader via internet daily. Traffic Analysts must
analyse the network traffic continuously and identify harmful data patterns.
These patterns are then updated in database. But before pattern detection
and updating in database, there is no mechanism to stop such data packet to
detriment the resources. This scenario is known as day zero attack, which
can be very dangerous, even, can lead to a non-recoverable state of affairs.
An alternative approach in this regard is anomaly detection in network
traffic. An anomaly is a traffic pattern in network that is deviating from
expected network traffic behaviour (Nevat et al., 2018). An anomaly can
cause extravagant damages in a network i.e. search vulnerability in network
or initiate an attack such as DDoS attack etc. Anomaly detection can be used
in many situations for same purpose i.e. fraud detection, location spoofing
detection in IoTetc (Koh et al., 2016). So anomaly detection performs well in
case of unknown attacks.
An efficient Intrusion detection system is a core contrivance in network
security (Li et al., 2015). IDS apply all of his capabilities to detect unusual
and unacceptable traffic patterns and intrusions attempted by crackers
(Salama et al., 2011). It is a big challenge to design efficient, reliable and
affordable IDS to meet network security objectives.
Anomaly detection is sculpted as a classification problem. Seminal work in
this regard is presented by Denning (Denning, 1987) and Stanford Chen
(García-Teodoro et al., 2009). Denning (1987) proposed the use of learning
algorithms on traffic flows to classify abnormal traffic and intrusion attempts.
Various classifiers can be used for classification form three main categories,
supervised learning, unsupervised learning and semi supervised learning. In
supervised learning, labelled data is used to train anomaly detection model.
A possible set of classes is predicted in advance is supervised learning
approach. So a training data set consists of inputs and possible outcomes.
Model will classify new data based on given data set (Aljawarneh et al.,
2018).
Literature review of anomaly detection problem leads us to some prominent
lacks. First of all there is no comprehensive attempt to investigate a
comprehensive solution of network security. Some isolated studies were
carried out although by different researchers, as described in Literature
Review section of this document, no proper solution is presented yet. Deep
Learning is suggested by a number of researchers. Deep learning is a
powerful and successful technique in classification problem, but it is
computationally very expensive which is difficult to apply in low cast
solutions. It is also very hard to train a DNN based classification model even
require thousands of examples during training. That is a tedious and time
consuming situation. Another prominent problem is lack of using proper
validation metrics of different classifiers among them.
To fill these lacks, an efficient anomaly detection model is designed and
implemented using conventional supervised learning techniques. Supervised
learning (Kwon et al., 2017) also referred as predictive or directed
classification, identifies a set of possible classes in advance. This family of
techniques receive a bunch of pre-classified data instances for training. This
training dataset comprises both inputs and desired results. It classifies new
data on the basis of training dataset then. Various renowned supervised
learning algorithms are support vector machines (SVMs), artificial neural
network (ANN), logistic regression, nave Bayes (NB), K-nearest neighbours
(KNN), random forests (RF), decision trees (DT), etc.
Feature selection is a procedure to choose appropriate set of attack features
against various attack classes to detect anomalous behaviour in the data
flow (Ravale et al., 2015). There may be a huge number of features and it is
impossible to work will all of them. Therefore some valuable features are
selected out of them. Most of the studies work with quantitative features.
Ravale et al. stated that they will work with measureable features i.e. no. of
times login failed (Ravale et a., 2015). Various symbolic features are also in
scene along with quantitative features and in this study, We believe that
using symbolic features, anomaly detection model will perform more
accurately than existing anomaly detection based IDSs. Symbolic features
must be encoded to quantitative form to measure them. For this purpose
various encoding schemes can be used Binary Encoder, Hashing Encoder,
Helmert Encoder, OneHotEncoder, OrdinalEncoder, SumEncoder,
PolynomialEncoder, BaseNEncoder, LeaveOneOutEncoder, TargetEncoder.
Research Questions

1. How inclusion of qualitative/symbolic features does increase the


performance of anomaly detection model?
2. Which encoding schemes are appropriate to quantify qualitative
features?

Research Objectives

1. This research is carried out to propose an appropriate Intrusion


detection system using supervised learning techniques.
2. Performance and accuracy enhancement of model is a major concern.
Qualitative features along with quantitative features are used in the
model.
3. Qualitative features are difficult to measure. They must be encoded
into quantitative form in order to measure them. Various encoding
schemes are used to quantify qualitative features. Ensuring validity
and appropriateness of these encoding schemes is also a significant
objective of this research,

Socio-Economic Importance/Benefits (Not Applicable)

Literature Review:

Anomaly detection problem is a kind of classification problem that uses some


feature or characteristic of sample data to categorize it. This feature is some
sort of summery of raw data. Various dimensionality reduction techniques
i.e. supervised learning, unsupervised learning, semi supervised learning,
have been proposed to improve anomaly detection model performance.

Shadi Aljawarneh et al. states that the first ever IDS proposed by Dorothy E.
Denning during a research conducted under the SRI International
(Aljawarneh et al., 2018). This leads to a new generation of intrusion
detection systems referred as the anomaly detection based IDS.

Bhavesh Borisaniya discussed (Borisaniya& Patel, 2015)a misuse detection


case study using ADFA-LD and ADFA-WD datasets. He used a modified
vector space and an N-gram feature extraction technique. This approach
generated classifier models to classify both binary and multiple classes.
Using various classifiers they presented classification accuracy up to 92%
and 20% false positive rate on binary and multiclass dataset. They used IBk
and J48 classifiers in Weka. They also reported an accuracy of 96% and 19%
false positive rate for binary class problem.

In a study by Assem, Rachidi and Graini (Asse et al., 2018) a misuse


detection system, named SC2.2, is proposed to address binary class problem
using UNM datasets. Using Markov chain model of long sequence of system
calls, they define conditional probabilities in four different datasets. They
used true positive rate (TPR), and false positive rate (FPR) for naïve Bayes
multinomial (NBm), C4.5 decision tree, Repeated Incremental Pruning to
Produce Error Reduction (RIPPER), support vector machine (SVM), and
logistic regression (LR) classifiers in their model. They presented that
classifier accuracy ranges from 97% to 99% and false positive rate ranges
from 0.3% to 3% on all UNM datasets.

In a study, by Eduardo DelaHoz et al. (De la Hoz et al., 2015) authors


discussed a combination of statistical techniques and self-organizing maps
for network anomalies detection. He utilized Fisher’s discriminant ration
along with principal component analysis for feature selection. Using
probabilistic self-organizing maps and noise removal, network traffic is
classified into two broad categories, normal traffic and anomalous traffic.

WathiqLaftah Al-Yaseen et al.(Al-Yaseen et al., 2017) proposed a multilevel


hybrid intrusion detection model to classify the network traffic into normal
and abnormal behaviour. Their model based on support vector machine and
extreme learning machine to enhance the capability of detecting known and
unknown attacks. They used a modified K-mean algorithm to train the model
that builds new training datasets. This algorithm improved the working of
classifiers and reduced the training effort and time, contributed significantly
in improving efficiency of IDS.KD Cup 1999 dataset is used to evaluate the
performance and efficiency of the model and reported the attack detection
accuracy up to 95.75%.

In a study (Naseer et al., 2018) different deep learning technique are studied
to investigate their suitability for anomaly detection in network flows. They
developed a IDS model based on different deep learning techniques, i.e.
Convolutional Neural Networks (CNNs), Auto-encoders and Recurrent Neural
Networks (RNNs).they used NSLKDD training dataset to train their model and
same dataset namely NSLKDDTest+ and NSLKDDTest21 for evaluation and
evaluated on both test datasets. After evaluation they reported that Deep
Convolutional Neural Network (DCNN) and Long Short term Memory (LSTM)
Recurrent neural network (RNN) Models proved up to 85% and 89% accurate
on test dataset. They concluded that deep learning is a viable and promising
technology for anomaly detection in network security.

Aygun&Yavuz et al. (Aygun&Yavuz, 2017) used vanilla and de-noising deep


Auto-encoders for anomaly detection based IDS model with NSLKDD dataset
and reported 88.28%and 88.6% accuracy rate on NSLKDDtest+ dataset.
NSLKDDtest21 dataset were not provided in this study and also there is lack
of other quality metrics to evaluate the quality of their classifiers.
Methodology/Research Design

Proposed IDS model is based on supervised learning techniques to classify


normal and abnormal traffic patterns. State of the art classification
techniques Nearest Neighbour, Random Forest, Multilevel perceptron and
Decision tree are used in this model. All these techniques will be modelled in
Python programming language.

This model will be trained using a well-known and reliable standard dataset,
names as UNSW-NB15. This data set is divided into two parts (Moustafa &
Slay, 2016) training set and testing set. Moustafa et al. (Moustafa & Slay,
2016) states that UNSW-NB15 is recently generated as a benchmark dataset
for IDS performance evaluation. It has nine types of modern attacks and
comprise of realistic activities of a normal traffic captured within change in
time. This dataset also comprises 49 feature of data flow between different
nodes in a network. Authors compared the UNSW-NB15 dataset with KDD99
dataset statistically and practically proved that UNSW-NB15 is more complex
and reflects real world situations in more sophisticated fashion. It provides a
number of modern attacks in more realistic way.

In this study, after training of model the anomaly detection accuracy of the
model will be evaluated and compared statistically with UNSW-NB15 dataset.
Risk Analysis and Management (Not Applicable)

References/Bibliography

[1] Ali, A., Hu, Y., Hsieh, C., & Khan, M. (2017). A Comparative Study on
Machine Learning Algorithms for Network Defense. Virginia Journal Of
Science, 68(3 & 4), 1-19. doi: 10.25778/PEXS-2309
[2] Nevat, I., Divakaran, D., Nagarajan, S., Zhang, P., Su, L., Ko, L., & Thing, V. (2018).
Anomaly Detection and Attribution in Networks with Temporally Correlated
Traffic. IEEE/ACM Transactions on Networking, 26(1), 131-144. doi:
10.1109/tnet.2017.2765719
[3] Koh, J., Nevat, I., Leong, D., & Wong, W. (2016).Geo-Spatial Location Spoofing
Detection for Internet of Things. IEEE Internet Of Things Journal, 3(6), 971-978. doi:
10.1109/jiot.2016.2535165
[4] Li, Y., Ma, R., & Jiao, R. (2015). A Hybrid Malicious Code Detection Method based on
Deep Learning. International Journal Of Security And Its Applications, 9(5), 205-216.
doi: 10.14257/ijsia.2015.9.5.21
[5] Salama, M., Eid, H., Ramadan, R., Darwish, A., &Hassanien, A. (2011).Hybrid
Intelligent Intrusion Detection Scheme. Advances In Intelligent And Soft Computing, 293-
303. doi: 10.1007/978-3-642-20505-7_26
[6] Denning, D. (1987). An Intrusion-Detection Model. IEEE Transactions On Software
Engineering, SE-13(2), 222-232. doi: 10.1109/tse.1987.232894
[7] García-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., &Vázquez, E. (2009).
Anomaly-based network intrusion detection: Techniques, systems and
challenges. Computers & Security, 28(1-2), 18-28. doi: 10.1016/j.cose.2008.08.003
[8] Ravale, U., Marathe, N., & Padiya, P. (2015). Feature Selection Based Hybrid Anomaly
Intrusion Detection System Using K Means and RBF Kernel Function. Procedia
Computer Science, 45, 428-435. doi: 10.1016/j.procs.2015.03.174
[9] Aljawarneh, S., Aldwairi, M., &Yassein, M. (2018).Anomaly-based intrusion detection
system through feature selection analysis and building hybrid efficient model. Journal Of
Computational Science, 25, 152-160. doi: 10.1016/j.jocs.2017.03.006
[10] Kwon, D., Kim, H., Kim, J., Suh, S., Kim, I., & Kim, K. (2017). A survey of deep
learning-based network anomaly detection. Cluster Computing.doi: 10.1007/s10586-017-
1117-8
[11] Liao, Y., &Vemuri, V. (2002).Use of K-Nearest Neighbor classifier for intrusion
detection. Computers & Security, 21(5), 439-448. doi: 10.1016/s0167-4048(02)00514-x
[12] Borisaniya, B., & Patel, D. (2015).Evaluation of Modified Vector Space Representation
Using ADFA-LD and ADFA-WD Datasets. Journal of Information Security, 06(03),
250-264. doi: 10.4236/jis.2015.63025
[13] Assem, N., Rachidi, T., &Taha El Graini, M. (2018). INTRUSION DETECTION USING
BAYESIAN CLASSIFIER FOR ARBITRARILY LONG SYSTEM CALL
SEQUENCES. IADIS International Journal On Computer Science And Information
Systems, 9(1), 71-81. Retrieved from
http://www.iadisportal.org/ijcsis/papers/2014170106.pdf
[14] De la Hoz, E., De La Hoz, E., Ortiz, A., Ortega, J., &Prieto, B. (2015). PCA filtering and
probabilistic SOM for network intrusion detection. Neurocomputing, 164, 71-81. doi:
10.1016/j.neucom.2014.09.083
[15] Al-Yaseen, W., Othman, Z., &Nazri, M. (2017). Multi-level hybrid support vector
machine and extreme learning machine based on modified K-means for intrusion
detection system. Expert Systems with Applications, 67, 296-303. doi:
10.1016/j.eswa.2016.09.041
[16] Naseer, S., Saleem, Y., Khalid, S., Bashir, M., Han, J., Iqbal, M., & Han, K. (2018).
Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE
Access, 6, 48231-48246. doi:10.1109/access.2018.2863036
[17] Aygun, R., &Yavuz, A. (2017). Network Anomaly Detection with Stochastically
Improved Autoencoder Based Models. 2017 IEEE 4Th International Conference On
Cyber Security And Cloud Computing (Cscloud).doi: 10.1109/cscloud.2017.39
[18] Moustafa, N., & Slay, J. (2016). The evaluation of Network Anomaly Detection Systems:
Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99
data set. Information Security Journal: A Global Perspective, 25(1-3), 18-31. doi:
10.1080/19393555.2015.1125974
[19]
Gantt chart (to be used as guideline)
Details of Funds/Expenditure (Not Applicable)

Student Signature
Date: ______

DECLARATION

We hereby agree to supervise the research work as per above proposal/synopsis.

____________________
Signature of Supervisor

Signature of SC Member 1 Signature of SC Member 2


Date: _________ Date: _________
Note: Hard and soft copy of synopsis/research proposal must be submitted to secretary ASRB for final approval.
FOR VU THESIS SUPERVISOR USE
ONLY

Profile of Supervisor

Name of Supervisor:_________________________________________________________
Designation: _______________________________________________________________
 Total No. of Impact Factor Research Publications during last 5 years: ____
 Total No. of Publications without Impact Factor during last 5 years: _______________

Ongoing
Research students

Number of MS/M.Phil. Number of PhD students


students

Signature of Supervisor

Endst. No. ___________ Dated:______________

The Proposal entitled “ ”duly


recommended by the Graduate Research Committee (GRC) in its meeting held on __________ is
forwarded to ASRB through the Dean of the Faculty for approval and allocation of funds (if
requested).

Signature / Seal
Chairperson of the Department
Date: ___________

Signature / Seal Signature / Seal


Dean of the Faculty Secretary ASRB
Date: ___________ Date: ___________

You might also like