Professional Documents
Culture Documents
Intrusion Detection System Using Unsupervised ML Algorithms: School of Information Technology and Engineering
Intrusion Detection System Using Unsupervised ML Algorithms: School of Information Technology and Engineering
A PROJECT ON
INTRUSION DETECTION SYSTEM USING
UNSUPERVISED ML ALGORITHMS
Submitted by:
Aditya Kumar (18BIT0235)
Ritvik Gupta (18BIT0218)
3. Modification: The attacker both accesses and tampers with the asset. This
may include altering the program so that it can perform an additional
computation or modify data being transmitted. They may range from simple
changes to more subtle changes that may not even be detected.
1. Several real attacks are far less than the number of false alarms raised. This
causes real threats to go often unnoticed.
1. Noise can severely reduce the capabilities of the IDS by generating a high
false-alarm rate.
2. IDS monitor the whole network, so are vulnerable to the same attacks the
network’s hosts are. Protocol-based attacks can cause the IDS to fail.
3. Network IDS can only detect network anomalies which limit the
variety of attacks it can discover.
3. Network IDS can create a bottleneck as all the inbound and outbound traffic
passes through it.
4. Host IDS rely on audit logs, any attack modifying audit logs threaten the
integrity of HIDS
Machine Learning is the field of study that gives computers the capability to
learn and improve from experience without being programmed explicitly
automatically. Machine learning focuses on the development of programs
that can use data to discover themselves
ABSTRACT
With the advent vast amounts of information and technology, all forms of
businesses around the world are becoming increasingly data driven.
Companies collect and deal with high velocity, variety and volumes of
data. This also gives way to various loopholes in the systems developed
for working with such large amounts of data.
AIM
We are using the K-Means and Gaussian Mixture Model for training
our unsupervised machine learning algorithms on local memory spark
clusters.
DATASET DESCRIPTION
These data sets contain the records of the internet traffic seen by a
simple intrusion detection network and are the ghosts of the traffic
encountered by a real IDS and just the traces of its existence
remain.
The data set contains 42 features per record, with 41 of the
features referring to the traffic input itself and the last is a label
(whether it is a normal or attack).
The test dataset contains 22,000 entries and the train
dataset contains 1.26lakh entries.
The training dataset is made up of 22 different attacks out of the
37 presents in the test dataset.
The known attack types are those present in the training dataset
while the novel attacks are the additional attacks in the test
dataset
i.e. not available in the training datasets.
The attack types are grouped into four categories: DoS, Probe,
U2R and R2L.
The feature types in this data set can be broken down into 4 types:
RATIO
It is also known as attribute selection or variable selection. It helps in
selecting the most appropriate features amongst the available. Feature
selection can be performed manually or automatically.
Importance:
Features may be expensive to obtain, thus feature selection
is helpful.
It helps in improving accuracy of the model.
It also reduces the time required by the model to train itself.
Discards the garbage data.
References: https://www.naun.org/main/UPress/cc/2014/a102019-106.pdf
http://www.wseas.us/e-library/conferences/2013/Nanjing/ACCIS/ACCIS-30.pdf