Professional Documents
Culture Documents
DDoS Attack Detection and Mitigation Using Anomaly Detection and Machine Learning Models
DDoS Attack Detection and Mitigation Using Anomaly Detection and Machine Learning Models
DDoS Attack Detection and Mitigation Using Anomaly Detection and Machine Learning Models
2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS) | 978-1-6654-0610-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/CSITSS54238.2021.9683214
Abstract—With the increase in cyber-crimes each day, it is Volumetric attacks are the attacks which utilize copious
important to build a layer of security to defend against attacks traffic to saturate the network bandwidth and jam the traffic,
which can compromise the Confidentiality, Integrity and preventing any other requests to go by. Some of the common
Availability (CIA). One of the most dangerous attacks in the volumetric attacks are: UDP flooding, ICMP flooding and
domain of cyber-attack is the Distributed Denial of Service (DDoS) DNS amplification.
attack. A DDoS attack can cause a huge disruption of services,
leading to monetary loss as well as loss of reputation in case of UDP (User Datagram Protocol) and ICMP (Internet Control
data theft, if an immediate action is not taken. There is a need for Message Protocol) are communication protocols used in
an efficient detection and response for such attacks, with a high services such as DNS, SNMP, RIP, DHCP and ping, trace
accuracy, low false-positives in a less latency. This paper puts route respective, to communicate between two systems over
forth a methodology which could detect attacks and efficiently a network.
mitigate them, all in a seamless fashion. The proposed
methodology relies on machine learning ensemble learning In a UDP/ICMP flood, continuous UDP packets or ICMP
algorithms and anomaly detection using fast entropy and attribute request packets are sent to the server up to a point where the
thresholding algorithms. The combined results of these algorithms server can not longer respond to any request. These attacks
are used to give a final verdict. are generally accompanied with a reflection attack where the
attacker aims at saturating the bandwidth in both directions
Keywords—Terms—Threat, Security, DDoS, Machine Learning, by spoofing the packets with the victim's address, and making
Anomaly Detection the server respond back to itself.
I. INTRODUCTION Application attacks are aimed at layer 7 of the OSI model.
They exploit vulnerabilities in this layer and initiate requests
In the past decade, Distributed Denial of Service (DDoS) that consume resources like disk, memory etc.
attacks have caused financial losses to companies and
government organizations worldwide. This is expected to One such attack is the HTTP flood attack, wherein multiple
grow with the increasing number of devices that are being interactions between the attacker and the website are made to
added to the network via the popularization of Cloud look like a normal interaction, however in the background,
Computing and Internet of Things. These devices interact they're coordinated to utilize maximum number of resources
with the application and server and also run remotely on the from the server.
network, this invites malicious users to cripple or take control
Protocol attacks consume network resources like firewalls
of these devices by launching a DDoS attack which leaves the
and load-balancers, attacking the different protocol
server with a resource deficiency to process all the pseudo
communications. Two of the most common protocol attacks
requests made leaving the server vulnerable or unusable.
are SYN flood attack and Ping of Death. In a SYN flood
A Denial-of-Service attack can be used to term an action of attack, the server is weighed down with a large number of
flooding the target system with requests higher than what it SYN requests from an attacker, causing the server-side
can handle, thereby depleting the system’s resources causing Transmission Control Buffer (TCB) to fill up with half-open
it to disregard any requests from a legitimate user. connections, and block the ACK response from the server.
Once this buffer is full, the server will no longer be able to
While DoS attacks originate from a single attacker system, a respond to any legitimate requests.
Distributed Denial of Service attack uses multiple sources to
perform the same attack. This increases the rate of requests A Distributed Denial of Service attack is more dangerous
being sent to the target system by a large number and causes than thought of. In its first glance, it simply implies "denial
a higher damage. In addition to this, since they have different of service" to a system or application, however there’s more
sources of origin, it is difficult to trace back to the attacker’s to this attack than just breach of availability. In most cases
origin. these attacks can be used to implant malware on the system
and create a botnet using all the available resources.
Thus, a system is proposed to identify a DDoS attack and
mitigate the attack. These botnets are extremely dangerous and can be further
used to launch DDoS attacks of higher magnitudes, evade
A. Types of DDoS attacks spam filters, mine bitcoins and even speed up a brute-force
Distributed Denial of Service attacks can be commonly attack of guessing passwords.
categorized into Volumetric, Application and Protocol based More the stealthy duration of these attacks, more is the
attacks. damage caused. It can result in loss of money, time, clients as
978-1-6654-0610-9/21/$31.00 ©2021 IEEE
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on March 11,2023 at 09:22:00 UTC from IEEE Xplore. Restrictions apply.
well as reputation. The severity of an attack accounts to the fuzzy logic.The GT-HWDS system gave an accuracy of 97%,
threat it possesses on the organization. During the attack, while the Fuzzy-GADS system gave an accuracy of 95%. The
neither the clients nor the employees can access any resources authors Başkaya, Dilek, and Refi Samet et al. in [2] put forth
over that network. different machine learning algorithms to detect different
types of DDOS attacks, such as Multilayer Percpertron,
B. DDoS defense mechanism
KNN, Support Vector Machines, and Random Forest. All the
It is important to safeguard important assets from these algorithms gave a high accuracy in the range of 86% to 99%,
attacks to avoid losses and identity thefts. The DDoS defence except SVC which gave an accuracy of 36%.
mechanism comprises four major stages: Monitor, Detect,
The authors Srinivasan, Karthik, Azath Mubarakali,
Mitigate and Prevent.
Abdulrahman Saad Alqahtani, and A.Dinesh Kumar et al. in
Monitoring everyday traffic and system activity to detect [13] portray the various types of DDoS attack along with
malicious activities plays an important role in the "pro- consequences of such attacks on the cloud. Prevention,
active" stage of defence. Once an attack is detected, detection and mitigation approaches along with strengths,
immediate actions towards the mitigation and prevention of challenges and limitations of approaches are thoroughly
the next set of attacks are to be taken. Every second lost in discussed. The authors Shahil ,Deekshitha, Nuzha A M, and
this process could have a huge impact on the server. Mustafa Basthikodi et al. in [12] talk about the detection and
prevention of various DDOS attacks using NEIF and
DDoS detection can be done in multiple ways using Honeypots. NEIF helps in prevention of such attacks whereas
signature-based detection, anomaly-based detection, and honeypots help capture the attacker’s activities as well. There
application/protocol based analysis. certainly exist risks in each of the methods, however they
In a signature-based detection, signatures are added to a could be refined in order to secure and implement the systems
database based on the previous set of intrusion on the system. effectively. The authors Bakr Ahmed, A.A. El-Aziz , Hesham
Every packet signature is compared with existing records, and A. Hefny, et al. in [1] put forth the different DDoS
and upon a match the user is alerted. mitigation methodologies along with a few commonly used
traceback technologies such as Moving Target Defense,
In an anomaly-based detection, a performance baseline is EDoS, Resource Quota, sPoW, IP Traceback, Packet
established, to compare the observed activity with the Marking and Logging, and SBTA. The authors Nsaif M
expected activity in terms of network traffic. Machine Ridha, MF Abbood, Abbas F Mahdi, et al. in [8] talk about
Learning techniques are implemented for a clear the detection and mitigation of DDoS attacks. It employs two
classification of a packet as malicious or benign. algorithms, one for the detection and the other for mitigation
After the detection of an attack, further mitigation has to be of DDoS attacks. The detection algorithm uses various kinds
performed. of lists to track the incoming IP and MAC addresses, and
detect an attack. In summary, the following can be derived
Mitigation can be done using multiple techniques such as from the conducted literature survey:
MDT, Rate Limiting and Black Hole routing. Following
mitigation, the next step is to prevent such attacks by further 1. DDoS Attacks on the cloud are dangerous and need to be
updating the existing firewall rules to filter out the packets prevented. Higher priority must give to decrease the false
before they enter the internal network. negatives as it can cause serious damage to applications.
2. Attacks must be detected with low latency and high
II. LITERATURE SURVEY accuracy, and the defence mechanism must be lightweight,
The literature survey was carried out in two different transparent, and precise.
categories; network security and the machine learning 3. Honeypots and NEIF techniques can be used as a
domains. preventive approach towards DDoS defence mechanisms.
4. Random Forest and Multi-Layer Perceptron are found to
The authors Idhammad, Karim, Afdel, Belouch, and be the best suited machine learning models for the detection
Mustapha et al. in [7] proposed a HTTP DDoS attack of DDOS attack with a high accuracy rate.
detection system using Information Theoretic Entropy and
Machine Learning on the CIDDS-001 dataset.The results
obtained from this had given a higher accuracy percentages
with Random Forest, Naive Bayes, KNN, and decision tree
as compared to Multi-layer Perceptron, which gave an
accuracy of 28%. The authors Deepa, V. and Sudar.K,
Muthamil and Deepalakshmi et al. in [5] compare the
accuracy of different machine learning models using the
CAIDA dataset, both individually and after combining
them(ensemble methods). Ensemble methods turned out to
give a high accuracy and high sensitivity when compared
to traditional DDoS detection and mitigation methods like D-
Ward. The rate of incorrect alarm was comparatively higher
for KNN model. The authors De Assis, Marcos VO,
Anderson H. Hamamoto, Taufik Abrão, and Mario Lemes
Proenca et al. in [6] proposed two systems GT-HWDS which
uses the Holt Winters method and Game theory to mitigate
DDoS attacks. and the Fuzzy-GADS system which uses
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on March 11,2023 at 09:22:00 UTC from IEEE Xplore. Restrictions apply.
1) The server is a hosted web-application or any hosted service
which is sought to be brought down by the attacker. This
III. PROPOSED METHODOLOGY server will have a designated IP address, MAC address and
The proposed methodology follows a client-server a port assigned to the running application.
architecture. It is composed of multiple components which
work in parallel and communicate with each other in a C. Attacker Proxy node
network. For simplicity, it is assumed that the entire set-up is The attacker proxy node is a black-hole node, to which all
in a NAT network, for easier data transfer between the the malicious packets are routed and discarded after further
components of the system. The idea behind this approach is analysis. This is a node on the network that routes nowhere.
to not rely on a single classification/ prediction result, but It is essentially the systems dead-end.
instead combine the results from different classifiers based on
anomaly detection and machine learning algorithms. Based D. Head Proxy node
on the accuracy of the classifiers, weights can be assigned The Head proxy node is an intermediate node between the
while aggregating these results, for a final verdict on the external internet and the internal servers which are to be
packet. safeguarded. This component can also be called the
The different components of this system as seen in Fig1 are: scrubbing station.
A. Client Within the scrubbing station, multiple filters and detection
A client in this system’s perspective refers to the attack algorithms are run, in order to classify a packet as a
initiator, or the attacker. This client starts a DDoS attack on malicious packet or benign packet.
a target server which is to be brought down. The attack is The scrubbing station has a series of events that occur in the
assumed to be from one system with various spoofed IP following order:
packets, and not a botnet which is trying to attack the target 1. Data collection and Filtering: The network traffic is
server. captured using a sniffer code written using raw sockets in
python. The collected data is sent to a filter where the data is
B. Server normalized, feature extracted, and redundant or unnecessary
data is removed. This real-time data is plotted on a
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on March 11,2023 at 09:22:00 UTC from IEEE Xplore. Restrictions apply.
dashboard in terms of graphs and other visualizations. The
filtered data is sent to the next phase of scrubbing called the
Proxy-Tail. Within the proxy-tail are four different
components which perform different activities.
2. Classifier: The classifier is a trained Machine
Learning Algorithm which uses Ensemble Learning to
classify a given packet as malicious or benign.
3. Traffic Monitor: The traffic monitor tracks
network traffic and establishes an expected pattern in the
network traffic on a daily basis. For any new incoming
packet, it compares the observed traffic activity with the
expected activity and gives an appropriate result.
4. MAC-IP: The MAC-IP layer is used to establish a
relationship between a MAC address and the IP addresses
associated with that MAC address over a short period of
time. An assumption made here is that, when the number of
IP addresses corresponding to one MAC address is beyond a
threshold value, it is considered that the IP addresses are
spoofed and are malicious.
E. Anomaly Detection
Two approaches to anomaly detection are implemented, one Algorithm 2: Fast Entropy
being static threshold based detection and the other dynamic
threshold which is updated based on the previous results. F. Machine Learning
1) Attribute Threshold [4]: This algorithm relies on Two machine learning models are employed,
four different attributes of a packet identified as: independently detect TCP and ICMP flood attacks.
a1 = Total number of packets in a window
a2 = Total number of unique source IP address packets in a Layers Activation Func No. of Neurons
window
a3 = a2/a1 Bidirectional LSTM tanh 64
a4 = Total number of protocols (2 in this case) Dense CNN(i/p) relu 128
Dense CNN(o/p) sigmoid 1
The weights for each attribute are set in the ratio of:
a1:a2:a3:a4 = 1:1:3:1. TABLE I
RNN SPECIFICATIONS
Algorithm: See Algorithm 1 TCP SYN Flood: Bidirectonal LSTM Neural Network [11]:
An LSTM cell contains weights and gates; the gates being
the distinguishing feature of LSTM models. There are three
gates inside of every cell namely, input gate, forget gate,
and output gate. The main advantage of using an LSTM
neural network is that it eliminates the vanishing and
exploding gradient problem.
G. Analyser
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on March 11,2023 at 09:22:00 UTC from IEEE Xplore. Restrictions apply.
Each component in the system would run producer and
consumer threads. These threads are used to perform
computation in parallel, and also act as a medium of
communication between the components of the system. A
real time channel is important for security detection systems.
These channels are set up using Apache Kafka queues, which
help in seamless flow of data from one process to another. All
the components would be deployed individually deployed on Fig. 2. BRNN accuracy
docker containers, each of which would have an isolated
environment, separating one component from the other, and
also help in easy packaging of the product.
IV. IMPLEMENTATION AND RESULTS
A. Data
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on March 11,2023 at 09:22:00 UTC from IEEE Xplore. Restrictions apply.
[13] Karthik Srinivasan, Azath Mubarakali, Abdulrahman Saad
Alqahtani, and A Dinesh Kumar. A survey on the impact of ddos attacks in
cloud computing: Prevention, detection and mitigation techniques. In
Intelligent Communication Technologies and Virtual Mobile Networks,
Fig. 4. BRNN Confusion matrix pages 252–270. Springer, 2019.
D. Conclusion
A system is proposed to be deployed as a proxy server, to
sniff packets and analyse them. The packets are then sent
through the Machine Learning classifier, Anomaly Detection,
and MAC/IP relationship detector systems to categorize the
packet. The aggregated results of the three systems are used
to categorize whether the packet is “Normal” or “Malicious”.
The malicious packets are then discarded via Black-Hole
routing while the Benign packets are routed to the host server.
This proposed methodology does not solely rely on one
single detection mechanism. Instead, it combines the
results from a Machine learning based classifier and
entropy-based algorithms, which reduces the
probability of false positives, and also increases the
accuracy.
In the near future, the aim is to implement the above model
in an efficient way to test the model capabilities and its
effectiveness in detecting and mitigating DDoS attacks.
REFERENCES
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on March 11,2023 at 09:22:00 UTC from IEEE Xplore. Restrictions apply.