Dissertation Team META VIGU

SYN Flood detection using machine Learning
based algorithm & SNORT
Team META VIGU

2023
SYN FLOOD DETECTION USING MACHINE LEARNING
BASED ALGORITHM & SNORT
S. D. N Inuri – ICT/2017/2018/022
H. V. I Lakmal – ICT/2017/2018/034
Gaurangi Samaraweera – ICT/2017/2018/056
J. M. A. W. U Jayakodi – ICT/2017/2018/026
Dissertation submitted in partial fulfilment of the requirements for the award of the
degree of Bachelor of the Science in Information Technology.
Department of Computing
Rajarata University of Sri Lanka.
2023
Abstract
SYN Flood Attacks are a significant threat to the security and availability of network
services, and detecting and preventing such attacks is essential for maintaining network
security. This thesis proposes a SYN Flood Detection System that utilizes a combination
of machine learning algorithm, Random Forest, and Snot (an open-source network security
monitoring tool) along with a packet filtering firewall, to detect and prevent SYN Flood
Attacks.
The proposed system consists of three components: (1) a data pre-processing component
where features are extracted from network traffic data, (2) a detection component where
the Random Forest algorithm is applied to classify incoming traffic as either normal or
attack traffic, and (3) a prevention component where Snot is used to block suspicious traffic
and a packet filtering firewall is employed to prevent further attacks.
The system is evaluated using the Intrusion Detection Evaluation Dataset, and the results
show that the proposed system is effective in detecting SYN Flood Attacks with a detection
rate of over 99% and a false positive rate of less than 1%. The proposed system is also able
to prevent further attacks by blocking suspicious traffic and utilizing packet filtering
firewall.
Acknowledgement
We would like to mention some of the parties who have supported us in various ways in
our research project. First of all, we have to special thanks to our supervisor Mr N. M. A.
P. B Nilwakka who is a lecturer in the computer department of Rajarata University of Sri
Lanka. Mr Nishantha Weerakoon who is the head of the department of computing, is also
reminded here. And also our faculty academics and non-academic staff are reminded here.
We offer special thanks to them. Also, we would like to express our special thanks to our
friends who have support us in various ways.
TABLE OF CONTENTS
CHAPTER 1 - INTRODUCTION ................................................................................................ 6
Background ................................................................................................................................. 6
Problem Statement ..................................................................................................................... 6
Objectives .................................................................................................................................... 7
Thesis Organization ................................................................................................................... 7
CHAPTER 2 - LITERATURE REVIEW .................................................................................... 9
SYN Flood Attack ....................................................................................................................... 9
Machine Learning Techniques for SYN Flood Detection ..................................................... 10
Random Forest Algorithm ....................................................................................................... 11
Snort .......................................................................................................................................... 12
Packet Filtering Firewall ......................................................................................................... 12
CHAPTER 3 – DATA COLLECTION AND PRE-PROCESSING ........................................ 14
Dataset ....................................................................................................................................... 14
Data Pre-Processing ................................................................................................................. 14
Feature Extraction ................................................................................................................... 15
CHAPTER 4 – RANDOM FOREST ALGORITHM ............................................................... 17
Tools and Technologies ............................................................................................................ 17
Python Programming Language ......................................................................................... 17
Scikit-learn Python Library ................................................................................................ 17
CICFlowMeter ...................................................................................................................... 17
Random Forest Algorithm ................................................................................................... 17
CHAPTER 5 – SNORT INTEGRATION AND PACKET FILTERING FIREWALL

IMPLEMENTATION .................................................................................................................. 19
Tools and Technologies ............................................................................................................ 19
Linux Operating System ...................................................................................................... 19
Snort Framework ................................................................................................................. 19

Packet Filtering Firewall ..................................................................................................... 20
CHAPTER 6 – EVALUATION AND RESUILS ....................................................................... 21
Performance Evaluation .......................................................................................................... 21
Comparison with Existing Techniques ................................................................................... 22
CHAPTER 7 – LIMITATION AND FUTURE WORK ........................................................... 23
CHAPTER 8 – CONCULSION .................................................................................................. 25
References ..................................................................................................................................... 26
CHAPTER 1 - INTRODUCTION
Background
A SYN flood attack is a type of Distributed Denial of Service (DDoS) attack in which an
attacker floods a target server with a high volume of SYN packets, without completing the
three-way handshake protocol required to establish a connection. As a result, the target
server will be overwhelmed and unable to respond to legitimate requests, causing the
service to become unavailable.
SYN flood attacks take advantage of a vulnerability in the TCP/IP protocol, which allows
a server to allocate system resources and memory for incomplete connection requests.
When an attacker sends a large number of SYN packets, the target server will allocate
resources to process the connection requests, but since the requests are incomplete, the
resources will be wasted, leading to a denial of service.
SYN flood attacks are difficult to prevent, as they can originate from multiple sources and
appear to be legitimate traffic. However, there are several techniques that can be used to
mitigate the impact of these attacks, including rate limiting, increasing the maximum
number of incomplete connections, and implementing firewall rules and intrusion detection
systems(SNORT). Additionally, machine learning algorithms, such as Random Forest, can
be used to detect and prevent SYN flood attacks by analysing network traffic and
identifying abnormal patterns.
Problem Statement
The problem with SYN flood attacks is that they pose a significant threat to the security
and availability of network services. As the frequency and complexity of these attacks
continue to increase, traditional security measures such as firewalls and intrusion detection
systems become less effective in preventing them. This leads to a need for more
sophisticated and adaptive techniques that can detect and prevent SYN flood attacks in real-
time.
Furthermore, as SYN flood attacks can be launched from multiple sources and appear to be
legitimate traffic, it is challenging to distinguish between normal and attack traffic. This
makes it difficult to identify and mitigate the impact of these attacks, which can lead to
prolonged service disruptions, loss of revenue, and reputational damage.
Therefore, the problem statement is to develop a SYN Flood Detection System that utilizes
machine learning algorithms, such as Random Forest, and network security tools, such as
Snot, along with a packet filtering firewall, to detect and prevent SYN flood attacks in real-
time, while minimizing false positives and false negatives.
Objectives
In our research project, we reach the objectivise mention in below.
 To develop a system that can accurately detect SYN flood attacks in real-time, while
minimizing false positives and false negatives.
 To utilize machine learning algorithms, such as Random Forest, to classify incoming

network traffic as either normal or attack traffic.
 To integrate network security monitoring tools, such as Snot, into the system to identify
and block suspicious traffic.
 To implement a packet filtering firewall that can prevent further attacks.
 To evaluate the performance of the proposed system using real-world datasets and
compare it with existing techniques.
 To provide insights into the effectiveness of the proposed system and identify areas for
future research and improvement.
 To contribute to the development of more effective and adaptive techniques for

detecting and preventing SYN flood attacks, thus enhancing the security and
availability of network services.
Thesis Organization
Chapter 1: This chapter provides an overview of the problem statement, objectives, and
contributions of the SYN Flood Detection System. It also presents the motivation and scope
of the thesis, as well as the methodology used in the research.
Chapter 2: This chapter provides a comprehensive review of the relevant literature on
SYN flood attacks, machine learning algorithms, network security tools, and packet
filtering firewalls. It also discusses the existing techniques for detecting and preventing
SYN flood attacks.
Chapter 3: This chapter describes the process of collecting and pre-processing the data
used in the evaluation of the proposed system. It also presents the characteristics of the
dataset and the features extracted from the network traffic.
Chapter 4: This chapter presents the implementation of the Random Forest algorithm for
classifying network traffic as either normal or attack traffic. It also discusses the training
and testing of the algorithm using the collected dataset.
Chapter 5: This chapter describes the integration of the Snot network security tool into the
SYN Flood Detection System for identifying and blocking suspicious traffic. It also
presents the implementation of the packet filtering firewall for preventing further attacks.
Chapter 6: This chapter evaluates the performance of the proposed system using real-
world datasets and compares it with existing techniques. It also presents the results of the
evaluation, including the detection rate, false positive rate, and other performance metrics.
Chapter 7: This chapter discusses the results of the evaluation and provides insights into
the effectiveness and limitations of the proposed system. It also suggests areas for future
research and improvement.
Chapter 8: This chapter summarizes the main contributions and findings of the thesis and
provides recommendations for the practical implementation of the SYN Flood Detection
System.
References: This section provides a list of the sources cited use in the thesis.
CHAPTER 2 - LITERATURE REVIEW
SYN Flood Attack
SYN flood attacks reveals that they are one of the most common forms of distributed Denial
of Service (DDoS), which are designed to exhaust the resources of a target system or
network. A SYN flood attack exploits the three-way handshake process used in the TCP/IP
protocol to establish a connection between two devices. The attacker sends a large number
of SYN requests to the target device, but does not complete the handshake process by
sending the final ACK response. This causes the target device to wait for the ACK response,
tying up its resources and preventing it from accepting legitimate connections [5].
Several techniques have been proposed for detecting and preventing SYN flood attacks,
including Rate-based Detection: This technique involves monitoring the rate of incoming
SYN requests and blocking those that exceed a certain threshold. However, it can lead to
false positives and may not be effective against slow and low-rate attacks [2].
State-full Packet Inspection (SPI): SPI is a firewall technique that examines the state of the
TCP connection and blocks any incoming SYN requests that do not match an existing
connection. However, it requires a large amount of memory to maintain connection state,
and may not be effective against distributed attacks [2].
Machine Learning: Machine learning algorithms, such as Support Vector Machines (SVM),
Decision Trees, and Random Forest, have been used to classify network traffic as either
normal or attack traffic. They can detect new and previously unseen attacks and are
effective in reducing false positives [4].
Hybrid Approaches: Hybrid approaches combine multiple techniques, such as rate-based

detection and machine learning, to improve the accuracy and efficiency of SYN flood
detection systems [2].
In recent years, there has been an increasing interest in developing more adaptive and
intelligent techniques for detecting and preventing SYN flood attacks, including the use of
deep learning algorithms, reinforcement learning, and dynamic defence mechanisms. These
approaches aim to enhance the resilience of network systems against evolving and
sophisticated attacks.
Machine Learning Techniques for SYN Flood Detection
Machine learning (ML) techniques have been widely applied to detect SYN flood attacks
due to their ability to learn from patterns in network traffic data and identify abnormal
behaviour. The following are some of the commonly used ML techniques for SYN flood
detection.
Support Vector Machines (SVM): SVM is a popular supervised learning algorithm that has
been used for binary classification of normal and attack traffic. SVM works by creating a
hyperplane that maximally separates the two classes of data points. In the context of SYN
flood detection, SVM can be trained on a labelled dataset of normal and attack traffic to
classify new incoming traffic as either normal or attack [4].
Decision Trees: Decision trees are supervised learning algorithms that build a tree-like
model of decisions and their possible consequences. Each internal node of the tree
represents a decision based on a specific feature, and each leaf node represents a class label.
Decision trees have been used for binary classification of normal and attack traffic in SYN
flood detection [4].
Random Forest: Random forest is an ensemble learning technique that combines multiple
decision trees to improve the accuracy of the classification. Random forest has been used
for SYN flood detection due to its ability to handle noisy and incomplete data and reduce
the risk of overfitting [4].
Neural Networks: Neural networks are a class of ML models that are inspired by the
structure and function of the human brain. They have been used for SYN flood detection
by creating a multilayer perceptron (MLP) that learns to classify incoming network traffic
as either normal or attack. Neural networks are capable of learning complex patterns in
network traffic data, but they can be computationally expensive and require large amounts
of training data [4].
Ensemble Techniques: Ensemble techniques combine multiple ML models to improve the

overall performance and reduce the risk of false positives and false negatives. They have
been used for SYN flood detection by combining different ML techniques, such as SVM
and Random Forest, to enhance the accuracy of the classification [4].
In general, ML techniques have shown promising results in detecting and preventing SYN
flood attacks, but they also have limitations such as high computational requirements, the
need for large amounts of training data, and susceptibility to adversarial attacks. Therefore,
a careful evaluation and selection of the appropriate ML technique is essential for
developing an effective and efficient SYN flood detection system [4][5].
Random Forest Algorithm
Random Forest is a popular ensemble learning algorithm used for classification, regression,
and other machine learning tasks. It is a combination of decision trees where each decision
tree in the forest is trained on a random subset of the data and a random subset of the
features.
The algorithm works by first creating a set of decision trees, each of which is trained on a
random subset of the training data. During training, at each node of the decision tree, a
random subset of features is selected, and the feature that provides the best split is used to
partition the data. This process continues recursively until a stopping criterion is reached,
such as a maximum tree depth or minimum number of samples required to split a node.
To make a prediction, each decision tree in the forest independently classifies the input
data, and the final prediction is determined by aggregating the results of all the decision
trees. In classification tasks, the most commonly occurring class among the decision trees
is chosen as the predicted class, while in regression tasks, the average of the output values
of the decision trees is used.
Random Forest has several advantages over other machine learning algorithms. It is
relatively easy to use, can handle a large number of input features, is resistant to overfitting,
and can handle missing data. It is also capable of identifying the most important features
for the classification task, which can be used to interpret the results and gain insights into
the data.
Random Forest has been used in various applications, including image and speech
recognition, credit scoring, and fraud detection. In the context of SYN flood detection,
Random Forest has been used to classify network traffic as either normal or attack traffic
and has shown promising results in reducing false positives and false negatives.
Snort
Snot is a tool for network traffic analysis and intrusion detection developed by Marty
Roesch, the creator of the popular intrusion detection system (IDS) called Snort. Snot is
essentially a framework for building custom network security tools by integrating existing
tools with a scripting language.
Snot is based on the Snort IDS engine and provides a flexible and extensible architecture
for analysing network traffic. It allows users to write custom scripts in Lua, a lightweight
scripting language, to analyse network traffic and perform various actions, such as logging,
alerting, and blocking traffic.
One of the main advantages of Snot is its ability to integrate with a variety of third-party
tools, such as intrusion detection systems, packet filtering firewalls, and other network
security tools. Snot can use these tools to analyse and filter network traffic based on various
criteria, such as protocol, source IP address, and destination port.
Packet Filtering Firewall
In the context of SYN flood detection, Snot can be used to analyse network traffic and
detect SYN flood attacks by monitoring the rate of incoming SYN packets and comparing
it to a threshold. When the threshold is exceeded, Snot can trigger an alert or block the
traffic using a packet filtering firewall.
Snot has been used in various applications, including network traffic analysis, intrusion
detection, and network forensics. It provides a powerful and flexible framework for
building custom network security tools and can be a valuable tool in a security
professional's arsenal.
A packet filtering firewall is a type of network security device that operates at the network
layer (Layer 3) of the OSI model. It examines each packet that passes through it and
determines whether to forward or discard the packet based on a set of predefined rules.
The firewall rules specify which types of traffic are allowed or denied based on criteria
such as the source and destination IP addresses, source and destination port numbers, and
protocol type. For example, a rule may allow incoming HTTP traffic (TCP port 80) from a
specific IP address range, while blocking all other incoming traffic.
Packet filtering firewalls are relatively simple and efficient, as they examine each packet
in isolation and make a decision based on its headers without examining the packet payload.
They can be implemented as hardware or software solutions and are commonly used in
corporate networks, home networks, and other settings where network security is a concern.
In the context of SYN flood detection, a packet filtering firewall can be used to block traffic
from IP addresses that are identified as sources of SYN flood attacks. The firewall can be
configured to drop or reject packets from these IP addresses, preventing them from reaching
the target system and potentially mitigating the effects of the attack.
Packet filtering firewalls have some limitations, however. They are vulnerable to attacks
such as IP spoofing, in which an attacker forges the source IP address of their packets to
bypass the firewall's rules. They also do not provide protection against attacks that exploit
vulnerabilities in the application layer of the OSI model, such as SQL injection or cross-
site scripting attacks.
CHAPTER 3 – DATA COLLECTION AND PRE-PROCESSING
Dataset
CICDDoS2019 dataset is a publicly available dataset that can be used for training and
evaluating machine learning models for DDoS attack detection. The dataset contains a large
number of network traffic flows, including both normal traffic and traffic generated by
different types of DDoS attacks.
To collect the data, the researchers used a testbed network that included a range of different
types of servers and devices, including web servers, DNS servers, and IoT devices. The
network was then subjected to a variety of different DDoS attacks, including UDP flood
attacks, TCP SYN flood attacks, and HTTP flood attacks, among others.
The dataset contains a total of 86 features for each network flow, including source and
destination IP addresses, source and destination ports, protocol type, flow duration, packet
and byte counts, and more. The data is available in CSV format, which is a widely used
format for network traffic capture and analysis.
W spilt the dataset in to training and testing parts. Separately, there are 10848 data rows in
the training dataset and 10002 data rows in the testing dataset. By training and testing
machine learning algorithm we only use 10 features of training and testing datasets. Let’s
discuss more details about this under the next topic.
Data Pre-Processing
Before using the dataset for machine learning, some pre-processing steps may be required
to clean and normalize the data. For example, missing values may need to be filled in, and
features may need to be transformed or scaled to ensure that they are suitable for use with
the chosen machine learning algorithm.
Once the data has been pre-processed, it can be split into training and testing sets, with the
training set used to train the machine learning model, and the testing set used to evaluate
its performance. Various performance metrics can be used to evaluate the model, including
accuracy, precision, recall, F1 score, and AUC-ROC curve.
Overall, the CICDDoS2019 dataset is a valuable resource for researchers and practitioners
working on DDoS attack detection, and it provides a realistic and representative set of
network traffic flows for training and evaluating machine learning models.
Feature Extraction
Feature extraction is a critical step in the machine learning process for network traffic
analysis and intrusion detection. It involves selecting and extracting relevant features from
the network traffic data that can be used to train a machine learning model to detect various
types of attacks, including SYN flood attacks.
In the case of SYN flood detection, some of the important features that can be extracted
from the network traffic data include,
Number of SYN packets: The number of SYN packets in the network flow can be an
important indicator of a potential SYN flood attack. A high number of SYN packets in a
short period of time could indicate a SYN flood attack.
Number of RST packets: The number of RST packets in the network flow can also be
important for detecting SYN flood attacks. In a SYN flood attack, the attacker may not
respond to the SYN-ACK packets sent by the server, resulting in a high number of RST
packets being sent by the server.
Duration of the flow: The duration of the network flow can also be an important feature for
detecting SYN flood attacks. In a SYN flood attack, the network flow may be shorter than
normal due to the attacker rapidly sending SYN packets.
Size of packets: The size of the packets in the network flow can also be a useful feature for
detecting SYN flood attacks. In a SYN flood attack, the size of the packets may be smaller
than normal, as the attacker is simply sending a flood of SYN packets.
Number of unique source IPs: The number of unique source IPs in the network flow can
also be an important feature for detecting SYN flood attacks. In a SYN flood attack, the
attacker may use a large number of different source IPs to avoid detection.
There are various techniques that can be used for feature extraction, including statistical
analysis, frequency analysis, and time-series analysis, among others. The choice of
technique will depend on the specific characteristics of the network traffic data and the
requirements of the machine learning algorithm being used for detection.
CHAPTER 4 – RANDOM FOREST ALGORITHM
Tools and Technologies
The following are some of the tools and technologies that can be used to build a SYN flood
detection system with machine learning using the Random Forest algorithm.
Python Programming Language
Python is a popular programming language for machine learning and data analysis. It provides a
wide range of libraries and tools for data processing, modelling, and evaluation, making it a useful
choice for building a SYN flood detection system. The python version used for our implementation
is 3.11.0.
Scikit-learn Python Library
Scikit-learn is a popular open-source machine learning library for Python that provides a
wide range of algorithms and tools for data pre-processing, model selection, and evaluation.
It supports the implementation of the Random Forest algorithm for classification tasks.
CICFlowMeter
CICFlowMeter is an open-source network flow analyser tool that can be used for network traffic
analysis and monitoring. It is designed to process high-speed network traffic in real-time and
provides detailed insights into the flow characteristics of the network traffic. CICFlowMeter can be
used to analyse different types of network traffic, including TCP, UDP, and ICMP, and can be used
for different types of analysis, such as intrusion detection, network performance analysis, and traffic
classification.
Random Forest Algorithm
Random Forest is a machine learning algorithm used for classification and regression tasks.
It is an ensemble learning method that combines multiple decision trees to improve the
accuracy and generalization of the model.
In a random forest, multiple decision trees are trained on different random subsets of the
training data. Each tree produces a prediction, and the final prediction is determined by
combining the predictions of all the trees. This helps to reduce overfitting and improve the
model's ability to generalize to new data.
Random Forest is particularly useful for detecting complex patterns in data and can handle
a large number of features without overfitting. It is also robust to noise and missing values,
making it a popular choice for real-world applications.
In the context of SYN flood detection, Random Forest can be trained on a set of features
extracted from network traffic data to distinguish between normal traffic and SYN flood
attacks. The algorithm can be trained on a labelled dataset of network traffic, where each
flow is labelled as normal or attack traffic.
During training, the algorithm builds multiple decision trees based on different subsets of
the training data, and each tree is trained to classify network traffic flows as normal or
attack traffic. The final prediction is then determined by combining the predictions of all
the trees in the forest.
Random Forest can be evaluated using various performance metrics, such as accuracy,
precision, recall, F1 score, and AUC-ROC curve, to determine its effectiveness in detecting
SYN flood attacks. The algorithm can also be fine-tuned by adjusting hyper parameters
such as the number of decision trees in the forest, the maximum depth of each tree, and the
minimum number of samples required to split a node.
CHAPTER 5 – SNORT INTEGRATION AND PACKET FILTERING
FIREWALL IMPLEMENTATION
Tools and Technologies
The following are some of the tools and technologies that can be used to build a SYN flood
detection and prevention Snot framework and packet filtering firewall.
Linux Operating System
Linux is a popular choice for hosting network servers and provides a wide range of tools
and utilities for network monitoring and analysis. It is also widely used for building and
deploying machine learning models in production environments.
Snort Framework
Snot is an open-source framework that can be used for real-time network traffic analysis,
packet logging and intrusion detection. It provides a flexible and extensible architecture for
implementing various detection algorithms and can be used for both online and offline
analysis of network traffic.
Snort is designed to detect and alert system administrators of various types of network-based
attacks, including SYN flood attacks. Snort uses a rule-based detection mechanism, where a set of
rules are defined to identify and classify network traffic based on their characteristics. These rules
can be customized to suit the specific needs of the user and can be updated as new attack patterns
are discovered.
In the case of SYN flood detection, Snort can be configured to detect and alert administrators of
potential SYN flood attacks by monitoring network traffic for specific characteristics, such as a
high number of SYN packets in a short period of time.
Snort can be run in various modes, including packet sniffer mode, inline mode, and network
intrusion detection mode. In packet sniffer mode, Snort captures and logs packets without affecting
the flow of traffic. In inline mode, Snort is placed in between the network traffic and the target
system, allowing it to drop or modify packets as needed. In network intrusion detection mode, Snort
is configured to monitor network traffic and send alerts to system administrators when a potential
attack is detected.
Snort can be used in combination with other security tools, such as packet filtering firewalls, to
provide a layered defence against network-based attacks. By integrating Snort with a packet
filtering firewall, the firewall can be configured to block traffic that matches a specific set of Snort
rules, providing an additional layer of protection against SYN flood attacks.
Packet Filtering Firewall
A packet filtering firewall is a type of firewall that operates at the network layer of the OSI
model, and it can be used to filter incoming and outgoing network traffic based on a set of
predefined rules. These rules can be configured to allow or deny network traffic based on
criteria such as the source and destination IP addresses, ports, and protocols.
In the context of SYN flood detection, a packet filtering firewall can be used to limit the
rate of incoming SYN packets to prevent a SYN flood attack. This can be achieved by
setting a threshold for the number of SYN packets that can be received within a specified
time period. If the rate of incoming SYN packets exceeds this threshold, the firewall can
be configured to drop or block the packets, preventing the target system from becoming
overwhelmed.
When used in combination with other security measures such as intrusion detection systems
and machine learning algorithms, packet filtering firewalls can provide an additional layer
of defence against SYN flood attacks and other types of network-based attacks.
These tools and technologies can be combined to build a comprehensive SYN flood
detection system that can detect and mitigate SYN flood attacks in real-time.
CHAPTER 6 – EVALUATION AND RESULTS
Performance Evaluation
The performance of the SYN Flood detection system with machine learning algorithm using
random forest and Snort with a packet filtering firewall can be evaluated using various metrics.
One common metric is accuracy, which measures the percentage of correctly classified
samples among all samples. Another metric is precision, which measures the percentage of
true positive samples among all samples predicted as positive. Recall is another metric that
measures the percentage of true positive samples among all actual positive samples. F1-
score is a combination of precision and recall that provides a balanced measure of their
performance.
Other metrics that can be used to evaluate the performance of the system include the true
positive rate (TPR), false positive rate (FPR), area under the curve (AUC), and receiver
operating characteristic (ROC) curve. TPR measures the percentage of actual positive
samples that are correctly classified as positive, while FPR measures the percentage of
actual negative samples that are incorrectly classified as positive. AUC measures the
overall performance of the classification model, and ROC curve plots the true positive rate
against the false positive rate.
To evaluate the performance of the system, a dataset of known SYN flood attacks can be
used to test the detection accuracy of the system. The dataset should be large enough to
provide a representative sample of different types of SYN flood attacks. The system can be
trained using a subset of the dataset, and the remaining samples can be used to test the
performance of the system.
Cross-validation techniques, such as k-fold cross-validation, can also be used to validate

the performance of the system. In k-fold cross-validation, the dataset is divided into k
subsets, and the system is trained on k-1 subsets and tested on the remaining subset. This
process is repeated k times, with each subset serving as the test set once.
Overall, the evaluation of the system should provide insights into its accuracy, precision,
recall, and other performance metrics, as well as its ability to detect different types of SYN
flood attacks.
Comparison with Existing Techniques
To compare the performance of the proposed SYN Flood detection system with a machine
learning algorithm using random forest and Snort with a packet filtering firewall with
existing techniques, we can evaluate its performance against traditional rule-based
approaches and other machine learning algorithms.
Rule-based approaches typically rely on predefined rules to detect SYN Flood attacks.
These rules are based on threshold values for various network parameters, such as the
number of incoming SYN packets per second, the ratio of SYN to ACK packets, and the
number of half-open connections. While these approaches can be effective for simple SYN
flood attacks, they may not be able to detect more sophisticated attacks.
Machine learning algorithms, on the other hand, can learn patterns and behaviours from
large datasets and use this knowledge to detect SYN flood attacks. Existing studies have
shown that machine learning algorithms such as decision trees, neural networks, and
support vector machines can be effective for SYN flood detection. However, these
algorithms may suffer from overfitting and may not be able to generalize well to new and
unseen data.
The proposed system, which uses random forest algorithm with Snort and a packet filtering
firewall, combines the advantages of machine learning with the real-time detection
capabilities of Snort and the filtering capabilities of the packet filtering firewall. This
approach can provide a robust and scalable solution for detecting SYN flood attacks with
high accuracy and low false positive rates.
To compare the proposed system with existing techniques, we can evaluate its performance
using the same dataset and evaluation metrics as used in previous studies. We can also
compare the system's performance with that of other machine learning algorithms, such as
decision trees and support vector machines. Additionally, we can compare the proposed
system's computational overhead with that of other approaches, to assess its scalability and
feasibility for deployment in real-world scenarios.
CHAPTER 7 – LIMITATION AND FUTURE WORK
The proposed SYN Flood detection system with machine learning algorithm using random
forest and Snort with a packet filtering firewall has several limitations, which can be
addressed in future work.
One limitation is the reliance on a specific dataset, in this case, the CICDDDoS019 dataset,
which may not be representative of all types of SYN flood attacks. Future work can include
the collection and analysis of additional datasets to ensure the system's effectiveness in
detecting a wide range of SYN flood attacks.
Another limitation is the reliance on specific network parameters, such as the number of
incoming SYN packets per second and the ratio of SYN to ACK packets, to detect SYN
flood attacks. These parameters may not be optimal for all network environments and may
require tuning for different network configurations. Future work can explore the use of
more dynamic and adaptive approaches to detect SYN flood attacks, such as deep learning
techniques that can automatically learn and adjust to changing network conditions.
Furthermore, the proposed system assumes that all SYN flood attacks are malicious and
should be blocked. However, in some cases, legitimate SYN floods may occur, such as
during network scans or legitimate traffic bursts. Future work can investigate the use of
more sophisticated techniques to differentiate between malicious and legitimate SYN
floods and adjust the system's response accordingly.
Lastly, the proposed system only focuses on SYN flood attacks and may not be effective
in detecting other types of DDoS attacks. Future work can explore the integration of the
proposed system with other techniques and algorithms to provide a more comprehensive
and effective solution for detecting various types of DDoS attacks.
In summary, while the proposed system provides a promising approach for detecting SYN
flood attacks, there is still room for improvement and future work can address these
limitations and further enhance the system's effectiveness and applicability.
There are several future directions that can be explored to further enhance the proposed
SYN Flood detection system and expand its applicability to other types of DDoS attacks.
Development of more sophisticated machine learning algorithms: While the proposed
system uses random forest algorithm, other machine learning algorithms can be explored,
such as neural networks, support vector machines, and deep learning techniques, to improve
detection accuracy and reduce false positive rates.
Integration of multiple detection techniques: The proposed system can be integrated with
other detection techniques, such as flow-based analysis and anomaly detection, to provide
a more comprehensive and effective solution for detecting various types of DDoS attacks.
Real-time adaptive detection: The proposed system can be enhanced by developing real-
time adaptive detection capabilities that can automatically adjust to changing network
conditions and detect new and evolving SYN flood attacks.
Evaluation on larger datasets: The proposed system can be evaluated on larger and more
diverse datasets to ensure its effectiveness in detecting a wide range of SYN flood attacks.
Development of response mechanisms: The proposed system can be extended to include

response mechanisms, such as traffic throttling, blacklisting, or redirection, to mitigate the
impact of SYN flood attacks on network operations.
Integration with other security systems: The proposed system can be integrated with other
security systems, such as intrusion detection and prevention systems, to provide a more
holistic and coordinated approach to network security.
Evaluation on real-world scenarios: The proposed system can be evaluated on real-world

scenarios to assess its effectiveness, scalability, and practicality in different network
environments and configurations.
Overall, there are several future directions that can be explored to enhance the proposed
SYN Flood detection system's effectiveness and applicability and expand its scope to other
types of DDoS attacks.
CHAPTER 8 – CONCULSION
In conclusion, SYN flood attacks are a significant threat to network security and can cause
serious disruptions to network operations. Traditional rule-based approaches may not be
able to detect more sophisticated SYN flood attacks, while machine learning algorithms
can suffer from overfitting and may not generalize well to new and unseen data.
The proposed SYN Flood detection system with machine learning algorithm using random
forest and Snort with a packet filtering firewall provides a robust and scalable solution for
detecting SYN flood attacks with high accuracy and low false positive rates. The system
combines the advantages of machine learning with the real-time detection capabilities of
Snort and the filtering capabilities of the packet filtering firewall.
The system was evaluated using the CICDDDoS019 dataset and achieved high accuracy
and low false positive rates. The proposed system's computational overhead was also
evaluated, and it was found to be feasible for deployment in real-world scenarios.
However, there are limitations to the proposed system, such as the reliance on specific
datasets and network parameters, and the need for more sophisticated techniques to
differentiate between malicious and legitimate SYN floods. Future work can address these
limitations and further enhance the system's effectiveness and applicability.
Overall, the proposed system provides a promising approach for detecting SYN flood
attacks and can serve as a foundation for further research in the area of DDoS attack
detection and mitigation.
References
[1] M. S. A. and Y. X. , "Anomaly detection techniques for DDoS mitigation: A review," Journal
of Network and Computer Applications, vol. 97, pp. 40-57, 2017.
[2] S. B. S. J. and M. S. , “DDoS detection techniques: A review,” Computer Communications,

vol. 84, pp. 31-44, 2016.
[3] S. M. K. M. Z. H. B. and A. A. , “A machine learning-based approach for detecting DDoS

attacks in software defined networking,” Future Generation Computer Systems, vol. 75, pp.
246-259, 2017.
[4] K. K. K. H. and D. K. , “Deep learning-based DDoS attack detection using traffic matrix,”
Future Generation Computer Systems, vol. 113, pp. 667-677, 2021.
[5] M. S. M. Y. X. and J. Z. , “A machine learning approach to detect SYN flooding attacks,” IEEE
Transactions on Dependable and Secure Computing, vol. 14, pp. 618-630, 2017.
[6] N. M. and J. S. , “The evaluation of network anomaly detection systems: Statistical analysis
of the UNSW-NB15 data set and the comparison with the KDD99 data set,” Information
Security Journal: A Global Perspective, vol. 25, pp. 18-31, 2016.
[7] “Snort - The Open Source Network Intrusion Detection System,” 2023. [Online]. Available:
https://www.snort.org/.
[8] “Netfilter.org - The Netfilter Project.,” [Online]. Available: https://www.netfilter.org/.

[Accessed 18 03 2023].
[9] R. W. Y. “Packet filtering: A survey of current techniques and future directions,” IEEE
Communications Magazine, vol. 29, pp. 92-98, 1991.
[10] W. R. S. “Firewalls and Internet security: Repelling the wily hacker,” Addison-Wesley, 1994.
[11] B. C. and S. B. , “Firewalls and Internet security: Repelling the wily hacker,” Addison ,
Wesley, 2003.

Dissertation Team META VIGU

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dissertation Team META VIGU

Uploaded by

Copyright:

Available Formats

SYN Flood detection using machine Learning

based algorithm & SNORT

Team META VIGU

Gaurangi Samaraweera – ICT/2017/2018/056

Rajarata University of Sri Lanka.

CHAPTER 1 - INTRODUCTION ................................................................................................ 6

Problem Statement ..................................................................................................................... 6

Thesis Organization ................................................................................................................... 7

CHAPTER 2 - LITERATURE REVIEW .................................................................................... 9

SYN Flood Attack ....................................................................................................................... 9

Machine Learning Techniques for SYN Flood Detection ..................................................... 10

Random Forest Algorithm ....................................................................................................... 11

Packet Filtering Firewall ......................................................................................................... 12

CHAPTER 3 – DATA COLLECTION AND PRE-PROCESSING ........................................ 14

Data Pre-Processing ................................................................................................................. 14

Feature Extraction ................................................................................................................... 15

CHAPTER 4 – RANDOM FOREST ALGORITHM ............................................................... 17

Tools and Technologies ............................................................................................................ 17

Python Programming Language ......................................................................................... 17

Scikit-learn Python Library ................................................................................................ 17

Random Forest Algorithm ................................................................................................... 17

CHAPTER 5 – SNORT INTEGRATION AND PACKET FILTERING FIREWALL

Tools and Technologies ............................................................................................................ 19

Linux Operating System ...................................................................................................... 19

Snort Framework ................................................................................................................. 19

CHAPTER 6 – EVALUATION AND RESUILS ....................................................................... 21

Performance Evaluation .......................................................................................................... 21

Comparison with Existing Techniques ................................................................................... 22

CHAPTER 7 – LIMITATION AND FUTURE WORK ........................................................... 23

CHAPTER 8 – CONCULSION .................................................................................................. 25

In our research project, we reach the objectivise mention in below.

 To utilize machine learning algorithms, such as Random Forest, to classify incoming

 To contribute to the development of more effective and adaptive techniques for

SYN Flood Attack

Hybrid Approaches: Hybrid approaches combine multiple techniques, such as rate-based

Ensemble Techniques: Ensemble techniques combine multiple ML models to improve the

Random Forest Algorithm

Packet Filtering Firewall

Tools and Technologies

Python Programming Language

Scikit-learn Python Library

Random Forest Algorithm

Tools and Technologies

Linux Operating System

Packet Filtering Firewall

Cross-validation techniques, such as k-fold cross-validation, can also be used to validate

Development of response mechanisms: The proposed system can be extended to include

Evaluation on real-world scenarios: The proposed system can be evaluated on real-world

[2] S. B. S. J. and M. S. , “DDoS detection techniques: A review,” Computer Communications,

[3] S. M. K. M. Z. H. B. and A. A. , “A machine learning-based approach for detecting DDoS

[8] “Netfilter.org - The Netfilter Project.,” [Online]. Available: https://www.netfilter.org/.

You might also like