Professional Documents
Culture Documents
NCCS Technical Report v2
NCCS Technical Report v2
NCCS Technical Report v2
DR NAJAM UL ISLAM
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 2
List of Abbreviations and Acronyms
PI Principal Investigator
Co-PI Co-Investigator
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 3
Table of Contents
PART A - project report.............................................................................................................
1. PROJECTS details.............................................................................................................
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 4
2.1.4.2. Implementation Details.....................................................................................................4
2.1.4.3. Results (Actual and Perceived).......................................................................................4
2.1.4.4. Analysis and Discussion...................................................................................................4
2.1.4.5. Testing................................................................................................................................4
2.1.4.6. Additional Features Activities added...............................................................................4
2.2. Milestone/Deliverable 2:................................................................................................
2.3. Milestone/Deliverable 3:................................................................................................
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 5
4.1.4. Implementation, Analysis, theoretical and/or analytical models and results.............8
4.2. Milestone/Deliverable 2:................................................................................................
4.3. Milestone/Deliverable 3:................................................................................................
5. OUTCOMES...........................................................................................................................
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 6
Structure of the Periodic Report
The periodic technical report must be submitted by the project PIs within two weeks following the end of each
reporting period.
Part A of the periodic technical report contains the cover page, publishable summary and the answers to the
questionnaire covering issues related to the project implementations and its impact in the context of key
performance indicators and the milestones/deliverables committed in the NCCS PC-1 Document.
Part B of the periodic technical report is the narrative part that includes explanations of the work carried out by the
beneficiaries during the reporting period. Note: Part B should not exceed Ten (10) pages limit (excluding Title page,
TOC, List of figures, Bibliography & Appendices).
Part A and Part B both needs to be submitted as a PDF document following the template provided in this
document.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 7
PART A - PROJECT REPORT
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 8
1. PROJECTS DETAILS
P1 7th – 12th
months
P1 13th – 18th
months
P1 19th – 24th
months
P1 25th – 30th
months
P1 31st – 36th
months
P1 37th – 42nd
months
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 9
1.1.1. Input Parameters of the Lab
Ser Input
. Equipment Budget HR Budget
1.1.3. Detail of HR (complete HR, including PI, Co-PI and everyone else)
Ser. Personnel Permanent Qualification Project Working Remun Details of Tasks Contribution
Detail with Employment (PhD, MS, BS for which Detail (Total eration Assigned towards
start and Status Certification) employed hours / week Paid overall
end date of (Organization and timings per objective of
employme and for each day month the Lab
nt appointment) of week)*
(students
give details of
degree with
start and end
dates)
Working Workin Task Due Status
Schedule g Hour Assigned Date
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 10
Technical Problem Problem’s Perceived Resources Required Resources Released Impact
Problem likely Impact on Solution mitigation plan
cause(s) the project (if any)
Equipment Trained Other Equipment HR Other
HR
* Permanent employed and students (not employed) to provide written consent for their employability at lab with hours / week
and timings by the employer and educational institution respectively
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 11
Ser Output Target as per KPI Outcome Impact
.
Journal Linkage to the project, deliverables,
Publications products, PC-1 objectives
Conference Networking with top tier researchers &
Publications Cyber Security experts, in addition to
above
Non PSDP 15 million (after 3 Self-Sustainability
Research Fund years completion)
from External
Sources
Industrial 7.5 million (after 3 Self-Sustainability & Commercial
Project Funding years completion) Viability
Startups 1 Solution to local problems, commercial
viability
Trained 3 trainings per year Skilled workforce & Knowledge workers
Professional (10 paid attendees
per training)
Indigenous
capacity
building
Products / Sponsored user, utility, user feedback,
deliverables delivery date, potential users
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 12
Release Expenditure Expenditure on Amount quarter
s on Equipment HR with the lab
1
2
3
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 13
PART B – PERIODIC REPORT (TECHNICAL DETAILS)
(The information provided in this section will only be available to NCCS Secretariat, members of the NCCS NSC, and the NCCS Scientific and Industrial
Advisory Board)
Lab Project (s) Titles:
PI Name:
Domain: Duration Employment Milestone Deliverable [% Status Impact as
Start/End Completed] per PC-1
Date
Co PI name: 1-
0-6 Months 2-
7-12 Months 3-
12-18 Months
19-24 Months
24-30 Months
30-36 Months
37-42 Months
Co PI name: 1-
2-
0-6 Months 3-
7-12 Months
12-18 Months
19-24 Months
24-30 Months
30-36 Months
37-42 Months
Co PI name: 0-6 Months 1-
7-12 Months 2-
12-18 Months 3-
19-24 Months
24-30 Months
30-36 Months
37-42 Months
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 2
2. DOMAIN 1: DETECTION ENGINE
Sr. Project Title/ Prior / Existing Work prior to Market Requirement Project Progress Status
No. Domain this quarter (reporting period)
2.1. Milestone/Deliverable 1:
During the last decade, attackers have compromised a lot of victim systems to launch massive Distributed Denial of
Service (DDoS) attacks against banking services, corporate websites, and e-commerce businesses, etc. Such attacks
can cause enormous financial losses and ruin their services to authorised users. Different solutions have been
proposed to fight against such DDoS attacks, but no ideal solution has been found till date. To validate the majority of
existing solutions, researchers have been using simulation-based experiments, but currently the trend has shifted to
publicly available realistic datasets for DDoS validation purposes. Thus, in this research study, we have provided a
comprehensive review of current datasets and proposed a novel taxonomy for the classification of DDoS attacks.
Further, we generated a new dataset called "CRCDDoS2022", which can overcome all existing shortcomings.
Moreover, with our new dataset, we have provided a new family classification and detection approach. This approach is
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 3
based on the set of features in network flow. Lastly, we gave the most important sets of features for detecting different
Currently, a critical problem in internet-based interconnected systems is cyber-attacks. DDoS attacks have emerged as
a significant threat to Internet services [1]. In a DDoS attack, the attacker produces a huge amount of traffic and
exhausts the resources of victim systems. This is normally started by one attacker who exploits and takes control of
multiple devices called "zombies." These zombies do not know the fact that they are compromised and being used for
an attack. Normally, a sweep operation is conducted by the attacker to identify devices eligible to become zombies, like
a device having an open port. After that, the attacker uses zombie devices to launch an attack. The detection of attacks
proves difficult because the number of zombie devices can reach hundreds, thousands, or even millions [2]. Different
techniques have been presented for the prevention of DDoS attacks. However, this is still a significant threat to network
security. The existing solutions have anomaly-based and signature-based techniques for intrusion detection [3,4].
Signature-based Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) play the most active roles
in defending against cyberattacks but are mostly ineffective against zero-day and distributed denial of service (DDoS)
attacks. Current research shows anomaly-based detection approaches are effective in intrusion detection, and they
have received good attention from the research community in recent years. While the signature-based system is easy
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 4
to implement, it has limitations in terms of the known signatures. Anomaly approaches: Machine Learning (ML) or deep
learning (DL), a subset of Artificial Intelligence (AI), can be used to distinguish between benign and abnormal traffic.
Telecom vendors are now focusing more on anomaly-based IDS solutions because of the advanced computing power
and the effectiveness of identifying cyber-attacks by anomaly approaches. Palo Alto Networks has just released the
first-ever anomaly-based IDS system in June 2020 [5]. However, the performance of the anomaly-based detection
approach depends on useful datasets to train. With the higher accuracy of the learning, various network attacks can be
detected.
The DARPA [1] dataset proposed in 2000 includes three datasets, such as the DDoS attack run by a novice attacker,
the DDOS attack run by a stealthy attacker, and the Windows NT attack dataset. Researchers extract features that will
serve as flags in DDoS attack detection by studying application layer attack tools. A flow correlation coefficient is
defined, which is helpful in the detection of flash mob DDOS attacks. The principal behind the usage of the flow
correlation coefficient is that the flow standard deviation for an attack is less as compared to legitimate traffic. CAIDA
[2] is a DDOS attack dataset proposed in 2007 that contains traffic traces. Details of the attack and its response are
also present in pcap format. Traffic traces are anonymized by removing payloads from the packets. Due to IP spoofing,
IP routing stateless nature, etc., traces are exceedingly difficult to gather. The DARPA dataset consists of three
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 5
datasets. This includes DDOS attack information that is generated by novice attackers. The second dataset is LLDOS
2.0.2, which contains attack information that is generated by stealthy attackers. And the third dataset is the Windows
The basic purpose of the [3] BOUN dataset is that it can be used to evaluate network-based intrusion detection
techniques. Traffic is taken from the campus router mirroring method. Recorded traffic is converted. csv file using
Wireshark software. Two attack scenarios are included in this dataset. For flooding purposes, in both scenarios,
randomly generated spoofed destination IP addresses are used. In a TCP flood attack, Port 80 is used as the
destination port. A realistic dataset is developed using Spirent communication's state-of-the-art emulator, CyberFlood
CF20, which fulfils the needs of network topology. ([4] CyberFlood-CF20 is a user-friendly testing platform which
generates realistic attack scenarios for testing IDS. Performance and scalability. In a DDOS attack, hundreds of
zombies attack servers to consume network bandwidth. CF20 is also used to simplify network configuration and create
network topology to develop efficient datasets. CF20 also eliminates other DDOS simulation tools like LOIC and HOIC
because these simulation tools create limited attack vectors. A realistic dataset for intrusion detection must have the
following requirements: a set of prominent features and an efficient machine learning algorithm for detection.
CICIDS2017 is proposed in 2017, which is very comprehensive for intrusion detection. [5] Using a network topology
with a benign background, some datasets generate various attack traffic. These datasets use CICFlowMeter to analyse
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 6
the generated traffic. Some researchers simulated 25 users' traffic with five different protocols to generate realistic
2.1.3. Methodology
Some taxonomies of DDOS attacks are presented in this section. Mirkovic and Reihe discussed [6] some DDOs' attack
taxonomies and defence mechanisms here. DDOS attacks are categorised into the following categories: automation,
vulnerability, source address validity, attack rate dynamics, victim, and impact on the victim, etc. In most automation-
based methods, an attacker must manually and automatically search for vulnerabilities on a machine. Bhardwaj et al.
proposed a taxonomy of DDOS attacks based on cloud computing. [7] DDOS attacks regarding cloud computing are
categorised as: degree of automation, attack impact, attack rate, and vulnerability. Another research also identifies the
same DDOS attacks along with the identification of real-time response, throughput response time, request and zero-
day attacks. [8] Masdari and Jalali performed a detailed analysis of DDOS attacks in cloud computing. They identify
major types of DDOS attacks by identifying vulnerabilities that lead to a DDos attack and then classify those attacks
using cloud components such as virtual machines, hypervisors, etc. The most common types of cloud DDOS attacks
are bandwidth attacks, connectivity attacks, resource exhaustion, and physical and data disruption. This research
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 7
shows that DDOS attacks on the cloud are more severe because of the more available resources. [9] Singh et al.
concentrated on the HTTP-GET flood DDOS attack. In this research, they categorise high-rate and low-rate attacks
Limitations
The limitation of existing datasets are they only train application layer for DDOS attack. Only one tool is used for attack.
Available datasets do not give an accurate result. Real time implementation is not possible in available datasets. All
tools used in these datasets shows zero prediction. Mostly tools return false positive. Detection results is only provided
for single trained model. Available datasets are not able to perform efficiently in real time scenarios. No mechanism is
provided to integrate available datasets with intrusion detection system for real time detection and prediction.
We can implement three networks which was named as Attack-network, Client Network and Victim-Network. Victim
network includes ubuntu server, capturing server, windows server, firewall, router, and network switch etc. Attacker
network includes bots which generate DDOS attacks using tools such as Zero-day DDoS, Apache and Windows, LOIC,
HOIC, Slowloris, DDoSIM, HULK, Goldeneye, Bonesi, Mirai Botnet and Tor Hammers. Attacker network generates
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 8
traffic that is
captured by
the capture
server placed in
victim network to
generate the
data set.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 9
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 10
When the dataset is generated before training PCA (Principal Component Analysis) is applied to check whether this
dataset is classified into 2 feature classes or not. Before training data is visualized by using Principal component
analysis. Principal component analysis is used for dimensionality reduction in machine learning. The reason behind
using PCA is that algorithms fail to perform efficiently without feature reduction because of high-dimensionality. Above
figure shows that data is visualized in two dimensions so that it is algorithms can perform efficiently. Decision tree is also
used to extract the best features. Separation of features is also used to measure the accuracy of dataset using individual
feature. After checking accuracy on individual features, we create subsets of features to test dataset accuracy. In this
report the proposed methodology is to combine the best features which give above 80% accuracy and then generate the
dataset.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 11
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 12
1.
2.
2.1.
2.1.1.
2.1.2.
2.1.3.
2.1.4.
1.
2.
2.1.
2.1.1.
2.1.2.
2.1.3.
2.1.4.
We can implement three networks, which are named Attack-network, Client Network, and Victim-Network. The victim
network includes an Ubuntu server, a capturing server, a Windows server, a firewall, a router, and a network switch,
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 13
among other things. network includes bots which generate DDOS attacks using tools such as Zero-day DDoS, Apache
and Windows, LOIC, HOIC, Slowloris, DDoSIM, HULK, Goldeneye, Bonesi, Mirai Botnet, and Tor Hammers. The
attacker network generates traffic that is captured by the capture server placed on the victim network to generate the
data set.
1.
2.
2.1.
2.1.1.
2.1.2.
2.1.3.
2.1.4.
2.1.4.1.
2.1.4.2.
Dataset presented in this report can detect three types of DDoS attacks which are volume-based attacks, Protocols
based attacks and application layer attacks. In volume-based attacks it can detect UDP Flood, ICMP Flood and
Spoofed-Packet Flood attacks. In protocol-based attack this dataset covers SYN Flood, Ping-Of-Death, Smurf DDoS,
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 14
and Fragmented Packet attacks. Datasets also cover Application layer attacks that include Slowloris, Zero-day DDoS,
Apache and Windows etc. DDoS attack tools used for this data sets are LOIC, HOIC, Slowloris, DDoSIM, HULK,
Goldeneye, Bonesi, Mirai Botnet and Tor Hammers. Slowloris is used to send authorized traffic to the server through Get
and Post request. Main advantage of using solwloris is that it can send partial packet instead of corrupted packets and
traditional intrusion detection systems are not able to detect this type of attack efficiently. Hulk is used to generate
unique traffic that can bypass cache server. LOIC is used in this dataset to send customized TCP, UDP and HTTP
request. One of the main reasons of using LOIC is that it can hide identity and we are able to control zombie network
computers. HOIC is used in this dataset to launch DDOS attack using HTTP protocol. HOIC can attack up to 256 DDOS
websites at once.
2.
2.1.1.
2.1.2.
2.1.3.
2.1.4.
2.1.4.1.
2.1.4.2.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 15
2.1.4.3.
Results of this dataset indicate that by using SVM (Support Vector Machine) classifier along with feature reduction
proposed approach gives 0.983% accurate results and in this case the number of selected features are 46. Using logistic
regression classifier along with feature reduction the accuracy is 0.987%. when linear classifier is used with feature
reduction the proposed dataset give accuracy up to 0.97%. Implementing decision tree along with feature reduction the
accuracy is 1. So, comparison of all classifiers shows that decision tree gives best results when it combines with feature
reduction. Without feature reduction technique the classifier shows less accuracy.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 16
Above figure shows the accuracy of multiple approaches. In above figure red line graph shows the accuracy of
ExtraTreeClassifier which is lowest amongst all. Green line graph shows the accuracy of subset of features. Individual
feature accuracy is also shown in the graph some features show accuracy above 80%. This individual feature accuracy
measurement helped us to propose an innovative approach known as crcApproach in which only those features are
selected which give 80% accuracy along with best classifier and feature reduction technique. When number of selected
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 17
2.1.4.5. Conclusion / Future work
In this report a new dataset presented which handle 11 types of DDOS attacks to evaluate the IDS algorithms and
systems. In this paper we reviewed existing datasets which are used for evaluation of IDS algorithms but these datasets
have some limitations like offline detection etc. In this paper proposed dataset cover all the weaknesses of existing
datasets. One of the main advantages of this dataset is that it can provide Realtime analysis. Proposed approach in this
paper is effective because it selects best features and use feature reduction techniques to reduce computation. Mostly
DDOS generating tools are those which anonymize the traffic and remove extra information from the packet. In this
dataset almost 8 tools are used to generate DDOS attack traffic. In this paper 4 classifiers are compared and the best
performance classifier is used along with feature reduction techniques to provide efficient results.
2.2. Milestone/Deliverable 2:
2.3. Milestone/Deliverable 3:
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 18
3.1. Milestone/Deliverable 1:
3.1.1. Description of Milestone
Signatures are the core of any IDS/IPS. In previous module we converted snort and suricata rules which were compatible
with our IDS. Now in the extension of this module, to make Netspection more robust and advance, YARA rules are planned
to be migrated to detect netwrok footprints of malwares and antiviruses. YARA is a great repository of known malwares
which can be utilized for the detection of malicious activities of any malware over the network. For now, YARA rules
3.1.3. Methodology
The proposed methodology contains the following set of activities in the below order Malware has become one of the most
severe cyber risks in recent years. Malware is any program that performs harmful acts, such as information theft,
espionage, and so on. Malware is defined by Kaspersky Labs (2017) as "a sort of computer program designed to infect a
genuine user's machine and harm it in numerous ways". Anti-virus scanners cannot keep up with the rising diversity of
malware, resulting in millions of hosts being infected. According to Kaspersky Labs (2016), 6563145 distinct hosts were
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 19
targeted in 2015, with 4 000 000 malware items identified. According to Juniper Research (2016), the worldwide cost of data
YARA rules are a way of identifying malware (or other files) by creating rules that look for certain characteristics. YARA was
originally developed by Victor Alvarez of Virustotal and is mainly used in malware research and detection. It was developed
with the idea to describe patterns that identify strains or entire families of malware.
Syntax
Each rule must start with the word rule, followed by the name or identifier. The identifier can contain any alphanumeric
character and the underscore character, but the first character is not allowed to be a digit. There is a list of YARA keywords
that are not allowed to be used as an identifier because they have a predefined meaning.
Condition
Rules are composed of several sections. The condition section is the only one that is required. This section specifies when
the rule result is true for the object (file) that is under investigation. It contains a Boolean expression that determines the
result. Conditions are by design Boolean expressions and can contain all the usual logical and relational operators. You can
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 20
Strings
To give the condition section a meaning you will also need a strings section. The strings sections is where you can define
the strings that will be looked for in the file. Let’s look at an easy example.
rule vendor
strings:
condition:
$text_string1 or $text_string2
The rule shown above is named vendor and looks for the strings “Vendor name” and “Alias name”. If either of those strings
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 21
2. Text strings, with modifiers: nocase, fullword, wide, and ascii.
There are many more advanced conditions you can use, but they are outside the scope of this post. If you would like to
“Analyzing the memory” is used for the detection of malicious activities in many conspicuous cases. Signature matching
technique is used to determine the malicious content or code within the memory. [1] proposed the mechanism for examining
the files as well as the physical memory to identify the malicious activities efficiently. Along with the efficient identification of
malware activities, it also focuses on the creation of new signatures for efficient searching of malwares in the physical
memory.
Ransomware is a sub-category of malwares that attacks the data on the garget systems to block user’s data by simply
encrypting the files on target systems to achieve financial benefits. [2] uses the technique of static and dynamic analysis for
the detection of WannaCry ransomware intrusion. Based on the analysis, the features that are extracted are used by
Intrusion Detection Systems with the signature rules created by the examination of WannaCry file.
Due to exponential increase in internet traffic, intrusion detection in networks is emerging as a huge challenge for network
administrators. To minimize the risk of network intrusion [3] presents a mechanism of network attack identification for
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 22
intrusion detection systems based on decision trees. This research uses a new dataset, called Kyoto 2006+ dataset. In this
dataset, the network traffic is classified as normal (legitimate traffic), attack (known intrusion) and the unknown attack [3].
J48 algorithm for decision tree is used for the classification of network traffic. The mechanism presented was trained and
tested for a set of network traffic for the creation of IDS rules.
Network monitoring and network security is considered a major area of working considering the sensitive information
floating over different computer networks. To secure the network traffic and allowing the legitimate traffic and blocking the
unintended traffic, Intrusion Detection Systems are being used to a larger extent. [4] presents a mechanism of automatic
signature generation based on hashing scheme. The malicious content is processed through the designed tools to create
the hash-based signatures of the malicious content and populated in the rule file for the IDS. Snort rules follow a set syntax.
This mechanism receives a malicious file and other IDS rule parameters as an input and create a fixed length output which
is populated in the content section of rule files for the analysis of desired output corresponding to the malware file.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 23
A custom binary signature based on the source code of the original malware is used and custom database is created.
ClamAV signature is converted to Yara format automatically with a python script. Yara is used to create rules that detect
strings, instruction sequences, regular expressions, byte patterns, and so on. Today there are various IDS, IPS (intrusion
prevention system) and antivirus solutions that use Yara rules to detect or prevent malwares, its popularity comes from its
simple and efficient way of writing rules. As soon as the malware is found on any device, ClamAV mitigate the malware
using YARA rules. Figure 2.6 shows a simple example of how a Yara rule syntaxes looks like.
meta:
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 24
thread_level = 3
in_the_wild = true }
strings:
$a = {6A 40 68 00 30 00 00 6A 14 8D 91}
$b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
condition:
$a or $b
YARA is the industry standard for searching for patterns in malware records. Malware analysts mostly depend on YARA
rules to identify specific threats, for example by scanning the pattern of malware that is unknown to the specific pattern for a
particular malware strain. YARIX, a more efficient methodology, is being introduced by [5]. YARIX uses an inverse n-gram
index that assigns a long sequence of bytes to a list of files. To make the corpora query more efficient, YARIX optimizes the
YARA search by changing the YARA rule to an index search to retrieve a range of potential candidate files in accordance
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 25
with the rules. Due to the memory requirements that arise when indexing of binary files, YARIX compresses disk traces with
variable byte delta encoding, extract from the offset file. This completeness is then scanned with YARA to get a real
matching set of files. The index footprint is quite small because some of the compression techniques used including
grouping-based compression scheme. That is, if the YARA search is optimized by five orders, only 74% of the accumulated
storage space of all instances will be required to store the YARIX inverse n-gram index.
Technological advances accompanied by many information topics: security, privacy, and integrity. Malware is one of those
security problems that threaten computer systems. Ransomware is a kind of malicious Software that threatens to publish
victim data or block further access to it unless a ransom is paid. [2] investigates the WannaCry ransomware malfunction and
detect ransomware through static and dynamic Analysis The features of the malware emerge from the analysis extracted
and the detection has been done using those features. The intrusion detection technique used in this study is the Yara rule-
based detection, which involves trying to establish a set of rules contains unique strings that will be decoded from the
WannaCry file. The proposed approach uses YARAGUI which is a malware analysis tool which compare the rules with the
desired directory. custom written rules contain important strings that are compared with directory and if the string is found in
[6] proposes a new approach for malware detection that produces static signatures of the YARA based on n-gram analysis.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 26
The proposed approach uses a genetic algorithm (GA) with Artificial Intelligence (AI) methods for creating YARA rules. The
GA application for generating YARA rules is considered the main contribution of the work.
The proposed methodology contains the following set of activities in the below order.
c. Add the network features and parameters to YARA ruleset for network monitoring.
Figure 1: Architecture
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 27
A high-level design of proposed signature conversion framework is shown in figure 1 where the framework takes a file
based YARA rules and perform two major tasks that includes:
Formulation of IDS signatures from YARA rules comprise on series of process that includes the extraction of YARA rules
from rule database, parsing and extraction of signature strings along with the conditions and conversion of extracted
signatures to an IDS compatible rule option. Also, the addition of network parameters is also an integral part of an IDS
signature, So the addition of network parameters along with protocol and actions are added with rule option to formulate a
The detailed flow diagram for the conversion of YARA signatures to IDS compatible signatures can be shown in the figure
below.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 28
Conversion of YARA signature to IDS
Till now literature is reviewed and proposed methodology is presented above. Implementation, deployment, and testing is
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 29
3.1.4.1. Analysis and Discussion
As attacks are now more and more sophisticated and modern so the detection of these attack is also complicated and
tough. IDS alone cannot detect this attack efficiently. We need a solution which can correlate multiple attack patterns and
generate some sort of event. In future we are planning to work on SIEM solutions. SIEM is capable to detect modern attack
3.2. Milestone/Deliverable 2:
3.3. Milestone/Deliverable 3:
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 30
4. DOMAIN NAME 3: EBPF
Sr. Project Title/ Prior / Existing Work prior to Market Requirement Project Progress Status
No. Domain this quarter (reporting period)
3.
4.
4.1. Milestone/Deliverable 1:
The development of new technologies and their usage has opened new horizons for monitoring and analyzing network
traffic. Modern solutions like Extended Berkeley Packet Filter eBPF show clear distinction between conventional and
modern techniques, which lead to a more customized and more proficient filtering. Although these technologies play an
important role in increasing or decreasing system performance, because these frameworks are entirely operated in the
lowest layer of operation system like kernel. The Intrusion Detection/Prevention Systems (IDPS) which are Network
based such as Snort and Bro are responsible for passively monitoring the network traffic obtained from the network
Terminal Access points. Most of the IDPS are signature based. On large networks, drop rate increases due to
limitations in IDPS capturing and packet processing. Large throughput results in overheads and IDPS buffers start to
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 31
drop packets that can cause serious threats to the network. Mostly IDPS are attacked by Volumetric DDoS to increase
bandwidth of the network more than the reception and processing capacity of IDPS, which causes the IDPS to drop
packets due to buffer overflows. To over-come this threat, proposed solution iKern uses eBPF and Virtual Network
Functions (VNF) for examining and filtering packets at kernel level, before forwarding the packets into userspace.
It is a technology which is responsible for making the Linux Kernel programmable by injecting the fragments of the
code at different locations of the kernel code[1][2]. Ebpf is used to safely and efficiently extend the capabilities of the
kernel without requiring to change kernel source code or load kernel modules. The eBPF can be statically injected
during the runtime and is then verified just to make sure that it does not crash and cannot get caught in the infinite
loops[9]. However, this type of verification is only possible for the programs that are not complete. Thus eBPF
programs lack the features such as arbitrary length loops but these loops must have a maximum count of iterations.
The backward jumps in the code are not allowed in general. It means that eBPF can only be possibly used for the
The eBPF programs are written in the C language and then compiled to the eBPF bytecode. Once it is injected into the
kernel, the eBPF bytecode first undergoes verification and then statically compiled to the native code. The eBPF is
specifically suitable for the packet processing. Initially when a packet reaches a network interface, some specific
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 32
actions are performed as in dropping the packet. This is essential for programs as the firewalls or tcpdump, which does
the work of recording the packets according to certain filters. For example, if the packets are coming from port 80 that
should be recorded, the tcpdump will combine an eBPF program that first encodes this and then loads it into the
kernel. This kernel then drops all these fragments or the pack- ets which are not compatible to the filter and then only
the correct ones are passed to tcpdump. The alternate is that first tcpdump receives all the fragments and then filters
them itself. The disadvantage of this is that each packet has to be individually passed to the kernel from the tcpdump.
This process involves duplicating the whole packet in memory and also other steps that are involved for computation.
This is the reason why passing the packets between the programs and the kernel should be avoided be- cause the
performance can be affected. eBPF helps overcome this problem[3]. Since the eBPF bytecode is put together with the
native or the original code, it should be faster in general than all the other codes in the kernel.
The eBPF programs mostly use the data structures which are specific and safe. This can pose a penalty on the
performance because testing the bounds of an array every time requires extra work every single time it is accessed. An
alternate instead of using eBPF is to use the kernel modules. Still, using the kernel modules has one disadvantage or a
drawback that they cannot be checked for stability and that they have to be assembled for a characterized kernel
version. Apart from this, the method of kernel module developing is not very straightforward and at times it is not
possible to expand some specific functionality in the kernel with a module of the kernel without changing this kernel
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 33
itself. It is very
hard to
reassemble
the whole
kernel[2].
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 34
• A & B is the code for execution
• C is the userspace code of BPF
• D is the bpf bytecode to be inserted in kernel
• Varifier initially verifies the bytecode
• BPF applies the filtering conditions on incoming traffic and store data in Maps (G)
By definition, A UDP flood is any type of a DDoS attack that completely floods a target with packets of User Datagram
Protocol (UDP)[4]. The actual target or goal of the attack is to flood the ports randomly on a remote host. It results in the
host to repeatedly check out for the application listening at that specific port, and in case if no application is found it
replies as ‘Destination Unreachable’ . This process can eventually lead to inaccessibility [3].
The principle is quite similar to the UDP flood attack. An ICMP flood cascades the target resource with packets of ICMP
Echo Request. It keeps on sending packets as fast as possible without waiting for the replies [5]. This attack can cause a
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 35
complete blockade of the pathway and using up the outgoing and the incoming bandwidth. The user’s servers will try to
respond with ICMP packets causing a complete slowing down of the system.
This attack utilizes an already known weakness in the three-way handshake or the TCP connection sequence. However
a SYN’s request for starting up a TCP connection in compliance with a host should be immediately answered by a SYN-
ACK feedback from the same host, and then eventually confirmed by an ACK feedback from the requester. In this attack
scenario, the suppliant sends a number of SYN requests, but he either does not acknowledge to the host’s feedback or
transmits the SYN requests using a spoofed IP address. Both of the ways, the host system keeps on waiting until the
requests are acknowledged, new connections cannot be made and eventually it leads to a denial of service and system
slowdown [3].
In this attack the attacker sends a number of malicious or malformed pings to the system. The maximum limit of the
packet length including the header is about 65,535 bytes. But sometimes the Data Link Layer poses a limit to the frame
size too e.g. 1500 bytes over an ethernet network [5]. A ping of death (“POD”) attack involves the attacker sending
multiple malformed or malicious pings to a computer. The maximum packet length of an IP packet (including header) is
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 36
65,535 bytes. However, the Data Link Layer usually poses limits to the maximum frame size – for example 1500 bytes
over an Ethernet network. In this scenario, a big IP packet is divided into a number of IP packets known as smaller
fragments, and then the receiver host rearranges these IP fragments into a complete set or a packet. In the case of Ping
of Death scenario, the receiver host ends up with a large packet of a size of about 65,535 bytes after being
reassembled. It can ultimately cascade up the space allotted for the packet. This leads to denial of service.
In this type of overload, the attacker sends many packets of fragments which are not finalized or completed. All these
fragments are saved in the IDS buffer and they stay there until other parts arrive. Finally, when the buffer is fully loaded
the fragment which is the oldest gets deleted. In case the attacker manages to load the buffer before the timeout finished
of the host’s fragments, he can send the fragments finishing the packet [4]. The goal or the target host will rearrange the
The attacker sends an enormous number of packets into the goal network with- out a definite purpose. These packets
are formed so that they need a higher amount of the computational power required for processing. In case of an IDS
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 37
which is not pow- erful or fast enough to carry out processing may either skip some resource requiring g the packets
altogether[6].
1.3.1 HULK
The word Hulk stands for HTTP Unbearable Load King. It actually is a flood attack tool used by the web server. It is
formed solely for the research purposes[7]. It can detour the cache engine and it is also capable of generating unique
and vague traffic. It creates a huge amount of traffic at the web server.
1.3.2 SLOWLORIS
It is responsible for sending authorized HTTP traffic straight to the server. It does not pose an effect to the other related
services and the ports of the target net- work. This attack actually aims at keeping a maximum connection engaged with
the ones that are open[7]. Thus, it completes its action by sending a partial request while holding the connections as
long as possible. While the server keeps the false connec- tion open, it overflows the connection pool and then as a
1.3.3 LOIC
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 38
This word stands for Low Orbit Ion Cannon. It is free of cost and a popular tool that is available for the DDoS attack. It is
quite easy to use. It is responsible for sending UDP, TCP, and HTTP requests to the server. It can perform the attack
based on the IP or URL address of the server[8]. In the matter of seconds, the website slows down and then stops
responding to the original requests. It does not hide the IP address. The proxy server also stops working. The proxy
server stops because in this case it makes the proxy server as the target.
1.3.4 RUDY
The word RUDY stands for R-U-Dead-Yet. This tool attacks using a field sub- mission which is in a long form through the
POST method. The console menu is interactive.[7] The forms can be selected from the URL for the POST-based DDoS
attack. The form fields are identified for the data submission. The long content length data is injected to this form at a
1.4 – Objectives
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 39
b) To analyze large traffic floods within kernel on 1 Gbps throughput and integrate traffic balancing to reduce CPU
overheads
c) To create Virtual Network Function using ebpf to enable programmability of linux kernel from user space.
d) To create ebpf maps within kernel to store the generated drop rules based on bytecode.
Various attack detection and prevention mechanisms have been proposed based on user space and kernel space detection
models. Signature and anomaly based intrusion detection systems detect malicious traffic using detection engines that
functions using ebpf. In this literature review, attack detection models based on user space Intrusion Detection Systems and
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 40
kernel space attack detection based on Virtual Network Functions is discussed. Existing work in the domain can be divided
into two categories Kernel Space Detection Model (KDM) and User Space Detection Model (UDM), as shown in Figure 2.1
In the research conducted by Sebastiano Miano[9], Polycube framework is pro- posed. It is a software framework whose
major aim is to utilize the Network Functions Virtualization (NFV) by the in kernel applications involving packets. It enables a
range of customization and flexibility. The Polycube helps in the creation of com- plex and arbitrary network function chains,
here each of the function includes a very efficient in-kernel data plane and a user space plane which is flexible to use. The
net- work functions of the polycube known as the cubes, can be generated dynamically and then eventually be injected into
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 41
the kernel networking stack by using the AF- PACKET socket,
and debugging. The injected cubes use the eBPF functions which
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 42
MARCOS A. M. VIEIRA [10] in his research described a key solution to the packet processing which is fast enough with
eBPF and XDP. In this paper the BPF and the eBPF machines are discussed , apart from this an overview of the eBPF
system is given by the Linux kernel, the recently available hooks and few results of the recent research. This paper is
based on filtering network traffic on the basis of TCP protocol. Author has developed a program that can be injected into
linux kernel and exploit eBPF to increase the network monitoring performance by defin- ing filters using TCP protocol. It
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 43
makes use of the eBPFin order to drop the pack- ets or fragments without any TCP parts and alleviates the TCP reset
attack. It is displayed in two kinds of scenarios or perspectives: one is the C code of higher level and the original eBPF
arrangement like the code generated post compilation process. Specifically, this program is designed to be directly loaded
into the XDP hook, this is why the input parameter of the function should be of the type struc- ture. The bytes of the packet
which are being processed are delimited by the data itself and the data end pointers, and this must be made to use
throughout the pro- gram to access the packet. By making use of the data, the parsing of the headers can be completed
Research conducted by , Luca, et al [11], showcased how the eBPF is used in order to trace and monitor the behavior of the
software pragmatically and also the network traffic with the aim of identifying the stegomalware. In order to prove the
efficacy of the idea they calculated the use of the eBPF in order to gather all the data in two mul- tiple use cases. In the first
case, it displayed how it can be used in order to track the specified calls from the system when an attack is based on the
conspiring applications scheme is ongoing. In the second case, an eBPF was developed in order to evaluate the behavior of
the Flow Label field when it is used for the implementation of a covert channel in bulk of the IPv6.
Josy Elsa Varghese[12], in his research proposed an Framework of the IDS for the DDoS type attacks in the environment
of SDN. The suggested approach displays DDoS Detection framework by making use of one statistical parameter within
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 44
the SDN ar- chitecture related to the DPDK framework. Developed algorithm for detecting attacks. in userspace. This
framework sorts out all the problems regarding the contradictory relationship of the SDN architecture and the DDoS attack
and the drawbacks of the IDS in network with high speed. Apart from that , the examination or the detection al- gorithm
gives a brief prediction of the attacks with a very good and efficient detection performance. The results attained through the
experiments show that the framework is quite successful in creating a trade off between the efficiency of the framework and
the detection effect in a high speed network. CICDDoS2019 Dataset is used to generate attacks for the framework testing.
In the research conducted by Sumit Badotra [13] a detection system based on the DDoS which is implemented by the help
of the SNORT IDS which stands for the Intru- sion Detection System in the Opendaylight (ODL) and Open Networking
Operating System . for the purpose of analyzing the activity of the DDoS tool which is implemented, various scenarios with
a varied number of the hosts, the generated data traffic and the switches are used. For the purpose of traffic generation,
various tools for pen- etration are used such as the hping3 and napping, while on the other hand involving the varied
number of switches and hosts, the Mininet tool for emulation is used. The final evaluation of the DDoS detection tool was
attained on the base of the number of the packets dropped, the packets received and the efficiency and the accuracy of the
utilization of the cpu. Figure 2.5 shows the experimental setup of the proposed system.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 45
It has been observed that Kernel Space Attack Detection Models KDM clearly outperformed the User Space Attack
Detection Models UDM in terms of Volumetric Attack detection accuracy, Packets Reception rate, Packets Drop Rate
and CPU utilization. UDM systems however showed better accuracies on average network sizes, whereas CPU
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 46
overhead and packets drop rate increased on larger networks. Detection engines operating in userspace required
more processing power than the Virtual Net- work Functions VNF running inside kernel space. Packets received by
Network Card are copied to kernel and attack is detected before forwarding the flow to other processes working for
userland applications, reducing CPU overhead. Related work comparison based on threat detection accuracy, data
acquisition methods and the achieved network throughput is shown in Table 2.1.
Processing with eBPF.. J,20 ebpf libpcap KDM 178 100 97.42
Methodology
The overall goal of the research is to provide innovative network monitoring technique adapting the latest
technologies like eBPF and PF RING to detect and mitigate Volumetric and Multi Vector flood attacks on large
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 47
throughput. The proposed solution must be as versatile and flexible as possible, allowing creating networking
probes that dynamically adapt to the user needs, changing the filtering program at runtime and exporting the
requested metrics. The traditional approaches to monitor network traffic include IDSPS systems, a more optimized
approach towards using IDPS is proposed using initial traffic monitoring and filtering at kernel level by exploiting
the eBPF system along with kernel ring buffer for acquiring network traffic on multiple cores.
In order to detect and mitigate Volumetric and Multi Vector Flood attacks, traffic monitoring and filtering is
implemented at linux kernel level to avoid application overheads and CPU utilization by IDPS signature matching
algorithms. This performed by using modern technique, linux eBPF system and PF-RING to make it a hybrid
framework for monitoring net- work traffic over high-speed networks. This solution is intended to perform on large
networks where data rate reaches at around 1 Gbps. Detection and mitigation of flood attacks is performed in
kernel level before sending the packets to user space applications such as IDPS[8][22]. Detection of attacks on
high speed networks and large flows of data is a challenging task. It requires an efficient Data Acquisition Module
to receive packets without dropping. As if the packets are dropped due to CPU over- heads or buffer overflows, it
can cause the network to be at a great risk of resources overhead and data loss. First phase of the methodology is
to implement Data Acquisition Module with load balancing capability for running instances on multiple cores and
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 48
Proposed system is comprised of separate modules as shown in the Figure 3.1. Data Acquisition Module (DM) is
designed for capturing packets over large network flows. Volumetric and Multi Vector floods are generated to test
the capability of DM. Multi core implementation of DM is used in Streamed Data Acquisition Module (SDM) by
applying load balancing techniques. Incoming traffic is inspected by using in kernel detection module designed by
using ebpf actions and ebpf bytecode injected from user space and creating a Virtual Network Function (VNF).
performance, CIC-DDoS2019[13] data set is used along with the attack generation tools mostly used by the
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 49
attackers for launch- ing floods on systems. The data set consists of benign and the updated common Volumetric
DDoS attacks, which resembles the true real-world data in the form of pcaps. Included floods are such as UDP,TCP,
PortMap, NetBIOS, LDAP, MSSQL,, SYN, DNS,NTP and SNMP. Along with using this data set, Some of the most
common and effective tools to launch Volumetric and Multi Vector attacks are also tested such as HIOC, RUDY,
HULK and LOIC[11]. These tools are capable of launching attacks with large volume flows and compromise the
resources of target system. These tools were used to test the data acquisition and attack detection capability of
iKern is loaded inside the linux kernel to detect volumetric and multi vector floods by using iKern detection algorithm
and iKern drop rules. iKern is directly connected to the Data Acquisition Module (DM) for receiving network packets
sniffed by the PF RING socket. iKern architecture consists of multiple modules operating within linux kernel. NAPI
copies packets from the NIC to the circular buffer. Incoming packets are inspected with in iKern Engine by the ebpf
Virtual Network Function (VNF). VNF is created by injecting eBPF bytecode from userspace.
iKern detection algorithm and drop rules are stored in ebpf maps. Every time a new ebpf bytecode is inserted into the
linux kernel, it is examined by the verifier for compatibility or any syntax issues for preserving the kernel space state.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 50
Verified bytecode is stored into ebpf maps. It includes the iKern algorithm and iKern drop rules writ- ten for detecting
the attacks. Incoming traffic received at the ring buffer is matched with the ebpf map attributes and in case of any
malicious activity, iKern drop rule is triggered to drop those packets and send an alert to userspace. Traffic forwarded
from VNF to ring buffer aware libpcap is filtered from any volumetric or multi-vector flood. Figure shows the internal
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 51
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 52
iKern filtering and detection module
iKern detection algrorithm checks the incoming traffic to the System, whether it is malicious traffic or normal. In case it
senses a volumetric or multi vector attack, iKern will specify the type and priority of the attack, whether it is High-
priority or Low-priority attack. It will send alerts containing the address of the attacker, port number and the attack
type. High volume flood is detected if the following equation gets true and it is flagged as a high priority attack.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 53
1.1.1.1. Results
For the initial tests, generated traffic is filtered using the iKern filter mechanisms supported by the ebpf. Traffic
filters are based on IP addresses, Protocols and ports. UDP filter is applied to filter all the UDP packets received
at NIC. Screenshots below show the different filtering mechanisms. Initial phase of testing consists of iKern DM
evaluation and comparison with the default libraries. All these tests are performed using same parameters for the
tested libraries and iKern DM. Socket clustering used for receiving packets and balancing the load across multiple
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 54
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 55
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 56
1.1.1.2. Conclusion & Future Work
iKern uses PF RING socket to capture network packets at high data rate and iKern Filtering is used that filters specified
traffic by in kernel processing, before sending packets into userspace. Detected threats and malicious packets are
dropped using ebpf filters. iKern filter code is injected into the linux kernel using ebpf bytecode. ebpf maps are stored for
In the future this research can be extended to detect the Volumetric attacks for larger network flows greater than 1Gbps
throughput. Multi core Streamed Data Acquisition Module (SDM) is scale able by using 10Gbps multi queue network
interface cards. This research can be extended towards the study of other cyber attacks and their detection can be
implemented in kernel space using ebpf and Virtual Network Functions. Signature matching per- formed by the IDPS
detection engines in user space can be implemented in the kernel space using VNF for decreasing the CPU overheads
1.2. Milestone/Deliverable 2:
1.3. Milestone/Deliverable 3:
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 57
5. DOMAIN NAME 4: HARDWARE
Sr. Project Title/ Prior / Existing Work prior to Market Requirement Project Progress Status
No. Domain this quarter (reporting period)
5.
6.
6.1. Milestone/Deliverable 1:
In this report, a complete hardware-based Netspection IDS is designed in such a way that we can achieve high
performance, throughput, and data rate. In hybrid IDS, there is a communication delay between software and hardware. So
in order to achieve high speed NIDS, we need to implement all these modules shown in figure 3 (i.e., packet capturing,
packet decoding, packet preprocessing, and pattern matching engine) on FPGA. In terms of hardware implementation, the
packet capturing module is used to capture incoming internet traffic (in terms of packets) using an FPGA board, while the
packet decoder is responsible for splitting captured packets into packet header and payload. The preprocessing module is
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 58
used to organise the split packets for the event detection engine. Finally, the event detection module is responsible for
performing the most computational intensive parts. Among these, pattern matching is a complex and time-consuming
process. Somehow, these modules are implemented on the Spartan (SP605) tool kit, and then we are converting the
prototype into a high-performance board, NetFPGA Sume, to achieve high end speed.
The exponential increase in malicious activities over the internet network causes security threats. Several software-based
applications and hardware-based communication devices are commonly available to protect the internet networks against
security threats and attacks. Due to higher security provisions and to maximise the throughput of the communication
devices, hardware-based solutions are preferred. Scalable hardware architectures for network security are required by
communication devices to provide protection against threats and attacks.On the other hand, scalability is important to
provide due to an ever increasing number of attack types. As a result, many researchers [1-49] have created scalable
The term scalability refers to the range of capabilities for processing computations involved in the intrusion detection
engines. Normally, an intrusion is an unauthorised entry into the internet network. An intrusion detection system (IDS) is a
software application or communication device that has the capability of monitoring communication devices or incoming
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 59
network traffic for malicious activities [50]. two types: (1) network intrusion detection systems (NIDS) and (2) host intrusion
detection systems (HIDS). Out of these two, the former monitors the incoming traffic from the internet source while the latter
monitors the operating system files [51]. The main point of this article is to talk about the NIDS environment's existing
scalable architectures.
Network intrusion detection systems contain packet capturing, packet decoding, packet preprocessing, and event
detection/engine modules [52]. In terms of hardware implementations, the packet capturing module is used to capture the
incoming internet packets using a network interface card (NIC), while the packet decoder module is responsible for splitting
the captured packet into a packet header and payload. Moreover, the packet header can further be analysed to obtain the
five internet tuples: (1) source IP, (2) destination IP, (3) source port, (4) destination port, and (5) protocol. The preprocessing
module is utilised to organise the incoming packets for the event detection and engine module. Finally, the event detection
module is in charge of the most computationally intensive part, pattern matching, which is required for the development of
NIDS [1]-[49].
3.1.1. Methodology
Pattern matching is the art of comparing a set of incoming characters with the elements of the stored patterns in a database
[53]. Broadly speaking, there are two types of pattern matching: (1) string matching (SM) and (2) regular expression
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 60
(RegEx) matching. The SM is utilised to coordinate a set of strings against a stream of received characters through RegEx.
These are standard dialects, built utilising character classes over a fixed letter set [54]. The use of pattern matching
algorithms or techniques is determined by the needs of the target application, such as incoming Ethernet traffic in network
security [55], protomata comparing in computational biology [56], and data mining in artificial intelligence [57]. This study
has looked at how different pattern- matching algorithms can be used to make
Pattern Matcher
networks safer.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 61
Aho-Corasick (AC) Algorithm
The AC algorithm is a string search algorithm developed by Alfred V. Aho and Margaret J. Corasick. It resembles a
dictionary matching algorithm that detects elements of a finite set of strings within the input text. The algorithm’s time
complexity can be given by, where is the length of the string, is the length of the input text, and is the total number of
outputs. [12].
The AC algorithm builds a finite-state machine that is similar to a tyre with surplus links between the several internal nodes.
These additional internal links permit fast transitions within other branches of the tyre with the longest common prefix when
a match fails. In this way, the automaton can make transitions between the nodes without the need for backtracking. When
a signature set is known in advance, the automaton can be constructed once off-line and then be used. In such a scenario,
the run time is proportional to the length of the input text and the number of matched outputs. All the signatures are
assimilated into a single deterministic finite automaton (DFA) in such a way that the size of the signature set and processing
time are independent of each other. The AC algorithm comprises of three functions, i.e., goto function, failure function, and
output function. Fig. 1 shows the finite state machine of the signature set app, apple, aim, cap, and cat. The Goto function is
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 62
shown by solid lines, while failure transitions are shown by dotted lines. The input character is discarded when the DFA
navigates into a goto transition edge. If a valid goto function is not found, then it tracks the failure pointer without discarding
the character if the current node is not the root. If the failure pointer is not there, then by default it refers to the root state. If
the current node is the root and the goto function is invalid, then it discards the input character. Whenever the DFA comes
A String-Matching Engine At a high level, our algorithm works by breaking the set of strings down into groups and building a
small state machine for each group. Each state machine is in charge of recognising a subset of the strings from the rule set.
The first concern is that building a state machine from any general regular expression can, in the worst case, require an
exponential number of states. We get around this problem by exploiting the fact that we are not matching general regular
expressions but rather a proper and well-defined subset of them for which we can apply the Aho-Corasick algorithm [Aho
and Corasick 1975]. The other problem is that if we are not careful, we will need to support 256 possible out-edges (one for
each possible byte) on each and every node on the state machine. This results in a huge data structure that can neither be
stored nor traversed efficiently. We solve this problem by bit-splitting the state machines into many smaller state machines,
which each match only one bit (or a small number of bits) of the input at a time (in parallel). Our architecture is built
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 63
hierarchically around the way that the sets of strings are broken down. The full device is at the highest level. Each device
holds the entire set of strings that are to be searched, and each cycle the device reads in a character from an incoming
packet and computes the set of matches. Matches can be reported either after every byte, or can be accumulated and
reported on a per-packet basis. For the purposes of this paper, we will focus on a single device. Inside each device is a set
of rule modules. The left side of Figure 1 shows how the rule modules interact with one another. Each rule module acts as a
large state machine, which reads in bytes and outputs string match results. The rule modules are all structurally equivalent,
being configured only through the loading of their tables, and each module holds a subset of the rule database. As a packet
flows through the system, each byte of the packet is broadcast to all of the rule modules, and each module checks the
stream for an occurrence of a rule in its rule set. Because throughput, not latency, is the primary concern of our design, the
broadcast has limited overhead because it can be deeply pipelined, if necessary. The full set of rules is partitioned between
the rule modules. The way this partitioning is done has an impact on the total number of states required in the machine and
will, hence, have an impact on the total amount of space required for an efficient implementation. Finding an efficient
partition is discussed in Section 3. When a match is found in one or more of the rule modules, that match is reported to the
interface of the device so that the intrusion detection system can take the appropriate actions. It is what happens inside
each rule module that gives our approach both high efficiency and throughput. Each rule module is made up of a set of tiles.
The right hand side of Figure 1 shows the structure of each and every tile in our design. When working together, tiles are
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 64
responsible for the actual
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 65
Core Concept of Bit-Split:
AC state machine has 256 possible outgoing edges from each nodes create large state machine is requires memory of
the order of kb. So, there is a need of compress the data and then store it. This is done by using Bit-Split technique. By
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 66
Figure 7: Complete bit split Memory based architecture
Memory Based AC
Memory base implementation of AC utilizes the BRAMs of dedicated FPGA while making of transition tables and it
left slices unused. This technique has the advantage of whenever the signature/ rule set is needed to update, only
the contents of the memory are required to be replaced but disadvantage of this memory based AC algorithm is the
The algorithm used for memory based AC is bit-split algorithm. The approach behind bit-split algorithm is taking AC
state machine and dividing it into multiple state machines and these are referred to as bit-state machines and they
work independently while providing input. In case of matching of string against signature set, output logic is tied to
their respective states. The division of AC-state machine into different machines is done on the basis of individual bits
of input.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 67
An application based on C++ has been developed for the memory based ac, which takes the text file of a rules/
• Main advantage of bit split is whenever there is a need to update the signatures/ rules set only the
• Auto Memory Files Generator is used to generate the memory files for any number of input rules and
update in BRAMs.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 68
Figure 2: Memory Based AC
Implementation on FPGA
Before implementing complete pattern detection engine on FPGA, individual IPs has been tested on Xilinx Zynq-7000
ZC702 providing XC7020-CLG484-1 evaluation kit. This evaluation kit provides both PS (Processing System) and PL
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 69
Figure 3: Xilinx Zynq-7000 ZC702
Based on the generated Verilog-HDL through auto-HDL-generator, project was created in Xilinx Vivado Design Suite.
Tasks related to packet capturing, decoding and preprocessing of incoming Ethernet packets are carried out using PS
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 70
which finally passes the payload data to PL by writing it to BRAM of FPGA. The BRAM controller module reads the
payload byte by byte from the BRAM and sends it to pattern matching module implemented within PL section using auto
generated Verilog-HDL code. The alerts are reported back in case incoming payload data contains any signature that
are considered in the design. After the successful testing of IPs individually on above mentioned evaluation kit, the
whole pattern engine is implemented and tested on Spartan 6 – SP605 FPGA kit because this device provides PCIe.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 71
The PCIe IP Controller core communicates data with the user logic through a standard Application FIFO, which is
supplied by the PCIe IP. On the other end PCIe IP talk with host PC.
Key Features:
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 72
Figure 5: High End NetFPGA Sume Evaluation Board
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 73
5.1.2. Implementation, Analysis, theoretical and/or analytical models and results
We performed pattern matching on hardware, while the host PC handled software tasks such as packet capture, decoding,
and preprocessing.Here, we will discuss the implementation of these parts that we have done other than the host PC.
Capturing packets
Since the host PC was capturing packets itself before, Our main task was to acquire data on Spartan SP605 other than that
on the host PC to increase the performance of the system. The Spartan SP605 has a data acquisition capacity of 1
GB.capturing procedure on FPGA includes modules for handling Ethernet frames as well as IP, UDP, and ARP and the
components for constructing a complete UDP/IP stack. The packet capturing module on FPGA has submodules to capture
Decoding of Packets
The next task was to decode captured packets. The packet decoder module divides the data into source ip, destination ip,
length, checksum, and payload data. Implementation of the packet decoding module has been done on the Spartan SP605
prototype.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 74
Design Inclusion
Now we have designed all the modules and implemented them as discussed above. The next step is the integration of
these modules. In this section we are going to explain how these modules are integrated and how they work.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 75
As we can see in the block design, we have three blocks, i.e., packet capturing, decoding, and an SME engine. Before that,
we had two blocks. One was an FPGA block while the second was running on a host PC. In this case, two blocks, i.e.,
packet capturing and decoding, are integrated with SME. As we know, the packet decoder gives source ip, destination ip,
length, checksum, and payload data. The controller reads data from the packet decoder and gives it to the fifo module,
PCIe
In our project, we interfaced an FPGA board to a host PC through PCIE. PCIE is a high-throughput protocol available on
most modern motherboards as well as some embedded boards. PCI Express provides an end-to-end solution for data
transport between an FPGA and a host running Linux. The PCIe IP Controller core communicates data with the user logic
through a standard Application FIFO, which is supplied by the PCIe IP. On the other hand, PCIe IP talks with the host PC.
The above system is tested on the SP605 development board and the HP Core I3 system. The below figure shows an
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 76
s
In further more, we interfaced an NetFPGA board to a host PC through PCIE. PCIE is a high-throughput protocol available
on most modern motherboards as well as some embedded boards. PCI Express provides an end-to-end solution for data
transport between an FPGA and a host running Linux. The PCIe IP Controller core communicates data with the user logic
through a standard Application FIFO, which is supplied by the PCIe IP. On the other hand, PCIe IP talks with the host PC
only shows the results. The above system is tested on the SP605 development board and the HP Core I3 system. In feature
work shift complete hardware implementation of IDS prototype in the High performance board. The below figure shows an
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 77
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 78
PCIe IP Synthesis
After successful simulation, the IP is synthesized. Below table shows the resources utilization by the IP. For
After successful simulation, the FIFO IP is synthesized and implemented on FPGA. Below table shows the synthesized
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 79
Number of BRAM/FIFO 1 1470 0.03%
For testing and validating the design, we generated a pcap file having known content in payload. Below figure show the
pcap files.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 80
Figure 6: Pcap File View in
Wireshark
As shown in above fig.10, we have content of GUID=2E, and this payload have ID: 06. Our proposed system
detected that ID and Data and results have been mentioned on below fig.11 by using Chip Scope Debugging in ISE
Design Suite.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 81
Figure 7: ID Detection and Data Detection of content of Signature Set
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 82
Time taken by detection engine based on software is shown in fig 12. It can be seen in figure that software took 1us – 2us.
This is the time difference of arrival of packet in detection engine until its ID detection. On hardware side, detection engine
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 83
The performance of hardware-based detection engine is better than software-based detection engine as it can be seen in fig
13, due to the parallelism property of FPGA. System is running on the frequency of 125MHz and clock cycle time calculation
using this frequency is 8ns. SP605 FPGA took almost 27 clock cycles of packet ID detection from the time of its arrival.
27∗8 ns=216 ns or 0. 216 μs. Hardware based detection engine almost 10 times faster than the software-based detection
engine.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 84
Our next task was to implement Netspection IDS on NETFPGA SUME which is an ideal platform for high-performance and
high-density networking design. The NetFPGA-SUME is an amazingly advanced board that features one of the largest and
most complex FPGAs ever produced, a Xilinx Virtex-7 690T supporting thirty 13.1 GHz GTH transceivers. Four SFP+
10Gb/s ports, five independent high-speed memory banks built from both 500MHzQDRII+ & 1866MT/s DDR3 So DIMM
devices, and an eight-lane third generation PCIe offer incredible throughput and can sustain a large number of high-speed
data streams to the FPGA fabric and memory devices. Other features include the presentation of twenty transceivers in total
on FMC and QTH expansion connectors, and SATA ports. The NetFPGA-SUME's main mission is to give students,
researchers and developers a state-of-the-art platform for networking, whether it’s learning the fundamentals or creating
new hardware and software applications. This board easily supports simultaneous wire-speed processing on the four
10Gb/s Ethernet ports, and it can manipulate and process data on-board, or stream it over the 8x Gen.3 PCIe interface and
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 85
Figure 9: NET FPGA- SUME Board
Features
10Gbps
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 86
Two SATA-III ports
MicroUSB Connector for JTAG programming and debugging (shared with UART interface)
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 87
Figure 10: NET FPGA-SUME block
Diagram
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 88
Packet Capturing and decoding on NETFPGA SUME
Packet capturing and decoding is implemented on NETFPGA SUME other than host pc. Integration of all three
Since Spartan SP605 had small number of resources for pattern matching algorithm. So it was designed for less
number of rules up to 512. When we move to NETFPGA SUME, the number of resources increase, hence we can
match strings up to seven thousand rules for hardware implementation. SME engine in this case is designed for more
rules use maximum 90% resources. Implementation and testing of Pattern matching on hardware for NETFPGA
This project proposed a signature based Netspection hybrid network intrusion detection system which guarantee a
robust system with high throughput. Bit Split Algorithm with both memory-based and hardwired-based has been
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 89
implemented to utilize maximum resources of FPGA and to enhance the performance of the system. Packet
capturing, decoding and Pattern matching are implemented on Spartan SP605 FPGA to make Netspection IDS more
efficient and robust. All three modules have been interfaced to PCIe through PCIe interfaced output alert are display
on host PC and store into database for further use. We concluded that integrated design and all IPs are working
Future work includes end-to-end complete implementation of Netspection IDS such as packet capturing, packet
decoding preprocessing, and pattern matching on the high performance NETFPGA SUME hardware development
board. At this stage, packet capturing, decoding, and pattern detection engines are implemented on Spartan SP605.
Furthermore, all three modules will be integrated and tested as a prototype. Further, we will convert the prototype into
a NETFPGA SUME evolution board. Finally, through the PCIE interface, the result will be displayed on the host PC
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 90
1.4. Milestone/Deliverable 2:
1.5. Milestone/Deliverable 3:
6. OUTCOMES
2.
1. At least 1 SCI indexed journal papers OR 1 CORE ranked A/B conference paper
2. At least 1 formal international collaboration with meaningful interactions and targets beneficial to both
NCCS and the collaborator
3. At least 2 Research Proposals on cyber security related projects to be submitted to ICT R&D Fund, HEC,
PSF or other national/international funding agencies
4. Dissemination activities carried out (quarterly events, open houses, industry interactions and linkage,etc.)
5. 1 Training program (5-days) carried out in Lab’s area of focus
6. Agreements with Industry (if any)
7. MoUs signed (if any)
8. Commercialization efforts/Start-ups (if any)
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 91
3.
4.
5.
5.1.
3.
4.
5.
5.1.
5.1.1.
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 92
5.1.2. Research Proposals (at least 2 per year)
Ser. Research Date of Present Funding Detail Deliverable Link with KPI
Proposal Submission status Funding Amount
Title Agency
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 93
5.1.4.1. Detail of Partner/ Organization
Ser. Subject Dissemination Activity Detail Duration Total Attendees Link with
/ Topic & Date and Target Audience KPI
Seminar Workshop Conference Open Gen Industry Investor Policy Customer Others
House Pub Maker
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 94
5.1.5.2. Details of Training Programs
Ser. Subject/ Dur. Target Training Detail Attendees Details Revenue Link with
Topic & Audience (Name, Contact, Generated KPI
Date Employment)
Seminar Workshop Courses Certifications Paid (at least 5) Non Paid
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 95
1.
2.
3.
4.
5.
5.1.
5.1.1.
5.1.2.
5.1.3.
5.1.4.
5.1.5.
5.1.5.1.
5.1.5.2.
5.1.5.3. Visits
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 96
Ser Visit Details Objective Members Expenditures Detail Of Deliverable link with
. Schedule KPI
Local Foreign Locatio Duration Timings
n
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 97
5.1.6.1. Start-ups
1.
1.1.
1.2.
1.3.
1.4.
1.5.
Ser. Local Commercial Partner Fund Fund Employ Details Solution Survival Revenue Deliverable
Problems Viability Name Required Released (5 Paid Employee) Provision Rate (at Generated link with
Identified (If any) along with Status/ least 6 KPI
Detail Submitted month)
Proposals/
Software
Tool/
Prototype
Name & Status Remuneration
Contact
Detail
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 98
6. CONCLUSION & LESSON LEARNT
Free textbox
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan 99
7. FUTRE WORK AND PLAN
Explain the work plan for the future/ remaining duration of the project as per PC-1
Free textbox
10
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan
0
8. RISK, ISSUES AND CHALLENGES
Explain the issues and challenges faced and how they are going to impact the outcome and timelines of the project. Provide
the mitigation plan
Free textbox
10
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan
1
HISTORY OF CHANGES
VERSION DISSEMINATION DATE CHANGE
1.0 version
10
CRC LAB, BUIC – A Partner Lab of National Center for Cyber Security, Pakistan
2