Technical Report (30-36)

CYBER RECONNAISSANCE AND COMBAT LAB
BAHRIA UNIVERSITY
TECHNICAL REPORT – 4.5

CRC NIDS IP integration on FPGA and design validation
Module IV: Hardware Prototyping of Intrusion Detection System
1. PROJECT BRIEF
Project Title Development of Network Intrusion Detection System and

Hardware Prototyping
Principal Investigator Prof. Dr. Muhammad Najam ul Islam

Contact Information – PI najam@bahria.edu.pk
2. TECHNICAL REPORT BRIEF
Report Title CRC NIDS – IP integration on FPGA and design validation

Date Submitted
Relevant Project Module IV: Hardware Prototyping of Intrusion Detection
Module(s) System
Co-PI(s) Dr. Atif Raza Jafri
Technical Team Roman Shah
Mashood ul Hassan
3. Module Milestone (please select reporting period milestones)
Duration (months) Milestone

0–6
6 – 12
12 – 18
18 – 24
24 – 30
30 – 36 System Integration and final system level validation
Abstract
This report presents a hardware module of Netspection IDS system. It monitors and analyzes all the
incoming packets on a given network to detect any intrusions. Netspection IDS has four basic parts (i.e.
packet capturing, packet decoding, preprocessing and pattern matching). Among these pattern matching
is complex and time taking process. In order to get the better performance hybrid Netspection IDS has
been proposed in which first three processes are running on Linux based PC while FPGA based
hardware solution has been developed for pattern matching in order to obtain the high throughput IDS
due to the fact of its parallelism and reconfigurable property. PCIe based serial connection is developed
for the communication between software and hardware. The algorithm used for detection engine is Aho-
Corasick and Bit-Split. The approach developed using Aho-Corasick algorithm is memory based which
utilizes the BRAMs and hardwired based which utilizes the slices in order to achieve the maximum
resources of FPGA. Though, writing HDL code takes a lot of time and effort in order to present a
hardware solution that caters to hundreds of rules, for this we developed a tool for automatically
generating Verilog-HDL code from a rule set in this project. Initially Xilinx ZC702 evaluation board is
used for the testing of individual IPs of the developed system while the whole detection engine is
implemented and tested in Spartan 6 (SP605) evaluation kit which runs at an operational frequency of
125MHz.
Table of Contents
Abstract....................................................................................................................................................................3
List of Figures...........................................................................................................................................................5
List of Tables.............................................................................................................................................................6
Chapter 1..................................................................................................................................................................7
1.1 Introduction..................................................................................................................................................7
1.2 Report Organization...................................................................................................................................10
Chapter 2................................................................................................................................................................11
Background and Related Work...............................................................................................................................11
2.1 Aho-Corasick (AC) Algorithm..................................................................................................................11
2.1.1 Memory Based AC..........................................................................................................................13
2.1.2 Hardwired Based AC.......................................................................................................................13
2.1.3 Auto HDL Generator.......................................................................................................................14
2.1.4 Implementation on FPGA................................................................................................................14
Chapter 3................................................................................................................................................................21
Integration of FPGA with Host PC...........................................................................................................................21
3.1 PCIe IP Synthesis.....................................................................................................................................21
3.2 Application FIFO Synthesis......................................................................................................................22
3.3 Design Integration..................................................................................................................................22
3.4 Time Comparison of Pattern Matching of Software and Hardware........................................................24
Conclusion..............................................................................................................................................................27
Future Work...........................................................................................................................................................28
References..............................................................................................................................................................29
List of Figures
Figure 1: Network Based Intrusion Detection (NIDS)...............................................................................................7

Figure 2: Hybrid Netspection IDS..............................................................................................................................8
Figure 3: FSM of AC algorithm for the signature set {app, apple, aim, cap, cat).....................................................12
Figure 4: Memory Based AC...................................................................................................................................13
Figure 5: Conversion of IDS Rules into HDL............................................................................................................14
Figure 6: Xilinx Zynq-7000 ZC702...........................................................................................................................15
Figure 7: Spartan 6 - SP605 Evaluation kit..............................................................................................................16
Figure 8: AC Algorithm Implementation Results....................................................................................................17
Figure 9: SME Match Engine...................................................................................................................................19
Figure 10: IP Re-customization...............................................................................................................................19
Figure 11: Event Capture........................................................................................................................................20
Figure 12: Integration of FPGA with Host PC through PCIe....................................................................................21
Figure 13: Top System Design.................................................................................................................................22
Figure 14: Design Testing System...........................................................................................................................23
Figure 15: Pcap File View in Wireshark...................................................................................................................23
Figure 16: ID Detection and Data Detection of content of Signature Set...............................................................24
Figure 17: Time utilized by Software based detection engine................................................................................25
Figure 18: Time utilized by FPGA (sp605) based detection engine.........................................................................26
List of Tables
Table 1: Progress on Milestone and Deliverables.....................................................................................................9
Table 2: Device Utilization Summary......................................................................................................................21
Table 3:Device Utilization Summary.......................................................................................................................22
Chapter 1
1.1 Introduction
Security is becoming one of the most serious threats due to the quick boost in malicious
activities over the internet as information and communication technology advance and the
number of internet users grows exponentially [1]. One of the most important aspects of
system and network management and security is keeping your network safe from access. If a
hostile attacker gains access to your network, it may result in significant costs for your
organization, as a result, an Intrusion Detection System (IDS) is developed to ensure a
network's security [2]. An intrusion detection system (IDS) is the tool that works with your
network to keep it safe and alert you when someone tries to get into it. IDS are of two types:
Host-based IDS (HIDS) and Network-based IDS (NIDS). HIDs and NIDs, are computer
network security systems used to protect from viruses, malware, and other malicious
activities. The difference is that HIDs are installed only on certain intersection points, such as
servers and routers, while NIDs are installed on every host machine [3]. Internal changes
(e.g., a virus accidentally downloaded by a person and spreading inside the system) can be
detected by a host-based IDS, whereas a network-based IDS can detect malicious packets as
they enter your network
This report provides the detailed discussion on NIDS only due to its monitoring capability of
Ethernet traffic. Network Interface Card (NIC) is placed on NIDS system which passes all
the network traffic to the NIDS. The incoming network traffic is then analyzed according to
the set of rules and attack signatures to determine if it is traffic of interest. If it is, an alert is
generated. The most commonly used NIDS methodologies are signature-based and anomaly-
based [4]. In a signature-based technique, the incoming traffic is monitored and compared
with predefined patterns which usually named as signatures, therefore, it covers only the
known attacks [5]. However, in an anomaly-based scheme, the unknown attacks can also be
handled.
Figure 1: Network Based Intrusion Detection (NIDS)

The NIDS methodology adopted and implemented here is signature-based. Network based
IDS has four building blocks. In order to capture the incoming Ethernet packets, packet
capturing block is used which has NIC (network interface card) [6]. The packet decoding
module is then used to divide the received packets into packet headers and payloads.
Similarly, the packet pre-processing module is used to prepare and reorganize the Ethernet
packets for the corresponding pattern matching module [7]. The main task of comparison of
network incoming packets with patterns / signatures is done in pattern detection engine.
Pattern matching is of two types: (i) string matching and (ii) regular expression matching [8].
In string matching, the incoming set of strings are matched with a set of already stored
strings or patterns in the rule sets [9]. On the other hand, a regular expression matching
technique also includes some special characters. From the implementation point of view, it
has been analyzed previously that the string matching algorithms and techniques have been
more frequently employed Although, the software-based implementations are generally
executed on general purpose processors and provide a higher degree of flexibility, they are
not suitable for time critical requests [10]. On the other hand, the hardware-based
implementations provide better timing results and are more secure.
In this report, a hybrid Netspection IDS solution has been developed in order to achieve
maximum throughput and it guarantee a robust performance of packet detection and alert
generation. As discussed earlier, NIDS has four basic building blocks (i.e. packet capturing,
packet decoding, packet preprocessing and pattern matching engine). Hybrid Netspection
IDS consists of software which will be responsible for first three blocks and hardware based
pattern matching engine.
Figure 2: Hybrid Netspection IDS

During 0–30 months of the project, the milestones and deliverables are listed below in Table.
Table 1: Progress on Milestone and Deliverables
Duration Milestones Deliverables Progress on

Deliverables
0-6m Trade study among different 1 x Technical Report Submitted
algorithms related to each
functional block and
finalization of Algorithms to
be utilized.
6-12m Finalization of architectural Technical Hardware Submitted
choices for each IP. Architecture Report
1 x Conference Paper Paper Title:

submission “Hardware
Architectures for
String Matching
Algorithms in
Network Intrusion and
Detection Systems”
Accepted in IAIT
2020.
Seminar Please check
12-18m HDL/HLS modelling of Verilog/VHDL/C Done, Please find
individual IPs and design Model of IPs details in the technical
verification under simulation report attached
1 x Seminar Please check
18-24m Individual IP synthesis and FPGA prototype of Done, Please find

their validation on FPGA individual IPs details in the technical
report attached
1 x Journal Paper Under Review
submission
1 x National Workshop Please check
24-30m IP integration on FPGA FPGA prototype of Done, Please find

prototype and their design integrated IPs details in the
validation technical report
attached
30-36m 1 x Technical Report done
1.2 Report Organization
Following is the organization of chapters in this project report:

Chapter 1: Introduction and Organization of the report.
Chapter 2: Background and Related work.
Chapter 3: Integration of FPGA with Host PC.
Chapter 6: Conclusion & Future Works.
Chapter 2
Background and Related Work
Netspection IDS which comprises of four modules namely packet capturing, packet decoding, packet
preprocessing and pattern matching engine. The pattern matching module is responsible to perform
most computational intensive part i.e. pattern matching, essential for the development of NIDs. As
discussed above, a hybrid technique has been developed for Netspection IDS in which packet
capturing, decoding and preprocessing is done in software side while hardware is responsible for
pattern matching engine.
Due to higher demand of scalable hardware architectures for network security applications, there is
abundant amount of research studies available on pattern matching algorithms and techniques [11].
Aho-Corasick (AC) algorithm is the most targeted algorithm for hardware based NIDs [12]. The
efficiency of AC algorithm is reported to increase using multi-character searching, pipelining and
parallelism of memory allocations. In many works, AC algorithm is implemented using memory
based approach [13]. Parallelization of designed system is then performed to achieve high
throughput [14]. The other possible implementation of AC algorithm is through hardwired approach
[15], however, it is limited if not obsolete when the signature set is vast. Hardwired AC algorithm is
reported to be implemented when the signature set is very small. In the presence of a large signature
set, implementing hardwired AC is very tedious and time consuming as it requires designing a very
large state machine and writing a lengthy HDL code prone to human errors [16].
Proposing hardwired solution for large rule set in parallel to memory based solution becomes
significant once we use FPGA which is most suitable target platform for IDS. This is due to the fact
that memory based solutions use BRAMs in large quantities whereas logic resources such as LUTs
and Flip-Flops are not used significantly. On the other hand, hardwired solution primarily uses logic
resources. Provision of hardwired solution in parallel with memory based solution can thus promise
overall best utilization of most of the major resources of single FPGA module. Considering the
research gap towards implementing hardwired AC algorithm for large rule set, an auto-HDL-
generator tool for hardwired AC algorithm is presented in this letter. In addition, comparative
analysis is also presented here to assess trade off among resource utilization, operational frequency
and resulting throughput of single hardware implementing large rule set versus multiple parallel
hardware implementation covering small sub-sets of rules.
2.1 Aho-Corasick (AC) Algorithm
AC algorithm is a string searching algorithm developed by Alfred V. Aho and Margaret J.

Corasick. It resembles to a dictionary matching algorithm that detects elements of a finite set
of strings within the input text. The algorithm’s time complexity can be given by O(n+ m+ z )
, where n is the length of the string, m is the length of input text and z is the total number of
outputs. [12].
AC algorithm builds a finite-state machine that is similar to a trie with surplus links between
the several internal nodes. These additional internal links permit fast transitions within other
branches of the tire with the longest common prefix when a match fails. In this way the
automaton can make transition between the nodes without the need of backtracking. When a
signature set is known in advance, the automaton can be constructed once off-line and then
can be used. In such scenario, the run time is proportional to the length of the input text and
the number of matched outputs. All the signatures are assimilated into a single deterministic
finite automaton (DFA) in such a way that the size of the signature set and processing time
are independent of each other. AC algorithm comprises of three functions i.e. goto function,
failure function, and output function. Fig. 1 shows the Finite state machine of signature set
{app, apple, aim, cap, cat}. Goto function is shown by solid lines while failure transitions are
shown by dotted lines. The input character is discarded when the DFA navigates in a goto
transition edge. If valid goto function is not found, then it tracks the failure pointer without
discarding the character if current node is not the root. If the failure pointer is not there, then
by default it refers to the root state. If the current node is the root and goto function is
invalid, then it discards the input character. Whenever the DFA comes to an output node, it
generates an output signal.
Figure 3: FSM of AC algorithm for the signature set {app, apple, aim, cap, cat)
Hardware prototyping of search engines utilizing AC algorithm have developed much admiration
recently because of its high efficiency and comparatively low time complexity.
AC algorithm can be implemented in two ways.
i. Memory Based implementation of AC
ii. Hardwired Based implementation of AC
2.1.1 Memory Based AC
Memory base implementation of AC utilizes the BRAMs of dedicated FPGA while

making of transition tables and it left slices unused. This technique has the advantage of
whenever the signature/ rule set is needed to update, only the contents of the memory are
required to be replaced but disadvantage of this memory based AC algorithm is the
elevated memory demand to store the DFA’s transition table.
The algorithm used for memory based AC is bit-split algorithm. The approach behind bit-
split algorithm is taking AC state machine and dividing it into multiple state machines
and these are referred to as bit-state machines and they work independently while
providing input. In case of matching of string against signature set, output logic is tied to
their respective states. The division of AC-state machine into different machines is done
on the basis of individual bits of input.
An application based on C++ has been developed for the memory based ac, which takes
the text file of a rules/ signature set as an input and create the memory initialization files.
Figure 4: Memory Based AC
2.1.2 Hardwired Based AC
Hardwired base implementation of AC utilizes the slices or logic cells of dedicated

FPGA. To overcome the poor access times off-chip memory modules offer, many
researchers built the FSM to hardwired logic circuits [17]. However, such type of
implementations is not flexible. Whenever the signature set is changed or updated, the
code is needed to be written again and the circuits are required to be recompiled.
There is a need to assess different metrics while implementing a single hardwired
solution for a big rule set and multiple hardwired solution targeting sub-sets of rules and
working in parallel. This aforementioned task is achieved in two steps:
i. Auto-HDL-Generator tool is developed to minimize time to implement IDS
ii. Single and parallel hardwired IDSs are implemented on FPGA.
Figure 5: Conversion of IDS Rules into HDL
2.1.3 Auto HDL Generator
Keeping in to consideration the complexity related to state machine and related HDL
coding for a large rule set, the idea of development of auto-HDL-generator was
conceived. This is C programming based utility and inputs the rule set in the form of a
text file.
2.1.4 Implementation on FPGA
Before implementing complete pattern detection engine on FPGA, individual IPs has
been tested on Xilinx Zynq-7000 ZC702 providing XC7020-CLG484-1 evaluation kit.
This evaluation kit provides both PS (Processing System) and PL (Programmable Logic)
in a single chip.
Figure 6: Xilinx Zynq-7000 ZC702
Based on the generated Verilog-HDL through auto-HDL-generator, project was created

in Xilinx Vivado Design Suite. Tasks related to packet capturing, decoding and
preprocessing of incoming Ethernet packets are carried out using PS which finally passes
the payload data to PL by writing it to BRAM of FPGA. The BRAM controller module
reads the payload byte by byte from the BRAM and sends it to pattern matching module
implemented within PL section using auto generated Verilog-HDL code. The alerts are
reported back in case incoming payload data contains any signature that are considered in
the design.
After the successful testing of IPs individually on above mentioned evaluation kit, the
whole pattern engine is implemented and tested on Spartan 6 – SP605 FPGA kit because
this device provides PCIe.
Figure 7: Spartan 6 - SP605 Evaluation kit
The PCIe IP Controller core communicates data with the user logic through a standard
Application FIFO, which is supplied by the PCIe IP. On the other end PCIe IP talk with
host PC.
Implementation and Performance Results of Hardwired AC design:

Single Architecture:
Auto HDL generator helps us to generate hardware code by just providing the rule set file
containing thousands of IDS rules. Based on number of rules, different configurations
have been used, but for each configuration information related to implementation i.e.
resource utilization of FPGA, operating frequency f which is acquired through Xilinx
Vivado tool suite. Throughput (T p is then calculated while considering 1500-byte
payload data size in a packet and taking worst case scenario of not detecting any
signature in the packet i.e. 1500 clock cycles (CC) time has been consumed by our
proposed AC algorithm hardware for one received packet. This results in 8-bit processing
per clock cycle. Worst case throughput can be computed using following expression:
Throughput = T p = 8 × f (bits/sec)
Based on throughput and FPGA slice consumed, their ratio is computed to assess
throughput/area metric. The results are tabulated in Table 1 and its graphical
representation is shown in Fig. 2. It is observed that as the number of rules increase, the
used slices increase, operational frequency decreases and consequently throughput
decreases. Due to decrease in throughput and increase in slice used, the key metric of
throughput/area drastically decreases. In our case, as number of rules go from 500 to
10,000 i.e. 20 times increase, the throughput/area metric reduces from 3.26
Mbits/sec/slice to 0.04 Mbits/sec/slice i.e. reduction more than 80 times. In order to
achieve high throughput, particularly in the presence of large number of rules, parallel
architecture is required.
Figure 8: AC Algorithm Implementation Results
Parallel Architecture:
Number of rules-based parallelism study is conducted, and results are presented here. In
this scenario, multiple hardware blocks implementing small rule sets working in parallel
are considered. The rule sets are taken randomly for parallelism study and every rule is
distinct and independent of other. For this purpose, a rule set of 5000 was taken. This set
is divided into five independent sub sets, where each sub set comprised on 1000 rules.
Another class of subsets, where 10 independent subsets each comprised of 500 rules, was
created. These subsets were individually synthesized and placed and routed of FPGA.
The implementation and performance results are tabulated in Table 2 and 3 respectively
for above-stated two types of sub sets. Table 2 and 3 provides information related to
important metrics associated with different chunks of rule sets e.g. Table 2 shows that the
hardware for first 1000 rules consume highest number of slices while operating on
minimum frequency, hence, providing worst performance If Table 3 is consulted, it can
be seen that among first 1000 rules, the second half i.e. second chunk of 500 rules has
worst performance.
Implementation and Performance Results of Memory Based AC design:
The theoretical background of Memory based architecture has been discussed in the
previous reports. And the preliminary design (version 1) of the hardware IP for String
Matching (SME_IP_v01) has been simulated and verified for correctness as well. Now in
the following sections of the report, the hardware implementation and analysis of the
design will be presented, along with some improvements to the original design.
Design of Hardware IP Version 2 (v2.1)
The first version of the IP had a lot of debug logic included inside the package, including
an instance of ILA IP from Xilinx. And it was designed in such a way so it can serve the
purpose of testing and verification with a small number of signatures.
The design and structure of the source code was drastically changed to accommodate a
lot of the features, will discuss later, that help in developing a functioning system design.
Some of the key aspects of the current tested version of build (SME_IP_v2.1) are
presented below:
1. Architectural Changes
 Scalability
In the second version of SME IP the source code was restructured to be able to
support scalability, since the number of signatures depend on the requirement of
the system or application. The end-user can then scale the resources of the IP to
accommodate certain number of signatures by manipulating parameters.
 Output Control
Couple of different options have been provided within the IP that are used to
control the output of the IP instance. User can either use the included event
capture block to capture the trigger events when the signature matches happen or,
introduce custom logic to suit the application. This is done using the conditional
synthesis of the block event capture, so sources are only used for this block if user
opts to include it in the IP synthesis. This helps keep the design modular to some
extent, such that, the developer can include more output control blocks in the
future serving different applications.
2. Port Description
An instance of String Match Engine IP is shown below followed by the port
description of its signals.
Figure 9: SME Match Engine
3. Design Parameters
As discussed in previous sections, the IP version used for testing has gone through
some significant improvements and this design can be configured to suit the
requirement of the application. These configuration options are described in this sub-
section:
 Module Count
This option can be found in the resource allocation tab of IP re-configuration
window. Each module can hold up to 256 states created by the Aho-Corasick
algorithm’s next move function. Hence the total states of the Aho-Corasick trie
are divided among modules. The number of modules needed to accommodate the
transition tables is exactly half the number of memory content files generated by
Memory Content Generator program.
 File Directory
This option can also be found on resource allocation tab. This option refers to the
parent directory of where the MCG memory files have been placed on the host
machine.
Figure 10: IP Re-customization

 Performance
In the feature select tab the first option is related to the performance presets.
There are two choices, “LOW LATENCY” and “HIGH PERFORMANCE”. Low
latency preset makes sure that output registers of the BRAM primitives are not
included in the design, to ensure a single clock cycle read latency (minimum)
while compromising actual throughput of the match engine. High performance
preset includes these registers in the primitives to allow the design to run at an
elevated frequency at the cost of two clock cycle latency (minimum).
 Event Capture
The last option lets designer include the provided event capture logic in the
synthesis of IP instance, or introduce their own custom logic to handle the output
of the IP block, by disabling the event capture block.
Figure 11: Event Capture
These parameters are used to configure the SME IP according to the requirements
of the application.
Chapter 3
Overview
As we had done pattern matching on hardware while software parts like packet capturing, decoding
and preprocessing were done by host pc itself. Here, we will discuss about the implementation of these
parts that we have done other than host pc
Packet capturing
Since host pc was capturing packets itself before. Our main task was to acquire data on Spartan
SP605 other than that on host pc to increase the performance of the system. Spartan SP605 can
acquire data of 1G. Packet capturing procedure on FPGA, includes modules for handling Ethernet
frames as well as IP, UDP, and ARP and the components for constructing a complete UDP/IP stack.
Packet capturing module on FPGA has submodules to capture data i.e. eth_mac_1g_fifo, eth_axis_rx,
eth_axis_txs and udp_payload_fifo.
Packet Decoding
The next task was to decode captured packets. The packet decoder module divides the data
into source ip, destination ip, length, check sum and payload data. Implementation of packet
decoding module has done on Spartan SP605 prototype.
Design Integration
Now we have designed all the modules and implemented as discussed above. Next step is
integration of these modules. In this section we are going to explain how these modules are integrated
and its working.
As we can see in the block design, that we have three blocks i.e. Packet capturing, decoding
and SME engine. Before that, we had two blocks. One was FPGA block while the second
was running on host PC. In this case, two blocks i.e. packet capturing and decoding are
integrated with SME.
As we know that, packet decoder gives source ip, destination ip, length, checksum and
payload data. The controller reads data from packet decoder and gives it to fifo module
which then gives input character to SME engine.
PCIe
In our project, we interfaced FPGA board to host pc through PCIE. PCIE is a high throughput
protocol available on most modern motherboards as well as some embedded boards. PCI Express
provide end-to-end solution for data transport between an FPGA and a host running Linux. The PCIe
IP Controller core communicates data with the user logic through a standard Application FIFO,
which is supplied by the PCIe IP. On the other end PCIe IP talk with host PC.
The above system is tested on SP605 development board and HP core I3 system. Below
figure show an FPGA board (SP605) installed in host system.
Figure 12: Design Testing System
PCIe IP Synthesis
After successful simulation, the IP is synthesized. Below table shows the resources
utilization by the IP. For validation of IP on FPGA we use Pci Tree.
Table 2: Device Utilization Summary
Logic Utilization Used Available Utilization

Number of slices LUTs 6995 433200 1.61%
Number of BRAM/FIFO 12 1470 0.81%
Application FIFO Synthesis

After successful simulation, the FIFO IP is synthesized and implemented on FPGA. Below
table shows the synthesized report of device utilization summary.
Table 3. Device Utilization Summary
Logic Utilization Used Available Utilization

Number of slice registers 48 866400 0.011%
Number of slices LUTs 47 433200 0.011%
Number of BRAM/FIFO 1 1470 0.03%
For testing and validating the design, we generated a pcap file having known content in payload. Below
figure show the pcap files.
Figure 13: Pcap File View in Wireshark
As shown in above fig.10, we have content of GUID=2E, and this payload have ID: 06. Our
proposed system detected that ID and Data and results have been mentioned on below fig.11 by
using ChipScope Debugging in ISE Design Suite.
Figure 14: ID Detection and Data Detection of content of Signature Set
3.4 Time Comparison of Pattern Matching of Software and Hardware

Here is the time comparison of Netspection based IDS where we have FPGA based detection
engine and software based detection engine.
Figure 15: Time utilized by Software based detection engine
Time taken by detection engine based on software is shown in fig 12. It can be seen in figure that
software took 1us – 2us. This is the time difference of arrival of packet in detection engine until its ID
detect
Figure 16: Time utilized by FPGA (sp605) based detection engine
The performance of hardware based detection engine is better than software based detection engine as it
can be seen in fig 13, due to the parallelism property of FPGA. System is running on the frequency of
125MHz and clock cycle time calculation using this frequency is 8ns. SP605 FPGA took almost 27
clock cycles of packet ID detection from the time of its arrival. 27∗8 ns=216 ns or 0. 216 μs. Hardware
based detection engine almost 10 times faster than the software based detection engine.
3.5 NET Fpga SUME:
Our next task was to implement Netspection IDS on NETFPGA SUME which is an ideal platform for high-
performance and high-density networking design. The NetFPGA-SUME is an amazingly advanced
board that features one of the largest and most complex FPGA’s ever produced, a Xilinx Virtex-7 690T
supporting thirty 13.1 GHz GTH transceivers. Four SFP+ 10Gb/s ports, five independent high-speed
memory banks built from both 500MHzQDRII+ & 1866MT/s DDR3 SoDIMM devices, and an eight-lane
third generation PCIe offer incredible throughput and can sustain a large number of high-speed data
streams to the FPGA fabric and memory devices. Other features include the presentation of twenty
transceivers in total on FMC and QTH expansion connectors, and SATA ports. The NetFPGA-SUME's
main mission is to give students, researchers and developers a state-of-the-art platform for networking,
whether it’s learning the fundamentals or creating new hardware and software applications. This board
easily supports simultaneous wire-speed processing on the four 10Gb/s Ethernet ports, and it can
manipulate and process data on-board, or stream it over the 8x Gen.3 PCIe interface and the
expansion interfaces.
Figure 19: NET FPGA-SUME

3.5.1 Features:
 Xilinx Virtex-7 XC7V690T FFG1761-3

 Four SFP+ interface (4 RocketIO GTH transceivers) supporting 10Gbps
 PCI-E Gen3 x8 (8Gbps/lane)
 QTH Connector (8 RocketIO GTH transceivers)
 Two SATA-III ports
 One HPC FMC Connector (10 RocketIO GTH transceivers)
 Three x36 72Mbits QDR II SRAM (CY7C25652KV18-500BZC)
 Two 4GB DDR3 SODIMM (MT8KTF51264Hz-1G9E1)
 MicroUSB Connector for JTAG programming and debugging (shared with UART interface)
 Two 512Mbits Micron StrataFlash (PC28F512G18A)
 Xilinx CPLD XC2C512 for FPGA configuration
 User LEDs and Push Buttons
Figure 20: NET FPGA-SUME block Diagram

Packet Capturing and decoding on NETFPA SUME
Packet capturing and decoding is implemented on NETFPGA SUME other than host pc. Integration of all
three modules is underprocess.
Pattern Matching on NETFPGA SUME
Since Spartan SP605 had small number of resources for pattern matching algorithm. So it was designed
for less number of rules up to 512. When we move to NETFPGA SUME, the number of resources
increase, hence we can match strings up to seven thousand rules for hardware implementation. SME
engine in this case is designed for more rules use maximum 90% resources. Implementation and testing
of Pattern matching on hardware for NETFPGA SUME have done.
Below figure show an FPGA board (NETFPGA SUME) installed in host system.
Conclusion
This project proposed a signature based Netspection hybrid network intrusion detection system which
guarantee a robust system with high throughput. Bit Split Algorithm with both memory-based and
hardwired-based has been implemented to utilize maximum resources of FPGA and to enhance the
performance of the system. Packet capturing, decoding and Pattern matching are implemented on
Spartan SP605 FPGA to make Netspection IDS more robust. All three modules have been interfaced to
PCIe and then PCIe has been interfaced to host PC. We concluded that integrated design and all IPs are
working properly. All the results are attached above to the report.
Future Work
Future works include end to end complete implementation of pattern capturing, decoding and pattern
matching on high performance NETFPGA SUME hardware development board. At this stage packet
capturing, decoding and pattern detection engine are implemented on Spartan SP605. All three modules
will be integrated and tested for NETFPGA SUME. The final module will be interfaced to PCIe and
then PCIe will be interfaced to host pc.
References
[1] P. M. K. Tharaka, D. M. D. Wijerathne, N. Perera, D. Vishwajith and A. Pasqual, “Runtime Rule-

Reconfigurable High Throughput NIPS on FPGA,” International Conference on Field Programmable Technology
(ICFPT), Melbourne, 2017, pp. 251–254.
[2] D. Pao and X. Wang, “Multi-Stride String Searching for High-Speed Content Inspection,” The Computer
Journal, vol. 55, pp. 1216–1231, 2012.
[3] R. Abdulhammed, M. Faezipour and K. M. Elleithy, "Network intrusion detection using hardware techniques:
A review," 2016 IEEE Long Island Systems, Applications and Technology Conference (LISAT), Farmingdale, NY,
2016, pp. 1-7.
[4] X. Wang and D. Pao, “Memory based architecture for multi character Aho Corasick string matching,”
Transaction on VLSI Systems, vol. 26, pp.143–154, 2018.
[5] Domínguez, P. P. Carballo and A. Nunez, “Programmable SoC Platform for Deep Packet Inspection using
Enhanced Boyer-Moore Algorithm,” 12th International Symposium on Reconfigurable Communication-centric
Systemson-Chip (ReCoSoC), Madrid, 2017, pp. 1–8.
[6] I. Sarbishei, S. Vakili, J.M. P. Langlois, and Y. Savaria, “Scalable MemoryLess Architecture for String
Matching With FPGAs,” IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, 2017, pp.
1–4.
[7] S. Pontarelli, G. Bianchi, and S. Teofili, “Traffic-Aware Design of a HighSpeed FPGA Network Intrusion
Detection System,” Transactions on Computers, vol. 62, pp. 2322–2333, 2013.
[8] J. M. Bande, J. H. Palancar and R. Cumplido, “Multi-character Cost-effective and High Throughput
Architecture for Content Scanning,” Microprocessors and Microsystems, vol. 37, pp. 1200–1207, 2013.
[9] H. Kim, K. Choi and S. Choi, “A Memory-Efficient Deterministic Finite Automaton-Based Bit-Split String
Matching Scheme Using Pattern Uniqueness in Deep Packet Inspection,” PLoSONE, vol. 10, pp. 1–24, 2015.
[10] H. J. Kim, H. S. Kim, and S. Kang, “A Memory-Efficient Bit-Split Parallel String Matching Using Pattern
Dividing for Intrusion Detection Systems,” Transaction on Parallel and Distributed Systems, vol. 22, pp. 1904–
1911, 2011.
[11] M. Arun and A. Krishnan, “Functional Verification of Signature Detection Architectures for High Speed
Network Applications,” International Journal of Automation and Computing, vol. 9, pp. 395–402, 2012.
[12] T. N. Thinh and S. Kittitornkun, “Massively Parallel Cuckoo Pattern Matching Applied For NIDS/NIPS,” 5
th International Symposium on Electronic, Design, Test and Applications, Ho Chi Minh, Vietnam, 2010, pp. 217–
221.
[13] O. Erdem, “Tree-based String Pattern Matching on FPGAs,” Computers and Electrical Engineering, vol. 49,
pp. 117–133, 2016.
[14] M. H. Hajiabadi, H. Saidi and M. Behdadfar, “Scalable High-Throughput and Modular Hardware Based
String Matching Algorithm,” 11th International ISC Conference on Information Sec and Cryptology, Tehran,
2014, pp.192–198.
[15] H. Le and V. K. Prasanna, “A Memory-Efficient and Modular Approach for Large-Scale String Pattern
Matching,” IEEE Transaction on Computers, vol. 62, pp. 844–857, 2013.
[16] C. H. Lin, and S. C. Chang, “Efficient Pattern Matching Algorithm for Memory Architecture,” Transaction
on VLSI Systems, vol. 19, pp. 33–40, 2011.
[17] A. Madhavan, T. Sherwood, and D. B. Strukov, “High-throughput Pattern Matching with CMOL FPGA
Circuits: Case for Logic-in-memory Computing,” Transaction on VLSI Systems, vol. 26, pp. 2759–2772, 2018.
[18] https://digilent.com/reference/programmable-logic/netfpga-sume/reference-manual
[19] https://www.xilinx.com/products/boards-and-kits/1-6ogkf5.html

Technical Report (30-36)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Technical Report (30-36)

Uploaded by

Copyright:

Available Formats

CYBER RECONNAISSANCE AND COMBAT LAB

TECHNICAL REPORT – 4.5

Project Title Development of Network Intrusion Detection System and

Principal Investigator Prof. Dr. Muhammad Najam ul Islam

2. TECHNICAL REPORT BRIEF

Report Title CRC NIDS – IP integration on FPGA and design validation

Duration (months) Milestone

Figure 1: Network Based Intrusion Detection (NIDS)...............................................................................................7

Figure 1: Network Based Intrusion Detection (NIDS)

Figure 2: Hybrid Netspection IDS

Duration Milestones Deliverables Progress on

1 x Conference Paper Paper Title:

1 x Seminar Please check

18-24m Individual IP synthesis and FPGA prototype of Done, Please find

1 x National Workshop Please check

24-30m IP integration on FPGA FPGA prototype of Done, Please find

30-36m 1 x Technical Report done

1.2 Report Organization

Following is the organization of chapters in this project report:

2.1 Aho-Corasick (AC) Algorithm

AC algorithm is a string searching algorithm developed by Alfred V. Aho and Margaret J.

Memory base implementation of AC utilizes the BRAMs of dedicated FPGA while

Figure 4: Memory Based AC

2.1.2 Hardwired Based AC

Hardwired base implementation of AC utilizes the slices or logic cells of dedicated

Figure 5: Conversion of IDS Rules into HDL

2.1.3 Auto HDL Generator

2.1.4 Implementation on FPGA

Based on the generated Verilog-HDL through auto-HDL-generator, project was created

Implementation and Performance Results of Hardwired AC design:

Figure 8: AC Algorithm Implementation Results

Figure 9: SME Match Engine

Figure 10: IP Re-customization

Figure 11: Event Capture

Logic Utilization Used Available Utilization

Application FIFO Synthesis

Logic Utilization Used Available Utilization

Figure 14: ID Detection and Data Detection of content of Signature Set

3.4 Time Comparison of Pattern Matching of Software and Hardware

Figure 19: NET FPGA-SUME

 Xilinx Virtex-7 XC7V690T FFG1761-3

Figure 20: NET FPGA-SUME block Diagram

Pattern Matching on NETFPGA SUME

[1] P. M. K. Tharaka, D. M. D. Wijerathne, N. Perera, D. Vishwajith and A. Pasqual, “Runtime Rule-

You might also like