Professional Documents
Culture Documents
Technical Report (30-36)
Technical Report (30-36)
BAHRIA UNIVERSITY
1. PROJECT BRIEF
This report presents a hardware module of Netspection IDS system. It monitors and analyzes all the
incoming packets on a given network to detect any intrusions. Netspection IDS has four basic parts (i.e.
packet capturing, packet decoding, preprocessing and pattern matching). Among these pattern matching
is complex and time taking process. In order to get the better performance hybrid Netspection IDS has
been proposed in which first three processes are running on Linux based PC while FPGA based
hardware solution has been developed for pattern matching in order to obtain the high throughput IDS
due to the fact of its parallelism and reconfigurable property. PCIe based serial connection is developed
for the communication between software and hardware. The algorithm used for detection engine is Aho-
Corasick and Bit-Split. The approach developed using Aho-Corasick algorithm is memory based which
utilizes the BRAMs and hardwired based which utilizes the slices in order to achieve the maximum
resources of FPGA. Though, writing HDL code takes a lot of time and effort in order to present a
hardware solution that caters to hundreds of rules, for this we developed a tool for automatically
generating Verilog-HDL code from a rule set in this project. Initially Xilinx ZC702 evaluation board is
used for the testing of individual IPs of the developed system while the whole detection engine is
implemented and tested in Spartan 6 (SP605) evaluation kit which runs at an operational frequency of
125MHz.
Table of Contents
Abstract....................................................................................................................................................................3
List of Figures...........................................................................................................................................................5
List of Tables.............................................................................................................................................................6
Chapter 1..................................................................................................................................................................7
1.1 Introduction..................................................................................................................................................7
1.2 Report Organization...................................................................................................................................10
Chapter 2................................................................................................................................................................11
Background and Related Work...............................................................................................................................11
2.1 Aho-Corasick (AC) Algorithm..................................................................................................................11
2.1.1 Memory Based AC..........................................................................................................................13
2.1.2 Hardwired Based AC.......................................................................................................................13
2.1.3 Auto HDL Generator.......................................................................................................................14
2.1.4 Implementation on FPGA................................................................................................................14
Chapter 3................................................................................................................................................................21
Integration of FPGA with Host PC...........................................................................................................................21
3.1 PCIe IP Synthesis.....................................................................................................................................21
3.2 Application FIFO Synthesis......................................................................................................................22
3.3 Design Integration..................................................................................................................................22
3.4 Time Comparison of Pattern Matching of Software and Hardware........................................................24
Conclusion..............................................................................................................................................................27
Future Work...........................................................................................................................................................28
References..............................................................................................................................................................29
List of Figures
Netspection IDS which comprises of four modules namely packet capturing, packet decoding, packet
preprocessing and pattern matching engine. The pattern matching module is responsible to perform
most computational intensive part i.e. pattern matching, essential for the development of NIDs. As
discussed above, a hybrid technique has been developed for Netspection IDS in which packet
capturing, decoding and preprocessing is done in software side while hardware is responsible for
pattern matching engine.
Due to higher demand of scalable hardware architectures for network security applications, there is
abundant amount of research studies available on pattern matching algorithms and techniques [11].
Aho-Corasick (AC) algorithm is the most targeted algorithm for hardware based NIDs [12]. The
efficiency of AC algorithm is reported to increase using multi-character searching, pipelining and
parallelism of memory allocations. In many works, AC algorithm is implemented using memory
based approach [13]. Parallelization of designed system is then performed to achieve high
throughput [14]. The other possible implementation of AC algorithm is through hardwired approach
[15], however, it is limited if not obsolete when the signature set is vast. Hardwired AC algorithm is
reported to be implemented when the signature set is very small. In the presence of a large signature
set, implementing hardwired AC is very tedious and time consuming as it requires designing a very
large state machine and writing a lengthy HDL code prone to human errors [16].
Proposing hardwired solution for large rule set in parallel to memory based solution becomes
significant once we use FPGA which is most suitable target platform for IDS. This is due to the fact
that memory based solutions use BRAMs in large quantities whereas logic resources such as LUTs
and Flip-Flops are not used significantly. On the other hand, hardwired solution primarily uses logic
resources. Provision of hardwired solution in parallel with memory based solution can thus promise
overall best utilization of most of the major resources of single FPGA module. Considering the
research gap towards implementing hardwired AC algorithm for large rule set, an auto-HDL-
generator tool for hardwired AC algorithm is presented in this letter. In addition, comparative
analysis is also presented here to assess trade off among resource utilization, operational frequency
and resulting throughput of single hardware implementing large rule set versus multiple parallel
hardware implementation covering small sub-sets of rules.
Figure 3: FSM of AC algorithm for the signature set {app, apple, aim, cap, cat)
Hardware prototyping of search engines utilizing AC algorithm have developed much admiration
recently because of its high efficiency and comparatively low time complexity.
AC algorithm can be implemented in two ways.
i. Memory Based implementation of AC
ii. Hardwired Based implementation of AC
2.1.1 Memory Based AC
Keeping in to consideration the complexity related to state machine and related HDL
coding for a large rule set, the idea of development of auto-HDL-generator was
conceived. This is C programming based utility and inputs the rule set in the form of a
text file.
Before implementing complete pattern detection engine on FPGA, individual IPs has
been tested on Xilinx Zynq-7000 ZC702 providing XC7020-CLG484-1 evaluation kit.
This evaluation kit provides both PS (Processing System) and PL (Programmable Logic)
in a single chip.
Figure 6: Xilinx Zynq-7000 ZC702
After the successful testing of IPs individually on above mentioned evaluation kit, the
whole pattern engine is implemented and tested on Spartan 6 – SP605 FPGA kit because
this device provides PCIe.
Figure 7: Spartan 6 - SP605 Evaluation kit
The PCIe IP Controller core communicates data with the user logic through a standard
Application FIFO, which is supplied by the PCIe IP. On the other end PCIe IP talk with
host PC.
Throughput = T p = 8 × f (bits/sec)
Based on throughput and FPGA slice consumed, their ratio is computed to assess
throughput/area metric. The results are tabulated in Table 1 and its graphical
representation is shown in Fig. 2. It is observed that as the number of rules increase, the
used slices increase, operational frequency decreases and consequently throughput
decreases. Due to decrease in throughput and increase in slice used, the key metric of
throughput/area drastically decreases. In our case, as number of rules go from 500 to
10,000 i.e. 20 times increase, the throughput/area metric reduces from 3.26
Mbits/sec/slice to 0.04 Mbits/sec/slice i.e. reduction more than 80 times. In order to
achieve high throughput, particularly in the presence of large number of rules, parallel
architecture is required.
Parallel Architecture:
Number of rules-based parallelism study is conducted, and results are presented here. In
this scenario, multiple hardware blocks implementing small rule sets working in parallel
are considered. The rule sets are taken randomly for parallelism study and every rule is
distinct and independent of other. For this purpose, a rule set of 5000 was taken. This set
is divided into five independent sub sets, where each sub set comprised on 1000 rules.
Another class of subsets, where 10 independent subsets each comprised of 500 rules, was
created. These subsets were individually synthesized and placed and routed of FPGA.
The implementation and performance results are tabulated in Table 2 and 3 respectively
for above-stated two types of sub sets. Table 2 and 3 provides information related to
important metrics associated with different chunks of rule sets e.g. Table 2 shows that the
hardware for first 1000 rules consume highest number of slices while operating on
minimum frequency, hence, providing worst performance If Table 3 is consulted, it can
be seen that among first 1000 rules, the second half i.e. second chunk of 500 rules has
worst performance.
Implementation and Performance Results of Memory Based AC design:
The theoretical background of Memory based architecture has been discussed in the
previous reports. And the preliminary design (version 1) of the hardware IP for String
Matching (SME_IP_v01) has been simulated and verified for correctness as well. Now in
the following sections of the report, the hardware implementation and analysis of the
design will be presented, along with some improvements to the original design.
Design of Hardware IP Version 2 (v2.1)
The first version of the IP had a lot of debug logic included inside the package, including
an instance of ILA IP from Xilinx. And it was designed in such a way so it can serve the
purpose of testing and verification with a small number of signatures.
The design and structure of the source code was drastically changed to accommodate a
lot of the features, will discuss later, that help in developing a functioning system design.
Some of the key aspects of the current tested version of build (SME_IP_v2.1) are
presented below:
1. Architectural Changes
Scalability
In the second version of SME IP the source code was restructured to be able to
support scalability, since the number of signatures depend on the requirement of
the system or application. The end-user can then scale the resources of the IP to
accommodate certain number of signatures by manipulating parameters.
Output Control
Couple of different options have been provided within the IP that are used to
control the output of the IP instance. User can either use the included event
capture block to capture the trigger events when the signature matches happen or,
introduce custom logic to suit the application. This is done using the conditional
synthesis of the block event capture, so sources are only used for this block if user
opts to include it in the IP synthesis. This helps keep the design modular to some
extent, such that, the developer can include more output control blocks in the
future serving different applications.
2. Port Description
An instance of String Match Engine IP is shown below followed by the port
description of its signals.
3. Design Parameters
As discussed in previous sections, the IP version used for testing has gone through
some significant improvements and this design can be configured to suit the
requirement of the application. These configuration options are described in this sub-
section:
Module Count
This option can be found in the resource allocation tab of IP re-configuration
window. Each module can hold up to 256 states created by the Aho-Corasick
algorithm’s next move function. Hence the total states of the Aho-Corasick trie
are divided among modules. The number of modules needed to accommodate the
transition tables is exactly half the number of memory content files generated by
Memory Content Generator program.
File Directory
This option can also be found on resource allocation tab. This option refers to the
parent directory of where the MCG memory files have been placed on the host
machine.
These parameters are used to configure the SME IP according to the requirements
of the application.
Chapter 3
Overview
As we had done pattern matching on hardware while software parts like packet capturing, decoding
and preprocessing were done by host pc itself. Here, we will discuss about the implementation of these
parts that we have done other than host pc
Packet capturing
Since host pc was capturing packets itself before. Our main task was to acquire data on Spartan
SP605 other than that on host pc to increase the performance of the system. Spartan SP605 can
acquire data of 1G. Packet capturing procedure on FPGA, includes modules for handling Ethernet
frames as well as IP, UDP, and ARP and the components for constructing a complete UDP/IP stack.
Packet capturing module on FPGA has submodules to capture data i.e. eth_mac_1g_fifo, eth_axis_rx,
eth_axis_txs and udp_payload_fifo.
Packet Decoding
The next task was to decode captured packets. The packet decoder module divides the data
into source ip, destination ip, length, check sum and payload data. Implementation of packet
decoding module has done on Spartan SP605 prototype.
Design Integration
Now we have designed all the modules and implemented as discussed above. Next step is
integration of these modules. In this section we are going to explain how these modules are integrated
and its working.
As we can see in the block design, that we have three blocks i.e. Packet capturing, decoding
and SME engine. Before that, we had two blocks. One was FPGA block while the second
was running on host PC. In this case, two blocks i.e. packet capturing and decoding are
integrated with SME.
As we know that, packet decoder gives source ip, destination ip, length, checksum and
payload data. The controller reads data from packet decoder and gives it to fifo module
which then gives input character to SME engine.
PCIe
In our project, we interfaced FPGA board to host pc through PCIE. PCIE is a high throughput
protocol available on most modern motherboards as well as some embedded boards. PCI Express
provide end-to-end solution for data transport between an FPGA and a host running Linux. The PCIe
IP Controller core communicates data with the user logic through a standard Application FIFO,
which is supplied by the PCIe IP. On the other end PCIe IP talk with host PC.
The above system is tested on SP605 development board and HP core I3 system. Below
figure show an FPGA board (SP605) installed in host system.
Figure 12: Design Testing System
PCIe IP Synthesis
After successful simulation, the IP is synthesized. Below table shows the resources
utilization by the IP. For validation of IP on FPGA we use Pci Tree.
Table 2: Device Utilization Summary
For testing and validating the design, we generated a pcap file having known content in payload. Below
figure show the pcap files.
Figure 13: Pcap File View in Wireshark
As shown in above fig.10, we have content of GUID=2E, and this payload have ID: 06. Our
proposed system detected that ID and Data and results have been mentioned on below fig.11 by
using ChipScope Debugging in ISE Design Suite.
Time taken by detection engine based on software is shown in fig 12. It can be seen in figure that
software took 1us – 2us. This is the time difference of arrival of packet in detection engine until its ID
detect
Figure 16: Time utilized by FPGA (sp605) based detection engine
The performance of hardware based detection engine is better than software based detection engine as it
can be seen in fig 13, due to the parallelism property of FPGA. System is running on the frequency of
125MHz and clock cycle time calculation using this frequency is 8ns. SP605 FPGA took almost 27
clock cycles of packet ID detection from the time of its arrival. 27∗8 ns=216 ns or 0. 216 μs. Hardware
based detection engine almost 10 times faster than the software based detection engine.
3.5 NET Fpga SUME:
Our next task was to implement Netspection IDS on NETFPGA SUME which is an ideal platform for high-
performance and high-density networking design. The NetFPGA-SUME is an amazingly advanced
board that features one of the largest and most complex FPGA’s ever produced, a Xilinx Virtex-7 690T
supporting thirty 13.1 GHz GTH transceivers. Four SFP+ 10Gb/s ports, five independent high-speed
memory banks built from both 500MHzQDRII+ & 1866MT/s DDR3 SoDIMM devices, and an eight-lane
third generation PCIe offer incredible throughput and can sustain a large number of high-speed data
streams to the FPGA fabric and memory devices. Other features include the presentation of twenty
transceivers in total on FMC and QTH expansion connectors, and SATA ports. The NetFPGA-SUME's
main mission is to give students, researchers and developers a state-of-the-art platform for networking,
whether it’s learning the fundamentals or creating new hardware and software applications. This board
easily supports simultaneous wire-speed processing on the four 10Gb/s Ethernet ports, and it can
manipulate and process data on-board, or stream it over the 8x Gen.3 PCIe interface and the
expansion interfaces.
Packet capturing and decoding is implemented on NETFPGA SUME other than host pc. Integration of all
three modules is underprocess.
Since Spartan SP605 had small number of resources for pattern matching algorithm. So it was designed
for less number of rules up to 512. When we move to NETFPGA SUME, the number of resources
increase, hence we can match strings up to seven thousand rules for hardware implementation. SME
engine in this case is designed for more rules use maximum 90% resources. Implementation and testing
of Pattern matching on hardware for NETFPGA SUME have done.
Below figure show an FPGA board (NETFPGA SUME) installed in host system.
Conclusion
This project proposed a signature based Netspection hybrid network intrusion detection system which
guarantee a robust system with high throughput. Bit Split Algorithm with both memory-based and
hardwired-based has been implemented to utilize maximum resources of FPGA and to enhance the
performance of the system. Packet capturing, decoding and Pattern matching are implemented on
Spartan SP605 FPGA to make Netspection IDS more robust. All three modules have been interfaced to
PCIe and then PCIe has been interfaced to host PC. We concluded that integrated design and all IPs are
working properly. All the results are attached above to the report.
Future Work
Future works include end to end complete implementation of pattern capturing, decoding and pattern
matching on high performance NETFPGA SUME hardware development board. At this stage packet
capturing, decoding and pattern detection engine are implemented on Spartan SP605. All three modules
will be integrated and tested for NETFPGA SUME. The final module will be interfaced to PCIe and
then PCIe will be interfaced to host pc.
References