Professional Documents
Culture Documents
Performance Evaluation of Hardware Unit For Fast IP Packet Header Parsing
Performance Evaluation of Hardware Unit For Fast IP Packet Header Parsing
Performance Evaluation of Hardware Unit For Fast IP Packet Header Parsing
Danijela Efnusheva(B)
1 Introduction
The rapid expansion of Internet has resulted with increased number of users,
network devices, connections, and novel applications, services, and protocols in
the modern multi-gigabit computer networks [1]. As technology has been advanc-
ing, the network connection links were gaining higher capacities, (especially with
the development of fiber-optic communications) [2], and consequently the net-
working devices were experiencing many difficulties to cope with the increased
network traffic and to timely satisfy the novel imposed requirements of high
throughput and speed, and low delays [3].
Network processors (NPs) have become the most popular solution to the bot-
tleneck problem for constructing high speed gigabit networks. Therefore, they
c Springer Nature Switzerland AG 2019
R. Silhavy et al. (Eds.): CoMeSySo 2019, AISC 1046, pp. 142–154, 2019.
https://doi.org/10.1007/978-3-030-30329-7_14
Performance Evaluation of Hardware Unit 143
are included in various network equipment devices, such as: routers, switches,
firewalls or IDS (Intrusion Detection Systems). In general, NPs are defined as
chip programmable devices that are particularly tailored to provide network
packet processing at multi-gigabit speeds [2–4]. They are usually implemented
as application specific instruction processors (ASIP), with customized instruc-
tion set that is based on RISC, CISC, VLIW or some other instruction set
architecture [3]. Over the last few years many vendors have developed their own
NPs (Intel, Agere, IBM etc.), which resulted with many NP architectures exist-
ing on the market [3]. Although there is no standard NP architecture, most NP
designs in general include: many processing engines (PE), dedicated hardware
accelerators (coprocessors or functional units), adjusted memory architectures,
interconnection mechanisms, hardware parallelization techniques (ex. pipelin-
ing), and software support [4,5]. NPs architecture design is an ongoing field of
research, expecting that the NPU market will achieve strong growth in the near
future. What is more, many new ideas, such as the NetFPGA architecture [6],
or software routers [7] are constantly emerging.
The most popular NPs that are in use today include one or many homo-
or heterogeneous processing cores that operate in parallel. For example, Intel’s
IXP2800 processor [8], consists of 16 identical multi-threaded general-purpose
RISC processors organized as a pool of parallel homogenous processing cores that
can be easily programmed with great flexibility towards ever-changing services
and protocols. Furthermore, EZChip has introduced the first network processor
with 100 ARM cache-coherent programmable processor cores [9], that is by far
the largest 64-bit ARM processor yet announced. Along with the general-purpose
ARM cores, this novel chip also include a mesh core interconnect architecture
that provides a lot of bandwidth, low latency and high linear scalability.
The discussed NPs confirm that most of the network processing is basically
performed by general-purpose RISC-based processing core (as a cheaper but
slower solution) combined with custom-tailored hardware units (as more expen-
sive but also more energy-efficient and faster solution) for executing some com-
plex tasks like traffic management, fast table look-up etc. Therefore, if network
packet processing is analyzed on general-purpose processing cores then it can be
easily concluded that a significant part of processor cycles is spent on packet’s
headers fields access, especially when the packet’s headers fields are non word-
aligned. In such case, some bit-wise logical and arithmetic operations are needed
in order to extract the field’s value from the packet header, (i.e. parsing of the
header) that should be further processed.
Assuming that network processing usually begins by copying the packet
header into a memory buffer that is available for further processing by the pro-
cessor, this paper proposes a specialized IP header parsing unit that performs
field extraction operations directly on the memory buffer output, before for-
warding the IP header to be processed by the processor. This way, the bit-wise
logical and arithmetic operations for extraction of IP header fields that are non
word-aligned, are avoided, and the packet header fields are directly sent to the
processor’s ALU in order to be further evaluated and inspected by the processor.
144 D. Efnusheva
The proposed IP header parsing unit is applied to a general purpose MIPS pro-
cessor [10] and a memory centric processor that operates with on-chip memory
[11], and then the performance gain of IP header parsing speed for the both
processors is compared, discussed and evaluated.
The rest of this paper is organized as follows: Sect. 2 gives an overview of
different approaches for improving network packet processing speed. Section 3
describes the proposed IP header parsing unit and provides details about its
design and way of operation. Section 4 presents and evaluates the simulations
results of IP header parsing speed attained when the proposed parser is applied
to a general purpose processor architectures (ex. MIPS and RISC-based memory
centric processor). Section 5 concludes the paper, outlining the performance gain
that is achieved with the proposed IP header parsing unit.
Fig. 1. Read access to an IPv4/IPv6 header field with the IP header parsing hardware
unit
The IP header parsing unit is designed so that it assumes that the IPv4 or
IPv6 packet headers are placed in a fixed area (headers buffer) of the memory,
before they are being processed. The descriptions of the IPv4 and IPv6 packet
headers that are supported by the IP header parsing unit include type of the IP
header and its location in memory as first line, while each following line contains
the definition of a single field. For each IP header field, the name and its size in
bits are specified, whereas the fields are defined in the order as they appear in
the IP header.
The IP packet header starting address that is specified in the IP header
description is used to set the base register value inside the field/data memory
address generator of the IP header parsing unit. This address generator module
also receives a field address that is translated into a field offset by the lookup
table (LUT). Actually, the field offset is a word-aligned offset to the starting IP
header packet address, thus it points to the location where the given IP packet
header field is placed in the headers buffer. This means that if the length of
some field is smaller than the memory word length, then the closest word-aligned
offset is placed in the LUT table for the given IP header field. For example, Field
Offset for IPv4 fields placed in the first word of an IPv4 header (Version, Header
Length, Type of Service and Total Length) is 0000h, while for the second word
of an IPv4 header (Identifier, Flags and Fragment Offset) is 0001h etc.
According to Fig. 1, the selected field offset from the LUT table is added to
the IP header starting address and the address of the memory word that holds
the required IP header field is generated. This address is applied to the memory
(headers buffer) and then the read word is forwarded to the field/data selector.
This selector module consists of separated field logic (FL) blocks purposed to
Performance Evaluation of Hardware Unit 147
Fig. 2. Write access to an IPv4/IPv6 header field with the IP header parsing hardware
unit
extract the value of the various IP header fields (FieldLogic1... N). In fact, each
field is extracted with a separate field logic (FL) block that is activated by the
output enable (OE) signal connected to a decoder output. The given decoder
is driven by the field address, which causes only one of the FL blocks to be
selected at a given moment. Afterwards, the selected FL block performs some
bit-wise and/or shifting operations in order to extract and then zero-extend the
appropriate IP header field. In the case when the IP header field is word-aligned,
then its FL block is empty and the word is directly forwarded from the memory
to the output of the field/data selector module.
The IP header parsing unit, presented on Fig. 1 shows the hardware that
is used to read out a single IP header field from the headers buffer. The same
concept is used for writing directly to an IP header field in the headers buffer,
as shown on Fig. 2. According to Figs. 1 and 2 it can be noticed that the both
modules use the same field/data memory address generator logic to generate the
address of the memory word that holds the required IP packet header field that
should be accessed.
The only difference between the two modules given on Figs. 1 and 2 is in
the field/data selector logic, since the FL blocks of the parsing unit receive two
inputs during writing: the IP header word-aligned data that was read from the
memory and the IP header field that will be written to the memory. In order
to provide write access to the required field, the decoder that is driven by the
148 D. Efnusheva
field address activates only one of the FL blocks. This FL block sets the input
IP header field to the appropriate position in the input IP header word-aligned
data. After that the whole word, including the written IP header field is stored
in the headers buffer at the generated address.
The IP header parsing unit is flexible to design, given that there are well-
defined packet header formats that should be supported. The proposed parsing
unit currently operates with IP headers, providing further support for other
packet header formats. In addition to the abilities for flexible extension, the
presented hardware approach of direct access to IP header fields also brings much
faster packet processing in comparison with the bare general-purpose processing,
used by nearly all network processors. A more detailed analysis referring to this
is given in the next section.
Fig. 3. Assembly programs that perform IPv4 header parsing in MIPS and MIMOPS
processors, without and with IP header parsing hardware unit
150 D. Efnusheva
the case of the Version field, only logical AND operation is needed to set all
bits to zero, except the last 4 that hold the field’s value. The Header length
field on the other hand needs a shifting first. After that an AND instruction is
used to select the last 4 bits of the shifted word, which hold the field’s value.
All the other fields are also retrieved by shifting and logical operations. The
second program is an equivalent to the previous, except that it refers to a MIPS
processor that operates with the IP header parsing unit. This program directly
addresses the fields, by using mnemonics starting with the letter ‘h’ followed by
the number of the field, as specified in the header description. Accordingly, only
one instruction is needed to read out a header field to a register.
The third and the fourth program, shown on Fig. 3, implement IPv4 header
fields access in the pure MIMOPS processor, and the MIMOPS processor that
includes IP header parsing unit, accordingly. Referring to that, it can be noticed
that the third program has many similarities with the first one, while the fourth
program is similar to the second one. Although the third program (which is pur-
posed for the MIMOPS processor) operates directly with the IP header words
that are placed in the on-chip memory, it still has to extract the fields that are not
word-aligned. The instructions that perform field’s extraction can address up to
three operands, where a 3-bit immediate value signifies which of the operands
implement base addressing. The extracted fields are placed on continuous mem-
ory locations in the Fields array, so afterwards they can be directly accessed and
processed by the ALU unit. On the other hand, the fourth program (which is
purposed for the MIMOPS processor that includes IP header parsing unit) only
has to set the base register to point to the starting address of the IP header, in
order to provide direct access to the IP header fields (specified with mnemonics,
as in the second program). According to that, an instruction that decrements the
TTL (h9) field could be simply given as SUB h9, h9, 1, allowing the ALU unit to
instantly process the extracted TTL field. This simplification could significantly
speed-up the complete network processing of an IP packet.
Figure 4 shows MIPS and MIMOPS assembly programs that parse an IPv6
packet header, without and with the IP header parsing unit. The given programs
perform parsing of an IPv6 packet header, by extracting its fields: version, traffic
class, flow label, payload length, next header, hop limit, source IP address and
destination IP address. These programs are very similar to the ones related
to IPv4 header parsing and also consist of many bit-wise logical and shifting
operations.
The comparative analysis between MIPS and MIMOPS processors that oper-
ate without or with IP header parsing unit is shown on Fig. 5. This analysis
verifies that the proposed parsing unit improves the IPv4/IPv6 header parsing
speed, providing impressing speed-up for the MIMOPS processor.
The results of the comparative analysis are given on Fig. 5, where Fig. 5a/b
shows the execution time of an IPv4/IPv6 header parsing program, while
Fig. 5c/d illustrates the IPv4/IPv6 parsing speed improvement that is achieved
by the use of IP header parsing unit in MIPS and MIMOPS processors, accord-
ingly. Referring to these results, it can be noticed that the MIPS processor that
Performance Evaluation of Hardware Unit 151
Fig. 4. Assembly programs that perform IPv6 header parsing in MIPS and MIMOPS
processors, without and with IP header parsing hardware unit