Professional Documents
Culture Documents
NetFPGA10G 3ways NoAtrhors
NetFPGA10G 3ways NoAtrhors
I. I NTRODUCTION
II. N ET FPGA C ARDS
For internal structure of NetFPGA card you can use your
own project or you can modify one of prepared demo projects
which are available on website [1]. In my proposal I focus NetFPGA cards (Figure 1) have been developed by Uni-
on efficient usage of FPGA chip performance. I assumed versity of Cambridge and Stanford University networking
new internal structure of code, which allows me to realize groups as an "open hardware" project [1]–[3]. NetFPGA card
more than one calculation in the same time. In hardware, is an extension card for PC, it has four physical network
you cam dedicate particular part or several ones for some, interfaces (4 × 1Gbps or 4 × 10Gbps electrical or optical
very strict, defined tasks. If designer multiply hardware, the (SF P +) Ethernet ports) and FPGA - programmable hardware
efficiency will be also multiplied. By simple multiplication of as a main chip. More details about structure, architecture and
calculation part, we can obtain system of parallel calculating usage of such a card can be found in the literature. What is
engines. We have to control them by sending parameters of very important, there is a public framework which implements
tasks and collecting results of calculations. In this paper I will basic network functionality (routers, switches and interface
show three approaches for sending commands and receiving cards) and it is relatively easy to add our own functionality.
results, where both are placed in IP packets. I use NetFPGA In the literature, there is a lot of information about typical
card, which can be programmed for serving ethernet traffic. usage and projects prepared by NetFPGA community [4]–[6].
But, in new architecture, the programmable core will be In primary assumption which is presented in Figure 2a,
changed, only the outer view will be realized as in primary the main chip of NetFPGA card receives packets from phys-
assumption, i.e. with ethernet frames and IP packets. Due ical ports (eth0-eth3) as input traffic, analyzes and modifies
to strict timing, commands and mainly responses, should them, and sends them out to physical ports as output traffic.
be processed in a proper way. The rest of this paper is Functionality of main chip is programmable and can be
organized as follows: Section 2 gives description of NetFPGA very flexible. The source and destination of traffic can be
cards, Section 3 presents usage of NetFPGA cards as a also logical interfaces visible in operating system as nf0-
calculating engine with proposed internal structure, section 4 nf3 interfaces. Very important fact is that this functionality
describes software tools, and Section 5 defines approaches to is realized by hardware, hence, the performance of such an
sending and receiving commands via IP packets, last section appliance allows to serve whole traffic with speed of line, in
summarizes paper and signalize future works. case of these Ethernet ports it can be 1Gbps or 10Gbps.
Fig. 2. Schema of modules on NetFPGA card and inside FPGA chip and modified schema of NetFPGA card - modification is only in FPGA chip
Fig. 3. Internal pipeline of modules on NetFPGA card and inside FPGA chip and modified schema of NetFPGA card - modification is only in FPGA chip
III. H ARDWARE PART AS A SIMULATOR between primary and proposed architecture is to use parallel
custom modules (Figure 3b), where each of them can calculate
In my apporach, which schematic is presented in Figure
its task in the same time. The speedup is not necessary, the
2b, there is no typical modules in FPGA chip. Only commu-
maximal speed of calculation is used by multiple modules in
nication with operating system through one interface is used,
the same time, hence, the final performance of calculation it
it is visible as one active interface in the operating system.
is a product of performance of one module and number of
All the traffic sent to this interface is visible by all modules,
modules. Proper control and timing mechanisms should be
they identify traffic dedicated to them by given IP/MAC
use to send requests for particular modules and also adequate
addres or even VLAN ID or UDP port. The return path for
way of receiving responses has to be implemented (in module
data is realized analogically, each module sends traffic with
with name output module arbiter). It reflects on functionality
given parameters. The control application on operating system
realized in the software part.
serves this interface in promiscuous mode and receives all
frames. Figure 3a presents primary pipeline, where 8 incom- The same FPGA chip (Virtex 5) is used on developers
ing queues and 8 outgoing queues are implemented, because boards ML555 [7] and ML505 [8] (Figures 4 and 5) which
they are related with serving traffic from two representation of are available in my laboratory. So the same procedure and
4 interfaces in two directions. In proposed architecture only architecture can be used for project for them. On those boards
one buffer is realized, because nature of transferred frame is there are only 1Gbps Ethernet SFP cages, but it is not a
different. In primary version, 8 incoming traffic streams was problem, because new architecture is focussed inside fpga
fully served thanks to internal speedup. The main difference chip, which is the same in all of three analyzed boards.
of work. When the simulator is prepared and configured (i.e.
IP/MAC addresses used in software part are also implemented
in custom modules), tasks can be sent to hardware. When
simulator decides to run some task and module for this task
is ready, parameters of tasks are packed into IP payload
and sent to NetFPGA card as typical IP packet packed in
Ethernet frame. This block of data, after passing DMA in
PCI bus, is placed in input buffer and it is visible by every
custom module. There is only one module which has matching
IP/MAC addresses, so, only one custom module will read
Fig. 4. ML555 developers board from Xilinx
this data. The content of IP payload is used as a parameters
for calculations. After finishing calculations, when results are
ready, they are packed into IP packet in order to be received
by DMA and software part. But, the output of custom modules
has to be served in proper way to omit problems. Different
types of problems causes different approach to solving them,
realization of different approaches is described further.