Professional Documents
Culture Documents
Mellanox Connectx®-5 25gbe Ethernet Adapter: The Bottom Line Executive Summary
Mellanox Connectx®-5 25gbe Ethernet Adapter: The Bottom Line Executive Summary
#219130
Commissioned by
Mellanox Technologies
40 40
Aggregate Throughput (Gbps)
20 20
Broadcom
could not
complete the
test of 2,048
connections
*&&&&&
rate throughput for Broadcom. At 128 Mellanox needed only 49 CPU cycles in the
FIO Tests through 512-byte packets, Broadcom single and dual-core tests and only 59
These storage benchmarking tests throughput delivered 70, 71 and 73% of cycles in the quad-core test. By contrast,
demonstrated the many-to-one networked line-rate respectively. Broadcom required 534 cycles for the
“write” performance of the NICs under test single-core test, 838 cycles for the dual-core
at three increasing load levels. DPDK CPU Cycles Per Packet and 907 cycles for the quad-core.
Broadcom’s CPU demand was 15X that of
The Mellanox performance was higher at The final test was run on a single system Mellanox for the quad-core test.
all three load levels. See Figure 4. The and was used to gauge the CPU efficiency
Mellanox aggregate throughput for a
single job was 33Gbps - more than 50%
of the NIC under test. The TestPMD utility
was used generate a stream of 1518-byte
Test Setup &
greater than Broadcom’s 21Gbps. packets and measure the number of CPU
cycles required to process a single packet. Methodology
With 12 concurrent jobs, the Mellanox The test was run using single, dual and
aggregate throughput was 36Gbps which Details of server configuration, SUT
quad cores active in the server.
was 20% greater than Broadcom’s 30Gbps. configurations, release levels and relevant
The Mellanox solution demonstrated better information are found in Tables 1-3.
With 50 concurrent jobs, the Mellanox CPU efficiency in all tests with dramatically
aggregate throughput was 35Gbps. lower CPU requirements. See Figure 6.
Engineers were unable to complete the 50
job test with the Broadcom solution. After
testing was completed and Tolly shared the DPDK CPU Cycles Per 1518-Byte Packet - 25GbE
results with Broadcom, the vendor noted Mellanox ConnectX-5 vs. Broadcom NetXtreme E
that they could not reproduce the problem (Lower numbers are better)
with their current software version.
750
DPDK consists of libraries to accelerate
packet processing workloads running on a 15X
wide variety of CPU architectures. Run on a
single system outfitted with a dual-port 500
25GbE NIC, this test illustrated the lossless
throughput of the NIC at the standard
RFC2544 packet sizes that ranged from 64 250
to 1518 bytes.
Test Infrastructure
Make & Model Dell PowerEdge R630 Make & Model HPE ProLiant DL380 Gen10
(868703-B21)
CPU 2 * Intel(R) Xeon(R) CPU E5-2660 v4 @
2.0GHz, 14 physical cores per socket, 2 * Intel(R) Xeon(R) Gold 5118
HyperThreading=ON
CPU
CPU @ 2.30GHz, 12 physical
cores per socket,
Memory (RAM) 32GB per NUMA node, 16GB DIMMS DDR4
HyperThreading=ON
@ 2400 MT/s Dual Rank
Server OS FIO: Red Hat Enterprise Linux Server 7.6 Add’l software for libbnxt_re-214.0.181.0-
RoCE: RHELS 7.4 Broadcom rhel7u6.x86_64.rpm
Table 2 Table 3
Source: Tolly, June 2019
NVMe-oF devices ‘/dev/nvme0n0’ ‘/dev/ throughput for standard RFC2544 packet c) Move all IRQs to far NUMA node:
nvme0n1’ then appear on each Initiator (1 sizes with maximum frame loss of 0.001%.
for each NIC port). TestPMD was run against 100% line rate "IRQBALANCE_BANNED_CPUS=$LOCAL_N
traffic to determine the number of host UMA_CPUMAP irqbalance - -oneshot"
FIO storage benchmark is run on each CPU cycles required to process a given
Initiator, to issue read/write requests to ‘/ d) Disable irqbalance:
frame. Tests were conducted on RHEL 7.5.
dev/nvme0n0’, ‘/dev/nvme0n1’. DPDK 19.02 was used for testing. "systemctl stop irqbalance"
NVMEoF transports the read/write requests
via RDMA to the Target.
TestPMD Procedure (For Mellanox only):
• Install clean Red Hat Enterprise Linux 7.5 e) Change PCI MaxReadReq to 1024B for
FIO Parameters each port of each NIC:
• Install MLNX_OFED_LINUX-4.6-0.3.1.0
• job composition "--rw":
Run "setpci -s $PORT_PCI_ADDRESS 68.w",
• Enable mlx5 PMD before compiling
• 'randread' = the workload consists of it will return 4 digits ABCD -->
DPDK:
100% read requests, to random
Run "setpci -s $PORT_PCI_ADDRESS
locations in the destination file • In .config file generated by "make config”
68.w=3BCD"
• 'randwrite' = the worload consists of • set: “CONFIG_RTE_LIBRTE_MLX5_PMD=y"
f) Set CQE COMPRESSION to “AGGRESSIVE”:
100% write requests, to random
• set:
locations in the destination file “CONFIG_RTE_TEST_PMD_RECORD_CORE mlxconfig -d $PORT_PCI_ADDRESS set
_CYCLES=y" CQE_COMPRESSION=1
• number of FIO processes "--numjobs": 1,
12, or 50 • Compile DPDK (For System):
• blocksize: 4KB
TestPMD Optimization g) Disable Linux realtime throttling:
• runtime: 70 sec. "--ramp_time": 5 sec. During testing, TestPMD was given real- echo -1 > /proc/sys/kernel/
time scheduling priority. An Ixia traffic sched_rt_runtime_us
The test results are: 1) the sum of the FIO generator was used to generate 100% line-
reported BW on all the Initiators (this sum is rate 1518-byte frames. Only TestPMD
limited by Target's NIC TX/RX on both ports L3fwd Procedure
cycles-per-frame results were recorded as
= 50GbE) and 2) the sum of the FIO packet throughput/loss was not relevant in An Ixia traffic generator was used to
reported IOPS on all the Initiators (this sum this scenario. generate layer 3 (L3) traffic according to
is limited by the target's NIC TX/RX on both IETF RFC2544.
ports = 50GbE), (For Mellanox only): *Note: For Broadcom,
Flow Control was disabled via TestPMD Traffic with Ethernet and IP headers, with
8K different srcIP addresses, was sent from
DPDK Tests runtime command.
each of the two Ixia ports.
Data Plane Development Kit (DPDK) a) Flow Control OFF:
consists of libraries to accelerate packet Frame sizes were: 64, 128, 256, 512, 1024,
processing according to its source, The "ethtool -A $netdev rx off tx off" 1280, and 1518-bytes.
Linux Foundation. These are low level,
(For System):
hence technical, tests. Thus, the test
description is necessarily very granular. b) Memory optimizations:
Two tests were run on DPDK. L3fwd was "sysctl -w vm.zone_reclaim_mode=0";
run to find the maximum rate of "sysctl -w vm.swappiness=0"
Terms of Usage
This document is provided, free-of-charge, to help you understand whether a given product, technology or service merits additional
investigation for your particular needs. Any decision to purchase a product must be based on your own assessment of suitability
based on your needs. The document should never be used as a substitute for advice from a qualified IT or business professional. This
evaluation was focused on illustrating specific features and/or performance of the product(s) and was conducted under controlled,
laboratory conditions. Certain tests may have been tailored to reflect performance under ideal conditions; performance may vary
under real-world conditions. Users should run tests based on their own real-world scenarios to validate performance for their own
networks.
Reasonable efforts were made to ensure the accuracy of the data contained herein but errors and/or oversights can occur. The test/
audit documented herein may also rely on various test tools the accuracy of which is beyond our control. Furthermore, the
document relies on certain representations by the sponsor that are beyond our control to verify. Among these is that the software/
hardware tested is production or production track and is, or will be, available in equivalent or better form to commercial customers.
Accordingly, this document is provided "as is," and Tolly Enterprises, LLC (Tolly) gives no warranty, representation or undertaking,
whether express or implied, and accepts no legal responsibility, whether direct or indirect, for the accuracy, completeness, usefulness
or suitability of any information contained herein. By reviewing this document, you agree that your use of any information contained
herein is at your own risk, and you accept all risks and responsibility for losses, damages, costs and other consequences resulting
directly or indirectly from any information or material available on it. Tolly is not responsible for, and you agree to hold Tolly and its
related affiliates harmless from any loss, harm, injury or damage resulting from or arising out of your use of or reliance on any of the
information provided herein.
Tolly makes no claim as to whether any product or company described herein is suitable for investment. You should obtain your own
independent professional advice, whether legal, accounting or otherwise, before proceeding with any investment or project related
to any information, products or companies described herein. When foreign translations exist, the English document is considered
authoritative. To assure accuracy, only use documents downloaded directly from Tolly.com. No part of any document may be
reproduced, in whole or in part, without the specific written permission of Tolly. All trademarks used in the document are owned by
their respective owners. You agree not to use any trademark in or as the whole or part of your own trademarks in connection with
any activities, products or services which are not ours, or in a manner which may be confusing, misleading or deceptive or in a
manner that disparages us or our information, projects or developments.