Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

BRKINI-2005

Engineering Fast I/O to the


Network

David Nguyen, UCS Technical Marketing Engineers


Cisco Spark
Questions?
Use Cisco Spark to communicate
with the speaker after the session

How
1. Find this session in the Cisco Live Mobile App
2. Click “Join the Discussion”
3. Install Spark or go directly to the space
4. Enter messages/questions in the space

cs.co/ciscolivebot#BRKINI-2005

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Agenda

• VIC 101
• Enic tuneables
• Hypervisor Optimization
• Kernel Bypass Technologies
• VIC CLI Monitoring Commands
Sometime default is not a bad thing
Tuning
• Many layers of tuning
• Application
• Kernel (network socket)
• Host (HW/BIOS)
• Adapter

• Lab != real world


• Test, test, and more testing

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
I/O for all Platforms
• Converged Network Adapter (CNA)
• Multi-form factor support
• mLOM* and PCIe based for rack servers
• mLOM* and Mezzanine for blade servers
mLOM (rack) PCIe
• Multi-speed support
• 10G, 20G, and 40G connectivity support
• Multi-management support
• CIMC – rack servers in standalone mode
• UCSM – rack and blade servers that are
connected to the FI either directly or IOM/FEX Mezzanine mLOM (blade)

* Note mLOMs are not interchangeable between rack and blade server

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
VIC Generation Comparison
Features VIC 1200 VIC 1300
PCIe Gen-2 x16 Gen-3 x16 (dual x8)
Speed 10G, 20G*, or 40G* 10G, 20G*, or 40G**
PPS x x
VIF 256 256+

Offloads IPv4 and IPv6 L3/L4 Checksum, TSO, LSO IPv4 and IPv6 L3/L4 Checksum, TSO, LSO

FCOE
QOS
Netflow
VM-FEX Technology (UCSM
Only - SRIOV)
Kernel Bypass Technology usNIC/Intel OpenMPI/DPDK usNIC/Intel OpenMPI/DPDK
Netqueue/VMQ
Network Overlay Offload NVGRE/VxLAN
ROCE Support v1

* Multiple 10G bundled in a port-channel


** Operate either 10G bundle or native 40G interface

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Standalone
CIMC
• Logical PCIe vNIC adapters to the OS
• Up to 16 vNIC and 2 vHBA
• Default is 2 vNIC and 2vHBA
• No vNIC-to-vNIC forwarding on the VIC
• Upstream device
• LLDP/DCBx for link connectivity
• Disable LLDP if upstream switch does not
support DCBx
• LACP is tagged with vlan-id 0 for priority
• If switch does not recognize vlan-id 0 then use
switch independent nic teaming for Active/Active

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
VN-Tag: Instantiation of Virtual Interfaces
• Virtual interfaces (VIFs) help distinguish between FC and Eth interfaces
• They also identify the origin server
• VIFs are instantiated on the FI and correspond to frame-level tags assigned to
the adapter cards
• A 6-byte tag (VN-Tag) is preprended by VIC as traffic leaves the server to
identify the interface
• VN-Tag associates frames to a VIF
• VIFs are ‘spawned off’ the server’s EthX/Y/Z interfaces (examples follow)

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Abstracting the Logical Architecture
Physical Logical
UCS-FI UCS-FI UCS-FI

Switch vFC vEth vFC vEth


1 1 1 1
 Dynamic, Rapid
Eth 1/1 Provisioning

 State abstraction
IOM A IOM A
 Location
Cable Independence
10GE 10GE
A
A  Blade or Rack

Adapter Physical Cable


vHBA vNIC
vHBA vNIC
1 1 Virtual Cable
1 1
(VN-Tag)
Service Profile
Blade (Server) (Server)

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
VIC Block ASIC Diagram
M4 Blade and Rack Servers
• Dual Uplink Interface (UIF)
• Multispeed support, 10G to 40G, based
on number of active lanes

Connection
10G or 40G
• 2 host PCIe x8 Gen 3 connection to the

PCIe x16 Connector


U
I
compute* PCIe x 8 Gen 3 F
0
• Each host connection provides up to PCIe x 8 Gen 3 3rd Gen VIC ASIC
~63Gbps** U
I
F
1

* VIC 1387 and S3260 SIOC are the exceptions, x8 connection


** Theoretical value, actual BW will vary

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
To Network To Network

Virtual I/O Autonomy A B

• Virtual I/O is independent from one


another UIF0 UIF1

• Independent adapter profile


• COS, MTU, # of queues, ring sizes, etc.
• Flexible I/O placement 1 2
3 5 4
• Uplink = vNIC
Host PCIe Host PCIe
• Host PCIe connection 1 2
= vHBA

PCIe x 8 Gen 3

PCIe x 8 Gen 3
PCIe x16 Connector

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
VIC1340

• Supported in mLOM slot only


• Multispeed operation, 10G or 40G
• IOM will dictate the speed of operation
• Port Expander is optional
• Increases the available bandwidth to the server
• Occupies the mezzanine slot
• Passive device, requires VIC1340 in order to operate

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Blade VIC1340 (mLOM) Connectivity Comparison
2204 vs 2208 or 2304
2204 2208 or 2304
IOM-A IOM-B IOM-A IOM-B

UIF0 UIF1 UIF0 UIF1

VIC1340 VIC1340

• Dual-10Gbps • Dual-20Gbps
• HW 2x10G port-channel
• Total BW to the server is 20G
• Hash based on traffic type
• Total BW to the server is 40G

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
B200M4

40G Interface – VIC 1340 plus Port-Expander


4x10G vs Native 40G
2204 2208 2304
IOM-A IOM-B IOM-A IOM-B IOM-A IOM-B

PE PE PE
True 40G
UIF0 UIF1 UIF0 UIF1 UIF0 UIF1
Interface

VIC1340 VIC1340 VIC1340

• Dual-20Gbps • Dual-40Gbps • Dual-40Gbps


• HW 2x10G port-channel • HW 4x10G interface • Native 40G interface via 40G
protocol
• Hash based on traffic type • Hash based on traffic type
• Bit spray across 4 lanes
• Max BW for a given flow is • Max BW for a given flow is • Max BW for a given flow is
10G 10G 40G
• Total BW to the server is 40G • Total BW is 80G • Total BW is 80G

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
To Network To Network
A B

UCSM
UIF0 UIF1

VIC VIC
1 2
1 2
• Implicit or Explicit VIF distribution 3 5 4
Host PCIe Host PCIe
1 2

• User can modify the adapter, admin VIC

PCIe x 8 Gen 3

PCIe x 8 Gen 3
Compute
host port, and/or desired order
PCIe x16 Connector

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Implicit Method
VIF Distribution between Adapters

• Distribute based on adapter


VIC1340 VIC1380
capabilities
• Cisco VIC vs 3rd party
• Divide VIF into groups based on
weight
• Weight is a ratio between the adapters
of max numbers of VIF supported
• Assign groups to the adapters

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Examples of Use Cases of Implicit Method
Even numbered vEth Odd numbered vEth
• VIC1340 and VIC1380 • VIC1340 and VIC1380
• 8 vEths (veth0-7)
• 7 vEths (veth0-6)
• 4 per Fabric Interconnect
• vEths are ordered from low to high • 4 vEth on FI-A and 3 vEth on FI-B
• Since VIC1340 and VIC1380 have • vEths are ordered from low to high
the same VIF scale number then
• VIF distribution is 1:1
vEth distribution is 1:1
• vEth[0-3] are group 1 and is assigned • vEth[0-4] are in group 1 and is
to VIC1340 assigned to VIC1340
• vEth[4-7] are in group 2 and is • vEth[5-7] are in group 2 and is
assigned to VIC1380
assigned to VIC1380

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Implicit Method
VIF Distribution between Admin Host Port

• Distribute VIF across admin host as VIC1340 VIC1380


evenly possible
Admin Host Admin Host Admin Host Admin Host
• Divide into equal parts and distribute Port 1 Port 2 Port 1 Port 2
as a group
• Similar distribution logic as with the
adapter distribution

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Explicit Method 1
Service Profile/Service Profile Template
Host
VIC Placement Placement

Order of the
vNIC

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Explicit Method 2
Policy Based – vCon Policy

• Virtual (Network Interface) Connection or vCONs are abstraction of physical


adapters and provided a method for consistent PCIe mappings for a service
profile regardless of how many physical cards are installed
• To provide a mobility mechanism between different resources (ie, 1 adapter vs many
adapters) or form factors (ie, blades vs racks)

Rack Blade

vCon1 vCon2 vCon3 vCon4 vCon1 vCon2 vCon3 vCon4

hba0

hba0
hba0

hba0

eth4

eth3
eth0

eth6
eth4

eth3

eth5

eth2

eth7
eth0

eth6

eth1
eth5

eth2

eth7
eth1

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Linear Mapping
vCon Mapping
# of
vCon1 vCon2 vCon3 vCon4
Adapters

1 1 1 1 1
• 4 vCon Scheme
2 1 1 2 2
• 2-step mapping 3 1 2 3 3
• 1st
is adapter to vcon
4 1 2 3 4
• 2nd is virtual interfaces to vcon

• Two-types of mapping Round-Robin Mapping


• Round-robin
• Linear Mapping (default since # of
vCon1 vCon2 vCon3 vCon4
2.1+) Adapters

1 1 1 1 1

2 1 2 1 2

3 1 2 3 3

4 1 2 3 4
BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
vCon Mapping
Linear Mapping Round-Robin Mapping

# of # of
vCon1 vCon2 vCon3 vCon4 vCon1 vCon2 vCon3 vCon4
Adapters Adapters

1 1 1 1 1 1 1 1 1 1
2 1 1 2 2 2 1 2 1 2
3 1 2 3 3 3 1 2 3 3
4 1 2 3 4 4 1 2 3 4

Desired Actual Desired Actual Desired Actual Desired Actual


vNICs vNICs
vCon vCon Order Order vCon vCon Order Order

vEth0 1 1 1 1 vEth0 1 1 1 1
vEth1 2 1 2 2 vEth1 2 2 2 1
vEth2 3 2 3 1 vEth2 3 1 3 2
vEth3 4 2 4 2 vEth3 4 2 4 2

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
NIC Teaming
Standalone mode (C series only)
• When rack servers with VIC adapters that are connected directly to the network
switches, there is no restriction on hashing or mode type with the exception of
breakout type cables with the VIC 1385, 1387, or SIOC
• VIC 1385, 1387 or SIOC in breakout mode
• The port interface (UIF) is always in a port-channel irrespective of the numbered of
active links.
• Channel-group is static (mode on), no LACP
• Not user configurable (cannot be disabled)
• Port-channel needs to be configured on the connected switch
• Static port-channel (mode on)
• 6 tuple hashing (src/dest L2-L4)
• vNIC is unware of the port-channel
• vNIC will advertise the speed to the OS based on number of active link

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
Standalone mode (C series only)

C-series or S3260 Supported Not Supported


vnic
1
Teaming Type – Switch Teaming Type – Switch
VIC1385, 1387 or SIOC Independent Dependent or LACP
(Active/Standby or
Windows
UIF-0 UIF-1 Active/Active)
Hashing Type – Hyper-V
Port
4x10GE mode
Route based on Route based on IP hash or on
Vmware
Port-channel Originating Port id or Physical NIC Load
ESXi
source mac address
10G Switch
Active/Backup (mode 1) balance-rr (mode 0)
Balance-TLB (mode 5) balance-xor (mode 2)
Linux
- Configure static port-channel on the switch Balance-ALB (mode 6) broadcast (mode 3)
and trunking 802.3ad (mode 4)
- Can’t run lacp on top of static, nic teaming
should be with switch independent

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
UCSM Mode
B- and C- series

Supported Not Supported

W 2016 Teaming Type – Switch Independent Teaming Type – Switch Dependent or LACP
i (Active/Standby or Active/Active)
n Hashing Type – Hyper-V Port
d
o 2012 Teaming Type – Switch Independent Teaming Type – Switch Dependent or LACP
w (Active/Standby or Active/Active)
Hashing Type – Hyper-V Port
s
Vmware ESXi Route based on Originating Port id or source Route based on IP hash or on Physical NIC
mac address Load
Linux Active/Backup (mode 1) balance-rr (mode 0)
Balance-TLB (mode 5) balance-xor (mode 2)
Balance-ALB (mode 6) broadcast (mode 3)
802.3ad (mode 4)

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
enic Tuneables
CPU CPU CPU CPU
Increase the q and Spread the load 0 1 2 x

• Increase queue depth


RX RX RX RX
• Increase number of tx-queues and rx-queues Q-0 Q-1 Q-2 Q-x

VIC

Default Windows Linux ESXi Windows Linux ESXi

TX Settings TX Settings
1/256 1/256 1/256 1/256 8/256 8/256 8/256
(Queue/Ring Size) (Queue/Ring Size)

RX Settings RX Settings
1/512 4/512 1/512 1/512 8/4096 8/4096 8/4096
(Queue/Ring Size) (Queue/Ring Size) *

CQ/Interrupts 2/4 5/8 2/4 2/4 CQ/Interrupts 16/32 16/18 16/18

* Enabled RSS
BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
UCSM

Configuration Considerations
• Descriptors points to a memory space
on the host
• Possible of exceeding the kernel
network memory space
• interface will not come
• Deep queues will reduce drops but will
also increases latency

CIMC

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Interrupt Coalescence at the NIC Layer
• Tune the interrupt wait time
• Min is more predictable than Idle
• Lower wait time for lower latency  will increase CPU
• Higher wait time for higher throughput

• Adaptive Interrupt Coalescing Timer (Linux only)


• configured through ethtool

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Offloads
• Default offloads are enabled with the exception of network overlay offloads
• Most cases, offloads are beneficial and should be left on
• Takes advantage of the adapter HW to do packet level specific functions
• Reduces CPU workload

TCP Segmentation TCP Large Receive


TX Checksum RX Checksum
(TSO) (LSO)

IPv4 and IPv6 IPv4 and IPv6


UCSM IPv4 and IPv6 IPv4 and IPv6
/TCP/UDP /TCP/UDP

IPv4 and IPv6 IPv4 and IPv6


CIMC IPv4 and IPv6 IPv4 and IPv6
/TCP/UDP /TCP/UDP

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Network Overlays
• Overlay technologies uses tunneling technique by inserting additional headers
• Breaks existing offload engines
Egress Outer Frame Inner Frame

VxLAN Frame
IP Checksum

Outer Frame Inner Frame


TCP/UDP
UDP only
Checksum (for both protocol)
Outer Outer IP Outer VxLAN Inner Inner
Inner Payload FCS
MAC Header UDP Header MAC IP/Protocol
TSO

NVGRE Frame
Egress Outer Frame Inner Frame
Outer Frame Inner Frame
IP Checksum
Outer Outer IP GRE Inner Inner
Inner Payload FCS
MAC Header Header MAC IP/Protocol
TCP/UDP
UDP only
Checksum (for both protocol)

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Network Overlay Offload Configuration Consideration
ESXi: VxLAN Offload+RSS
20

Aggregate BW (Gbps)
• Supported with VIC1300s only 15

• Current OS Support 10

• ESXi 5

• RHEL 7.x 0
VSM NSX VSM NSX VSM NSX VSM NSX
• SUSE SLES 12SP1 or greater 1 pair vm 2 pair vm 4 pair vm 8 pair vm

Single Flow Multi-Flow


• Increase rx-queue depth
OVS: VxLAN Offload+RSS
• Increase RSS 20

Aggregate BW (Gbps)
15

10

0
1 pair VM 2 pair VM 4 pair VM

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Hypervisor
Bare Metal Like I/O Performance
• Virtualization allows admins to
maximize investment and HW
capability
• Introduce new set of problems
• Will RSS help in this situation?

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Yes but other alternatives
• VM-Q and Netqueue for Windows Hyper-V and ESXi respectively
• Not a hypervisor bypass technology, packet forwarding still goes through the virtual
switch
• Offload the L2 sorting (L2 vlan/mac classifier) to the adapter HW
• Dedicated q-pairs to the guest machine
• CPU affinity
vm vm vm vm vm vm vm vm
1 2 3 4 1 2 3 4

Hypervisor Hypervisor

1 1 2 2 3 3 4 4
Server NIC Adapter Server NIC Adapter

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Configuration Consideration
UCSM
• Connection Policy
• Adapter policy will be
overwritten by the connection
policy
• VMQs 64 by default; valid
range 1-128
• Interrupts
• ESXi, 2 x VMQ + 2
• Hyper-V, 2 x VMQ +2, round up
to the nearest power of 2

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Configuration Consideration
CIMC
• Enable VMQ
• Configure RX-Q, TX-Q, and
CQ
• Number of VMQ == RX-Q ==
TX-Q
• CQ = RX-Q + TX-Q

• Interrupts
• ESXi, 2 x VMQ + 2
• Hyper-V, 2 x VMQ +2, round up
to the nearest power of 2

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Configuration Consideration
Hypervisor
ESXi Hyper-V
• Vmware supports 16 but • Do not use CPU 0
recommends no more than 8
• Only physical cores can be used
netqueues per port for standard or
jumbo frame • Do not span NUMA node
• Should be enabled only for MSI-X • Stay below 64 logical cores
systems
• Live migration supported
• Vmware recommends disabling
netqueue for 1G NICs

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
VMFEX Overview
Extend networking into hypervisor Extend physical network to VMs
(Cisco Nexus 1000V Switch) (Cisco UCS VM-FEX)

Cisco Nexus 1000V

Hypervisor Hypervisor
Generic Cisco
Adapter UCS VM-FEX
Server Server

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
UCS General Baseline: Dynamic vNICs Policy
Setting a Dynamic Adapter Policy
• Policies are to automatically provision dynamics on Servers
• Scale may be dependent on the number of FI to IO Module (IOM) connections
• Gen 2 Hardware (FI-62xx, IOM-22xx) -- (# IOM to FI links * 63) – 2
• Gen 3 Hardware (FI-6332, IOM-22xx or IOM-2304) -- Max available, thus VIC
dependent

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
UCS General Baseline: Building Service Profile
Adding the Dynamic Policy and Static Adapters
• 2 Statics – 1 to each UCS Fabric Interconnect
• Change dynamic vNIC connection policy to setup dynamics

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
UCS General Baseline: Building Service Profile
Static and Dynamic Adapter Policy
Windows Linux
Static vNIC SR-IOV SR-IOV
Dynamic vNIC Hyper-V KVM/OVS

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
UCS General Baseline: Building Port Profiles
Creating Folders of Network Access Attributes
• Creating Port Profiles Includes:
• VLAN(s)
• Tagging or untagged (Native)
• Class-of-Service, traffic weights and/or rates

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Bypass Technologies
RDMA/ROCE
• Remote Direct Memory Access, moving packets with minimal packet copy
through the stack == lower latency and higher throughput with minimal CPU
processing power
• Supported with Microsoft SMB 3.0 (2012, 2012R2, and 2016)
• Multi-Channel Support
• UCS VIC 13xx hardware
• Max 4 rnic per VIC adapter
• Does not support with VMQ, Netflow
or NVGRE/VxLAN

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Configuration
• Option to enable RNIC

• Number of q-pairs determine number of SMB connections that can be


established. SMB multichannel uses 2 q-pairs per SMB connection
• Windows 2012R2 requires 512 MRs per q-pair on SMB Client. On SMB server,
no MRs are used

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
RDMA Performance Numbers
Simultaneous Live Migration of 20 guest machines
• Compression (No RDMA) • RDMA
• CPU jumped up to ~45% • CPU jumped up to ~9%
• BW is around 8.7Gbps • BW is around 10Gbps
• Multichannel (5Gbps per channel)
• Capable of pushing higher BW

https://www.youtube.com/watch?v=mbrFsygj5Q0
BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
DPDK
• Open source
• Improve pps performance and throughput for smaller packet size
• Packet processing is done in user space
• NFV, Service Provider Telecommunication
• Supported on VIC12xx and VIC13xx
• Upstream PMD on dpdk.org

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
DPDK Configuration
• Dedicate a vNIC for PMD
• For every 1 TX-Q, requires 2 RX-Q
• Increase q-size, most optimal 512 fro TX-Q and 4096 for RX-Q
• RSS is supported for TCP
• 8 TX-Q / 16 RX-Q / 16 CQ

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
User Space NIC (usNIC)
• Consolidating simplified network, where latency is not the biggest factor
• Single connection for mgmt., file system, and MPI/Application traffic
• Direct access to NIC hardware from linux userspace
• Via the Linux libfabric API (ud)
• Dual functionality, kernel and user space
• Open Source and
• Open MPI
• Upstream starting with Open MPI v1.7
• Libfabric
TCP/IP usNIC
• Upstream starting with Libfabric v1.0

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
usNIC Overview App to App Latency Components

• Supported on both blades and


rack adapters
2.02 usecs
• VIC 1300 for better latency usNIC
• Especially in 40G mode Kernel Overhead
• Blade – 1340+Port-Expander with
3rd Gen FI/IOM Hardware
9.42 usecs
• Half-round trip (HRT) ping-pong TCP/IP (possible, but
unrealistic)
latencies
• Raw back-to-back: 1.57μs
• MPI back-to-back: 1.85μs 0 2 4 6 8 10

• Through MPI+N3548: 2.02μs


Latency (usecs)
Middle Ware Kernel NIC Network

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
usNIC Configuration Consideration

Host Port Switch Port


eth0 VLAN 27, MTU 1500B, Bandwidth: 100 Mbps (ssh)
eth1 VLAN 42, MTU 9000B, Bandwidth: 2Gbps (filesystem I/O)

eth2 VLAN 64, MTU 9000B, Bandwidth: Not limited (MPI)

Isolated HW Resource
MPI HPC Traffic RX/TX Queue Pairs PCIe Physical Functions
CPU
Process
Storage Traffic eth 2

SSH eth 1
Process
eth 0

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
CLI Monitoring Commands
VIC Monitoring
Standalone
• ssh to CIMC IP_address
• List adapter

C240-FCH1838V1SM /chassis # show adapter


PCI Slot Product Name Serial Number Product ID Vendor
-------- -------------- -------------- -------------- --------------------
2 UCS VIC 1385 FCH1850JD0Q UCSC-PCIE-C... Cisco Systems Inc
5 UCS VIC 1225 FCH1924J7NN UCSC-PCIE-C... Cisco Systems Inc
MLOM UCS VIC 1227 FCH1905K3TJ UCSC-MLOM-C... Cisco Systems Inc

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
vNIC list
C240-FCH1838V1SM /chassis # connect debug-shell MLOM

adapter (top):1# attach-mcp

adapter (mcp):1# vnic


vnic id : internal id of vnic, use for other vnic cmds
vnic name/mac : ucsm provisioned name (-n) or mac address (-m)

<output truncated>

-------------------------------------- --------- --------------------------


v n i c l i f v i f
id name type host state lif state uif ucsm idx vlan state
---- -------------- ------- ---- ----- --- ----- --- ----- ----- ---- -----
15 eth0 enet 0 UP 3 UP =>0 0 ce 180 UP
16 eth1 enet 0 UP 4 UP =>1 0 ce 180 UP
17 fc0 fc 0 UP 5 UP =>0 0 ce 0 UP
18 fc1 fc 0 UP 6 UP =>1 0 ce 0 UP
19 mgmtvnic0 mgmt 0 INIT

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
vNIC Stats
adapter (mcp):2# lifstats 3
DELTA TOTAL DESCRIPTION
9446100 9446100 Tx unicast frames without error
4937 4937 Tx multicast frames without error
15 15 Tx broadcast frames without error
604793382 604793382 Tx unicast bytes without error
631248 631248 Tx multicast bytes without error
960 960 Tx broadcast bytes without error
88 88 Tx TSO frames
16204 16204 Rx unicast frames without error
19 19 Rx multicast frames without error
11805870 11805870 Rx broadcast frames without error
1225105 1225105 Rx unicast bytes without error
1786 1786 Rx multicast bytes without error
987929775 987929775 Rx broadcast bytes without error
549677 549677 Rx good frames with RSS
31619 31619 Rx frames len == 64
11696649 11696649 Rx frames 64 < len <= 127
59021 59021 Rx frames 128 <= len <= 255
34468 34468 Rx frames 256 <= len <= 511
325 325 Rx frames 512 <= len <= 1023
10 10 Rx frames 1024 <= len <= 1518
1 1 Rx frames len > 1518
663.652bps Tx rate
1.024kbps Rx rate

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
Physical Port Level Stats
• Similar command for the physical port -- [0 or 1]
• Remember total statistics, aggregate count for all of the VIF pinned to that port
adapter (mcp):7# dcem-macstats 0
DELTA TOTAL DESCRIPTION
18 5329790 Tx frames len == 64
48 9855045 Tx frames 64 < len <= 127
18 358248 Tx frames 128 <= len <= 255
0 7628 Tx frames 256 <= len <= 511
1 11253 Tx frames 512 <= len <= 1023
3 4059 Tx frames 1024 <= len <= 1518
0 27706 Tx frames 1519 <= len <= 2047
88 15593729 Tx total packets
11462 1130542662 Tx bytes
88 15593729 Tx good packets
69 10453642 Tx unicast frames
18 5111308 Tx multicast frames
1 28779 Tx broadcast frames

<output truncated>

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
UCS Physical Connectivity

UCS-FI UCS-FI
62xx or 63xx 63xx (only)

Up to 8 links Up to 4 links
Ni Ni
IOM IOM
22xx or 2304 (4x10G only) 2304 (40G only)

HIF HIF

Server Adapter Server Adapter


2x10G or 4x10G True 40G Interface

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
IOM Statistics
• ssh to UCSM
• connect to IOM, where IOM == chassis id
• to connect the subordinate IOM, connect local-mgmt <subordinate_id>, connect iom_id

ucs-3GFI-B# connect iom 1


Attaching to FEX 1 ...
To exit type 'exit', to abort type '$.’
Last login: Mon Jun 26 14:12:02 from 127.15.1.250
fex-1#

<connecting to the other IOM that is on the subordinate FI>

ucs-3GFI-B# connect local-mgmt a


ucs-3GFI-A(local-mgmt)# connect iom 1

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
IOM Interface Forwarding Rates
• 22xx == woodside
• 2304 == tiburon
fex-1# show platform software tiburon rate
+--------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
| Port || Tx Packets | Tx Rate | Tx Bit || Rx Packets | Rx Rate | Rx Bit |Avg Pkt|Avg Pkt| |
| || | (pkts/s) | Rate || | (pkts/s) | Rate | (Tx) | (Rx) |Err|
+--------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
| 0-BI || 17 | 3 | 3.31Kbps || 9 | 1 | 1.80Kbps | 102 | 106 | |
| 0-CI || 42 | 8 | 13.05Kbps || 27 | 5 | 12.40Kbps | 175 | 269 | |
| 0-NI20 || 25 | 5 | 11.92Kbps || 62 | 12 | 20.00Kbps | 281 | 183 | |
| 0-HI31 || 8 | 1 | 7.84Kbps || 1 | 0 | 192.00 bps | 599 | 106 | |
| 0-HI30 || 0 | 0 | 0.00 bps || 1 | 0 | 288.00 bps | 0 | 164 | |
| 0-HI29 || 4 | 0 | 624.00 bps || 1 | 0 | 768.00 bps | 78 | 464 | |
| 0-HI28 || 0 | 0 | 0.00 bps || 1 | 0 | 288.00 bps | 0 | 164 | |
| 0-HI20 || 14 | 2 | 2.74Kbps || 1 | 0 | 760.00 bps | 104 | 464 | |
| 0-HI16 || 13 | 2 | 2.11Kbps || 1 | 0 | 976.00 bps | 82 | 596 | |
| 0-HI12 || 5 | 1 | 760.00 bps || 1 | 0 | 344.00 bps | 76 | 200 | |
| 0-HI8 || 5 | 1 | 760.00 bps || 1 | 0 | 344.00 bps | 76 | 200 | |
| 0-HI7 || 5 | 1 | 864.00 bps || 0 | 0 | 0.00 bps | 90 | 0 | |
| 0-HI5 || 1 | 0 | 160.00 bps || 1 | 0 | 552.00 bps | 81 | 332 | |
| 0-HI3 || 9 | 1 | 1.48Kbps || 1 | 0 | 552.00 bps | 83 | 332 | |
| 0-HI1 || 5 | 1 | 768.00 bps || 0 | 0 | 0.00 bps | 77 | 0 | |
| 0-HI0 || 0 | 0 | 0.00 bps || 1 | 0 | 288.00 bps | 0 | 164 | |
+--------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
legend:
' '= no-connect
X = Failed
- = Disabled
: = Dn
IOM Port Map |
.
= Up
= SFP waiting for t_start_up timer expiration
i = SFP waiting for prom info read
c = SFP waiting for t_start_up_cooled timer expiration
* = SFP present [X] = SFP validation failed
fex-1# show platform software tiburon sts ------------------------------
Board Status Overview:
* * * * * * * * <- QSFPs
+-----------------------------------------------------------------------------------------------+
- - - - - - - - : : : : : : : : : : : : | : : :
+-----------------------------------------------------------------------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23|
|I |
|N |
| Tiburon |
| Asic: 0 |
|H |
|I |
|0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4|
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7|
+-----------------------------------------------------------------------------------------------+
| | | | | | | | | : : : | : : : | : : : | : : : : : : : | | | | - - - - - - - - - - - - - - - -
3 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1
2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
\_\_/_/ \_\_/_/ r_\_r_/ r_\_r_/ r_\_r_/ r_\_r_/ \_\_/_/ \_\_/_/ (r = retimer lane configured)
blade8 blade7 blade6 blade5 blade4 blade3 blade2 blade1

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Native 40G HIF

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Adapter
Port Level and Per-vnic statistics
• Same stats output as standalone (CIMC)
• ssh to UCSM
• connect to blade adapter (connect adapter <chassis_id>/<server>/<adapter>)
• mLOM = 1
• Mezzanine = 2
ucs-3GFI-B# connect adapter 1/1/1
adapter 1/1/1 # connect
adapter 1/1/1 (top):1#
adapter 1/1/1 (mcp):1#

• connect to rack adapter (connect adapter <rack>/<adapter>


• mLOM = 1
• Mezzanine = 2
ucs-3GFI-B# connect adapter 1/1
adapter 1/1 # connect
adapter 1/1 (top):1#
adapter 1/1 (mcp):1#

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
adapter (mcp):7# dcem-macstats 0
DELTA TOTAL DESCRIPTION
18 5329790 Tx frames len == 64

Stats output 48
18
0
9855045 Tx frames 64 < len <= 127
358248 Tx frames 128 <= len <= 255
7628 Tx frames 256 <= len <= 511
1 11253 Tx frames 512 <= len <= 1023
3 4059 Tx frames 1024 <= len <= 1518

• Port level stats 0


88
27706 Tx frames 1519 <= len <= 2047
15593729 Tx total packets
11462 1130542662 Tx bytes
• dcem-macstats [0|1] 88
69
15593729 Tx good packets
10453642 Tx unicast frames
18 5111308 Tx multicast frames
C240-FCH1838V1SM
1 /chassis28779
# connect debug-shell
Tx broadcast MLOM
frames
• Per-vnic stats adapter (top):1# attach-mcp
<output truncated>
adapter (mcp):1# vnic
• vnic to list out the vnics vnic id : internal id of vnic, use for other vnic cmds
vnic name/mac : ucsm provisioned name (-n) or mac address (-m)

• lifstats <lif_no> <output truncated>


adapter (mcp):2# lifstats 3
DELTA --------- --------------------------
-------------------------------------- TOTAL DESCRIPTION
v n i c 9446100 l i f 9446100 Txv unicast
i f frames without error
id name type host state
4937 lif state uif 4937
ucsm Tx multicast
idx vlan state
frames without error
---- -------------- ------- ---- ----- --- ----- --- ----- ----- ---- -----
15 15 Tx broadcast frames without error
15 eth0 enet 0 UP 3 UP =>0 0 ce 180 UP
16 eth1 enet
604793382
0 UP 4 UP
604793382
=>1 0
Tx unicast bytes without error
ce 180 UP
17 fc0 fc 0631248
UP 5 UP =>06312480 Tx multicast
ce bytes without error
0 UP
18 fc1 fc 0 UP960 6 UP =>1 9600 Tx broadcast
ce bytes without error
0 UP
19 mgmtvnic0 mgmt 0 INIT88 88 Tx TSO frames
16204 16204 Rx unicast frames without error
19 19 Rx multicast frames without error
11805870 11805870 Rx broadcast frames without error
1225105 1225105 Rx unicast bytes without error
1786 1786 Rx multicast bytes without error
987929775 987929775 Rx broadcast bytes without error
549677 549677 Rx good frames with RSS
31619 31619 Rx frames len == 64
11696649 11696649 Rx frames 64 < len <= 127
59021 59021 Rx frames 128 <= len <= 255
34468 34468 Rx frames 256 <= len <= 511
325 325 Rx frames 512 <= len <= 1023
10 10 Rx frames 1024 <= len <= 1518
1 1 Rx frames len > 1518
663.652bps Tx rate
1.024kbps Rx rate

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Lunch & Learn
• Meet the Engineer 1:1 meetings
• Related sessions

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Cisco Spark
Questions?
Use Cisco Spark to communicate
with the speaker after the session

How
1. Find this session in the Cisco Live Mobile App
2. Click “Join the Discussion”
3. Install Spark or go directly to the space
4. Enter messages/questions in the space

cs.co/ciscolivebot#BRKINI-2005

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Please complete your Online Complete Your Online
Session Evaluations after each
session
Session Evaluation
• Complete 4 Session Evaluations
& the Overall Conference
Evaluation (available from
Thursday) to receive your Cisco
Live T-shirt
• All surveys can be completed via
the Cisco Live Mobile App or the
Communication Stations
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
www.ciscolive.com/global/on-demand-library/.

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Tech Circle
• Meet the Engineer 1:1 meetings
• Related sessions

BRKINI-2005 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Thank you

You might also like