Professional Documents
Culture Documents
BRKCOM-3002 - UCS Performance Troubleshooting (2015 Melbourne) - 90 Mins - OK
BRKCOM-3002 - UCS Performance Troubleshooting (2015 Melbourne) - 90 Mins - OK
BRKCOM-3002 - UCS Performance Troubleshooting (2015 Melbourne) - 90 Mins - OK
Cisco Public
UCS Performance Troubleshooting
BRKCOM-3002
• Ask Questions
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Agenda
• Infrastructure Path Tracing
• LAN Performance
• SAN Performance
• BIOS Settings
Presentation_ID © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Troubleshooting Methodology
Methodology
• Troubleshooting is an Art
• Pre/Post Production baselines are essential
• Document all changes
• Keep network diagrams up to date
• Leverage tools you have access to - Free or Paid
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Troubleshooting Process
• Define the problem – what is vs. what isn’t (Build the Picture)
– Identify components Involved, note FW/Driver versions
– Identify/Isolate traffic path (as granular as you can)
– Reference/create a network diagram
• Single change at a time
– Assess the impact of one tuning
– Don’t use a shotgun approach
– Try to keep tests consistent (ToD, OS Versions, Test Parameters etc)
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
What affects Performance?
Rx Queues UIF B2B Credits Firmware Driver
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Which path will UCS choose?
LAN
FEX/Interconnect A
FEX/Interconnect B
Ingress
Egress
?
UCS Blade
Chassis
CNA
Eth0 Eth1
MAC A MAC B
OS NIC Teaming
OS
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
UCS Frame Flow Decisions
Egress
FEX/Interconnect B
(many choices)
Fabric
Which Port
Fabric Port?Pinning/Port
(4 choices)
Channeling Algorithm
UCS Blade
Chassis
CNA
OS NIC Teaming
OS
OS Routing Table or
Which PCIe Ethernet Interface?
OS NIC
(1-114choices Teaming
depending on SW/HW)
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
UCS Frame Flow Decisions
Ingress
LAN
(Upstream Switch
Which downlink Decides)
or port channel?
Allow
FEX/Interconnect A
FEX/Interconnect B
Déjà the
vu,frame
RPF, inbound?
border port
(decision depends on ‘switch
pinning
mode’ vs. ‘end host mode’)
Fabric
Which PortExtender
Fabric Pinning/Port
Port?
Channeling Algorithm
WhichVNTag
Server Bay Port?
+ Offset
UCS Blade
(8 or more choices)
Chassis
CNA
Eth0 Eth1
Which PCIe Device (vNIC)?
(1-114VNTag
choicesIdentifier
MAC A MAC B
Pass
Dest.frame
MACtoand
OS?Ethertype
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 binding
Infrastructure Path Tracing
System Components – Hop by Hop
UCSM NXOS
Trunk Interface
Fabric Interface
NIF (Network)
HIF (Host)
UIF (Uplink)
VIF (Virtual)
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Defining the Ports
• FI Uplink/Trunk Port
– The Fabric Interconnect defines Uplink ports
as those ports connecting to the LAN
– Always in trunk mode (no such thing as mode
access configuration)
– VLAN 1 is default (native) & can be changed
– Port-channel configuration allowed (LACP
only)
– There is currently no vPC or Fabric Path
feature in the FI
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Defining the Ports
• Fabric Interconnect FEX-Fabric aka Server
Interfaces (SIF)
– The Fabric Interconnect (FI) defines fex-fabric
ports as those ports connecting to the IOMs in
the chassis
– IOM Host Interfaces (HIFs) ports are pinned
to FEX-fabric ports (SIF)
– Same concept Nexus FEXs use with Satellite
ports.
Note: The term “FEX” and “IOM” are commonly
used interchangeably
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Defining the Ports
• IOM Network Interfaces (NIF)
– The IOM defines these ports which are
external connecting the IOM to the FI.
– NIF port are either configured as
individual or channeled to the FI’s as
server ports (SIF) – depends on model
of IOM.
– Same concept Nexus FEXs use with
Satellite ports.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Defining the Ports
• IOM Host Interfaces (HIFs)
– Each IOM provides a number of internal ports
per blade
– IOM model 2104XP provides 8x internal ports
(one for each blade)
– IOM model 2204XP provides 16x internal
ports (two for each blade)
– IOM model 2208XP provides 32x internal
ports (four for each blade)
– Each HIF is defined by three different values,
EthX/Y/Z. <chassis-id>/<io-module-id>/<Host-
Interface-Port-Id> == <chassis-id>/1/<Host-
Interface-Port-Id>
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Defining the Ports
• Adapter Uplink Interface (UIFs)
– Each Adapter has 2 physical
connections, one to each Fabric
– References as 0 and 1
– These are also known as the Data
Center Ethernet (DCE) Interfaces
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
Defining the Ports
• Virtual Interface (VIF)
– Defined as Ethernet (veth) or Fibre Channel
(vfc)
– A vNIC with Fabric Failover enabled will have
two VIFs assigned (Primary & Backup)
– Represent the vNIC or vHBA on the compute
blade towards OS
– Pinned automatically or manually (pin groups)
to border port or FC uplink ports
– veth and vfc numbers are dynamically assigned
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Defining the Ports
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
System Components – ASICs (Gen 1, Gen 2 and 3)
SAN LAN MGMT SAN • Fabric ASIC : Altos/Sunnyvale
P P S S P P
Fabric Fabric
F F
Switch Switch
P P P P P P
• Port ASIC : Gatos/Carmel
Compute Chassis
Compute Chassis
Fabric Fabric
ComputeL Chassis
I C C I L
Extender Extender
x8 x8 x8 x8
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Why do I care about ASIC names?
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Narrowing Down the Problem
• Define the problem
– From which point to what other point is the problem?
– Do we see the problem in one direction or both?
• Eliminate variables
– Is the problem seen between traffic traversing the same fabric?
– Is the problem only happening on a specific path?
• List all the ports in the traffic path
– VIFs, FEX, HIFs, NIFs, Fabric and Uplink ports
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Trace Example
• Let’s trace the path for the Eth0 (00:25:b5:44:00:3b )
first vNIC (eth0) on blade 1/6
• First, what’s the VIF # ? Blade 1/6
0 1
FEX 1 FEX 2
Fabric A Fabric B
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
VIF Pinning – Service Profile View
• UCSM top level : show service-profile circuit server <chassis#>/<slot#>
UCS-A# show service-profile circuit server 1/6
Service Profile: roberbur/Perf-Test-3
Server: 1/6
Fabric ID: A
VIF vNIC Link State Oper State Prot State Prot Role Admin Pin Oper Pin Transport
---------- --------------- ----------- ---------- ------------- ----------- ---------- ---------- ---------
9178 Up Active No Protection Unprotected 0/0 0/0 Ether
986 fc0 Up Active No Protection Unprotected 0/0 0/0 Fc
988 eth1 Up Active Passive Backup 0/0 1/7 Ether
990 eth3 Up Active Passive Backup 0/0 1/7 Ether
991 eth0 Up Active Active Primary 0/0 1/7 Ether
993 eth2 Up Active Active Primary 0/0 1/7 Ether
Fabric ID: B
<snip>
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
VIF Pinning – GUI vs CLI
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Trace Example
• We know eth0 is eth0 VIF 991
Fabric A Fabric B
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
IOM Internal Port Information – 2200XP
• connect iom <chassis #>
• show platform software woodside sts
FEX Ports
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Trace Example
• VIF 991 is using eth0 VIF 991
IOM 1 IOM 2
2204 2204
Fabric A Fabric B
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
FEX to Fabric Port Pinning (2204XP)
1-2 IOM Fabric
3-4 1 Interconnect
5-6
2
7-8 NIF SIF
3
9-10
Blade 6 11-12 4
13-14
15-16
IOM 1 IOM 2
2204 2204
NIF 2
Fabric A Fabric B
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
IOM Port Information
• Connect nxos : show fex <chassis#> detail
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Trace Example
• VIF 991 is using eth0 VIF 991
IOM 1 IOM 2
2204 2204
NIF 2
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
VIF Pinning – Fabric Interconnect View
• Connect nxos : show pinning border-interface active
UCS-A(nxos)# show pinning border-interfaces active
--------------------+---------+----------------------------------------
Border Interface Status SIFs
--------------------+---------+----------------------------------------
Eth1/7 Active Veth988 Veth990 Veth991 Veth993
Eth1/8 Active Veth963 Veth974 Eth1/1/3 Eth2/1/7
Total Interfaces : 2
IOM 1 IOM 2
2204 2204
NIF 2
BRKCOM-3002
Vif 991
© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Narrowing Down the Problem
• Define the problem
– From which point to what other point is the problem?
– Do we see the problem in one direction or both?
• Eliminate variables
– Is the problem seen between traffic traversing the same fabric?
– Is the problem only happening on a specific fabric path?
• List all the ports in the traffic path
– VIFs, FEX, HIFs, NIFs, Fabric and Uplink ports
Blade 1/6 HIF: 11
vNIC: eth0 NIF: 2
VIF: 991
DCE: 0
SIF: Eth 1/12
Uplink: Eth 1/7 .
FEX: 1/1/11
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
LAN Performance
Baseline Performance
46
Performance 101
Throughput
• In data transmission, throughput is the amount of data transferred successfully
over a link from one end to another in a given period of time. It is usually
expressed in a magnitude of bits per second (Gbps/Mbps).
• Refers to how fast a device is actually sending data over the communication
channel
• Also known as “Consumed Bandwidth”
Bandwidth
Refers to how fast a device can send data over a single communication channel
Also known as “Maximum Throughput”
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Performance Tools – Free vs. Paid
No Charge/Free Paid
Iperf Ttcp IxChariot
Jperf Netcps Spirent
Netperf Qcheck Agileload
Ntttcp Ostinato etc.
Nettcp etc
Note: All variations of ttcp/iperf report payload or user data rates, i.e. no
overhead bytes from headers (TCP, UDP, IP, etc.) are included in the reported
data rates. When comparing to "line" rates or "peak" rates, it is important to
consider all of this overhead.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Tools Compared
Tool Type Platform Protocols
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
Simple Test
• Running iperf on two blades, different Chassis
• This will test max TCP throughput between the two nodes
• Reporting Interval every 10s for 300s duration
• Uses the default windows size
• Uses the default port of 5001
• Prints the max MTU (less headers)
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
IPERF Test Results
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Baseline Testing
• Repeat tests at min. 3 times
• Test both directions Sender Receiver
• Try different size MTU ie. Jumbo frames if using iSCSI / IP Storage.
• Ensure test duration is >3mins. Allows for TCP windowing adjustments .
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
Monitoring Performance
54
Looking for Congestion
UCS-A(nxos)# show interface ethernet 1/1/11 priority-flow-control
============================================================
Port Mode Oper(VL bmap) RxPPP TxPPP
============================================================
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
QoS Considerations
• CoS/QoS within UCS is simple to configure
• Needs to be configured End-to-End
• Can do more harm than good if configured incorrectly
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
QoS Queing GUI vs. CLI
• Connect nxos - > show queuing interface eth x/y
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
QoS – Misconfiguration
show queuing interface ethernet 1/5
Ethernet1/5 queuing information:
TX Queuing
qos-group sched-type oper-bandwidth
0 WRR 50
1 WRR 50
RX Queuing
qos-group 0
q-size: 360960, HW MTU: 9216 (9216 configured)
drop-type: drop, xon: 0, xoff: 360960
Statistics:
Pkts received over the port : 0
Ucast pkts sent to the cross-bar : 0
Mcast pkts sent to the cross-bar : 0
Ucast pkts received from the cross-bar : 0
Pkts sent to the port : 0
Pkts discarded on ingress : 0
Per-priority-pause status : Rx (Inactive), Tx (Inactive)
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
QoS – Misconfigured
show queuing interface ethernet 1/5 – cont’d
qos-group 1
q-size: 79360, HW MTU: 2158 (2158 configured)
drop-type: no-drop, xon: 20480, xoff: 40320
Statistics:
Pkts received over the port : 809739
Ucast pkts sent to the cross-bar : 743529
Mcast pkts sent to the cross-bar : 0
Ucast pkts received from the cross-bar : 67599
Pkts sent to the port : 67599
Pkts discarded on ingress : 66210
Per-priority-pause status : Rx (Inactive), Tx (Inactive)
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
Adapter Commands (VIC)
60
Adapter Specific Commands
• Based on the Adapter used, there are various commands we can leverage.
• Cisco VIC allows to attach to the Master Control Program (MCP) to view
verbose enic stats & counters, or Fabric Layer Services (FLS) to view fnic (FC)
stats & counters. We will focus on the VIC command sets.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
VIF Details
• Connect adapter x/y/z (Chassis, Blade, Adapter)
UCS-A# connect adapter 1/6/1
adapter 1/6/1 # connect
adapter 1/6/1 (top):1# attach-mcp Indicates which Fabric Failover
adapter 1/6/1 (mcp):1# vnic
enabled interface is active
<snip>
---------------------------------------- --------- --------------------------
v n i c l i f v i f
id name type bb:dd.f state lif state uif ucsm idx vlan state
--- -------------- ------- ------- ----- --- ----- --- ----- ----- ---- -----
13 vnic_1 enet 06:00.0 UP 2 UP =>0 991 91 1 UP
- 1 992 84 1 UP
14 vnic_2 enet 07:00.0 UP 3 UP - 0 987 92 1 UP
=>1 988 85 1 UP
15 vnic_3 enet 08:00.0 UP 4 UP =>0 993 93 1 UP
- 1 994 86 1 UP
16 vnic_4 fc 0a:00.0 UP 5 UP =>1 985 87 200 UP
17 vnic_5 fc 0b:00.0 UP 6 UP =>0 986 94 100 UP
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
VIF Details
• Connect adapter x/y/z (Chassis, Blade, Adapter)
UCS-A# connect adapter 1/6/1
adapter 1/6/1 # connect
adapter 1/6/1 (top):1# attach-mcp
adapter 1/6/1 (mcp):1# vif
------- ----- ----- ---- --------------- -----
vif
lif.uif index pri hash state flags
------- ----- ----- ---- --------------- -----
2.0 91 0 91 UP NIV, CREATED, VIFHASH, VUP, VIFINFO, DCXUP
2.1 84 0 84 UP NIV, CREATED, VIFHASH, VUP, STANDBY, VIFINFO, DCXUP
3.0 92 0 92 UP NIV, CREATED, VIFHASH, VUP, STANDBY, VIFINFO, DCXUP
3.1 85 0 85 UP NIV, CREATED, VIFHASH, VUP, VIFINFO, DCXUP
4.0 93 0 93 UP NIV, CREATED, VIFHASH, VUP, VIFINFO, DCXUP
4.1 86 0 86 UP NIV, CREATED, VIFHASH, VUP, STANDBY, VIFINFO, DCXUP
5.0 94 0 94 UP NIV, CREATED, VIFHASH, VUP, STANDBY, VIFINFO, DCXUP
5.1 87 0 87 UP NIV, CREATED, VIFHASH, VUP, VIFINFO, DCXUP
6.1 88 0 88 UP NIV, CREATED, VIFHASH, VUP, VIFINFO
7.0 95 0 95 UP NIV, CREATED, VIFHASH, VUP, VIFINFO
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
LIF Stats
adapter 1/6/1 (mcp):1# lifstats [LIF#]
DELTA TOTAL DESCRIPTION
0 7552141 Tx unicast frames without error
0 51 Tx multicast frames without error
0 95328 Tx broadcast frames without error
0 11487518549 Tx unicast bytes without error
0 10359 Tx multicast bytes without error
0 6482868 Tx broadcast bytes without error
0 7545073 Tx TSO frames
0 514067 Rx unicast frames without error
2 12257348 Rx multicast frames without error
11 45826617 Rx broadcast frames without error
0 38526561 Rx unicast bytes without error
284 1224613817 Rx multicast bytes without error
3116 10944716423 Rx broadcast bytes without error
1 579234 Rx frames len == 64
5 37635413 Rx frames 64 < len <= 127
1 1546077 Rx frames 128 <= len <= 255
3 8981081 Rx frames 256 <= len <= 511
3 9856176 Rx frames 512 <= len <= 1023
0 51 Rx frames 1024 <= len <= 1518
BRKCOM-3002
35.690kbps
© 2015 Cisco and/or its affiliates. All rights reserved.
Rx rate 64
Cisco Public
DCE (UIF) Stats
adapter 1/6/1 (mcp):1# dcem-macstats [UIF#]
TOTAL DESCRIPTION
1061 Tx frames len == 64 42954 Rx Frames 64 < len <= 127
168 Tx frames 64 < len <= 127 2644 Rx Frames 128 <= len <= 255
5647 Tx frames 128 <= len <= 255 85018 Rx Frames 256 <= len <= 511
6 Tx frames 256 <= len <= 511 16 Rx Frames 512 <= len <= 1023
16 Tx frames 512 <= len <= 1023 1 Rx Frames 1024 <= len <= 1518
8 Tx frames 1024 <= len <= 1518 1 Rx Frames 1519 <= len <= 2047
6906 Tx total packets 130634 Rx total received packets
1143159 Tx bytes 32292176 Rx bytes
6906 Tx good packets 130634 Rx good packets
1445 Tx unicast frames 1485 Rx unicast frames
5423 Tx multicast frames 27672 Rx multicast frames
38 Tx broadcast frames 101477 Rx broadcast frames
1143159 Rx bytes for good packets
114.638bps Tx Rate
3.238kbps Rx Rate
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
IO Module Commands
66
IOM Commands
• Two different methods to pull IOM counters.
• Option 1:
UCS-A# connect iom 1
Attaching to FEX 1 ...
To exit type 'exit', to abort type '$.'
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
Monitoring IOM Interface Drops
• connect iom <chassis#>
• show platform software [redwood][woodside] drops 0 <HIF# | NIF#>
fex-1# show platform software woodside drops 0 ni3
fex-1# show plat soft woodside drops 0 HI3
WOO_BI_CNT_RX_FWD_DROP [40204]: 0
WOO_HI_CT_CNT_MUX_TX_FLUSHED [f1648]: 1 HI7
WOO_HI_CT_CNT_MUX_TX_FLUSHED [271648]: 2 HI31
fex-1# show plat soft woodside drops 0 NI1
WOO_BI_CNT_RX_FWD_DROP [40204]: 0
WOO_HI_CT_CNT_MUX_TX_FLUSHED
WOO_HI_CT_CNT_MUX_TX_FLUSHED
[f1648]: 1 HI7
[271648]: 2 HI31 .
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
SAN Performance
SAN Performance
• Most SAN related issues are due to array limitations than the host side
• Default Queues are set according to OS vendor recommendations
• Rx/Tx Queues can be adjusted but not recommended unless application or
storage array vendor recommended
• Mismatched COS/MTU negatively impact iSCSI performance
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
What to look for - FC
• Are seeing the issue with only certain hosts?
• If so, are there any commonalities between these hosts?
– Adapter model
– Driver & Firmware Versions
– Chassis ID
– FC uplink Pinning
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
What to look for - FC
• Verify Pinning
UCS-A(nxos)# show npv flogi-table
--------------------------------------------------------------------------------
SERVER EXTERNAL
INTERFACE VSAN FCID PORT NAME NODE NAME INTERFACE
--------------------------------------------------------------------------------
vfc924 100 0x6c000d 20:00:00:25:b5:b2:26:1d 20:00:00:25:b5:a2:26:1f fc1/29
vfc986 100 0x6c000f 20:00:00:25:b5:b2:37:2e 20:00:00:25:b5:a2:37:0f fc1/29
vfc1191 100 0x6c000e 20:00:00:25:b5:b2:37:0f 20:00:00:25:b5:a2:37:2f fc1/29
vfc1305 100 0x6c0002 20:00:00:25:b5:b2:33:0f 20:00:00:25:b5:a2:33:2f fc1/29
vfc1423 100 0x6c0011 20:00:00:25:b5:02:22:0e 20:00:00:25:b5:02:22:0c fc1/29
vfc1450 100 0x6c01c5 20:00:00:25:b5:b2:33:07 20:00:00:25:b5:a2:33:0b fc1/29
vfc1458 100 0x6c0421 20:00:00:25:b5:b2:33:16 20:00:00:25:b5:a2:33:1b fc1/30
vfc1464 100 0x6c0422 20:00:00:25:b5:b2:33:25 20:00:00:25:b5:a2:33:2a fc1/30
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
What to look for - FC
• B2B Credit depletion/exhaustion
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
What to look for - FC
• Counters: Drop, Discards, Errors (CRC)
UCS-A(nxos)# show int fc1/29 counters
Fc1/29
1 minute input rate 88 bits/sec, 11 bytes/sec, 0 frames/sec
1 minute output rate 88 bits/sec, 11 bytes/sec, 0 frames/sec
401580 frames input, 22505468 bytes
0 discards, 0 errors, 0 CRC
0 unknown class, 0 too long, 0 too short
401611 frames output, 22513040 bytes
0 discards, 0 errors
0 input OLS, 1 LRR, 0 NOS, 0 loop inits
1 output OLS, 1 LRR, 0 NOS, 0 loop inits
0 link failures, 0 sync losses, 0 signal losses
0 BB credit transitions from zero
16 receive B2B credit remaining
250 transmit B2B credit remaining
0 low priority transmit B2B credit remaining
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
What to look for - FC
• Transceiver Info
UCS-A(nxos)# show int fc1/29 transceiver detail
fc1/29 sfp is present
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 78
What to look for - iSCSI
• iSCSI Latency
UCS-A# connect adapter 1/1/2
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
SAN Performance Tools – Free vs Paid
No Charge/Free Paid
Dd Solarwinds
iometer Spirent
SQLio SAN Vendor Tools
copy/cp
etc.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Simple Test – dd on Linux
• ‘dd’
– Widely available
– Highly customisable
Example: ‘Input File’ ‘Output File’ ‘Block Size’ ‘Sync Data Before Exit’
[root@localhost ~]# dd if=/dev/zero of=/root/file.big bs=1M count=1000 conv=fdatasync
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.830429 s, 1.3 GB/s
• Other Usage:
if=/dev/urandom Random Data .
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
BIOS Settings & Performance Impact
82
BIOS Settings
• Each generation of processor will add new chipset features
• BIOS tokens are added to manage BIOS settings from UCSM (BIOS Policy)
• Many times it’s a decision between performance and power efficiencies. Many
settings are default for balanced power saving.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 83
Intel SpeedStep / SpeedBoost
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 84
Processor C3 and C6 States
• These are two states or levels of halt & sleep the processor can enter into when
not busy.
• Used to improve power efficiency
• Drawback is there is added overhead when processors “Wake up” and exit
these states.
• C states range from 0 – 6.
– 0 is a fully powered CPU
– 1 is the halt state. The CPU is not currently executing instructions.
– 3 is deep sleep. All internal clocks are stopped
– 6 is deep power down. Reduces internal voltage
• C states are transitional.
• For max performance, these states can be disabled.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
Hyperthreading
• Enables additional parallelisation of processing by allowing two processes to
leverage the same resource
• Useful to applications that can take advantage of multi-threaded instructions
• Requires Operating System (OS) support.
• If your OS has not been optimized for Hyperthreading, it should be disabled.
• Recommendation to run baseline test against your applications with HT enabled
& disabled to gauge impact.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 86
Memory Performance
• Certain Memory can run at dual voltage (Eg: DDR3 -> 1.35V or 1.5V)
• BIOS setting for Power Saving or Performance set via BIOS policy
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 87
Non Uniform Memory Access (NUMA)
• Addresses the latest server chipset designs
• Allows the system to access memory belonging to the other CPUs but adds a
“cost” to doing do, minimizing this action when necessary.
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Recap
90
What have we learned
• Understanding of the various hops & interfaces within the UCS
• How to trace the exact path for VIF through FI uplink egress
Networking Performance on RHEL with Cisco UCS 1240 & 1280 Virtual Interface Card (VIC):
http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns944/whitepaper_C11-720526.html
Storage Performance on RHEL with Cisco UCS 1240 & 1280 Virtual Interface Card (VIC):
http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns944/whitepaper_C11-721280.html
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Thank you.
Appendix
97
IOM to Fabric Interface Pinning (2104XP)
Server slots pinned to uplink
slot 1
slot 2 F 1 link
slot 3
Fabric Interconnect Uplink: slots 1,2,3,4,5,6,7,8
slot 4
slot 5 E NIF
slot 6
slot 7
slot 8
X
slot 1
slot 2
slot 3
F 2 links Uplink 1: slots 1,3,5,7
slot 4
slot 5 E Fabric Interconnect Uplink 2: slots 2,4,6,8
slot 6
slot 7 X NIF
slot 8
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 98
IOM HIF to NIF Pinning (2204XP)
14-15 IOM Fabric
12-13
Interconnect
10-11
8-9
NIF SIF
HIF
1
6-7
4-5
2-3
0-1
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 99
IOM HIF to NIF Pinning (2204XP)
14-15 IOM Fabric
12-13
Interconnect
10-11
1
8-9
HIF NIF SIF
6-7
2
4-5
2-3
0-1
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 100
IOM HIF to NIF Pinning (2204XP)
14-15 IOM Fabric
12-13
Interconnect
1
10-11
2
HIF 8-9 NIF SIF
3
6-7
4-5 4
2-3
0-1
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 101
IOM HIF to NIF Pinning (2208XP)
28-31 IOM Fabric
27-24
Interconnect
20-23
16-19
NIF SIF
HIF
1
12-15
8-11
4-7
0-3
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 102
IOM HIF to NIF Pinning (2208XP)
28-31 IOM Fabric
27-24
Interconnect
20-23
1
16-19
HIF NIF SIF
12-15
2
8-11
4-7
0-3
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 103
IOM HIF to NIF Pinning (2208XP)
28-31 IOM Fabric
27-24
Interconnect
1
20-23
2
HIF 16-19 NIF SIF
3
12-15
8-11 4
4-7
0-3
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 104
IOM HIF to NIF Pinning (2208XP)
IOM 1
28-31 Fabric
27-24 2 Interconnect
20-23 3
16-19 4
HIF NIF SIF
12-15 5
8-11 6
4-7 7
0-3 8
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 105
Port Channel Hashing (6200+2200)
Port-Channel Hash IP
• L2 DA, L2 SA, L3 DA, L3 SA,VLAN
FCoE
• L2 SA, L2 DA, FC SID ,FC DID
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 106
Appendix – Case Study
107
Storage Best Practice Tips - FC
• Maintain Separate A/B Fabrics
• Do zoning by WWPN. Most exact & secure
• Ensure WWPNs use pools, and they are not overlapping
• Make use of the customisable WWN format
AA:BB:CC:CC:CC:DD:DD:DD
– Section labelled with C is the OUI (00:25;b5 = Cisco)
– Section labelled with D is safely configurable allowing for > 1M addresses
Example: 20:00:00:25:b5:11:A0:01
1 = Datacenter #
1 = Rack #
A = Fabric A
000 = Host #
• Don’t modify HBA adapter policies (unless instructed by OS Vendor)
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Storage Best Practice Tips - iSCSI
• iSCSI Boot – Must use the Native VLAN for Boot. Use dedicated [overlay] vNICs
if possible for iSCSI boot NICs
• Disable Fabric Failover on iSCSI overlay vNICs unless host does not support an
mpath driver
• Ensure proper a QoS class for iSCSI traffic with correct CoS value and MTU
• Match vNIC adaptor policy with bare metal OS
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Case Study #1
• Symptom: ESX host experiencing high avg disk wait times on SAN LUNs.
• Now where do we start?
• Show Tech from each device – snapshot the environment
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 110
Case Study #1 – High Avg Disk Wait times
• Define the scope of the problem – what is vs. what isn’t
New or existing problem?
Just starting happening lately, was previously working fine
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 112
Check IOM Counters
fex-1# show platform software redwood sts
Board Status Overview:
+---+----+----+----+
| | | [$]| [$]|
+---+----+----+----+
: : | |
+-+----+----+----+-+
| 0 1 2 3 |
| I I I I |
| N N N N |
| |
| ASIC 0 |
Blade 3 = HIF 5
| |
| H H H H H H H H |
| I I I I I I I I |
| 0 1 2 3 4 5 6 7 |
+-+-+-+-+-+-+-+-+--+
| | | | | | : |
+-+-+-+-+-+-+-+-+
|v|v|v|v|v|v|v|v|
+-+-+-+-+-+-+-+-+
Blade:
BRKCOM-3002 © 82015
7 6Cisco
5 4and/or
3 2 its affiliates. All rights reserved. Cisco Public 113
Check IOM Counters – cont’d
fex-1# # show platform software redwood rate
+-------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
| Port || Tx Packets | Tx Rate | Tx Bit || Rx Packets | Rx Rate | Rx Bit |Avg Pkt|Avg Pkt| |
| || | (pkts/s) | Rate || | (pkts/s) | Rate | (Tx) | (Rx) |Err|
+-------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
| 0-NI3 || 85940 | 17188 | 275.14Mbps || 56961 | 11392 | 162.40Mbps | 2001 | 1781 | |
| 0-NI2 || 22204 | 4440 | 67.53Mbps || 5643 | 1128 | 4.07Mbps | 1901 | 451 | |
| 0-HI7 || 10 | 2 | 1.86Kbps || 1 | 0 | 240.00 bps | 116 | 150 | |
| 0-HI5 || 41028 | 8205 | 125.15Mbps || 68248 | 13649 | 224.56Mbps | 1906 | 2056 | |
| 0-HI4 || 1236 | 247 | 275.59Kbps || 8886 | 1777 | 28.39Mbps | 139 | 1997 | |
| 0-HI3 || 9999 | 1999 | 24.25Mbps || 10796 | 2159 | 31.09Mbps | 1516 | 1800 | |
| 0-HI2 || 3318 | 663 | 3.38Mbps || 13247 | 2649 | 41.58Mbps | 637 | 1961 | |
| 0-HI1 || 4030 | 806 | 4.15Mbps || 13874 | 2774 | 42.34Mbps | 643 | 1907 | |
| 0-HI0 || 1978 | 395 | 550.84Kbps || 3780 | 756 | 9.22Mbps | 174 | 1524 | |
| 0-BI || 29 | 5 | 4.58Kbps || 21 | 4 | 9.42Kbps | 98 | 280 | |
| 0-CI || 13 | 2 | 5.71Kbps || 12 | 2 | 8.26Kbps | 274 | 430 | |
+-------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 114
Check IOM Counters – cont’d
fex-1# # show platform software redwood rmon 0 hif5
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 115
Let’s get the VIF Information
UCS-A# show service-profile circuit server 1/3
Fabric ID: A
VIF vNIC Link State Overall Status Prot State Prot Role Admin Pin Oper Pin Transport
---------- --------------- ----------- -------------- ------------- ----------- ---------- ---------- ---------
1370 vnic0 Up Active No Protection Unprotected 0/0 0/0 Ether
1372 vhba0 Up Active No Protection Unprotected 0/0 2/1 Fc
45 Up Active 0/0 0/0 Unknown
9564 Up Active No Protection Unprotected 0/0 0/0 Ether
Fabric ID: B
VIF vNIC Link State Overall Status Prot State Prot Role Admin Pin Oper Pin Transport
---------- --------------- ----------- -------------- ------------- ----------- ---------- ---------- ---------
1371 vnic1 Up Active No Protection Unprotected 0/0 0/0 Ether
1373 vhba1 Up Active No Protection Unprotected 0/0 2/8 Fc
46 Up Active 0/0 0/0 Unknown
9565 Up Active No Protection Unprotected 0/0 0/0 Ether
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 116
Verify Pinning
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 117
Check out VIF Counters
UCS-A(nxos)# sh int vfc1372
vfc1372 is trunking
Bound interface is Ethernet1/1/3
Port description is server 1/3, VHBA vhba0
Hardware is Virtual Fibre Channel
Port WWN is 25:5b:00:0d:ec:f4:e8:7f
Admin port mode is F, trunk mode is on
snmp link state traps are enabled
Port mode is TF
Port vsan is 3
Trunk vsans (admin allowed and active) (3)
Trunk vsans (up) (3)
Trunk vans (isolated) ()
Trunk vsans (initializing) ()
1 minute input rate 141314960 bits/sec, 17664370 bytes/sec, 8589 frames/sec
1 minute output rate 255304552 bits/sec, 31913069 bytes/sec, 15633 frames/sec
2138130329 frames input, 4373806983358 bytes
557 discards, 0 errors
4990800688 frames output, 10220173553430 bytes
332 discards, 0 errors
last clearing of "show interface" counters never
Interface last changed at Mon Nov 10 14:16:28 2012
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 118
Check out VIF Counters
UCS-A(nxos)# show interface fc2/1 0 CRC, 0 unknown class
fc2/1 is up 0 too long, 0 too short
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN) 2804330715 frames output, 5503741450656 bytes
Port WWN is 20:41:00:0d:ec:f4:e8:40 0 discards, 1742 errors
Admin port mode is NP, trunk mode is off 0 input OLS, 0 LRR, 0 NOS, 0 loop inits
snmp link state traps are enabled Counter 0 output OLS, 0 LRR, 0 NOS, 0 loop inits
Port mode is NP Incrementing 2012
last clearing of "show interface" counters Mon Nov 5 19:45:09
Port vsan is 3
Speed is 4 Gbps
16 receive B2B credit remaining
Transmit B2B Credit is 250
250 transmit B2B credit remaining
Receive B2B Credit is 16
0 low priority transmit B2B credit remaining
Receive data field Size is 2112
Interface last changed at Mon Nov 5 14:16:18 2012
Beacon is turned off
1 minute input rate 240137264 bits/sec, 30017158 bytes/sec, 15780
frames/sec
1 minute output rate 199311496 bits/sec, 24913937 bytes/sec,
12756 frames/sec
5106467544 frames input, 9978925374164 bytes
5 discards, 0 errors
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 119
Let’s check the MDS side
MDS-1# show interface fc12/3
fc12/3 is up
Port description is <multiple>
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN)
Port WWN is 22:c3:00:0d:ec:6d:42:40
Admin port mode is auto, trunk mode is on
snmp link state traps are enabled
Port mode is F, FCID is 0x450076
Port vsan is 3
Speed is 4 Gbps
Rate mode is dedicated
Transmit B2B Credit is 16
Receive B2B Credit is 250
Receive data field Size is 2112
Beacon is turned off
5 minutes input rate 170053656 bits/sec, 21256707 bytes/sec, 10945 frames/sec
5 minutes output rate 358776792 bits/sec, 44847099 bytes/sec, 22802 frames/sec
324684133152 frames input, 601241450563276 bytes
0 discards, 152040 errors
76020 CRC, 0 unknown class Counter
0 too long, 0 too short Incrementing
<snip>
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 120
Conclusion
• After finding the CRC errors on the FC interface which had the affected VFC
pinned, we shut down the FC uplink fc2/1, which cause the VFC to re-pin to a
remaining uplink.
• Further troubleshooting found the uplink traced to a faulty transceiver on the
MDS. Once the link was relocated, the performance issue went away.
UCS-A(nxos)# show npv flogi-table
--------------------------------------------------------------------------------
SERVER EXTERNAL
INTERFACE VSAN FCID PORT NAME NODE NAME INTERFACE
--------------------------------------------------------------------------------
vfc1218 3 0x4500a0 20:00:00:25:b5:00:00:df 20:00:00:25:b5:01:00:df fc2/2
vfc1293 3 0x450092 20:00:00:25:b5:00:01:5f 20:00:00:25:b5:01:01:0f fc2/2
vfc1372 3 0x4500ac 20:00:00:25:b5:00:00:bf 20:00:00:25:b5:01:00:ef fc2/2
fc2/1
vfc1376 3 0x4500ad 20:00:00:25:b5:00:00:9e 20:00:00:25:b5:01:00:cf fc2/2
BRKCOM-3002 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 121