Professional Documents
Culture Documents
Brocade Product Training: Diagnostics and Troubleshooting
Brocade Product Training: Diagnostics and Troubleshooting
Brocade Product Training: Diagnostics and Troubleshooting
Product Training
Diagnostics and Troubleshooting
Describe troubleshooting processes and best practices Gather, Analyze & Solve Predictive SAN problems Identify and categorize problems and associated commands that fall into the following categories: Timeout/ Sluggishness; Segmented Fabric; Port Configuration Issue; Missing device; Marginal link Discuss Traditional and SAN troubleshooting approaches Describe switchshow & errshow Discuss supportshow & some associated commands Introduce hardware diagnostics
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 2
P o w e r
S p e e d N e g
F L O G I
S E C U R I T Y ACL
Z O N I N G
L U N M A S K I N G
Configuration/CLI Commands
Portlog Analysis
Configuration Problems Port/device/switch is not correctly configured Brocade Zoning Problems Zoning is not configured correctly Brocade QuickLoop (QL) Problems QL is configured or used inappropriately License Problems Customers do not have the license to do what they are attempting Single use transaction key used to retrieve switch license expired1 Marginal Links Bad or marginal cables/GBICs/SFPs
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 4
Timeout/sluggishness congestion Segmented Fabric switch cannot join Fabric Port/Node configuration port offline, incorrect portCfgShow output, incompatible node parameters or device drivers Missing Device Nx_Port attributes not shown in the Fabric database or Nx_Port not seen by other Nx_Port Marginal Link Intermittent transmitting/receiving signal
4.
5.
Timeout/sluggishness ISL overloaded, insufficient BB-credit, marginal link Segmented Fabric Wrong product license, Domain ID conflicts, zoning conflicts, incompatibility switch parameters Port/Node configuration Port/node offline and/or configured in wrong topology Missing Device Nx_Port is not registered with the name server, zoning is enabled and device is not in the zone, LUN masking is enabled and device is not properly defined, application accessing device is not configured correctly
3.
4.
5.
Healthy No worries
Not Healthy
Gather & analyze to identify the problem
1. Timeout/Sluggishness
Start
Marginal Link
1. Timeout/Sluggishness (contd)
Label Analysis Commands with an A and Solve Commands with an S: 2 1
Congestion Related Commands/Tools portPerfShow topologyShow uRouteShow linkCost/lsdbshow porterrShow errShow/mqShow portStatsShow PM & Scripts /API Fabric Watch spinFab
portStatsShow portLogDump
portCfgShow distance
Check HBA configuration from its utility
2. Segmented Fabric
Start
10
Zoning conflict
11 3
cfgShow licenseShow faShow portZoneShow zoneHelp Zone Commands: cfgCreate zoneAdd aliDelete
3. Port Configuration
Start
12
Speed Related
3 2
13
Topology Related Commands configShow licenseShow1 portShow portCfgLport portCfgGport portCfgEport fabricShow topologyShow
Trunking Related Commands version4 licenseShow portCfgTrunk switchCfgTrunk truckShow islShow trunkDebug
Web Tools
Web Tools
4. Missing Device
Start
Does device have a good physical connection to the Fabric? (See tip#1) Yes Is the device logically connected to the Fabric? (See tip#2) No Yes Does the device correctly follow FC connection protocol? (See tip#3) No Yes Use port/node configuration troubleshooting commands/tools Yes Check zoning, QuickLoop (See tip#4) and LUN masking configurations No Use Marginal Link Troubleshooting Flowchart
14
15
The
Cable/Terminator
How do you troubleshoot? Divide and conquer, use a process of deduction and logical elimination
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 15
16
To troubleshoot this scenario (assuming a single switch SAN), you must concentrate on:
The The The The
Heterogeneity
How do you troubleshoot? Start at the switch switchshow and error log then look at additional supportshow output
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 16
17
18
0 = Panic 1 = Critical 2 = Error 3 = Warning 4 = Information 5 = Debug System Error Log Logs diagnostics and system error messages Fabric OS v3.1 Circular log with 64 entries Fabric OS v4.1 Circular log with 1536 entries (256 entries per message level) Persistent log saves errors across reboots Can be resized from 1024 to 2048 with errnvlogsizeset
19
Error 02 0x10fbdca0 (tSwitch): Feb 17 19:21:29 Error Description Error DIAG-POST_SKIPPED, 3, (999) Skipped POST tests: assuming all ports are healthy, Err# 0004 Error Message Code = Error Types Error 01 0x11fbdca0 (tSwitch): Feb 17 19:21:29 Error SYS-BOOT, 4, Restart reason: Reboot Logs can be forwarded using syslogdIpAdd
20
Group of pre selected Fabric OS and LINUX commands Commands can be Controlled supportshowcfgshow
supportshowcfgenable supportshowcfgdisable
When executed it will gather information and display numerous
Brocade Fabric OS command outputs on the terminal screen Clear the old portlog by invoking portLogClear 2. Recreate the problem or force suspected port to login 3. Capture supportShow
1.
21
os enabled exception enabled port enabled fabric enabled services enabled security enabled network enabled portlog enabled system enabled extend disabled filter disabled perfmon disabled switch:admin>
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 21
v3.1 supportShow
OS Fabric
22
23
version uptime switchShow tempShow psShow licenseShow diagShow portFlagsShow portErrShow portCfgShow configShow
Extend
filterShow
PerfMon
ps_dump
v4.1 supportShow
24
OS mii-tool vv /usr/bin/du -xh / | /bin/sort: /bin/ps elfh /bin/echo /bin/rpm qa /bin/cat /var/log/dmesg /bin/cat /etc/fstab /bin/cat /etc/mtab
Fabric
fabricShow islShow trunkShow topologyShow fabStatsShow fabSwitchShow fabStatsShow fabPortShow fspfShow fcplogsShow /fabos/bin/zone stateshow portZoneShow portCamShow cfgSize cfgShow rcssmShow rcsinfoShow rcsregistryShow
portLogDump
25
filterShow
PerfMon
ps_dump a n port#
26
27
version uptime tempShow psShow licenseShow diagShow * errDump switchShow portFlagsShow portErrShow * mqShow portSemShow portShow * portRegShow portRouteShow portStructShow bloomDataShow
fabricShow trunkShow topologyShow qlShow faShow portCfgShow nsShow nsAllShow nsCamShow cfgShow configShow faultShow traceShow memShow mallocShow fastCheckHeap portLogDump
v4.0:admin> diagShow
Displays diagShow output for slots specified in supportShow, default: all
diagshow of slot 2 (<truncated output> Slot: 1 UPORTS <more truncated output>): Diagnostics Status: Thu Oct 17 22:21:47 2002 Slot: 2 UPORTS Port BPort Diag Active Speed FrTX FrRX LLI Errs Loopback 16 15 OK DN 2G Auto ---17 14 OK DN 2G Auto ---18 13 OK DN 2G Auto ---19 12 OK DN 2G Auto ---20 31 OK DN 2G Auto ---21 30 OK DN 2G Auto ---22 29 OK DN 2G Auto ---23 28 OK DN 2G Auto ---24 47 OK DN 2G Auto ---25 46 OK DN 2G Auto ---26 45 OK UP 2G Auto 30073 29860 45166 27 44 OK DN 2G Auto ---28 63 OK DN 2G Auto ---29 62 OK DN 2G Auto ---30 61 OK DN 2G Auto ---31 60 OK DN 2G Auto ---Central Memory: OK Total Diag Frames Tx: 0 Total Diag Frames Rx: 0
28
v3.0:admin> portErrShow
Note: v4.0 has identical output
29
d i a g S h o w
frames enc crc too too bad enc disc link loss loss frjt fbsy tx rx in err shrt long eof out c3 fail sync sig -------------------------------------------------------------------------------------------------------------------------0: 33 32 0 0 0 0 0 15 0 271 14 0 0 0 0: 43m 107m 0 0 0 0 0 38 0 76 84 17 0 0 1: 35m 107m 0 0 0 0 0 39 0 75 111 17 0 0 2: 70m 29m 0 0 0 0 0 1.6k 0 9 5 9 0 0 3: 10m 7.3m 0 0 0 0 0 95k 33 0 21 30 0 0 4: 3.0m 2.0m 0 0 0 0 0 0 22 1 15 22 0 0 5: 1.3m 859k 0 0 0 0 0 0 35 2 16 18 0 0 6: 08m 36m 0 0 0 0 0 8 70 0 10 13 0 0 < truncated output> 14: 23m 46m 0 0 0 0 0 37 0 826 107 20 0 0 15: 21m 47m 0 0 0 0 0 38 0 888 140 20 0 0
portStatsClear [slotNumber/]portNumber] can be used to clear port errors on most error statistics on the same ASIC quad as the port
30
portLogDump
What a portLogDump is:
The command invoked to retrieve switch portlog information A recorder inside the switch
31
limited length or record time Captures only activities that are related to the Fabric and Fabric services Port states, Fabric and port logins, state changes, name service queries Does not normally capture end-to-end (device-to-device) activity
Has a
portLogDump
What a portLogDump is not:
portLogDump is not an FC analyzer trace
32
No trigger capability Limited storage capacity Not end-to-end The Event Log is separate You must analyze and catch abnormal Fabric activities
33
R_CTL
Class 3 frame received Frame payload size
Task
Timestamp
Data Collection
The data is collected. What is next?
34
Debugging Tool?
Hardware Diagnostics
General Behavior of the Tests
35
If none entered, most tests default to run one set If none entered, the following tests default to an infinite loop:
portLoopBack
crossPortTest
spinSilk
Tests are invoked from either the Telnet screen or the front panel, with
The test stops at the first error it finds The test can be interrupted with the Enter key while still running,
36
Pt0 (Lm0) "WordsTx" is 0x200020 sb 0x400040 er 0x600060, off 0x100 phy 0x80030100 msk 0xffffffff, Err# 0415
Line 1 shows the task (tPBmenu) and the date/time (Mar 22 11:03:23) Line 2 shows the error (DIAG-REGERR), the severity (1) and the test (RegTest)
Line 3 (or more lines) describes the failure often showing the actual (is) and
the expected (sb) values The last line shows the equivalent error number See the Brocade Fabric OS Diagnostic and Error Reference Guides
per OS version
37
1 2 3 4 5 6
camTest2 3 portLoopbackTest2
portloopbacktestserdes2 txdtest2 crossPortTest5 6 spinSilk3 spinsilktestserdes2 filterTest3 statsTest3 portTest5 loopPortTest5 spinFab5 fPortTest5 6 backPlaneTest2 4
Part of v4.x POST 1 Part of v4.x POST 2 Part of v3.x POST Available on v4.x only Can be run while switch is online Can be run in OS versions 2.x while switch is online
38
diagPostEnable turns POST back on fastBoot disables POST this boot only
39
following tests1:
There are 3 boot types (~boot times for V4.x series switches):
Cold
Warm Fast
boot (fastboot)
40
crossPortTest
Functionally verifies the ports ability to send and receive
41
Tests the entire path: main board, SFP/GBIC, and cable To run this test, external cables are required It tests the data integrity of each frame
SW connects to SW LW connects to LW
Only one frame is transmitted and received at any one time The port LEDs rapidly flicker green while the test is running
crossPortTest
Modes
With the switch enabled; it probes for at least one M-M or M-N
42
connection, the test fails if at least one loop (M-M or M-N) is not found
switches
Test aborts if any port is not connected If setGbicMode 1 (a separate telnet command): Only ports with SFPs or GBICs are included in the test Disconnect all non-looped SFPs/GBICs
v3.x:admin> crossPortTest
Operands
crossPortTest [ passCount, singlePortAlso ]
43
singlePortAlso - Allows port M to be connected to itself (M-M) - If 1: it allows for both single port (M-M) and cross port (M-N) cable connections - If 0: only cross port (M-N) cable connections are allowed
times (or until you press Enter) and allows ports to be connected to themselves (M-M):
v3.x:admin> crossPortTest 0,1 Running Cross Port Test ...
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 43
v4.x:admin> crossPortTest
Operands
crossPortTest [-nframes count] [-lb_mode mode] [spd_mode mode][-gbic_mode mode][-norestore mode][-
44
ports itemlist]
This example will run crossPortTest online or offline 100 (default value is 10) times on slot 2 to the SERDES at 2 Gbit/sec, not all
spinSilk
Full speed functional test of internal/external tx and rx paths Configures each looped port pair to route received frames to
45
each other
Four frames spin around each two-port loop at full hardware speed continuously
All loop back plug ports will send frames to all other loop back plugs and themselves1 There is no CPU intervention during the test
8b/10b encoders
spinSilk - Operands
v3.X+: spinSilk [nmegs, gbic_mode, lb_mode, spd_mode] Example: switchname:admin> spinSilk 10,1,1,0
nmegs - Test will run 10 times; gbic_mode- Only ports with SFP/GBICs will be
46
tested (this test only); lb_mode - Both M-M and M-N cables are allowed; spd_mode - Ports with SFPs/ GBICs will auto-negotiate speed
v4.x+: spinSilk [--slot number] [-nmegs count] [-gbic_mode mode] [lb_mode mode][-spd_mode mode] [-norestore mode][verbose mode] [-ports itemlist] Example: spinSilk slot 1 -nmegs 100 -lb_mode 1 spd_mode 2 verbose 1 ports 1/4-16
slot 1 - Test will run on slot 1; nmegs - Test will run 100 times; gbic_mode Only ports with GBICs will be tested (this test only); lb_mode - Both M-M and M-N cables are allowed; spd_mode Ports are locked at 2 GByte/sec; verbose
mode - Either on (1) or off (0), in this case it will be on; -ports 1/4-8 Will run
test port 4 to/from port 8 on slot 1
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 46
spinFab
Exercises E_Port connections in a manner similar to
47
spins them, verifies that the frames are still in order and there are no 8b/10b encoder errors
The frame is sent with the original switches S_ID so it
spinFab - Operands
v3.X spinFab [nMillionFrames, ePortBeg, ePortEnd, setFail]
48
nMillionFrames - The number of million frames per port to execute this test (default is 100); ePortBeg - First port to test, if omitted 0 will be used; ePortEnd - The last port to test - test will be performed on; ePortBeg to ePortEnd inclusive; setFail - Specify 1 to mark failing ports as bad, or 0 to not mark failed ports as bad
v4.X spinFab [-nmegs count] [-ports itemlist] [-setfail mode] [-domain value]
-nmegs: The number of million frames per port to execute this test (default is 100); ports: Specify a list user ports to test. By default, all of the ISL ports in the current switch will be tested; setFail - Specify 1 to mark failing ports as bad, or 0 to not mark failed ports as bad ; -domains: This parameter is used to specify a specific remote domain that the switch is connected to (default is to automatically determine the remote domain number)
loopPortTest
Used to test NL Ports and the devices attached Tests from main board through devices Should be connected with the same SFP/GBIC type; SW
49
loopPortTest - Operands
50
This will execute loopPortTest 100 times on port 8 with payload pattern 0xaa55 and pattern width 2 (meaning word width)
v4.x+ loopPortTest [-nframes count] [-ports itemlist seed payload_pattern] [-width pattern_width]
-nframes: The number of million frames per port to execute this test (default is 10); -ports: Specify a list of user ports to test; -seed: The pattern of the test packets payload; -width: The width of the pattern which user specified. It could be 1, 2, and 4 (which are byte, word, and quad)
fPortTest
Functional test of F N, N F Ports Tests from mainboard through devices N_Port receiver
51
fPortTest - Operands
52
This will execute fPortTest 100 times on port 8 with payload pattern 0xaa55, pattern width 2 (meaning word width) and default payload size 512 bytes
v4.x+ fPortTest [-nframes count] [-ports itemlist] [-seed payload_pattern] [-width pattern_width] [-size pattern_size]
nframes count: The number of times (or number of frames per port)to execute this test (default value is 10); ports itemlist: Specify a list of user ports to test; seed payload_pattern: The pattern of the test packets payload; width pattern_width: The width of the pattern which user specified - It could be 1, 2, and 4 (which are byte, word, and quad); size pattern_size: Number of words of the test packets payload (default value is 512)
2003 Brocade Communications Systems, Incorporated. Revision CFP261-1001-2003 Chapter 7 - 52
portTest
Used to isolate problems to a single replaceable element Diagnostics can be run on demand
53
frames from ports Tx to the Rx Exercises all components: Main board SFP Cable
Should be connected with the same SFP/GBIC type; SW
v3.1:admin> portTest
Operands
portTest [ports, iteration, delay, timeout, pattern, patsize, seed]
54
ports The type of port on which to run the test iteration Number of times to run the test; -1 runs infinitely delay Time delay in minutes between frames being sent; 20 is default
v3.1:admin> portTest
Command Example
v3.1:admin> portTest 4,30
Test run with 4 ports on switch:
55
12 contained a loopback adapter Port 13 was an F_Port attached to a server Port 14 was an E_Port Port 15 was an L_port attached to a JBOD Operand 4,30 used to run the test only against E_Ports, 30 iterations Possible port types are as follows: -1 All ports -2 All L_Ports -3 All F_Ports -4 All E_Ports -5 All N->N loopback ports stopporttest used to halt test
Port
v4.1:admin> portTest
Operands
portTest [-ports itemlist, -iteration count, -userdelay time, timeout time, -pattern pattern, -patsize size, -seed seed, listtype porttype]
56
-ports - A list of ports on which to run the test -iteration - Number of times to run the test; -1 runs infinitely -userdelay - Time delay in minutes between frames sent; 10 is default -timeout Max seconds to allow test to run; default is 0 -pattern The pattern of the test packets payload (as per datatypeshow); default is random -patsize The width of the pattern; default is 1024 -seed Seed value used with the pattern; default is 0xaa -lisType Type of ports on which to run the test (v4.1.0 is case sensitive)
v4.1:admin> portTest
Command Example
v4.1:admin> porttest -ports 1/12-14 -iteration 50
Test run with 3 ports on switch:
Port Port Port
57
Operand ports 1/12-14 indicates test run against slot 1, ports 12-14
Operand -iteration 50 used to run test 50 times stopporttest used to halt test
58
Action replace MB replace MB replace MB replace MB replace MB replace MB replace MB Action replace MB
59
replace CBL, GBC, SFP, MB, or attached device, CBL, GBC, SFP, MB
Summary Page
Divide and Conquer - Troubleshoot starting at the
60
switch
Brocade commands in supportshow can help
determine where problem breakdown occurred in LINK, LOGIN, FABRIC, DEVICE process
Hardware diagnostics can be useful to the field
Review Questions
Trunking problems fall into which categories? Choose all that apply: a. Port Configuration Problem b. Fabric Issues c. Marginal Link d. Optional Product License 2. How can the nsShow command output help you to troubleshoot a SAN?
1.
61
3. What are the differences between the portStatsClear and the portLogClear commands? 4. What information is provided in response to the diagShow command?