Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

FC Management Security

Fibre Channel Common Transport (FC-CT) Management Security


• Attached FC devices can send in Common Transport commands to query the Name Server
• Ensure only approved devices send FC CT cmds
• Introduced in NX-OS 6.2(9)
• To enable
• fc-management enable
• If devices are to be permitted to query the name server then they must be added into the
database:
• fc-management database vsan <vsan>
• pwwn <pwwn> feature <feature or all> operation <both or read>

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
FLOGI Scale Enhancements
NX-OS 8.1(1) Two enhancements allow for larger MDS 9718 limits

• FLOGI scale
• FLOGI quiesce

Feature Parameters Old Limits New Limits


(MDS 9700) (MDS 9718)
Login (FLOGI / FLOGIs per Module 1000 2000
FDISC)
FLOGIs per Switch 4000 8000

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
FLOGI Scale Enhancements
Flogi scale
• On a switch or module reload, routing information(RIB) is updated
based on prior fcdomain manager information. This is done even
prior to devices logging in. This will reduce the time taken to
generate the ELS_ACC(FLOGI).
• CLI: flogi scale enable – default is “enabled” on all switch types
except for 9148S and 9250i

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
FLOGI Scale Enhancements
Flogi quiesce
• When a device logs out of a port the Fport-server will send the
ELS_ACC(LOGO) normally but will delay notifying other applications
and sending any RSCNs for a default 2 seconds.
• If the device re-logs in to the same interface then there is very little
the Fport-server needs to do.
• If the device does not re-log in to the same interface within 2
seconds or logs into a different interface on the same switch then
FLOGI notifies all of the other applications and sends the
appropriate “offline” RSCNs.

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
FLOGI Scale Enhancements
Flogi quiesce
• Normally this 2 second delay is fine
• If there are devices sharing PWWNs by logging into separate
switches in failover conditions then this should be disabled
• CLI: flogi quiesce timeout 0-20000 0 disables it.
• NX-OS 8.1 and 8.2 default is 2000(2 seconds)
• NX-OS 8.3(1) and later the default was changed to 0 (no delay)

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Design Principles for
Slow Drain and
Congestion Isolation
SAN Congestion
What is SAN congestion?
• SAN congestion is when some part of the SAN has frames that
cannot be immediately transmitted
• Caused by two main reasons
1. “Traditional” slow drain
• Devices purposely withholding buffer to buffer credits
• Lost credits
• Easy to spot

2. Overutilization / Oversubscription
• Devices requesting more data than they can receive at their link rate
• More difficult and tricky to spot

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Slow Drain - Example
2. Data
sent by
both
arrays
“Typical” Slow Drain causing ISL congestion
SCSI Read SCSI Data
1. Server SCSI Read
sends SCSI Read 16Gb FC
multiple SCSI Read port- 8Gb
reads to
different
FC channel
FC Data

8Gb
switch1 switch2
RRDY
targets No R_Rdy
Data Data
Sent
FC Data Data Data Data
RRDY 8Gb
server No/slow
RRDY
No R_Rdy Data
RRdy sent Sent RRDY
No R_Rdy
Sent
5. Server SCSI Data
6. FC 3. Data 7. Data
stops or 4. Data arriving at 8. Left FC 9. Data
switch must builds up
slows sent by switch builds up 10. Right
frequently a on ingress
R_RDYs the FC stops on ingress FC Switch
stop maximum due
switch sending due stops
sending due combined excessive
R_Rdys to excessive sending
to 0 Tx 16 Gbps data being
right FC data being R_Rdys to
credits received
switch received arrays

Both arrays and all devices utilizing ISLs are affected!


#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
2. Data

Over Utilization - Example


sent at line
rate
8Gbps by
both
Multiple SCSI reads cause congestion w/o withholding R_RDYs arrays
1. Server
sends SCSI Read SCSI Data
SCSI Read
multiple
SCSI Read 16Gb FC
reads to
different
SCSI Read port- 8Gb
targets FC channel
FC Data

closely 8Gb
switch1
RRDY
spaced Data Data
switch2 No R_Rdy
Sent
FC Data Data Data Data Data Data
RRDY RRDY RRDY RRDY 8Gb
server RRDY
No R_Rdy Data

Sent RRDY
No R_Rdy
Sent
5. Server 4. Data SCSI Data
sending 6. FC switch 7. Data 8. Left FC
sent at line 3. Data 9. Data
R_RDYs w/o receiving builds up switch
rate arriving at builds up 10. Right
delay excess data on ingress stops
8Gbps by a on ingress FC Switch
due to due sending
the FC combined due stops
sheer excessive R_Rdys to
switch 16 Gbps excessive sending
volume data being right FC data being R_Rdys to
received switch arrays
received

Not strictly “slow drain” but the effects are exactly the same!
#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
SAN Congestion – Alerting / Prevention
Alerting - Port-monitor
• Port-monitor is the primary mechanism for alerting
• Port-monitor functions by running at periodic “poll-intervals”
• Each “poll-interval” it checks a configured counter
• If a counter >= a configured “rising-threshold” an SNMP
alert/syslog is generated
• As long as counter continues above the “falling-threshold” no
further SNMP alerts or syslogs are generated
• If a counter <= a configured “falling-threshold” SNMP alert/syslog
is generated

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
Slow Drain Alerting and Prevention
Alerting – Port-monitor - Counters
• counter <name> poll-interval <interval> delta rising-threshold
<rthresh> event <id> falling-threshold <fthres> event <id> warning-
threshold <wthres> <portguard errordisable | flap>
• poll-interval – Seconds - How often should this counter be checked?
• delta – Compare the current value with the value at the previous poll interval
• absolute – Match the actual value
• rising-threshold – How much the counter must increase in this poll interval
to trigger
• event – Indicates severity of alert - info, warning, error, etc.
• warning-threshold – Optional - A lower value than rising-threshold to issue
warning syslog message (no alert)
#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
Slow Drain Alerting and Prevention
Alerting – Port-monitor - Counters
• falling-threshold - How much the counter must decrease in this poll interval
to reset
• portguard – Optional – Action to take when rising-threshold is reached
• errordisable – Place put in error-disable state. Requires manual shut/no shut to re-
activate
• flap – shut/no shut port
• cong-isolate – For congestion-isolation – Only valid on these four counters:
• TXWait
• TX-credit-not-available
• credit-loss-reco
• tx-slowport-oper-delay

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
Slow Drain Alerting and Prevention
Alerting - Port-monitor – RMON event severities
• Event indicates severity in alert
• 1 – Fatal
• 2 – Critical mds9513(config-port-monitor)# show rmon events
Event 1 is active, owned by PMON@FATAL
• 3 – Error Description is FATAL(1)
Event firing causes log and trap to community public, last fired never
• 4 – Warning Event 2 is active, owned by PMON@CRITICAL
Description is CRITICAL(2)
• 5 – Informational Event firing causes log and trap to community public, last fired never
Event 3 is active, owned by PMON@ERROR
Description is ERROR(3)
Suggestions: Event firing causes log and trap to community public, last fired never
Event 4 is active, owned by PMON@WARNING
2 – Link failure events Description is WARNING(4)
Event firing causes log and trap to community public, last fired
3 – Packet loss events 2014/02/21-17:13:11
Event 5 is active, owned by PMON@INFO
4 – Delay type events Description is INFORMATION(5)
Event firing causes log and trap to community public, last fired
2014/03/08-08:25:19

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
Slow Drain Alerting and Prevention
Alerting – Port-monitor – Congestion / Slow Drain Counters
Counter name Slow Drain Event Description
Level
credit-loss-reco 3 2 Credit loss recovery counter – 1-1.5 secs @ 0 Tx credits
lr-rx 3 2 The number of link resets received by the fc-port
lr-tx 3 2 Link resets transmitted by the fc-port
timeout-discards 2 3 Timeout discards counter
tx-discards 2 3 Tx discards counter – all reasons
tx-credit-not-available 1 4 Credit not available counter 10% (100ms) increments
tx-slowport-oper-delay 1 4 Slowport operational delay (9500 gen4 & 16G)
txwait 1 4 Percentage of time at 0 Tx credits and packets Q’d
tx-datarate 1 4 Tx data rate as a percentage of link speed

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
Slow Drain Alerting and Prevention
Alerting – Port-monitor – Link Error Counters
Counter name Slow Drain Event Description
Level
Link-loss NA 2 Link loss counter
Sync-loss NA 2 Sync loss counter
Signal-loss NA 2 Signal loss counter
Invalid-words NA 3 Invalid words counter
Invalid-crc NA 3 Frames received with invalid CRC counter

Note link-loss, sync-loss and signal-loss are very similar. Probably only
need link-loss.

#CLUS BRKSAN-2883 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 62

You might also like