BRKRST 3363

Routed Fast Convergence
Denise “Fish” Fishburne

Customer Proof of Concept Team Lead
CCIE #2639, CCDE 2009:0014

BRKRST-3363
Spoiler Alert!!
Spoiler Alert!!
Designing for fast convergence is more than tuning a few timers
 Think about network convergence, not just routing protocol convergence
 Look at layer 1 and layer 2 for failure detection properties and physical
topology (shared-risk link groups)
 Failure detection is key to fast convergence
 OSPF – check your timers
 EIGRP – design feasible successors and bound your query domain
 When designing for fast network convergence – the business needs are the
driving factor
Agenda
• Thinking About Fast Convergence
• Routed Convergence
• Failure Detection
• OSPF/ISIS
• EIGRP
• BGP
• Additional Routed Convergence
• NSF/NSR
Agenda
• OSPF/ISIS
• EIGRP
• BGP
• NSF/NSR
Fast Convergence Mindset
• How Fast?
• 200ms (or less)
• 50ms – SONET APS
• Do I Need It?
• Complexity vs. Return
• Business Drivers
• Risks
• More than timers

• Processes
• Monitoring
• Applications
• Everything Matters!
• Not the same thing, but faster
• Not just about routing protocols
• Not just about failure recovery
• Not just about one node

Measuring Convergence
Convergence =
Failure Detection + Event Propagation + Routing Process + FIB Update
Neighbor Down Tell Neighbors RIB + CEF + Hardware

Measuring Fast Convergence
A D
C
A D
C
• What happened?
B
! !
A D
C
• What happened?
• Event Propagation
• Spread the word
B
! !
My Link to D is
A down! D
C
My Link to B is down!
• Failure Detection • Routing Process

• What happened? • Now where do we go?
• Spread the word
A
??? ??? D
C

• Spread the word
Reach D via C
A No change D
C
• Event Propagation • FIB Update

• Spread the word • Make it so
A D
C
• Event Propagation • FIB Update

• Spread the word • Make it so
A D
C
Network Convergence Overview
• Network convergence is the time needed for traffic to be rerouted to the
alternative or more optimal path after the network event
• Network convergence requires all affected routers to process the event and
update the appropriate data structures used for forwarding
• Network convergence is the time required to:

• Detect the event
• Propagate the event
• Process the event
• Update the routing table/FIB
The Culture of Availability
Measuring
What’s FastLevel?
Your Availability Convergence
Availability DPM Downtime Per Year (24x365)

99.000% 10000 3 Days 15 Hours 36 Minutes
99.500% 5000 1 Day 19 Hours 48 Minutes
99.900% 1000 8 Hours 46 Minutes
99.950% 500 4 Hours 23 Minutes
99.990% 100 53 Minutes
99.999% 10 5 Minutes
99.9999% 1 30 Seconds
What’s Your Availability Level?
Reactive ~99%
 Few if any identified processes (except maybe to fix problems as reported by users)
 Significant number of Single Points of Failures
 Low tool utilization
 Low level of consistency (HW, SW, config, design)
 No quality improvement processes
Proactive ~99.9%
 Good change management processes including what-if analysis and change validation
 Low number of Single Points of Failures
 Fault and configuration management tools
 Improved consistency (HW, SW, Config, design)
 Typically no quality improvement process
Predictive ~99.99+%
 Consistent processes for fault, configuration, performance and security

 No Single Points of Failures except at edge of network
 Fault, configuration, performance and workflow process tools
 Excellent consistency (HW, SW, config, design)
Designing for Fast Convergence
• Designing for FC is more than tuning a few timers
• Designers need to look at all network layers
Layer 1 and Layer 2 for failure detection properties and physical topology (shared-risk
link groups)
Layer 3 protocol behaviour, interactions between different protocols
Layer 4-7 for application requirements and behaviour
Only 3 numbers are interesting in the context of network convergence
• 1) What is the longest the network could take to converge?
• 2) What is the average amount of time the network could take to converge?
• 3) How long before applications running on the network lose their state?
If you know these three numbers you can tell…

– 1) How well the network is going to perform
– 2) Whether or not the network is meeting application requirements
– 3) Whether or not to look for some way to make the network faster (or slower!)
The Mistake in Fast Convergence
• Fast Convergence is doing the same things but faster
• Fast Convergence is about getting the routing protocol quicker to converge
The mistake: we should not have been thinking about routing protocol
convergence, but of network convergence
• What can the network do to minimize loss in the event of a failure?
• What can we do outside the routing protocol?

Routed Convergence
Agenda
• OSPF/ISIS
• EIGRP
• BGP
• NSF/NSR
Failure Detection
Failure Detection
Detecting Link Failure
Link Failure -> Interface Down, Easy?

Hardware Software
Port Firmware
PHY CPU IOS
ASIC
Hardware Dependent
• Polling vs Interrupt
• 6748-GE-TX: 20ms/port * 48 ports = 960ms (polled)
• Nexus 7k, ASR9k, 6708-10GE/ES/ES+: <10ms (interrupt)
Failure Detection
• Link Failure -> Interface Down, Easy?
Hardware Software
Port Firmware
PHY CPU IOS
ASIC
• Debounce Timer
• Throttles down notification
• Switches only
Failure Detection
• Link Failure -> Interface Down, Easy?
Hardware Software
Port Firmware
PHY CPU IOS
ASIC
• Debounce Timer • Carrier Delay

• Throttles down notification • Throttles up + down
• Switches only • Routers only
Debounce Timer Carrier Delay
• Not Always Configurable • Generally Configurable
• Platform/Linecard/Media Dependent • Software Dependent
• 7600 • IOS/IOS-XE
• 10ms on Fiber (10Gig) • 2 Seconds
• 300ms Copper • NX-OS
• NX-OS • 100ms (SVI Only)
• 100ms • XR/ASR9k
• ASR9k • 0 ms
• 0ms
Recommendation: 0 down, 2sec Up
Recommendation: Leave unchanged
7600(config)# interface ...
7600(config)# interface ... 7600(config-if)# carrier-delay msec 0
7600(config-if)# link debounce time ... 7600(config-if)# carrier-delay up 2
Failure Detection at Layer 3
In some environments, some or all failures can be directly tied to the physical
One physical cable
Layer 3 Routing Adjacency

Failure Detection at Layer 3
In some environments, some or all failures require checks at Layer 3 (i.e. IP)
L2 bridged
network
DWDM/X without LoS

propagation
Tunnels (GRE,
IPsec, etc.)
Layer 3 Routing Adjacency

Event
FailureDetection
DetectionatatLayer
Layer33: Fast Hellos
• All IGPs (EIGRP, OSPF and ISIS) use HELLOs to maintain adjacencies and to
check neighbour reachability
• Hello/Hold timers can be tuned down (“Fast Hellos”), this may not be ideal --
•This does not scale well to larger number of neighbors
•Not a robust solution, high CPU load can cause false-positives
•Having said this: Works reasonably well in small & controlled environments, for example
Campus networks
 We need another solution: Use BFD!
BFD (Bi-directional Forwarding Detection)
BFD
L2 bridged
network
• In normal scenarios if a failure occurred in the L2 bridged network, the layer 3

routers would have to rely on their IGP/BGP timers to detect the failures
• With BFD, failure can be detected in less than a second
Fast Hellos vs BFD
Fast Hellos BFD
• Even Faster
• Normal Hellos…but fast!
• 50ms x 3 = 150ms detection
• ~1 second detection
• Interrupt Driven (like CEF)
• Process Driven
• 1 Hello to Rule Them All
• 1 Hello/Protocol
• PIM, LDP, BGP, OSPF • Hardware Offload Possible
• Nexus 7k, ASR 1k/9k, me3600-CX,
• Handled by Central CPU 7600 ES+
BFD Configuration
IOS-XE(config)# interface…
IOS-XE(config-if)# bfd interval 50 min_rx 50 multiplier 3
IOS-XE(config)# router ospf 1
IOS-XE(config-router)# bfd all-interfaces
RP/0/RSP0/CPU0:XR# configure
RP/0/RSP0/CPU0:XR(config)# router ospf 0
RP/0/RSP0/CPU0:XR(config-ospf)# area 0
RP/0/RSP0/CPU0:XR(config-ospf)# interface …
RP/0/RSP0/CPU0:XR(config-ospf-ar-if)# bfd fast-detect
NX-OS(config)# interface…
NX-OS(config-if)# bfd interval 50 min_rx 50 multiplier 3
NX-OS(config)# router ospf 1
NX-OS(config-router)# bfd

• BFD (150 ms)
• FIB Update
• Event Propagation • Make it so
• Spread the word
OSPF
SPF and LSA Generation Throttling
Throttling is the general process of slowing down responses to the frequently
oscillating events such as link flaps.
The general idea is to reduce resource wastage in unstable situations and wait till the
situations calm down…….The general idea is as follows.
When an event occurs, e.g. a link goes down or new LSA arrives, do not respond to
it immediately, e.g. by generating an LSA or running SPF, but wait some time, hoping
to accumulate more similar events, e.g. waiting for the link to go back up, or more
LSAs arriving.
This could potentially save a lot of resources, by reducing the number of SPF runs or
amount of LSAs flooded.
The question is – how long should we hold or throttle the responses?
Petr Lapukhov
http://blog.ine.com/2009/12/31/tuning-ospf-performance/
OSPF Convergence Times
• Convergence =
Failure Detection + Event Propagation + SPF + FIB Update
Neighbor Down LSA generation RIB + CEF + Hardware
Neighbor Down: Dead Time vs. BFD

Event Propagation: LSA generation throttle vs. LSA generation tuning
Event Processing: SPF throttle vs. SPF tuning
LSAs
• Initial LSA Generation Delay
•OSPF_LSA_DELAY_INTERVAL
•Only Router and Network LSA Generation Delayed
• Recurring LSA Origination Delay

•MinLSInterval
•The minimum time between distinct originations of any particular LSA.
 Fast LSA Generation

 Repeated events increase regeneration delay
 Configuration:
timers throttle lsa all <lsa-start> <lsa-hold> <lsa-max>
SPF
• SPF-DELAY and SPF-HOLDTIME protect the router as the cost of

convergence time
• Configuration:
•timers throttle spf <spf-start> <spf-hold> <spf-max>
OSPF:
IOS Default
OSPF Timers
Defaults
IOS
brisbane#sh ip ospf 100
Routing Process "ospf 100" with ID 30.0.0.13
Initial SPF schedule delay 5000 msecs Initial SPF delay (5 secs)
Minimum hold time between two consecutive SPFs 10000 msecs (10 secs)
Maximum wait time between two consecutive SPFs 10000 msecs (10 secs)
Incremental-SPF disabled
Minimum LSA interval 5 secs Min LSA interval
IOS XE OSPF
OSPF: Defaults
Default Timers
IOS XE
darwin#sh ip ospf 100
.
.
Initial SPF schedule delay 5000 msecs Initial SPF delay (5 secs)
Minimum hold time between two consecutive SPFs 10000 msecs (10 secs)
Incremental-SPF disabled
Minimum LSA interval 5 secs Min LSA interval
IOS XR OSPF
OSPF: Defaults
Default Timers
IOS XR
RP/0/5/CPU0:rangers#sh ospf

Initial SPF schedule delay 50 msecs Initial SPF delay (5/100ths sec)
Minimum hold time between two consecutive SPFs 200 msecs (2/10ths sec)
Initial LSA throttle delay 50 msecs Initial LSA delay (5/100ths sec)
Minimum hold time for LSA throttle 200 msecs (2/10ths sec)
Maximum wait time for LSA throttle 5000 msecs (5 secs)
IOS N7KDefault
OSPF: OSPF Defaults
Timers
N7K
N7KC1# sh ip ospf
Routing Process 100 with ID 10.1.1.1 VRF default
...
…
SPF throttling delay time of 200.000 msecs, Initial SPF delay (2/10ths sec)
SPF throttling hold time of 1000.000 msecs, (1 sec)
SPF throttling maximum wait time of 5000.000 msecs (5 secs)
LSA throttling start time of 0.000 msecs, Initial LSA delay (0 secs)
LSA throttling hold interval of 5000.000 msecs, (5 secs)
LSA throttling maximum wait time of 5000.000 msecs (5 secs)
..
..
Improving OSPF Event Convergence
Wild Side Nice and Easy
• Start: 0ms • Start: 5ms
• Hold: 20ms • Hold: 40ms
• Max: 5000ms • Max: 10000ms
General theory for timer tuning

React immediately the first time, then wait significant
periods of time for subsequent events
EIGRP
Event Propagation in EIGRP
EIGRP A
• The Good Query
• Immediate event notification
• The Bad
• Query Domain Size B C
EIGRP
Query
EIGRP
Query
D
EIGRP
Query
Event Propagation in EIGRP
EIGRP A
• The Good Query
• Immediate event notification
• The Bad
• Query Domain Size B C
EIGRP
Query
EIGRP
Query
D
EIGRP
Query
Improving EIGRP Event Propagation
EIGRP A
• Reduce Query Domains Query
• Summary
• Stub
• Filters
Summary
Boundary
B C
D
EIGRP A
• Reduce Query Domains Query
• Summary EIGRP
Reply
• Stub
• Filters
Summary
Boundary
B C
D
EIGRP Summary Configuration
IOS-XE(config)# interface…
IOS-XE(config-if)# ip summary address eigrp <AS_NUM> 192.168.0.0 255.255.0.0
!
IOS-XE(config)#router eigrp <AS_NUM>
IOS-XE(config-router)#eigrp stub
IOS-XE(config-router)#distribute-list 7 out <Interface>
IOS-XE(config)# access-list 7 permit 172.16.1.0 0.0.0.255
RP/0/RSP0/CPU0:XR(config)# router eigrp <AS_NUM>
RP/0/RSP0/CPU0:XR(config-eigrp)# address-family ipv4
RP/0/RSP0/CPU0:XR(config-eigrp-af)# stub
RP/0/RSP0/CPU0:XR(config-eigrp-af)# route-policy summary out
RP/0/RSP0/CPU0:XR(config-eigrp-af)# <interface>
RP/0/RSP0/CPU0:XR(config-eigrp-af-if)# summary-address 192.168.0.0/24
NX-OS(config)# interface…
NX-OS(config-if)# ip summary address eigrp <AS_NUM> 192.168.0.0 255.255.0.0
!
NX-OS(config)#router eigrp <AS_NUM>
NX-OS(config-router)#eigrp stub
NX-OS(config-router)#distribute-list 7 out <Interface>
NX-OS(config)# access-list 7 permit 172.16.1.0 0.0.0.255
• Reduce Query Domains A

• Summary
<10s ms
• Stub
• Filters
B C
• Feasible Successors
• Don’t even ask!
D
• Reduce Query Domains A

• Summary
<10s ms
• Stub
• Filters
B C
• Feasible Successors
• Don’t even ask!
~0 ms
• No Query/Reply
D
EIGRP
Feasible Successor
• Whether the next best path is considered loop free by EIGRP (a feasible
successor) or not has a large impact on convergence times.
• Don’t just consider the best path from every point in your network, but also
the next best path.
• Determine how best to set up your path metrics to improve convergence
performance.
• Always use the delay metric to engineer your routing, never the bandwidth
metric!
EIGRP Feasible Successors
• EIGRP selects Successor and Feasible Successor
• Successor is the best route
• Feasible Successor is 2nd best route
• Must be mathematically loop-free (meets feasibility condition)
• Feasible Successor acts as a “backup route”
• Kept in topology table (not routing table)
• Up to 6 Feasible Successors
• Built into the protocol, nothing to enable
Delay 10
172.16.2.0/24 EIGRP 10 B
Delay 15
Metric based on bandwidth and delay

RouterB# show ip route 172.16.2.0
Routing entry for 172.16.2.0/24
Known via "eigrp 10", distance 90, metric 285440, type internal
Routing Descriptor Blocks:
* 192.168.200.1, from 192.168.200.1, 00:34:19 ago, via Eth0/1
Route metric is 285440, traffic share count is 1
172.16.2.0/24
B
RouterB#show ip eigrp topology
P 172.16.2.0/24, 1 successors, FD is 285440
via 192.168.200.1 (285440/281600), Ethernet0/1
via 172.16.1.1 (307200/281600), Ethernet0/0
Feasible Successor reported distance (281600)

is less than Successor feasible distance (285440)
 Feasibility Condition met

 Instant convergence after Successor loss
172.16.2.0/24
B
BGP
BGP Fast Convergence Primer
A
• BGP != IGP OSPF OSPF
• Different Goals
• Lots of Data…. iBGP
• ….means lots of CPU B C
• ….means lots of memory
• ….means lots of packets
• BGP generally relies on IGP OSPF OSPF
• Little Events vs. Big Events D

• Route Flap vs. clear ip bgp *
Think about data plane over control plane

BGP Failure Detection
• Keepalives
• 60/180s default
• Don’t tune (at least not aggressively)
• BFD
• neighbor <> fall-over bfd
• Interface Tracking
• Notifies BGP if interface/route down
• Enabled by default
BGP
• BGP and IGP Convergence tuning have a different focus
•IGPConvergence - Rebuild the topology quickly following an event
•BGP Convergence - Transfer large amounts of prefix information very quickly
• The magnitude of time involved is different

•IGP- Sub-Second
•BGP - Seconds to Minutes
• Fast IGP Convergence plays a role in maintaining availability for BGP

prefixes
Often topological changes can result in no BGP changes, the IGP updates the
next-hop information for BGP prefixes
BGP
Initial Convergence
Initial convergence is limited by

the amount of work that needs to be done and
the router/network’s ability to do this fast and efficiently
BGP
Initial Convergence
Initial convergence is limited by the amount of work that needs to be done and the
router/network’s ability to do this fast and efficiently
•Thenumber of packets required to transfer the entire BGP database
• The number of routes
• The number of peers
• The ability of BGP to pack routes into a small number of packets
• The number of peer specific policies
•TCP transport issues
• How often does TCP go into slow start?
• How much can TCP put into one packet?
•Router Specific
• Horsepower of your CPU, Code version, Outbound Interface Speed
BGP Failure Detection – Configuring BFD
• BFD
XE-NX(config)# router bgp 65535
XE-NX(config-router)# neighbor <> fall-over bfd
RP/0/RSP0/CPU0:XR(config)# router bgp 65535
RP/0/RSP0/CPU0:XR(config-bgp)# bfd mimimum-interval 50
RP/0/RSP0/CPU0:XR(config-bgp)# bfd multipler 3
RP/0/RSP0/CPU0:XR(config-bgp)# neighbor <>
RP/0/RSP0/CPU0:XR(config-bgp-nbr)# bfd fast-detect
BGP Event Propagation
• MTU L3 Source
• Bigger packets L3 Destination
TTL
• BGP Based On TCP
• MSS Source Port
• Maximum amount of TCP data Destination Port
• Window Size Flags MTU
• Local TCP buffer Window
• ACKs reduce window as it fills
• Update Groups BGP Routes MSS

• Single policy update per group
• More groups = more work
BGP Event Propagation
IOS-XE(config)# ip tcp mss <>

IOS-XE(config)# ip tcp window-size <>
IOS-XE(config)# ip tcp selective-ack
RP/0/RSP0/CPU0:XR(config)# tcp selective-ack

BGP Routing Update
• BGP Scanner
• Old and Busted
• The janitor of BGP
• Runs every 60 seconds
• Next Hop Tracking

• New Hotness
• Event driven (3-5 sec delay)
• IGP metric or path change
XE-NX(config)# router bgp 65535

XE-NX(config-router)# bgp nexthop trigger-delay <>
RP/0/RSP0/CPU0:XR(config-bgp)# address-family ipv4 unicast
RP/0/RSP0/CPU0:XR(config-bgp-af)# nexthop trigger-delay critical <> non-critical <>
BGP Routing Update – PIC Core
• Flat RIB = slow convergence
10.1.1.0/24 192.168.1.1
10.1.2.0/24 192.168.1.1
10.1.3.0/24 192.168.1.1
• Before PIC
• Update per route
• Convergence dependent on
BGP RIB size
• Instead of flat FIB, Hierarchical
10.1.1.0/24 192.168.1.1
10.1.2.0/24 192.168.1.1
10.1.3.0/24 192.168.1.1
• Instead of flat FIB, Hierarchical
10.1.1.0/24 192.168.1.1
10.1.2.0/24 Next Hop 1 192.168.1.1
10.1.3.0/24 192.168.1.1
• Single change updates multiple entries
• Convergence time independent from prefix
count
7600(config)# cef table output-chain build favor convergence-speed
BGP
Minimum Route Advertisement Interval (MRAI)
“…determines the minimum amount of time that must elapse

between an advertisement and/or withdrawal of routes to a
particular destination by a BGP speaker to a peer. This rate
limiting procedure applies on a per-destination basis, although
the value of MinRouteAdvertisementIntervalTimer is set on a
per BGP peer basis.”
RFC 4271
Section 9.2.1.1
BGP
• MRAI timers are maintained per peer
•iBGP – 0 seconds by default
•eBGP – 30 seconds by default
•neighbor x.x.x.x advertisement-interval <0-600>
• Pros
•Promotes stability by batching route changes
•Improves update packing in some situations
• Cons
•May drastically slow convergence
•One flapping prefix can slow convergence for other prefixes
BGP
• BGP is not a link state protocol, but instead is path vector based
• May take several “rounds/cycles” of exchanging updates & withdraws for the
network to converge
• MRAI must expire between each round!
• The more fully meshed the network and the more tiers of Autonomous
Systems, the more rounds required for convergence
• Think about
•The many tiers of Autonomous Systems that are in the Internet
•The degree to which peering can be fully meshed
Additional Routed
Convergence
Forwarding Table Overview (CEF)
OSPF
Adjacency Routing
Table Table EIGRP
FIB Hardware CEF

(Software CEF) (TCAM)
Software Hardware
OSPF
Adjacency Routing
Table Table EIGRP
FIB Hardware CEF

Software Hardware
OSPF
Adjacency Routing
Table Table EIGRP
FIB Hardware CEF

Software Hardware
Software CEF Updates
• Controlled by CPU + OS Dual
Quad Core
• Supervisors Matter 2 Quad Core 2.66 GHz
Core 2.27 GHz
2.13 GHz
• RIB Size Matters Single 2 Single
Core Core
• Summarize and Filter Single
1.5 GHz 1.5 GHz
Core
• XE: OSPF prefix suppression 1.5 GHz
• XR/XE: ISIS advertise passive-only
• Process quantum
• XE only
• Prefix Prioritization RSP720 RP1 RSP2 Sup2e RSP440 RP2
• Install /32s first
• OSPF Prefix Suppression (XE Only)
XE(config)#router ospf 1
XE(config-router)#prefix-suppression
• ISIS advertise-passive-only (XE/XR)
XE(config)#router isis CLUS
XE(config-router)#advertise-passive-only
RP/0/RSP0/CPU0:XR(config)# router isis CLUS
RP/0/RSP0/CPU0:XR(config-isis)# address-family ipv4 unicast
RP/0/RSP0/CPU0:XR(config-isis-af)# advertise passive-only
• Process Quantum (XE Only)

XE(config)#process-max-time 50
• Prefix Prioritization
XE(config)#interface g0/0
XE(config-interface)#isis tag 7
XE(config)#router isis CLUS
XE(config-router)# ip route priority high tag 7
RP/0/RSP0/CPU0:XR(config)# ipv4 prefix-list critical-priority-prefix
RP/0/RSP0/CPU0:XR(config-ipv4_pfx)# 10 permit 0.0.0.0/0 eq 32
RP/0/RSP0/CPU0:XR(config)# router isis CLUS
RP/0/RSP0/CPU0:XR(config-isis)# address-family ipv4 unicast
RP/0/RSP0/CPU0:XR(config-isis-af)# spf prefix-priority critical critical-priority-prefix
OSPF
Adjacency Routing
Table Table EIGRP
FIB Hardware CEF

Software Hardware
OSPF
Adjacency Routing
Table Table EIGRP
FIB Hardware CEF

Software Hardware
OSPF
Adjacency Routing
Table Table EIGRP
FIB Hardware CEF

Software Hardware
BGP Prefix Independent Convergence (PIC)
• What is it, and why?
30
Connectivity
25
t, Loss Of
20
15 no pic
10 pic
5
0
n 2n 3n 4n 5n 6n
• PIC is the ability to restore forwarding without resorting to per prefix operations.
• Loss Of Connectivity does not increase as my network grows (one problem
less).
And Then There Were LFAs
• Loop Free Alternate: a routing protocol calculates the next-best hop should a
link fail (per-link LFA) or a particular prefix become unreachable (per-prefix
LFA)
• EIGRP’s concept of Feasible Successor is per-prefix LFA
• Recent LFA technologies simply apply FS logic to link-state protocol
LFA in Link-State Protocols
• WWLSPD (What Would Link-State Protocols Do?)
• OSPF and ISIS can apply the same concept as EIGRP’s FS
• Calculate the second-best NH in the event of a failure
• …variants can handle link or node failure
• Cannot work in all topologies

• Same as EIGRP; not all topos have a FS
Non Stop Forwarding and
Graceful Restart
NSF/GR Frequently Used Terms
 SSO allows standby RP
to take immediate control
 NSF continues to forward
packets until route
convergence is complete
Active Standby
 GR (graceful restart) RP RP
reestablishes the routing
information bases without
churning the network Line Card Line Card
NSF/GR NSF/SSO
 Limit restart to be local event, not network Redundant Route Processors
wide
RIB RIB
 Packet forwarding during switchover while
routing is converging on Standby RP
FIB FIB
 Non-stop forwarding of packets while

control plane is reestablished and routing
information is validated
Packet forwarding continues using current
FIB (on all LCs)
forwarding information base (FIB)
 Layer 3 (BGP, OSPF, IS-IS, EIGRP)

recovers routing information from
NSF/SSO Seeks to
neighbors, rebuilds routing information
Preserve Traffic
base (RIB) and updates FIB
Forwarding
NSF/ SSO
Graceful Restart Relationship Building Process
NSF/SSO Capable NSF Aware Peer

“I Can Preserve My During Restart
Forwarding Table 1) I Will Preserve Forwarding Table
During Restart” Agreement 2) I Will Not Declare you Dead
3) I Will Not Inform my Neighbors
I Have Restarted Restart Notification

OK. I Acknowledge.
and Acknowledgement
I Will Stick to My
Agreement
I Will Use Your Knowledge Transfer
Knowledge to This is my knowledge of
Build My the network
Database Updates
NSF/GR OSPF
• OSPF uses an extension to the hello
packets called link local signaling.
Control Data A
Empty Hello + Restart

 The first hello A sends to B has an
empty neighbor list; this tells B that
something is wrong with the neighbor
relationship.
 A sets the restart bit in its hello, which

tells B that A is still forwarding traffic,
and would like to resynchronize its
database. B
Control Data
NSF/GR OSPF
• B moves A into the exchange state,
and uses out of band signaling
(OOB) to resynchronize their Control Data A
databases.
DBD exchange
LSA exchange
• This process is the same as initial
database synchronization, but it uses
set A to
different packet types.
exchange
Control Data B
NSF/GR OSPF
• When A and B have resynchronized
their databases, they place each
other in full state, and run SPF. Control Data A
• After running SPF, the local routing

table is updated, and OSPF notifies
CEF.
• CEF then updates the forwarding
tables, and removes all information
marked as stale.
Control Data B
NSF/GR OSPF
router ospf 100
• Use the nsf command under the nsf A
router ospf configuration mode to ....
enable graceful restart.
• Show ip ospf can be used to verify
graceful restart is operational. router ospf 100
nsf
....
router#sh ip ospf
....
B
Non-Stop Forwarding enabled, last NSF restart 00:02:06
ago (took 44 secs)
router#show ip ospf neighbor detail
Neighbor 3.3.3.3, interface address 170.10.10.3
....
Options is 0x52
LLS Options is 0x1 (LR), last OOB-Resync 00:02:22 ago
NSF/GR BGP
• When the BGP peering session is
brought up, the graceful restart
capability is negotiated. If both peers Control Data
state they are capable of GR, it is A
New TCP Session

GR capability
enabled on the peering session.
 When A restarts, it opens a new TCP
session to B, using the same router Restart; close
ID. old session
 B interprets this as a restart, and

closes the old TCP session.
Control Data B
NSF/GR BGP
• B transmits updates containing its BGP
table (it’s local RIB out). Control Data
A
A
End of RIB Marker

 A goes into read only mode, and does
Updates
not run the bestpath calculations until
its B has finished sending updates.
Read only
mode
 When B has finished sending updates,
it sends an end of RIB marker, which is
an update with an empty withdrawn
NLRI TLV.
Control Data B
B
NSF/GR BGP
• When A receives the end of RIB
marker, it runs bestpath, and installs
the best routes in the routing table. Control Data A
• After the local routing table is

updated, BGP notifies CEF.
• CEF then updates the forwarding
tables, and removes all information
marked as stale.
Control Data B
NSF/GR BGP
• Use the bgp graceful-restart
command under the router bgp router bgp 65000
bgp graceful-restart A
configuration mode to enable ....
graceful restart.
• Show ip bgp neighbors can be
used to verify graceful restart is router bgp 65501
operational. bgp graceful-restart
....
router#show ip bgp neighbors x.x.x.x

....
Neighbor capabilities: B
....
Graceful Restart Capabilty:advertised and received
Remote Restart timer is 120 seconds
Address families preserved by peer:
IPv4 Unicast, IPv4 Multicast
Non Stop Routing (NSR)
Non Stop Routing (NSR)
 SSO allows standby RP to take

immediate control
 NSF continues to forward packets until State Information
route convergence is complete
 GR (graceful restart)
reestablishes the routing information
bases without churning the network Active Standby
 NSR (Non-Stop Routing) RP RP
Continue forwarding packets
and maintain routing state
Line Card Line Card
Non-Stop Routing (NSR) Operation
ACTIVE RP STANDBY RP
Process routes from peers and update DB Process routes from peers and update DB
1 1 independently
independently
2 Active mirrors bestpath info before advertising to peers
route DB 3 Active sends updates to peers route DB
BGP Active synchronizes the send state BGP
4 information after mapping from the TCP
ACKs
RIB LSD TCP RIB LSD TCP
 Unlike GR, NSR is a self-contained solution to maintain the routing topology across HA events
 TCP connections and the routing protocol sessions are migrated from the active RP to standby RP without
letting the peers knowing about the switchover
 Does not depend on any protocol extensions—relies on forwarding-plane’s NSF capability
 Neighbors/protocol peers and rest of the network do not notice that an OSPF/LDP/BGP process went
through a restart
Minimal LSA/Route information re-flooded during NSR recovery
Overall CPU usage greatly reduced during NSR recovery
Improves reliability of the overall system
Agenda
o Thinking About Fast Convergence
 Reactive Convergence
o Failure Detection
o Event Propagation
o Routing Update
o BGP Convergence
 Forwarding Table Update
• Proactive Convergence
• Closing Remarks
Agenda
o Thinking About Fast Convergence

o Reactive Convergence
 Proactive Convergence
 Loop Free Alternate (IP FRR)
• BGP PIC Edge
• LDP Session Protection
• Closing Remarks
OSPF Loop Free Alternate
• A has a primary (A-C) and

secondary (A-B-C) path to
10.1.1.0/24
B
• Link State allows A to know entire
topology
• A should know that B is an 10.1.1.0/24
A C
alternative path
• Loop Free Alternate (LFA)
OSPF Loop Free Alternate
• OSPF presents a primary and backup to CEF
• Backup calculated from secondary SPF run
RouterA# show ip route 10.1.1.0
Routing Descriptor Blocks:
* 172.16.0.1, from 192.168.255.1, 00:01:57 ago, via Ethernet4/1/0
Repair Path: 192.168.0.2, via Ethernet4/2/0
RouterA#show ip CEF 10.1.1.0

10.1.1.0/24
nexthop 172.16.0.1 Ethernet4/1/0
repair: attached-nexthop 192.168.0.2 Ethernet4/2/0
EIGRP LFA
RouterB#show ip route 172.16.2.0
Known via "eigrp 10", distance 90, metric 1100800, type
internal
* 172.16.1.2, from 172.16.1.2, 00:00:17 ago, via Ethernet0/1
Repair Path: 192.168.1.1, via Ethernet0/0
RouterB#show ip cef 172.16.2.0

172.16.2.0/24
nexthop 172.16.1.2 Ethernet0/1
repair: attached-nexthop 192.168.1.1 Ethernet0/0
LFA Configuration Examples
XE(config)# router ospf 1
XE(config-router)# fast-reroute per-prefix enable
!
XE(config)# router eigrp CLUS
XE(config-router)# address-family ipv4 autonomous-system 10
XE(config-router-af)# topology base
XE(config-router-af-topology)# fast-reroute per-prefix all
RP/0/RSP0/CPU0:XR(config)# router ospf 1

RP/0/RSP0/CPU0:XR(config-ospf)# area 0
RP/0/RSP0/CPU0:XR(config-ospf-ar)# interface G0/5/0/0
RP/0/RSP0/CPU0:XR(config-ospf-ar)# fast-reroute per-link enable
BGP PIC Edge – Link Protection
Swap PE4 label
and redirect!
PE1 PE3
CE CE
1 2
PE2 PE4
BGP PIC Edge
XE(config)# router bgp 65535

XE(config-router)# address-family vpnv4
XE(config-router-af)# bgp advertise-best-external
XE(config-router-af)# bgp additional-paths install
RP/0/RSP0/CPU0:XR(config)# route-policy BACKUP

RP/0/RSP0/CPU0:XR(config-rpl)# setp pat-selection backup 1 install
RP/0/RSP0/CPU0:XR(config-bgp)# address-family vpnv4 unicast
RP/0/RSP0/CPU0:XR(config-bgp-af)# additional-paths selection route-policy BACKUP
Summary
• Designing for FC is more than tuning a few timers
• Designers need to look at all network layers
Layer 1 and Layer 2 for failure detection properties and physical topology (shared-risk
link groups)
Layer 3 protocol behaviour, interactions between different protocols
Layer 4-7 for application requirements and behaviour
Participate in the “My Favorite Speaker” Contest
Promote Your Favorite Speaker and You Could Be a Winner
• Promote your favorite speaker through Twitter and you could win $200 of Cisco
Press products (@CiscoPress)
• Send a tweet and include
• Your favorite speaker’s Twitter handle <@DeniseFishburne>
• Two hashtags: #CLUS #MyFavoriteSpeaker
• You can submit an entry for more than one of your “favorite” speakers
• Don’t forget to follow @CiscoLive and @CiscoPress
• View the official rules at http://bit.ly/CLUSwin
Complete Your Online Session Evaluation
• Give us your feedback to be
entered into a Daily Survey
Drawing. A daily winner
will receive a $750 Amazon
gift card.
• Complete your session surveys
though the Cisco Live mobile
app or your computer on
Cisco Live Connect.
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
CiscoLive.com/Online
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
Thank you

BRKRST 3363

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BRKRST 3363

Uploaded by

Copyright:

Available Formats

Routed Fast Convergence

Denise “Fish” Fishburne

CCIE #2639, CCDE 2009:0014

• More than timers

• Not just about routing protocols

• Not just about failure recovery

• Not just about one node

Failure Detection + Event Propagation + Routing Process + FIB Update

Neighbor Down Tell Neighbors RIB + CEF + Hardware

• Failure Detection • Routing Process

• Failure Detection • Routing Process

• Event Propagation • FIB Update

• Event Propagation • FIB Update

• Network convergence is the time required to:

Availability DPM Downtime Per Year (24x365)

 Consistent processes for fault, configuration, performance and security

If you know these three numbers you can tell…

• Fast Convergence is about getting the routing protocol quicker to converge

• What can the network do to minimize loss in the event of a failure?

• What can we do outside the routing protocol?

Link Failure -> Interface Down, Easy?

• Debounce Timer • Carrier Delay

One physical cable

Layer 3 Routing Adjacency

DWDM/X without LoS

Layer 3 Routing Adjacency

• In normal scenarios if a failure occurred in the L2 bridged network, the layer 3

• Failure Detection • Routing Process

Neighbor Down LSA generation RIB + CEF + Hardware

Neighbor Down: Dead Time vs. BFD

• Recurring LSA Origination Delay

 Fast LSA Generation

• SPF-DELAY and SPF-HOLDTIME protect the router as the cost of

Routing Process "ospf 100" with ID 30.0.0.4

General theory for timer tuning

• Reduce Query Domains A

• Reduce Query Domains A

Metric based on bandwidth and delay

Feasible Successor reported distance (281600)

 Feasibility Condition met

• BGP generally relies on IGP OSPF OSPF

• Little Events vs. Big Events D

Think about data plane over control plane

• The magnitude of time involved is different

• Fast IGP Convergence plays a role in maintaining availability for BGP

Initial convergence is limited by

• Update Groups BGP Routes MSS

IOS-XE(config)# ip tcp mss <>

RP/0/RSP0/CPU0:XR(config)# tcp selective-ack

• Next Hop Tracking

XE-NX(config)# router bgp 65535

10.1.2.0/24 Next Hop 1 192.168.1.1

“…determines the minimum amount of time that must elapse

FIB Hardware CEF

FIB Hardware CEF

FIB Hardware CEF

• Process Quantum (XE Only)

FIB Hardware CEF

FIB Hardware CEF

FIB Hardware CEF

• Cannot work in all topologies

 Non-stop forwarding of packets while