Professional Documents
Culture Documents
BRKIPM 2001 NSF NSR
BRKIPM 2001 NSF NSR
BRKIPM 2001 NSF NSR
v1.1
Routing High Availability NSF & NSR
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
4
Agenda
Setting the stage Introduction
Non-Stop Forwarding & Graceful Restart (NSF/GR)
Non-Stop Routing (NSR)
Deployment Considerations and Scenarios
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
5
BRKIPM-2001
Introduction High Availability
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
6
Availability Definitions
The probability that an item (or network, etc.) is operational, and
functional as needed, at any point in time
Or, the expected or measured fraction of time the defined service,
device or area is operational; annual uptime is the amount (in
days, hrs., min., etc.) the item is operational in a year
Network Provider
Shared Network
Server
Network
User
Network
Availability
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
8
Availability Definitions
Network Availability
There is a working network
path between source and
destination (generally bi-
directionally)
Generally involves only the
Network Layer (OSI Layer 3)
Service Availability
The offered service performs
according to the stated SLAs
(packet loss, delay, jitter,
response time, etc.)
Involves all layers
Network vs. Service Availability
Our focus is on Network Availability today
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
9
What Is High Availability?
DPM = Defects per Million (Hours of Running Time)
Availability Downtime Per Year (24x365)
99.000%
99.500%
99.900%
99.950%
99.990%
99.999%
99.9999%
3 Days
1 Day
53 Minutes
5 Minutes
30 Seconds
15 Hours
19 Hours
8 Hours
4 Hours
36 Minutes
48 Minutes
46 Minutes
23 Minutes
DPM
10000
5000
1000
500
100
10
1
High
Availability
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
11
Most common causes of downtime
Telco/ISP
35% Human error
31%
Power
failure
14%
Hardware
failure
12%
Other 8%
Common causes of Enterprise Network Downtime **
Embedded Management
Best Practices
System and Network
Level Resiliency
Mitigating the Exposure:
Targeting Downtime
Operational
Process
40%
Network
20%
Software
Application
40%
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
13
What is Routing High Availability?
Routing HA
Set of technologies & features
to enable traffic to continue to
flow through a device during
a fault
Routing HA maintains the
logical network topology while
the faulty device recovers
Routing HA helps to address
failures within the control
plane of a routing device
Routing HA increases the
resiliency of a single system
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
14
What is Routing Fast Convergence?
Routing FC
Set of technologies & features
to enable traffic to continue to
flow around a device during a
fault
Routing FC adapts the logical
network topology to avoid the
faulty component
Routing FC targets to address
any component failure within
a routing device
Routing FC increases the
resiliency of the network
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
15
What is Routing Fast Convergence?
Routing FC
Set of technologies & features
to enable traffic to continue to
flow around a device during a
fault
Routing FC adapts the logical
network topology to avoid the
faulty component
Routing FC targets to address
any component failure within
a routing device
Routing FC increases the
resiliency of the network
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
16
Routing Convergence vs. Routing HA
Routing FC
Set of technologies & features
to enable traffic to continue to
flow around a device during a
fault
Routing FC adapts the logical
network topology to avoid the
faulty component
Routing FC targets to address
any component failure within
a routing device
Routing FC increases the
resiliency of the network
Routing HA
Set of technologies & features
to enable traffic to continue to
flow through a device during
a fault
Routing HA maintains the
logical network topology while
the faulty device recovers
Routing HA helps to address
failures within the control
plane of a routing device
Routing HA increases the
resiliency of a single system
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
17
Main Routing HA Applications
Route Processor failure
Routing Process failure
(modular OS)
Chassis Failure
Cat6k-VSS
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
18
Routing HA to help Planned Downtime
Routing HA technologies can assist minimizing
customer impact during planned maintenance
Controlled RP failover, for example to swap hardware, or to
upgrade memory on RPs
Routing Protocol patches (IOS-XR)
Clearing BGP Sessions (IOS-XR)
HA technologies pre-requisite for In-Service
Software Upgrade
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
19
BRKIPM-2001
Non-Stop-Forwarding (NSF)
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
20
Behaviour without NSF
Router A loses its control
plane for some period of time
It will take some time for
Router B to recognize this
failure, and react to it
Control Data
A
Control Data
B
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
21
Behaviour without NSF
During the time that A has
failed, and B has not detected
the failure, B will continue
forwarding traffic through A
Once the control plane resets,
the data plane will reset as
well, and this traffic will be
dropped
NSF reduces or eliminates the
traffic dropped while As
control plane is down
Control Data
A
Reset
Control Data
B
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
22
Prerequisite 1: Separated Forwarding Plane
CPU
IOS
interfaces interfaces
Route DRAM
Packet DRAM ASIC NP (Network
Processor)
Interconnect
Control Packet
Data Packet
Data Packet
Control Plane
- RIB (Routing
Information Base)
- aka. routing table
Data Plane
- FIB (Forwarding
Information Base)
Concept of separated control- and forwarding plane
essential for routing HA
Routing HA maintains the forwarding plane while the control
plane restarts/recovers
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
23
Prerequisite 1: Separated Forwarding Plane
Control
Plane
Engine0 622M
IOS
buff.
SPA
SPA
Q
NP
buff.
Engine5 10G
NP
Q
buff. IOS
IOS
Engine3 3G
Q
F
buff.
F
Q
buff. IOS
Engine6 20G
RP (active) RP (standby)
NP
buff.
NP
Q
buff.
Q
CPU
IOS
CPU
IOS
Data
Plane
Distributed router architectures have this natively
Forwarding information base (FIB) located on Linecards
Cisco 12000
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
24
Prerequisite 1: Separated Forwarding Plane
IOS IOS
F
IOS IOS
Sup720 (standby)
F
buff.
buff.
buff.
buff.
IOS
F
4, 6, 9, or 13 Linecard/Sup slots
buff.
buff.
SP RP
SP RP
buff.
buff.
buff.
buff.
F
IOS
buff.
buff.
buff.
buff.
buff.
buff.
20G
F
IOS
Catalyst 6500
Cat6500 also has it, despite FIB and Switching Matrix located
physically on RP
FIB is synced between active and standby
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
25
Prerequisite 2: Stateful Switch Over (SSO)
Any routing HA requires one important mechanism:
The link and its line protocol need to stay up
If not, all neighbours would re-route across the restarting
node
Can be trivial: Keep the linecard up and laser on,
for example for POS/HDLC
Keeping physical link active is easy with Ethernet
as well, but need to sync ARP/v6ND/adjacency
information
Can be complex: PPP, ATM or FrameRelay require
state to be maintained when failing over the control-
plane, sync needed as well
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
26
GR/NSF Fundamentals
If A is NSF capable, the control
plane will not reset the data
plane when it restart
Instead, the forwarding
information in the data plane is
marked as stale
Any traffic B sends to A will still
be switched based on the last
known forwarding information
This is the Non-Stop
Forwarding behaviour
Control Data
A
No reset
Control Data
B
Mark forwarding
information as stale
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
27
GR/NSF Fundamentals
While As control plane is
down, the routing protocol hold
timer on B counts down....
A has to come back up and
signal B before Bs hold timer
expires, or B will route around
it
When A comes back up, it
signals B that it is still
forwarding traffic, and would
like to resync
This is the first step in
Graceful Restart (GR)
Hold Timer: 15 14 13 12 11 10 9876
Control Data
A
Control Data
B
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
28
GR/NSF Fundamentals
The second GR phase deals
with neighbors updating the
restarting routers routing table
This involves new protocol
mechanisms
Control Data
Control Data
A
B
I
m
r
e
s
t
a
r
t
i
n
g
O
k
,
f
i
n
e
,
I
l
l
s
e
n
d
r
o
u
t
e
s
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
29
GR/NSF Fundamentals Summary
Key Components of NSF on the restarting
router
Keeping interfaces/linecards up
Maintaining Forwarding State in the data plane
Synchronizing routing information post failover
On the neighbouring router(s)
Maintain routes while neighbour restarts
Help restarting node synchronizing its routing table
GR/NSF implementation in various protocols
generally differ in the way synchronization
works
NSF/GR
capable
NSF/GR
aware
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
30
EIGRP GR/NSF Fundamentals
The signal in EIGRP is an
update with the initialization
and restart (RS) bits set.
A sends its hellos with the
restart bit set until GR is
complete.
B transmits the routing
information it knows to A.
When B is finished sending
information, it sends a special
end of table signal so A knows
the table is complete
A
B
T
o
p
o
l
o
g
y
i
n
f
o
r
m
a
t
i
o
n
H
e
l
l
o
+
R
e
s
t
a
r
t
I
n
i
t
+
R
e
s
t
a
r
t
E
n
d
o
f
t
a
b
l
e
Control Data
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
31
Control Data
EIGRP GR/NSF Fundamentals
When A receives this end of
table marker, it recalculates its
topology table, and updates
the local routing table
When the local routing table is
completely updated, EIGRP
notifies CEF
CEF then updates the
forwarding tables, and
removes all information
marked as stale
A
B
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
32
EIGRP GR/NSF Configuration
Use the nsf command under
the router eigrp configuration
mode to enable graceful
restart
no configuration required on
helper node
Show ip protocols can be
used to verify graceful restart
is operational
Currently only supported for
IPv4
A
B
router eigrp 100
nsf
....
A#show ip protocols
Routing Protocol is "eigrp 100
....
Redistributing: eigrp 100
EIGRP NSF-aware route hold timer is 240s
EIGRP NSF enabled
NSF signal timer is 20s
NSF converge timer is
....
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper0900aecd8023df74.shtml
http://www.cisco.com/en/US/products/sw/iosswrel/ps1839/products_feature_guide09186a0080160010.html
Restarting Node
Helper Node
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
33
OSPF NSF Implementations
There are two mechanisms: Cisco- and IETF
(RFC3623) Style
cisco-Style is also defined as in informational RFC4811 &
RFC4812
Approaches differ in the ways ...
the restart process is signalled
the restarting node synchronizes the LSA database
deciding when to abort the GR process
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
34
OSPF NSF Cisco Style
OSPF uses an extension to
the hello packets called link
local signaling
The first hello A sends to B
has an empty neighbor list;
this tells B that something is
wrong with the neighbor
relationship
A sets the restart bit in its
hello, which tells B that A is
still forwarding traffic, and
would like to resynchronize its
database
A
B
E
m
p
t
y
H
e
l
l
o
+
R
e
s
t
a
r
t
Control Data
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
35
OSPF NSF Cisco Style
B moves A into the exchange
state, and uses out of band
signaling (OOB) to
resynchronize their databases
This process is the same as
initial database
synchronization, but it uses
different packet types
A
B
D
B
D
e
x
c
h
a
n
g
e
Set A to
exchange
L
S
A
e
x
c
h
a
n
g
e
Control Data
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
36
Control Data
OSPF NSF Cisco Style
When A and B have
resynchronized their
databases, they place each
other in full state, and run SPF
After running SPF, the local
routing table is updated, and
OSPF notifies CEF
CEF then updates the
forwarding tables, and
removes all information
marked as stale
A
B
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
37
OSPF NSF CISCO Configuration
A
B
router ospf 1
nsf cisco
router ospf 1
Restarting Node
Helper Node
B#show ip ospf int
GigabitEthernet0/0 is up, line protocol is up
NSF capable
Restarting Node
Helper Node
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
48
IS-IS GR/NSF Fundamentals (Cisco-Style)
IS-IS Cisco-Style works
without any GR protocol
extensions
IS-IS constantly syncs the
neighbour adjacency state as
well as LSP header
checkpoints on the standby
Once A restarts, it requests the
full LSPs from its neighbors,
using a CSNP (Complete
Sequence Number Packet)
packet
Neighbour follows regular IS-
IS mechanisms and floods its
complete LSP database
A
B
C
S
N
P
L
S
P
s
Control Data
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
49
Control Data
Control Data
IS-IS GR/NSF Fundamentals (Cisco)
When A has resynchronized its
database, A runs SPF
After running SPF, the local
routing table is updated, and
IS-IS notifies CEF
CEF then updates the
forwarding tables, and
removes all information
marked as stale
A
B
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
50
IS-IS GR/NSF Cisco Configuration
Use the nsf cisco command
under the router isis configuration
mode to enable graceful restart
No configuration required on
helper node
A
B
router isis
nsf cisco
....
A#show isis nsf
NSF is ENABLED, mode cisco'
Restarting Node
Helper Node
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
51
IS-IS IETF/RFC3847 vs. Cisco
With nsf cisco requiring no
protocol extensions to
synchronize the LSDB,
deploying it is much easier
Cisco nodes configured with
nsf cisco will also signal
support for neighbours using
IETF-style GR
A
B
router isis
nsf cisco
....
B#show clns neighbor detail
System Id Interface SNPA
neighborxx Gi4/3 0005.00fe.3444
NSF capable
router isis
nsf ietf
....
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
52
BGP GR/NSF Fundamentals
Graceful restart capability is
negotiated when session comes
up. If both peers state they are
capable of GR, its enabled on the
peering session, on a per-
address-family (ipv4, ipv6, vpnv4,
etc.) basis
When A restarts, it opens a new
TCP session to B, using the same
router ID
B interprets this as a restart, and
closes the old TCP session
B also considers TCP session going
down as a signal for A restarting
While A restarts, B marks all paths
received from A as stale
A
B
G
R
c
a
p
a
b
i
l
i
t
y
N
e
w
T
C
P
S
e
s
s
i
o
n
Restart; close
old session
r3#show ip bgp 10.20.0.0
BGP routing table entry for 10.20.0.0/16, version 47
Paths: (1 available, best #1, table Default-IP-Routing
Flag: 0x820
Not advertised to any peer
Local, (stale)
10.0.0.2 (metric 21) from 10.0.0.1 (0.0.0.0)
Origin IGP, metric 0, localpref 100, valid, internal, best
Control Data
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
53
BGP GR/NSF Fundamentals
B transmits updates containing
its BGP table (its local RIB
out)
A goes into read only mode,
and does not run the bestpath
calculations until its B has
finished sending updates
When B has finished sending
updates, it sends an end of
RIB marker, which is an
update with an empty
withdrawn NLRI TLV
A
B
U
p
d
a
t
e
s
E
n
d
o
f
R
I
B
M
a
r
k
e
r
Read only
mode
Control Data
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
54
Control Data
BGP GR/NSF Fundamentals
When A receives the end of
RIB marker, it runs bestpath,
and installs the best routes in
the routing table
After the local routing table is
updated, BGP notifies CEF
CEF then updates the
forwarding tables, and
removes all information
marked as stale
A
B
Control Data
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
55
BGP GR/NSF Fundamentals
Use the bgp graceful-restart
command under the global router bgp
configuration mode to enable graceful
restart
IOS-XR and recent IOS can
disable it on a per-nbr basis
Needs to be enabled on both ends,
sessions need to be reset in order for
the config to take effect
Show ip bgp neighbors can be
used to verify graceful restart is
operational
A
B
router#show ip bgp neighbors x.x.x.x
....
Neighbor capabilities:
....
Graceful Restart Capabilty:advertised and received
Remote Restart timer is 120 seconds
Address families preserved by peer:
IPv4 Unicast, IPv4 Multicast
router bgp 65000
bgp graceful-restart
....
router bgp 65501
bgp graceful-restart
....
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
56
GR/NSF Summary
All NSF protocols require some form of neighbour
interaction and functionality/configuration on the
adjacent systems
Holding onto the routes while the neighbour restarts
Re-Sending the routing information
Deploying NSF in scaled edge deployments (for
example large hub site or service provider edge)
can be challenging as all neighbors need to be
touched (config, OS upgrade, etc.)
What if we used another approach
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
57
BRKIPM-2001
Non-Stop Routing
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
58
Non-stop Routing NSR
Idea: Why not sync all
routing protocol state to
the standby RP (or
standby process)?
Restarting RP could pick
up right where the primary
left off
No need to refresh any
information, no need for
the neighbour to know that
anything happened
Easy idea challenging
implementation
Now we absolutely need to
avoid anything to let the
neighbour know
Forwarding
Continues
A
c
t
i
v
e
S
t
a
n
d
b
y
SSO
Line Cards
Routing
Adjacency
Maintained to
Neighbours
No Link Flap
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
59
The easy NSR
IS-IS nsf cisco (available for a long time) actually looks
like NSR (only on the surface, though)
Checkpointed adjacency state (as maintained by hellos) as
well as LSDB on standby, able to recover with existing
protocol mechanism
Neighbour actually notices something happens, but we still
achieve non-stop forwarding
RSVP and PIM in IOS-XR uses checkpoints, refreshes
state from neighbors
There is a substantial difference, to real NSR, though:
restarting node forwards on potentially outdated
information
Lets look at real NSR now
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
60
OSPFv2 NSR (IOS-XR)
Neighbour & interface state
and LSDB constantly
synced between active and
standby
Input packets replicated to
both active and standby (1)
LSDB updated on active &
standby (2a/2b)
Standby ACKs LSA to
Active (3)
Active RP acks LSA to
sender (4)
state & LSDB sync
(4)
(3)
(1)
ACTIVE RP
OSPF
Raw IP
(2a)
OSPF
Raw IP
(2b)
STANDBY RP
Sender/Peer
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
61
More tricky: NSR for TCP-based Protocols
LDP and BGP use TCP for reliable delivery of
PDUs
Eases protocol implementation, but makes NSR quite
challenging
Strict requirement to maintain TCP session during
failover
TCP session reset would be interpreted by nbr as adjacency
down rerouting
How can we reliably maintain the TCP session?
Need to ensure TCP stack on active and standby RP are
synced (sequence numbers, etc.)
Need to ensure to only acknowledge the receipt of a packet
when primary and standby received it
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
62
TCP NSR Receive Path (IOS-XR)
Input pkt replicated to
both active and standby
TCP stack (1)
Standby ACKs pkt to
active once it stored it in
buffer (2)
Once active TCP sees
the ACK, it ACKs pkt to
sender
Active owns TCP
session
TCP delivers data to
application
(4)
(2)
(1)
ACTIVE RP
APP
TCP
(4a)
APP
TCP
(4b)
STANDBY RP
(3)
Sender/Peer
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
63
TCP NSR Send Path (IOS-XR)
In the send path, standby
TCP stack sends the
packet towards the peer
Standby owns the
session
(4)
(2)
ACTIVE RP
APP
TCP
(1)
APP
TCP
STANDBY RP
(3)
Sender/Peer
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
64
NSR Support in IOS-XR
Supported for BGP, OSPFv2,
and LDP
OSPFv3/IPv6 planned for 4.2
Configured on global protocol
level
When GR/NSF is also enabled,
protocols can fall back to NSF in
case NSR is not possible
for example when standby RP is
not in NSR-ready state
generally recommended to enable
NSF alongside NSR
Important to monitor NSR state
on standby
router bgp
nsr
router ospf ..
nsr
mpls ldp
nsr
router isis
nsf cisco
RP/0/RP0/CPU0:router#show redundancy
Redundancy information for node 0/RP0/CPU0:
==========================================
Node 0/RP0/CPU0 is in ACTIVE role
Partner node (0/RP1/CPU0) is in STANDBY role
Standby node in 0/RP1/CPU0 is ready
Standby node in 0/RP1/CPU0 is NSR-ready
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
65
NSR Support in IOS
BGP NSR
Supported for IPv4 VRF
neighbors on c10k and c7600
GR/NSF should also be
enabled
For peers supporting GR, TCP
state is not maintained and
failover is done via NSF
OSPFv2 NSR
coming in 15.1(2)S
GR/NSF can be enabled to
support fallback to NSF in case
NSR not ready
router bgp
bgp graceful-restart
address-family ipv4 vrf ..
neighbor x.x.x.x ha-mode sso
....
# show ip bgp vpnv4 all sso summary
# show tcp ha connections
router ospf 1
nsr
[ nsf cisco|ietf ]
....
# show ip ospf nsr
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
66
NSR Summary
Unique, Self-Contained Routing HA Solution
Simplifies NSF/SSO deployment by synchronizing
edge routes automatically
NSF-aware neighbour devices not needed
Addresses additional network scenarios e.g. unmanaged
CPE devices
Delivers persistent routing for the entire customer
edge
Retains scalability and safety of NSF/GR with
benefits of NSR
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
67
BRKIPM-2001
Deployment Considerations and Use
Cases
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
68
Complex?!?!
Two approaches (NSF and NSR) to address the
same problem
Different protocols, different NSF/NSR variants,
implementations and roadmaps
Different fundamental approaches to increase
availability: HA and Fast Convergence
Lets look at some generic deployment guidance,
some implementation caveats and use cases
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
69
GR/NSF Deployment Considerations
Be careful with partial
deployments of GR/NSF
capability
If B restarts, A will reset its
session, removing all the
routing information it learned
from B. However, D will
continue to forward traffic
through B
This will, at best, cause
asymmetric routing. At worst, it
could cause a routing loop
Router A must be GR capable
or GR aware
Core
GR/NSF capable
A
B C
D
Session reset
D continues
forwarding
Asymmetric
return path
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
70
Service Provider
A
B C
D
OSPF
Multiple Routing Protocols
OSPF is configured for
GR/NSF, while BGP is not
Ds next hop for all routes is A;
the path to A is learned via
OSPF
If the control plane on B
restarts, D will continue
learning BGP routes from C
with a next hop of A; it will also
maintain the best path to that
next hop through B
Best path
to A
BGP learned
routes
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
71
Multiple Routing Protocols
Since the best path to A is still
through B, D will continue
forwarding through B for all the
BGP routes it is learning
through C
B will drop this traffic, since it
is not maintaining its BGP
state, only its OSPF state
If BGP and an IGP are running
together, they must both have
GR enabled
Service Provider
A
B C
D
OSPF
D continues
forwarding
BGP learned
routes
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
72
A
B C
D
IPv6 Deployment Considerations
NSF/NSR implementation for
IPv6 is not yet at the same
state as for IPv4, i.e.
no GR support for IPv6-AF in BGP
in IOS
no NSF support for OSPFv3
but: works with IS-IS
As v4 and v6 routing is
carried in different protocols,
everything is fine
IPv4
continues
through
restarting
node
IPv6
routes
around the
failure
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
73
MPLS Deployments P/LSR Routers
MPLS P (or LSR) routers act as transit
node only
no directly connected customers or
services
Assuming there is sufficient
redundancy and capacity within the
network, it can be better just route
around the failure
There are still several deployments
around with IOS releases not
supporting MPLS SSO
RPR redundancy should be configured
to let linecards reload on RP
failure/failover
Fast Convergence required to minimize
packet loss
A
B C
D
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
74
Other Protocols
To achieve hitless convergence, all protocols and
features involved in routing and forwarding of a
given service along a given path need to be GR
enabled- or capable
All routing protocols
Dont forget PIM (Mcast), RSVP (MPLS-TE)
ARP/IPv6 ND
HSRP/VRRP
etc.
Did we miss anything?
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
75
HA with NAT/FW/IPSec/L2TP
Network Address Tranlsation (NAT), Firewall or
IPSec/L2TP/PPPoX all maintain session state
Broadband platforms (ASR1000, c10k, ASR9000)
support SSO for PPPoX/L2TP to allow for stateful
switch-over
ASR9000 maintains session state on linecard(s), so state is
much easier to maintain for RP failovers
Currently, IPSec (incl. DMVPN), NAT and FW is not
SSO- capable on any platform, so sessions need to be
re-established after RP failover
Lack of SSO support for a fundamental feature like the
ones above on a given platform is often a reason to not
deploy Routing HA at all
We rather want to fail over to a redundant device
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
76
Protocol Hello Considerations
Depending on platform and OS, it can
take a few seconds until standby
process is operational
Neighbour adjacencies configured with
fast hellos could time out, leading to re-
route
Default hello timers are ok, no need to
increase
Restarting RP/process starts to send
hellos as soon as possible and at higher
rate right after restart
Make sure to test failover with tuned
hello times with platforms/software prior
to deployment (see [1] for some test
results)
[1] http://www.cisco.com/en/US/technologies/tk869/tk769/technologies_white_paper09186a00801dce40.html
%OSPF-5-ADJCHG: [],
Neighbor Down: Dead timer
expired
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
77
BFD Consideration
BFD (Bi-directional Forwarding Detection)
is a hello-type protocol designed and
deployed to provide sub-second failure
detection
BFD needs to be SSO-aware to ensure standby RP can take
over
BFD session state synced
Still, platform restrictions apply, ex. 6500/7600 performing RP
failover cause short traffic disruption on bus, affecting traffic
to/from the RPs
S/E chassis and 67xx/ES linecards mitigate this
Still: recommended not to go below 500msec x 3, smaller values
can cause BFD going down
BFD
BFD
OSPF OSPF
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
78
Single-RP Deployments
Any platform only
supporting a single
control plane (i.e. 7200,
ISRs, fixed Catalyst L3-
switches, etc.) can only
act as GR helper node
SSO and NSF is not
configurable
When BGP GR is
configured to act as
helper, they wont
announce GR for any
address family (AF)
10.0.0.2
7600,
dual RP
7200
router#show ip bgp neighbors 10.0.0.2
....
Neighbor capabilities:
...
Graceful Restart Capability: advertised and received
Remote Restart timer is 120 seconds
Address families advertised by peer:
none
router#show ip bgp neighbors 10.0.0.1
....
Neighbor capabilities:
...
Graceful Restart Capability: advertised and received
Remote Restart timer is 120 seconds
Address families advertised by peer:
IPv4 Unicast
10.0.0.1
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
79
Single-RP Deployments
Problematic are dual-RP platforms (i.e.
6500, 7600, 12000, ASR1000) with only a
single RP installed
In this case, redundancy mode can be
configured as RPR, documenting that
linecards/etc. will be restarted when RP
reloads
NSF should not be configured for any protocol,
helper support is generally enabled by default
However, configuring BGP GR (to act as
helper) will announce GR for supported/
configured AFs
Neighbors will hold on to routes if peer goes
down
Recommendation: Avoid single-RP
deployments when using NSF/GR
7600
single RP
router#show ip bgp neighbors 10.0.0.1
....
Neighbor capabilities:
...
Graceful Restart Capability: advertised and received
Remote Restart timer is 120 seconds
Address families advertised by peer:
IPv4 Unicast
2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential BRKIPM-2001
80
Example, using multiple AFs
Remote node shutdown, no failover
router#show bgp all neighbors 10.0.0.1 routes