Tecrst-2310 (2015)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 146

Deployment and Operation of BGP

TECRST-2310
Oliver Boehmer, AS Solutions Architect
Denise “Fish” Fishburne, CPOC Engineer
Agenda
• BGP General Operation
• Controlling Traffic
Overview
Controlling Outbound Traffic
eBGP
BGP Multipath
eBGP
Controlling Inbound Traffic
• Attributes and Best Path Selection
Algorithm • Route Reflectors
Route Origination • Convergence
AS-PATH Initial Convergence
NEXTHOP BGP Routing Convergence
Communities
• High Availability
• Show and Tell/Demo Lab

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
General Operation
Overview AS 20
R2
ASR1K
EBGP Router_20
ASR1K

Routing Information Base


RIB Internet
40.40.40.0/24
IBGP N7K

EBGP Router_30
AS 40
ASR9K
R3 40.40.40.0/24
ASR1K

AS30
AS 10

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Overview AS 20
R2
ASR1K
EBGP Router_20
ASR1K

Routing Information Base


RIB Internet
40.40.40.0/24
IBGP N7K

BGP Best Path EBGP Router_30


AS 40
Algorithm R3 ASR9K

ASR1K
40.40.40.0/24
BGP Table
AS30
40.40.40.0/24
AS 10
Path #1: via Router_20
Path #2: via Router_3

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
BGP General Operation
Peering
• An AS originates their routes into BGP
R2
• BGP peers with other BGP speakers R_20

– Peer is also called “neighbor”


AS 20
– Uses TCP port 179 AS 10 Peering
• BGP peers exchange routes
• Picks the best path R3
– Installs in the forwarding table
– Advertises to BGP peers via UPDATEs
• UPDATEs have Attributes
– Routing policies tweak attributes to influence best path
selection

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Overview AS 20
R2
ASR1K
EBGP Router_20
ASR1K
40.40.40.0/24

Internet
IBGP N7K

R2#sh ip bgp 40.40.40.0


BGP routing table entry for 40.40.40.0/24
EBGP Router_30
AS 40
Paths: (2 avail, best #2, table default) ASR9K
R3
30 40  as-path ASR1K

10.100.100.3 (metric 2) from 10.100.100.3 AS30


Origin IGP, metric 0, localpref 100, valid, internal AS 10
20 40  as-path Internally learned (iBGP)
20.2.20.20 from 20.2.20.20 (20.100.100.20) Externally learned (eBGP)
Origin IGP, localpref 100, valid, external, best
Community: 40:1  community
R2#
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Autonomous System
AS 10 AS 20
R2 R_20

R1
Internet

AS 40
R_30
R3
AS30 • AS Numbers
– Historically 2 bytes
• A network sharing the same • 1 to 65535
routing policy • 64512 to 65535 are private
– Possibly multiple IGPs • Running out of AS numbers…
– Usually under single administrative – RFC 4893
control • 4-byte AS number
• Unique AS for every IPv4 address 

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
IGP vs. EGP
• IGP – Interior Gateway Protocol
– Exchange routes within an Autonomous Systems
– Limited Scalability
– Sub-second convergence
– OSPF, ISIS, EIGRP, etc.

• EGP – Exterior Gateway Protocol


– Exchange routes between Autonomous Systems
– Once was an EGP called “EGP”
– BGP is standard EGP today
– Slower convergence in exchange for scalability

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Incremental Updates
• Once BGP sends a route to a peer, it assumes the peer will keep it
• There is no periodic refresh
• New UPDATEs are sent when
– Bestpath change
– Peer bounces
– Route-Refresh

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
eBGP Peering
eBGP - External BGP
• Neighbor in different AS
AS 20 AS 40
• Usually directly connected
• Next Hop set to self R_20

External
eBGP Interne
t
AS #s ≠
(Autonomous System Numbers)
TTL 1 (default)
(Time to Live) R_30
Next Hop Change
AS 30
Directly Connected Check Enabled
(default)

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
eBGP - External BGP
Configuration
AS 20 AS 40
R_20
router bgp 20
bgp router-id 20.100.100.20 R_20
neighbor 5.20.40.40 remote-as 40
neighbor 5.20.40.40 send-community

Interne
Internet t
router bgp 40
router-id 40.100.100.40
neighbor 5.20.40.20 remote-as 20
address-family ipv4 unicast R_30
send-community
AS 30
Router_20#sh ip bgp summary
BGP router identifier 20.100.100.20, local AS number 20
<snip>
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
5.20.40.40 4 40 8256 9102 4 0 0 5d17h 3
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
eBGP Multihop

AS 10 AS 20

R2 R20

• Peer between loopbacks


• Often used to load-balance traffic over multiple links
router bgp 10
neighbor 10.1.20.1 remote-as 20
neighbor 10.1.20.1 update-source loop0
neighbor 10.1.20.1 ebgp-multihop 2

ip route 10.1.20.1 255.255.255.255 s0/0


ip route 10.1.20.1 255.255.255.255 s1/0
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
iBGP Peering
iBGP - Internal BGP
• Neighbor in same AS R2

• NEXTHOP is unchanged
AS 10
• Peer to loopbacks

Internal R3

iBGP
AS #s =
(Autonomous System Numbers)
TTL 255
(Time to Live)
Next Hop unchanged
Directly Connected Check disabled

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
iBGP - Internal BGP

• Cannot advertise route received from


AS 10
one iBGP peer to another iBGP peer
– Full iBGP mesh is required R2
– n*(n-1)/2 peering mesh – scaling problem!
– Route-Reflectors relax this constraint

R1

R3

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
iBGP – Loopback Peering
• Best Practice
– Loopbacks should be /32s
– Have an IGP route to loopbacks R2

Configuration
R_2 AS 10
router bgp 10
bgp router-id 10.100.100.2
neighbor 10.100.100.3 remote-as 10
neighbor 10.100.100.3 update-source Loopback0 R3
neighbor 10.100.100.3 next-hop-self

R3
router bgp 10
bgp router-id 10.100.100.3
neighbor 10.100.100.2 remote-as 10
neighbor 10.100.100.2 update-source Loopback0
neighbor 10.100.100.2 next-hop-self
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
iBGP – Loopback Peering
• Loopback peering promotes stability AS 10
• There are two paths between R1 and R2
• If the link between them fails R2
– Peering to the interface IP would bring
down the BGP session
– Peering to a loopback allows the session to R1
stay up

R3

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Attributes and Best Path Selection Algorithm
Attributes
• IGP
– Primary attribute is a cost/metric
– The path with the lowest metric is the best…nice and easy

• BGP
– Routing Policy between AS is usually more complex than this
– Shortest path is not necessarily the best one
– Has many attributes to describe reachability to a destination
– The “Best Path Algorithm” compares attributes between different paths to select a best
path
– Route-policies are used tweak attributes to influence routing

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
BGP Route vs. BGP Path
• BGP can have multiple paths per route
• Here we have 2 paths to the 40.40.40.0/24 prefix
R2#sh ip bgp 40.40.40.0
BGP routing table entry for 40.40.40.0/24
Paths: (2 avail, best #2, table default)
30 40
10.100.100.3 (metric 2) from 10.100.100.3
Origin IGP, metric 0, localpref 100, valid, internal
20 40
20.2.20.20 from 20.2.20.20 (20.100.100.20)
Origin IGP, localpref 100, valid, external, best
Community: 40:1
R2#
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
BGP Route vs. BGP Path
• “show ip bgp summary” provides the total number of routes and paths
• Paths and routes both consume memory
• The more paths you have per route, the more memory consumed

R3#sh ip bgp summary


BGP router identifier 10.100.100.3, local AS number 10
BGP table version is 4, main routing table version 4
3 network entries using 432 bytes of memory
6 path entries using 480 bytes of memory
5/3 BGP path/bestpath attribute entries using 800 bytes of memory
2 BGP AS-PATH entries using 48 bytes of memory
1 BGP community entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1784 total bytes of memory
BGP activity 9/6 prefixes, 28/22 paths, scan interval 60 secs
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
BGP Path Selection Algorithm
Attribute Logic
1 Weight Higher is better. Local to the router…not really an attribute.

2 Local Preference Local to an AS…higher is better

3 Locally Originated Corner case…”network 10.0.0.0” vs. “aggregate 10.0.0.0”


vs. “redistribute” on the same router
4 AS-PATH Shorter AS-PATH is better

5 ORIGIN IGP < EGP < Incomplete

6 MED Is often a reflection of IGP metrics so lower is better

7 eBGP vs. iBGP Prefer eBGP path over iBGP path

8 IGP cost to NEXTHOP Lower is better

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
BGP Path Selection Algorithm (contd)
Attribute Logic
9 Lowest Router ID Lower is better

10 Shortest Lower is better


CLUSTER_LIST
11 Lowest neighbor IP Lower is better
address

All details at http://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/13753-25.html

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
BGP Path Selection Algorithm
BGP Attribute “Denise”ism
Weight Wise

Local Preference Lip

Locally Originated Lovers

AS-PATH Apply

ORIGIN Oral

MED Medication

eBGP vs. iBGP Every • Hard to remember all of that?


NEXTHOP IGP Cost Night • Wise Lip Lovers Apply Oral
Medication Every Night 

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Route Origination
Route Origination

AS 40
I am AS 40 and I own 103.103.40.0/24
router bgp 40
Internet
router-id 40.100.100.40
address-family ipv4 unicast
network 103.103.40.0/24

 An AS must “originate” routes for their address space


 Three ways to originate a route…

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Route Origination – Network Statements
• Easiest/Cleanest method
• Network 103.103.0.0 mask 255.255.0.0
– Requires 103.103.0.0/16 to be in the RIB
– Floating static route to Null0 is common
– Originates 103.103.0.0/16
• Easy to determine/control what you are originating
router bgp 40
router-id 40.100.100.40
address-family ipv4 unicast
network 103.103.0.0/16

ip route 103.103.0.0 255.255.0.0 Null0 250

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Route Origination – Network Statements

London-Internet# sh ip bgp 103.103.0.0


BGP routing table information for VRF default, address family IPv4 Unicast
BGP routing table entry for 103.103.0.0/16, version 27
Paths: (1 available, best #1)
Flags: (0x080002) on xmit-list, is not in urib • The Origin is IGP
Advertised path-id 1 • Weight is 32768
Path type: local, path is valid, is best path • “0.0.0.0 from 0.0.0.0”
AS-Path: NONE, path locally originated
0.0.0.0 (metric 0) from 0.0.0.0 (40.100.100.40)
Origin IGP, MED not set, localpref 100, weight 32768
Path-id 1 advertised to peers:
5.20.40.20 5.30.40.30
London-Internet#

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Route Origination – Redistribution
• Routes can be redistributed into BGP
• Pros
– Easy to configure and setup
• Cons
– IGP instability is passed along to BGP
– Isn’t always obvious what routes you are originating
– “Redistribute static” is especially dangerous
• What if someone configures a static route for Google’s address space?
• You could blackhole Google’s traffic

 use route-maps to control what you’re redistributing

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Route Origination – Redistribution
• Things to note
– The nexthop for the OSPF route is 10.1.1.14
– The OSPF metric is 11

R10#show ip route 10.1.1.3


Routing entry for 10.1.1.3/32
Known via "ospf 10", distance 110, metric 11, type intra area
Redistributing via bgp 10
Advertised by bgp 10
Last update from 10.1.1.14 on Ethernet0/2, 02:56:42 ago
Routing Descriptor Blocks:
* 10.1.1.14, from 10.1.1.3, 02:56:42 ago, via Ethernet0/2
Route metric is 11, traffic share count is 1
R1#

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Route Origination – Redistribution
• NEXTHOP uses the IGP nexthop of 10.1.1.14
• ORIGIN is set to “Incomplete”
• “metric” here means MED
– Uses the IGP metric of 11
• Weight is 32768

R10#show ip bgp 10.1.1.3


BGP routing table entry for 10.1.1.3/32, version 5
Paths: (1 available, best #1, table default)
Advertised to update-groups:
9
Local
10.1.1.14 from 0.0.0.0 (10.1.1.2)
Origin incomplete, metric 11, localpref 100, weight 32768, valid, sourced, best
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Route Origination – Aggregation
• Typically used by ISPs to summarize their address space
• Reduces number of routes in global BGP table
 Adds AGGREGATOR attribute
 Contains Router-ID and AS of the router that did the aggregation
 Used for troubleshooting

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Route Origination – Aggregation
 Configure an “aggregate-
address” statement AS 100 Check for component route(s)
R11#show ip bgp 10.1.0.0 255.255.0.0 longer
 BGP table must have R11 10.1.1.0/24, 10.1.2.0/24, etc listed here
component route(s) R1#

– Components are the longer NLRI: 10.1.1.0/24,


10.1.2.0/24, etc
AS-PATH: 10 200 300 400 router bgp 10
length prefixes that fall within
NLRI: 10.1.0.0/16 aggregate-address 10.1.0.0 255.255.0.0
the aggregate’s range AS-PATH: 10
AGGREGATOR AS: 10
AGGREGATOR ID: 10.1.1.1
!
– Use “show ip bgp x.x.x.x
y.y.y.y longer” to check for
components
 Component routes are still R12
advertised
AS 200
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Route Origination – Aggregation
• Adding the keyword AS 100 Check for component route(s)
summary-only R11#show ip bgp 10.1.0.0 255.255.0.0 longer
causes BGP to suppress R11 10.1.1.0/24, 10.1.2.0/24, etc listed here
the components of R1#
the aggregate NLRI: 10.1.1.0/24,
10.1.2.0/24, etc

– Suppressed route: use it,


AS-PATH: 10 200 300 400
router bgp 10
NLRI: 10.1.0.0/16
but do not advertise it to AS-PATH: 10 aggregate-address 10.1.0.0 255.255.0.0 summary-only
AGGREGATOR AS: 10
any peer AGGREGATOR ID: 10.1.1.1 !

R12

AS 200
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Address-Family-Identifier Syntax
• In the beginning BGP only supported IPv4 Unicast
• IPv4 is the AFI, Unicast is the SAFI
• Today there are many supported AFI/SAFIs
– IPv6 Unicast, VPNv4, IPv4 Multicast, etc
• AFI/SAFI specific configuration happens in a sub-context
– Network statements, route-maps on neighbors, etc
• Non AFI/SAFI configuration still happens directly under ‘router bgp’
– Remote-as, update-source, keepalive timers, etc

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Address-Family-Identifier Syntax

Original syntax AFI/SAFI syntax


router bgp 20
router bgp 20 bgp router-id 20.100.100.20
bgp router-id 20.100.100.20 bgp log-neighbor-changes
bgp log-neighbor-changes neighbor 5.20.40.40 remote-as 40
neighbor 5.20.40.40 remote-as 40 !
neighbor 5.20.40.40 send-community address-family ipv4
network 20.100.100.20 mask 255.255.255.255 network 20.100.100.20 mask 255.255.255.255
neighbor 5.20.40.40 activate
neighbor 5.20.40.40 send-community
exit-address-family
!
address-family vpnv4
neighbor 5.20.40.40 activate
neighbor 5.20.40.40 send-community extended

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
AS-PATH
AS-PATH
• The AS-PATH tells the story of what ASes a route has been through
• AS-Path is used for loop detection on the border of the AS
– BGP drops an external update if it sees its own AS in the path
• The most recent AS is on the left, the originating AS is on the far right
• BGP prepends his own AS# to the AS-PATH when advertising to an eBGP
peer
• Shortest AS-PATH is often the tie-breaker for best path selection

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
AS-PATH

AS 10 AS 20 AS 40
R2 R2
0

Internet

R3 R30

AS 30

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
AS-PATH
R2#sh ip bgp 40.1.1.0
BGP routing table entry for 40.1.1.0/24, version 6
Paths: (2 available, best #2, table default)
Advertised to update-groups:
14
Refresh Epoch 1
30 40
10.100.100.3 (metric 2) from 10.100.100.3 (10.100.100.3)
Origin IGP, metric 0, localpref 100, valid, internal
rx pathid: 0, tx pathid: 0
Refresh Epoch 1
20 40
20.2.20.20 from 20.2.20.20 (20.100.100.20)
Origin IGP, localpref 100, valid, external, best
rx pathid: 0, tx pathid: 0x0
R2#
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
NEXTHOP
NEXTHOP
• NEXTHOP is the address that we must route towards in order to reach the
BGP prefix
– Paths where the next-hop is unreachable are not considered for best-path calculation
• eBGP does “next-hop-self” automatically
– Multiple eBGP peers on the same subnet is an exception
• iBGP does not modify the NEXTHOP by default
– NEXTHOP will remain as the IP of the eBGP peer
– Forces BGP speakers in an AS to have routes for the eBGP facing links
– Would need to carry many /30 eBGP facing links in our IGP
– Best practice is to use “next-hop-self” to avoid this

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
iBGP without next-hop-self
 NEXTHOP does not
change AS 10
 AS 10’s IGP must have R2
route to 30.1.1.9
 Adds many /30s to IGP
R1

E0/0
30.1.1.9
R3 R5

AS 30

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
iBGP with next-hop-self
 R3 changes NEXTHOP
to his “update-source” AS 10
interface R2
 iBGP should always
use loopback peering
 AS 10’s IGP has a R1
route to R3’s loopback
10.1.1.3
E0/0
30.1.1.9
R3 R5

Loop0
10.1.1.3 AS 30

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
3rd Party NEXTHOP
• Three BGP routers on a
common subnet Traffic
AS 10
• Peering is a partial-mesh BGP
– Not too common a situation Peering R1

• R1 does NOT modify the E0/0


10.1.1.1
NEXTHOP here
• Prevents R1 from
AS 20 AS 30
becoming a hairpin router E0/0 E0/0
10.1.1.2 10.1.1.3
• Traffic flows from R3 R2 R3
straight to R2

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Communities
Communities
• A COMMUNITY is an attribute that stores a number
– 4-byte number that is usually displayed in X:Y notation
– “ip bgp-community new-format” triggers X:Y notation
• Set communities via a route-map
• Communities are not advertised by default
• A community by itself does nothing
– Tagging a prefix with 100:1 or 100:2 will not change routing in any way

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
Sending Communities
LA#
router bgp 20
neighbor 10.1.1.2 remote-as 10 AS 20
neighbor 10.1.1.2 send-community
neighbor 10.1.1.2 route-map TAG_MY_ROUTES out LA
!
ip bgp-community new-format AS 10
!
R1
route-map TAG_MY_ROUTES permit 10
set community 10:1
! NY
LA#

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Receiving Communities
• Applying Policy towards communities does impact routing
• Use route-maps and community-list to
– Match against a certain community
– Modify a BGP attribute as a result
• LOCALPREF, ASPATH prepending, etc

• You can impact 1000s of prefixes by applying policy based on a single


community

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Communities
R1#
router bgp 10
neighbor 20.1.1.1 description LA_PEER
neighbor 20.1.1.1 route-map NY_OR_LA in
AS 20
neighbor 30.1.1.1 description NY_PEER
neighbor 30.1.1.1 route-map NY_OR_LA in
! LA
ip community-list standard VIA_LA permit 100:1
ip community-list standard VIA_NY permit 100:2 AS 10
!
route-map NY_OR_LA permit 10 R1
match community VIA_LA
set local-preference 120
!
NY
route-map NY_OR_LA permit 20
match community VIA_NY
set local-preference 130
AS 30
!
route-map NY_OR_LA permit 30
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
Post-It Note Analogy
• By themselves the pink, yellow, or blue post-it
notes have no meaning
• “Policy” says what each color signifies
– Pink post-it for Dogs
– Yellow post-it for Cats
– Blue post-it for Aardvarks
• Gives you a way to quickly reference/find all pages
on Aardvarks, or Aardvarks and Cats, etc.

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Well Known Communities
• “A community by itself does nothing”
– There are exceptions to every rule 
• Well Known Communities do have an automatic impact

Community Impact
local-AS Do not send to EBGP peers

no-advertise Do not advertise to any peer

no-export Do not export outside AS/confed

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
Controlling Outbound Traffic
Local Preference
• An attribute used to influence outbound traffic
• Higher LOCAL_PREF is preferred
• Is compared very early in the Best Path Algorithm
• Is local to an AS
– Local preference is never transmitted to an eBGP peer
– A default LP of 100 is applied to routes from eBGP peers

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
Local Preference
AS 10 AS 20 AS 40
R2 R4

R1
R6

R3 R5

AS 30
• Default behavior…LOCALPREF 100
• R2 and R3 prefer eBGP path
• R1 prefers path from R2 over R3 (lower neighbor IP)
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
Local Preference
AS 10 AS 20 AS 40
R2 R4

R1
R6

R3 R5

AS 30
• R2 advertises LOCALPREF of 200
• R1, R2, and R3 all prefer the R2 exit

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
Local Preference

AS 10 R2#
!
R2 router bgp 10
neighbor 10.1.1.1 remote-as 10
neighbor 10.1.1.1 route-map SET_LOCAL_PREF out
neighbor 10.1.1.3 remote-as 10
neighbor 10.1.1.3 route-map SET_LOCAL_PREF out
R1 !
route-map SET_LOCAL_PREF permit 10
set local-preference 200
!

Or: set localpref inbound on eBGP session.. There are


R3 always multiple ways to skin a cat :-}

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
Alternatives to Local Preference
BGP
• Local preference is a very “heavy” attribute to influence routing, as it is Attribute

evaluated very early in best path algorithm Weight

• Especially with Internet routing, AS path length is very important (how Local
Preference
“far” is the destination away from me) Locally
Originated

• Hence, evaluate other attributes like MED for best path manipulation AS-PATH

ORIGIN

• No one size fits all, there are lots of ways to implement BGP routing
MED

policies… eBGP vs. iBGP

NEXTHOP IGP
Cost

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
Applying BGP Policy
• Policy based on various attributes:
– ASPATH
– Community
– Destination prefix
– Many, many others…
• Reject/accept selected routes
• Set attributes to influence path selection
• Tools (IOS):
– Distribute-list or prefix-list
– Filter-list (as-path access-list)
– Community-list
– Route-maps (the Swiss army knife)
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
Policy Control - Prefix List
• Per-peer prefix filter, inbound or router bgp 200
outbound neighbor 220.200.1.1 remote-as 210
neighbor 220.200.1.1 prefix-list PEER-IN in
• Allows coverage for ranges of prefix
neighbor 220.200.1.1 prefix-list PEER-OUT out
lengths (ge, le)
!
• Based upon network numbers in ip prefix-list PEER-IN deny 218.10.0.0/16
NLRI (using familiar IPv4 ip prefix-list PEER-IN permit 0.0.0.0/0 le 32
address/mask format) ip prefix-list PEER-OUT permit 215.7.0.0/16

• Example configuration: ip prefix-list PEER-OUT deny 0.0.0.0/0 le 32

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Policy Control - Prefix List
a.b.c.d/x [ge | eq | le] y

care vs. don’t care bits


operator
base prefix length to match operand

• ip prefix-list PEER-IN permit 10.0.0.0/8 le 32


All 10.x.x.x subnets, regardless of mask length (e.g. 10.1.2.4/24,
10.1.1.1/32, 10.1.0.0/16)
• 0.0.0.0/0 eq 32 = All /32 prefixes (e.g. 1.2.3.4/32)
• 192.168.1.0/24 = 192.168.1.0/24 eq 24 (ONLY 192.168.1.0/24)
• 172.16.0.0/16 ge 28 = all subnets from 172.16.0.0/16 that have a mask
length of /28 or greater (e.g. 172.16.4.0/28)

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Policy Control - Filter List
• Filter routes based on AS path
• Inbound or Outbound
• Example Configuration:
!
router bgp 100
neighbor 220.200.1.1 filter-list 5 out
neighbor 220.200.1.1 filter-list 6 in
!
ip as-path access-list 5 permit ^200$
ip as-path access-list 6 permit ^150$

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
Policy Control - Regular Expressions
• Simple Examples
.* Match anything
^$ Match routes local to this AS (as-path is empty)
_1800$ Originated by 1800 (as-path ends with 1800)
^1800_ Received from 1800 (as-path starts with 1800)
_1800_ AS 1800 is somewhere in the as-path
_790_1800_ Passing through 790 then 1800
1800 Literal “1800” is somewhere, also matches 21800, 18001, etc.

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
BGP Multipath
BGP Multipath
• R1 receives two paths from AS20 (via R2
and R3) AS 20
• Best-path algorithm selects one and
installs it in routing table R4
– Assuming all attributes are equal, uses the
one from the lower neighbour IP address
•  By default, all of the traffic goes via one R1
link only 
• We could do some manual load-sharing R5
via localpref/MED, but that’s cumbersome
AS 10

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
eBGP Multipath
• Enable eBGP multipath on R1 to install both
paths
AS 20
router bgp 10
maximum-paths 2
R4

• Multipath selection is part of the Best Path


algorithm
– Evaluated before the more arbitrary tie breakers like R1
IP address/etc.

• Only paths with identical ASPATH will be


considered R5
– Hidden knob “bgp bestpath as-path multipath relax”
changes this, but be aware of what you’re doing AS 10

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
iBGP Multipath
• In this topology, eBGP Multipath
will not help
AS 10 AS 20
• R1 will choose one of the internal
paths, and will select one R2 or R3
R2 R4
• When R1’s IGP cost to R2 and are
R3 is equal, and all other path
attributes are the same, iBGP R1
multipath can be used
router bgp 10
maximum-paths ibgp 2
R3 R5

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
iBGP Multipath and Route Reflectors
• BGP multipath allows to install
multiple paths to be installed in the
routing table AS 10 AS 20
• BGP will still select one best path
R2 R4
and advertises it to its peers
• With RR between R1 and R2/R3,
R1 will only receive one path, no R1
load-sharing possible RR

• Multiple solutions:
– Enhance BGP protocol behaviour
R3 R5
using “ADD-PATH”
– Use multiple iBGP sessions on RR
(“shadow-RR”)
– Details are beyond this session, visit
BRKRST-3371 - Advances in BGP
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 71
Controlling Inbound Traffic
Controlling Inbound Traffic
• The first rule of controlling inbound traffic…
– You do not have ultimate control of how traffic enters your AS
– Your peers may have outbound policies that will override all of your attempts to
influence inbound traffic
• That said, what are your options?
– Leaking more-specific routes
– MED
– AS-PATH Prepending
– Community/Local Pref agreement

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
Leaking Specific Routes
• A RIB lookup always looks for the most specific match
– A route for 10.1.1.1/32 will be used over 10.1.1.0/24
• You can leak more specific routes to one ISP but not the other
• If the routes are not filtered this will draw the traffic in through the preferred ISP

• Some argue: Advertising more specifics to the global Internet is not “nice” as it
causes the Internet BGP table to bloat, and everyone has to bear the costs..
• Many ISPs filter routes that are too specific
– You can’t advertise /32s for your entire address space
– These will obviously be filtered

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
Leaking Specific Routes

• You are AS 10 AS 10 10.1.1.0/24


AS 20
R2 R4
• AS 10 owns 10.1.1.0/24
• AS 20 only uses one link
to send traffic to AS 10 
R1
• You want to utilize both
links

R3
Traffic

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
Leaking Specific Routes
• Split your /24 in two /25s 10.1.1.0/25
AS 20
AS 10
• R2 R2 R4
– advertise 10.1.1.0/25
– suppress 10.1.1.128/25
• R3 R1
– suppress 10.1.1.0/25
– advertise 10.1.1.128/25
• AS 20 will now send R3
traffic on both links
Traffic

• Q: What are the


problems with this policy?
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Leaking Specific Routes
• Will inbound traffic split 10.1.1.0/25
AS 20
50/50 on your two links?
AS 10
www.espn.com
R2 R4
• Maybe…maybe not 10.1.1.10

• In this case the R2 link


will receive much more R1
traffic than the R3 link

www.watching-
paint-dry.com R3
10.1.1.140
Traffic

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
MED
• Officially “Multi Exit 10.1.2.0/24

Discriminator” AS 10 MED: 1 AS 20
10.1.3.0/24
– Everyone calls it MED MED: 2
R1 R2 R5
• An attribute used to
influence inbound traffic
10.1.2.0/24
• Lower MED is better
– MED is designed to be a 10.1.3.0/24
reflection of IGP metrics
– A lower IGP metric is always
preferred
R4 R3
– Therefore lower MED is
preferred

• Used to bring traffic into the


AS on the eBGP speaker
closest to the destination
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 78
MED
R2#
10.1.2.0/24
router bgp 10
neighbor 10.1.1.5 remote-as 20 AS 10 MED: 1 AS 20
neighbor 10.1.1.5 SET_MED out 10.1.3.0/24
MED: 2
! R1 R2 R5
route-map SET_MED permit 10
set metric-type internal
!
10.1.2.0/24
• MEDs can be set manually
• “set metric-type internal” 10.1.3.0/24
sets MED dynamically
– Uses IGP cost to prefix as
the MED value R4 R3
– R2 has an IGP cost of 1 to
10.1.2.0
– R2 has an IGP cost of 2 to
10.1.3.0

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
MED
• Traffic for 10.1.2.0/24 10.1.2.0/24

uses the R2 link AS 10 MED: 1 AS 20


10.1.3.0/24
MED: 2
R1 R2 R5
• Traffic for 10.1.3.0/24 10.1.2.1
uses the R3 link
10.1.2.0/24

10.1.3.0/24

R4 R3
Traffic

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
MED – bgp always-compare-med
AS 10 AS 20 AS 40
R2 R4

R1
R6

R3 R5

AS 30

• MEDs are only compared if received from the same AS


• Makes sense as you can’t necessarily compare routing policies across different AS
• R6 does not compare MEDs for the paths received from AS20 and AS30 unless
“bgp always-compare-med” is configured
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 81
AS-PATH Prepending
AS 10 AS 20 AS 40
R2 R4

R1
R6

R3 R5

AS 30
• AS 10 can force traffic into R3 by prepending from R2  R4
• A shorter ASPATH is preferred
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 82
AS-PATH Prepending

AS 10 AS 20
R2 R4

R2#
router bgp 10
neighbor 10.1.1.4 remote-as 20
neighbor 10.1.1.4 route-map PREPEND_3X out
!
route-map PREPEND_3X permit 10
set as-path prepend 10 10 10
!

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 83
Community/Local Pref Agreement
• Many providers accept
communities from their customers AS 10 AS 20
to give customers some control on R1 R3
inbound traffic.
• Example
– Customer sends community 20:80,
ISP sets the LOCALPREF to 80
– Customer sends community 20:120,
ISP sets the LOCALPREF to 120

R2 R4

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 84
Community/LOCALPREF Agreement
R1#
router bgp 10 AS 10 AS 20
neighbor 10.1.1.3 remote-as 20
R1 R3
neighbor 10.1.1.3 route-map SET_COMMUNITY out
neighbor 10.1.1.3 send-community
!
route-map SET_COMMUNITY permit 10
set community 20:120
!

R2 R4

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
Community/LOCALPREF Agreement
R3#
router bgp 20
neighbor 10.1.1.1 remote-as 10 AS 10 AS 20
neighbor 10.1.1.1 route-map COMMUNITY_TO_LOCALPREF in R1 R3
!
ip community-list standard LP_80 permit 20:80
ip community-list standard LP_120 permit 20:120
!
route-map COMMUNITY_TO_LOCALPREF permit 10
match community LP_80
set local-preference 80
!
route-map COMMUNITY_TO_LOCALPREF permit 20
R2 R4
match community LP_120
set local-preference 120
!
route-map COMMUNITY_TO_LOCALPREF permit 30
!

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 86
Route Reflectors
BGP Route Reflectors
• Route Reflector Basics
• Hierarchical Route Reflectors
• Deploying Route Reflectors
• Route Reflector Redundancy

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 88
Route Reflector Basics
• BGP cannot advertise a path from one iBGP to another. Why?
– eBGP uses ASPATH for loop detection
– Loop detection in iBGP is more difficult 
A B
• RFC 2796 defines two attributes for loop detection within an AS:
• Originator ID C D
– Set to the Router ID of the client who injects the route into the AS
• Cluster List
– Each route reflector the route passes through adds their Cluster-ID to
this list.
– Cluster-ID = Router ID by default this looks very similar to AS-PATH
loop detection

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 89
Route Reflector Basics
• A route reflector is an iBGP speaker Route reflectors
that reflects routes learned from iBGP
peers to other iBGP peers
• Route reflectors are designated by
configuring some of their iBGP peers
as route reflector clients

neighbor <A> route-reflector-client


neighbor <B> route-reflector-client
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 90
Route Reflector Basics
• A route reflector client is just Route reflectors
an iBGP speaker
• There is no special configuration
for a route reflector client

Route reflector client

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 91
Route Reflector Basics
• A cluster is a route reflector and Route reflectors
its clients
• Route reflector clusters may
overlap Cluster

Route reflector client

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 92
Route Reflector Basics
• A non-client is any route reflector Route reflectors
iBGP peer that is not a route
reflector client Non-client
• Each route reflector is also a non- Cluster
client of each other route reflector in
this network
• Route reflectors must be fully iBGP
meshed A

Route reflector client

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 93
Hierarchical Route Reflectors - Motivation
• All of the route reflectors will need Full iBGP
to be fully meshed mesh
– Reflectors still follow the normal rules between
of iBGP route propagation between reflectors
themselves
• This full iBGP mesh between
reflectors can still contain so many
routers that it presents a scaling
problem as well

Cluster Cluster

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 94
Hierarchical Route Reflectors
• To resolve this, route reflectors Client and reflector
can be deployed in a hierarchy
• A single router can be a reflector
client and a reflector Cluster

Cluster Cluster

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 95
Hierarchical Route Reflectors
• An unlimited number of tiers can be used
– But very rare to see more than 2 levels
• Edges of route reflector tiers are a natural place to reduce the amount of routing
information being carried in the lower tiers
– RRs would be ABRs in “textbook” network design
• The same topology rule applies: The reflector topology should follow the
physical topology to prevent loops and black holes
• RRs can lead to suboptimal routing because they can hide full path information
from clients (RRs can advertise a single best path).

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 96
Route Reflector – Advertisement Rules
Non-client
• If a Route Reflector Receives a Route
from an eBGP Peer eBGP peer iBGP peer
– Send the route to all clients
– Send the route to all non-clients (incl.
other eBGP peers) Send

Send
Send
Non-client
iBGP peer
Client
Client

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 97
Route Reflector – Advertisement Rules
Non-client
• If a Route Reflector Receives a eBGP peer iBGP peer
Route from a Client
– Reflect the route to all clients Send
• Unless “no client-to-client reflection”,
which is rarely deployed Reflect
– Reflect the route to all non-clients
– Send the route to all eBGP peers
Reflect
Non-client
iBGP peer
Client
Client

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 98
Route Reflector – Advertisement Rules
Non-client
• If a Route Reflector Receives a Route iBGP peer
eBGP peer
from a Non-Client
– Reflect the route to all clients Send
– Send the route to all eBGP peers

Reflect
Reflect
Non-client
iBGP peer
Client
Client

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 99
Route Reflector Design and Redundancy
• A client may peer with more than one reflector, in different
clusters
– A client that peers to only one reflector has a single point of failure
– Clients should peer to at least two reflectors to provide redundancy
• How many reflectors should a single client be peered to?
• Where should the RRs be placed in the network?
• How many RRs are needed?

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 105
Route Reflector Design and Redundancy
• Redundancy is needed but….
• Too much burns memory on RRCs because the client learns the same
information from each RR
• Also burns memory on the RRs because they learn multiple paths for each
route introduced by a RRC
• Two route reflectors per client should be plenty…
• …but this is not a hard and fast rule
• As with everything else…”it depends”
– PEs, RRs, SLAs, network size, network topology, etc.

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 106
Route Reflector Topology Examples
Distribution Routers as RR Scaling the Core RR Deployment in MPLS

RRC
RRC

Full mesh in the


core, middle core
boxes are
regular iBGP
speakers

RRC

RRC

All core and edge nodes


are RR-clients

Not all required iBGP connections are shown in these diagrams, but you should get the idea…
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 107
A word of reason
• Moore’s Law also applies to routers/route processors
• Most routers sold in the last decade can run 100 or more sessions (all depends
on number of prefixes carried)
• ASR1000-RP2 scales to thousands of sessions (Isocore tested 20 Million
routes with 1000 RR clients)
• So RP performance is often not the limiting factor of a full iBGP mesh, it’s
rather the manageability adding/removing nodes from the mesh
• So don’t over-engineer it…

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 108
Initial Convergence
BGP Convergence
• Hey—Who are you calling slow? 
• Two general convergence situations
– Initial startup
– Reaction to network failure events

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 117
Convergence
Initial Startup
• Initial convergence happens when:
– A router boots
– RP failover
– clear ip bgp *
• How long initial convergence takes is a
factor of the amount of work to be done and
the router/network’s ability to do this fast and
efficiently

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 118
Convergence
Initial Startup

Initial convergence can be


stressful…if you are approaching
BGP scalability limits this is when
you will see issues.

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 119
Convergence
Initial Startup
What work needs to be done?
1) Accept routes from all peers
• Not too difficult

2) Calculate bestpaths
• This is easy

3) Install bestpaths in the RIB


• Also fairly easy

4) Advertise bestpaths to all peers


• This can be difficult and may take several minutes depending on the following variables…

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 120
Convergence
Key Variables
• BGP Variables
– The number of routes
– The number of peers
– The number of update-groups
– The ability to advertise routes to each update-group efficiently
• Router Variables
– CPU horsepower
– Code version
– Outbound Interface Bandwidth

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 121
Convergence
UPDATE Packing
• UPDATE contains a set of Attributes and a  “UPDATE Packing” refers to how efficiently
list of prefixes (NLRI) an implementation packs NLRIs into
• BGP starts an UPDATE by building an UPDATEs
attribute set – Least efficient: BGP only puts one NLRI
• BGP then packs as many destinations per UPDATE
(NLRIs) as it can into the UPDATE – Most efficient: BGP puts all NLRI with a
• Only NLRI with a matching attribute set can certain Attribute set in one UPDATE
be placed in the UPDATE
• NLRI are added to the UPDATE until it is full
(4096 bytes max)
Least Efficient MED 50 10.1.1.0/24 MED 50 10.1.2.0/24 MED 50 10.1.3.0/24
Origin IGP Origin IGP Origin IGP

Most Efficient MED 50 10.1.1.0/24


Origin IGP 10.1.2.0/24
10.1.3.0/24

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 122
Convergence
UPDATE Packing
• The fewer attribute sets you have the better
– More NLRI will share an attribute set
– Fewer UPDATEs to converge

• Things you can do to reduce attribute sets


– next-hop-self for all iBGP sessions
– Don’t accept/send communities you don’t need

• To see how many attribute sets you have


– show ip bgp summary
– 190844 network entries using 21565372 bytes of memory
302705 path entries using 15740660 bytes of memory
57469/31045 BGP path/bestpath attribute entries using 6206652 bytes of memory

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 123
Convergence
TCP MSS – Max Segment Size
TCP MSS (max segment size) is also a factor in convergence times. The larger the
MSS the fewer TCP packets it takes to transport the BGP updates. Fewer packets
means less overhead and faster convergence.

BGP UPDATE Attribute NLRI ..NLRIs.. NLRI ..NLRIs.. NLRI

Default MSS IP Header TCP Header Attribute NLRI ..NLRIs..

BGP UDPATE is split IP Header TCP Header NLRI ..NLRIs.. NLRI


into two TCP packets

Increased MSS IP Header TCP Header Attribute NLRI ..NLRIs.. NLRI ..NLRIs.. NLRI

The entire BGP update


can fit in one TCP packet

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 124
Convergence
TCP MSS – Max Segment Size
• MSS – Max Segment Size
– Limit on packet size for a TCP socket
– 536 bytes by default
• Path MTU Discovery
– Finds smallest MTU between R1 and R2
– Subtract 40 bytes for TCP/IP overhead
– Enabled by default for BGP (at least in recent releases)
– In older releases enable via global cmd “ip tcp path-mtu-discovery”
• To find the MSS
R1#sh ip bgp neighbors
BGP neighbor is 2.2.2.2, remote AS 3, external link
Datagrams (max data segment is 1460 bytes):

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 125
Convergence
Update Groups
• BGP must create updates based on the policies
towards each peer Less Efficient – Two peers in different
update-groups
• Peers with a common outbound policy are Attribute NLRI NLRI
members of the same update-group
– iBGP vs. eBGP Attribute NLRI NLRI
– Outbound route-map, prefix-lists, etc

• UPDATEs are generated for one member of an More Efficient – Two peers in
update-group and then replicated to the other the same update-group
members
Attribute NLRI NLRI

• Back in the old days, these “update-groups” had to


be created specifically, using “peer-groups”.
They’re still widely deployed…

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 126
Convergence
Dropping TCP Acks
• Primarily an issue on RRs (Route Reflectors) with RR
– One or two interfaces connecting to the core
– Hundreds of RRCs (Route Reflector Clients) BGP UPDATEs

• RR sends out tons of UPDATES to RRCs


• RRCs send TCP ACKs
TCP ACKs
• RR core facing interface(s) receive huge wave of TCP
ACKs
RRCs

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 127
Convergence
Dropping TCP Acks
• Interface input queue fills up…TCP ACKs are dropped 
– Each time a TCP packet is dropped, the session goes into slow start
– It takes a good deal of time for a TCP session to come out of slow start
• Increase the input queue
– hold-queue 1000 in
• If you still see drops increase to 4096

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 128
Convergence
• How do you know if BGP has converged?
• Watch the global table version
– Increases by 1 for every bestpath change
– In the lab: Table version stabilizes
– In the real world: Reaches your “normal” rate of change
• Watch peer InQ and OutQs
– Wait for all InQ and OutQs to be empty
– To list peers with non-empty queues
– show ip bgp summ | e 0 0
• Watch peer table versions
– show ip bgp summ
– If peer table version == global table version and InQ/OutQ empty, BGP has converged
that peer

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 129
Convergence
Initial Convergence Summary
• Initial convergence time is a factor of the amount of work that needs to be done
and the router/network’s ability to do this fast and efficiently
• Reduce the number of attributes sets in BGP
– Use next-hop-self, don’t send/accept communities you don’t need, etc.
• Reduce the number of unique outbound policies towards all peers
– Try to find a small set of common policies, rather than individualizing policies per peer
– The fewer update-groups the better
• MSS/PMTU
– Efficient packaging of BGP messages in TCP
• Stop TCP ACK drops
– Increase interface input queues on RRs

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 130
BGP Routing Convergence
IGP vs. BGP Convergence
• IGP (OSPF/ISIS) deals with hundreds routes
– Max a few thousands, but only a few hundreds are really important/relevant
• BGP is designed to carry millions of routes
– and a few large customers carry that amount of prefixes!

• We can tune IGPs to converge in << 1 second


• But how about BGP?

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
BGP Control-Plane Convergence Components I:
Core failure
1. Core Link or node goes down
2. IGP notices failure, computes new paths to PE1/PE2
3. IGP notifies BGP that a path to a next-hop has changed
4. PE3 identifies affected paths, runs best path, 3/8, NH 1.1.1.1
NH <PE2>
path to PE2 no longer as good as the one to PE1

5. Updates RIB/FIB, traffic continues
PE1

1.1.1.1
3.0.0.0/8

1.1.1.5
3/8, NH <PE1>
PE3
NH
NH <PE2>
<PE2>
… PE2

3/8, NH 1.1.1.5
NH <PE1>
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public …
BGP Control-Plane Convergence Components II:
Edge Node Failure
Edge Node (PE1) goes down
2. IGP notices failure, update RIB, remove path to PE1
3. IGP notifies BGP that path to PE1 is now longer valid
4. PE3 identifies affected paths, runs best path, 3/8, NH 1.1.1.1
NH <PE2>
removes paths via PE1

5. Updates RIB/FIB, traffic continues
PE1

1.1.1.1
3.0.0.0/8

1.1.1.5
3/8, NH <PE1>
PE3
NH <PE2>
… PE2

3/8, NH 1.1.1.5
NH <PE1>
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public …
BGP Control-Plane Convergence Components III:
Edge Neighbour Failure (with next-hop-self)
1. Edge link on PE1 goes down
2. eBGP session goes down
3. PE1 identifies affected paths, runs best path
4. PE1 sends withdraws to other PEs 3/8, NH 1.1.1.1
NH <PE2>
5. PE2 & 3 run best path, update …
RIB/FIB, traffic continues
Withdraw: 3/8, NH <PE1> PE1

1.1.1.1
3.0.0.0/8

1.1.1.5
3/8, NH <PE1>
PE3
NH <PE2>
… PE2

3/8, NH 1.1.1.5
NH <PE1>
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public …
BGP Control-Plane Convergence Components
• Failure Detection
• Reaction to Failure
• Failure Propagation

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Failure Detection (Edge)
• Problem: Detect an eBGP
neighbour failure
• Available Methods router bgp …
[no] bgp fast-external-fallover
– Fast External Fallover – monitors interface …
line protocol for directly connected ip bgp fast-external-fallover {permit|deny}
neighbours (default behaviour)

– Fast Session Deactivation (FSD), router bgp …


neighbor x.x.x.x fall-over
monitors routing table for
reachability of next-hop address
(eBGP multi-hop) router bgp …
timers bgp <hello> <hold>
neighbor <..> timers <hello> <hold>
– “Hello”-type protocols: BFD and
BGP Hello neighbor <..> fall-over bfd

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Failure Detection – Next-Hop Failure
• Goal: Detect next-hop failures (as carried in IGP)
• Methods:
– Next-hop Tracking, enabled by default
– BGP scanner (legacy, very slow reaction)

• Note: On most cases, we do not want to use iBGP hellos to detect iBGP
neighbor failures, and rely on next-hop reachability checks

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Update Propagation
• Once failed paths are identified and best-path has been run,
updates/withdrawals need to be sent to peers
• Goal: maximize TCP throughput and update packing/replication
• Design Guidance:
– Reduce minimum advertisement interval to zero
(already default in many recent releases)
neighbor x.x.x.x advertisement-interval 0

– Enable TCP PMTUD (default in recent software) and increase window-size

ip tcp path-mtu-discovery
ip tcp window-size 65535
– Use peer-groups/peer-templates

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Control vs. Data Plane Convergence
• Control Plane Convergence
– For the topology after the failure, the optimal path is known and installed in the
dataplane
– May be extremely long (depends on number of prefixes carried)
• Data Plane Convergence
– Once IGP convergence has detected the failure, the packets are rerouted onto a valid
path to the BGP destination
– While valid, this path may not be the most optimum one from a control plane
convergence viewpoint
– We want this behaviour, in a prefix-independent way, no matter if BGP carries 1000 or
1,000,000 prefixes!

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
BGP Prefix Independent P1
PE1

Convergence (BGP PIC) PE3 PE2

P2
BGP Net
110.0.0.0/24
IGP pathlist
BGP Net
110.1.0.0/24 PE1 via P1 Gig1, dmac=x
BGP pathlist PE1 via P2

… PE1
PE2 IGP pathlist
Gig2, dmac=y

BGP nexthop(s) PE2 via P2


BGP Net
110.5.0.0/24
IGP nexthop(s) Output Interface

• Pointer Indirection between BGP and IGP entries allow for immediate update of the multipath BGP pathlist at
IGP convergence

• Only the parts of FIB actually affected by a change needs to be touched

• Used in newer IOS and IOS-XR (all platforms), enables Prefix Independent Convergence
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
P1
Characterization PE1

BGP PIC Core


PE3 PE2

BGP Net P2
110.0.0.0/24
IGP pathlist
BGP Net Gig1,
110.1.0.0/24 PE1 via P1 dmac=x
BGP pathlist PE1 via P2

… PE1
PE2 IGP pathlist
Gig2,
dmac=y
Core
BGP nexthop(s) PE2 via P2
BGP Net
110.5.0.0/24 100000
IGP nexthop(s) Output Interface

10000
PIC

LoC (ms)
1000
no PIC
• As soon as IGP converges, the IGP pathlist memory is 100

updated, and hence all children BGP path-lists 10


leverage the new path immediately
1

12 0

15 0

17 0
20 0

22 0

25 0

27 0

30 0

32 0

35 0
00
• Optimum convergence, Optimum Load-Balancing,

10 0
1

0
0

0
00

00

00
00

50

00

50
00

50

00

50

00

50

00
25

50

75
Excellent Robustness Prefix

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
P1
Characterization PE1

BGP PIC Edge


PE3 PE2

BGP Net P2
110.0.0.0/24
IGP pathlist
BGP Net Gig1,
110.1.0.0/24 PE1 via P1 dmac=x
BGP pathlist PE1 via P2

… PE1
PE2 IGP pathlist
Gig2,
dmac=y

BGP nexthop(s) PE2 via P2


BGP Net
110.5.0.0/24
IGP nexthop(s) Output Interface

msec
1000000

• At IGP Convergence time, the complete IGP pathlist to 100000

PE1 is deleted 10000


250k PIC
250k no PIC
500k PIC

• FIB updates the affected BGP path lists, traffic


1000
500k no PIC

converges to the alternate next-hop PE2 100

10

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000
0
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Prefix
BGP Convergence Summary
• BGP Control Plane convergence can achieve fast convergence for small-ish
deployments only (a few 1000s of prefixes)
• Achieving fast convergence for large deployments (including full Internet
routes) requires BGP PIC
• BGP Convergence always require the underlying IGP (OSPF, ISIS, EIGRP) to
be tuned for fast convergence

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 145
High Availability
What is Routing High Availability?
Routing HA
• Set of technologies & features to
enable traffic to continue to flow
through a device during a fault
• Routing HA maintains the
logical network topology while
the faulty device recovers
• Routing HA helps to address
failures within the control plane
of a routing device
• Routing HA increases the
resiliency of a single system

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Behaviour without Non-Stop Forwarding (NSF)
• Router A loses its control plane for
some period of time
Control Data A
• It will take some time for Router B to
recognize this failure, and react to it

Control Data B

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Behaviour without Non-Stop Forwarding (NSF)
• During the time that A has failed, and B Reset
has not detected the failure, B will
continue forwarding traffic through A Control Data A
• Once the control plane resets, the data
plane will reset as well, and this traffic
will be dropped
• NSF reduces or eliminates the traffic
dropped while A’s control plane is down

Control Data B

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Graceful-Restart/NSF Fundamentals
• If A is NSF capable, the control plane No reset
will not reset the data plane when it
restart Control Data A
• Instead, the forwarding information in
the data plane is marked as stale
• Any traffic B sends to A will still be
switched based on the last known
forwarding information
• This is the Non-Stop Forwarding
Control Data B
behaviour

Mark forwarding
information as stale
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
GR/NSF Fundamentals
• While A’s control plane is down, the
routing protocol hold timer on B
Control Data A
counts down....
• A has to come back up and signal B
before B’s hold timer expires, or B will
route around it
• When A comes back up, it signals B
that it is still forwarding traffic, and
would like to resync Control Data B
• This is the first step in Graceful Restart
(GR) Hold Timer: 15
6
7
8
9
10
11
12
13
14

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
GR/NSF Fundamentals
• The second GR phase deals with neighbors
updating the restarting router’s routing table
Control Data A
• This involves new protocol mechanisms
• BGP Protocol extensions detailed in RFC 4724

I’m restarting

send routes
Ok, fine, I’ll
• BGP GR has to be explicitly enabled on both
ends of the connection
router bgp …
bgp graceful-restart
....
Control Data B
• But neighbours could be outside my
administrative domain? Like other SPs or
customers?
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Non-stop Routing – NSR
• Idea: Why not sync all routing
protocol state to the standby RP
(or standby process)?
• Restarting RP could pick up right
where the primary left off
• No need to refresh any
information, no need for the
neighbour to know that anything
happened
• Easy idea – challenging Routing
implementation  Forwarding No Link Flap Adjacency
– Now we absolutely need to avoid Continues Maintained to
anything to let the neighbour know Neighbours

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
Non-stop Routing – NSR
• BGP RIB, TCP Session state is
synced to standby RP/process
• Router can restart without
neighbour’s interaction
• No upgrade/changes on peers
required
router bgp …
bgp graceful-restart
address-family ipv4 vrf ..
neighbor x.x.x.x ha-mode sso Routing
.... Forwarding No Link Flap Adjacency
Continues Maintained to
Neighbours

# show ip bgp vpnv4 all sso summary


# show tcp ha connections
TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public
“Show and Tell”
Loop0 Loop0
10.1.1.1 10.3.3.3
Fa 2/0 Fa 2/0
R1 R3
.1 .3
10.1.3.0 10.3.5.0
Fa 1/0
AS 3
.1
AS 5
AS 1 10.1.2.0
Internet

Loop0
Fa 1/0
.2 AS4 100.100.100.1

10.4.5.0
Fa 2/0 Fa 2/0
R2 R4
.2 .4
10.2.4.0
Loop0 Loop0
10.2.2.2 10.4.4.4

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 156
Call to Action
• Visit the World of Solutions for
– Cisco Campus
– Walk in Labs
– Technical Solution Clinics
• Meet the Engineer
• Lunch time Table Topics
• DevNet zone related labs and sessions
• Recommended Reading: for reading material and further resources for this
session, please visit www.pearson-books.com/CLMilan2015

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 157
Complete Your Online Session Evaluation
• Please complete your online session
evaluations after each session.
Complete 4 session evaluations
& the Overall Conference Evaluation
(available from Thursday)
to receive your Cisco Live T-shirt.

• All surveys can be completed via


the Cisco Live Mobile App or the
Communication Stations

TECRST-2310 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 158

You might also like