Professional Documents
Culture Documents
Interdomain Routing
Interdomain Routing
Interdomain Routing
ROUTING
Lubak M. rosert2016@gmail.com
2022
Network Layer, Control Plane
2
Function:
Data Plane Set up routes between networks
Application Key challenges:
Implementing provider policies
Presentation
Creating stable paths
Session
Transport
Network RIP OSPF BGP Control Plane
Data Link
Physical
3 Outline
BGP Basics
Stable Paths Problem
BGP in the Real World
Debugging BGP Path Problems
ASs, Revisited
4
AS-1
AS-3
Interior
Routers
AS-2
BGP
Routers
AS Numbers
5
Provider
Peer 2 has no incentive to
Peers do not pay
route 1 3
each other
$
Customer
Peer 1 Provider
Peer 2 Customer
Peer 3
Customer pays
provider
Customer
Tier-1 ISP Peering
9
Inteliquent Centurylink
Verizon
AT&
Business
T
So you want to be a tier 1 network?
All you have to do is get all the other tier 1s to peer with you!
Level 3 (not that easy ) Sprint
XO Communications
Peering Wars
11
Improve
Peering end-to-end customers
struggles in the ISP world are extremely contentious
performance agreements are usually confidential
Peers are often
May be the only way to competitors
Example:
connect toIfparts
you are
of athe
customer ofmy peer why
Peering should I peer
agreements
with you? You should pay me too!
Internet require periodic
Incentive to keep relationships private!
renegotiation
Two Types of BGP Neighbors
12
Exterior
routers also
IGP speak IGP
eBGP eBGP
iBGP iBGP
Full iBGP Meshes
13
120.10.0.0/16: AS 2 AS 3 AS 4
AS 1 130.10.0.0/16: AS 2 AS 3
110.10.0.0/16: AS 2 AS 5
BGP Operations (Simplified)
15
Establish
session on TCP
port 179
AS-1
Exchange
active routes
n
io
ss
Se
AS-2
Exchange
P
BG
incremental
updates
Four Types of BGP Messages
16
Shortest AS Path
Lowest MED Traffic engineering
Lowest IGP Cost to BGP Egress
4 hops 9 hops
Source
4 ASs 2 ASs
?
?
Destination
Hot Potato Routing
20
Destination
Importing Routes
21
From ISP
Provider
Routes
From From
Peer Peer
From Customer
Exporting Routes
22
To To
Peer Peer
To Customer
Customers get
all routes
Modeling BGP
23
AS relationships
Customer/provider
Peer
Sibling, IXP
Gao-Rexford model
AS prefers to use customer path, then peer, then provider
Follow the money!
Valley-free routing
Hierarchical view of routing (incorrect but frequently used)
P-P P-P
C-P P-C
P-C
P-P
AS Relationships: It’s Complicated
24
AS_SET
Instead of a single AS appearing at a slot, it’s a set of Ases
Communities
Arbitrary number that is used by neighbors for routing decisions
Export this route only in Europe
Do not export to your peers
BGP Basics
Stable Paths Problem
BGP in the Real World
Debugging BGP Path Problems
27
What Problem is BGP Solving?
27
1 3
30
130
10
A Solution to the SPP
29
10
130 20
210
1 2 2
130
10 210
20
1 2 2
• Not every node gets preferred route
• Topology is still stable
0
• Only one stable configuration
• No matter which node chooses first!
3 4
30 430
420
SPP May Have Multiple Solutions
32
0 0 0
2 2 2
210 210 210
20 20 20
Bad Gadget
33
130
10 210
20
• That was only
1 one round of oscillation!
2 2
• This keeps going, infinitely
• Problem stems from:0
• Local (not global) decisions
• Ability of one
3 node to improve4its path selection
3420 420
30 430
SPP Explains BGP Divergence
34
Naughty Gadgets
BGP is Precarious
36
5 6 2
5310 6310
563120 643120 210
53120 63120 20
Can BGP Be Fixed?
37
Possible Solutions
Extend BGP to
detect and suppress
Automated Analysis policy-based oscillations?
of Routing Policies
Inter-AS
(This is very hard) coordination
BGP Basics
Stable Paths Problem
BGP in the Real World
Debugging BGP Path Problems
Motivation
39
AS 2117
How Many Announcements Does it Take
For an AS to Withdraw a Route?
50
Answer: up to 19
BGP Routing Table Convergence Times
100
90
80
Cumulative Percentage of Events
r
ove
r
ve
70
O
Sho oute
ail-
il-
60 Tup
Fa
rt F Tshort
R
50
g
Tlong
on
Lon New
40 Tdow n
>L
g->
rt-
30
o
20
Sh
e
ur
il
10
Fa
0
0 20 40 60 80 100 120 140 160
Se conds Until Converge nce
$
$ ISP 1
ISP 1
(peer)
Level 3
Stub
stub
(customer)
Level 3
(peer) $
Verizon
ISP 2
Wireless
$ ISP 2
(provider)
$
22394
(also VZW)
BGP: The Internet’s Routing Protocol (2)
$ ISP 1 ISP
Level 3 ISP
stub Loses $
X ISP 2
Verizon
Wireless
$
22394
$ ISP 1
66.174.161.0/24
VZW, 22394
66.174.161.0/24
stub Level 3
Verizon
$ ISP 2 Wireless
$ Export Policy:
$ Provider 1. Export customer path
to all neighbors.
2. Export peer/provider path
Customer to all customers.
Standard model of Internet routing
61
$ Export Policy:
$ Provider 1. Export customer path
to all neighbors.
2. Export peer/provider path
Customer to all customers.
More complex routing example
Cogent, Georgia Tech
Local ISP, AOL, Cogent, Georgia
Paths chosen based 130.207.0.0/16 U. Toronto
Techon business relationships and
length. 130.207.0.0/16
Georgia
Tech
BGP Basics
Stable Paths Problem
BGP in the Real World
Debugging BGP Path Problems
Control plane vs. Data Plane
64
Control:
Make sure that if there’s a path available, data is forwarded over
it
BGP sets up such paths at the AS-level
Data:
For a destination, send packet to most-preferred next hop
Routers forward data along IP paths
Visibility
IP provides no built-in monitoring
Economic disincentives to share information publicly
Control
Routing protocols optimize for policy, not reliability
Outage affecting your traffic may be caused by distant network
Challenges
What should we monitor?
What do we do with additional visibility?
How to use additional control?
A Practical Approach
We can do this already in today’s Internet
Crowdsourcing monitoring
Use existing protocols/systems in unintended ways
Lack of control
Reverse paths determined by possibly distant ASes
Limited means to affect such paths
Goals and Approach
Improve availability through:
Failure isolation and remediation
Identifying the AS(es) responsible for path changes
Key techniques:
Visibility
Active measurements from distributed vantage points
Passive collection of BGP feeds
Control
On-demand BGP prepending to route around outages
Active BGP measurements to identify alternative paths
LIFEGUARD: Locating Internet Failures Effectively and
Generating Usable Alternate Routes Dynamically
72
Building blocks for failure isolation
LIFEGUARD can use:
Ping to test reachability
Traceroute to measure forward path
Distributed vantage points (VPs)
PlanetLab for our experiments
Some can source spoof
Reverse traceroute to measure reverse path (NSDI ’10)
Atlas of historical forward/reverse paths between VPs and
targets
73
How does LIFEGUARD locate a failure?
Before outage:
Historical Current
Ping? Ping!
Fr:VP To:VP
Historical Current
NTT:Ping
?
Fr:GMU
GMU:Ping
Historical Current !
Fr:NTT
Forward path works
How does LIFEGUARD locate a failure?
During outage:
Rostele:
Ping?
Fr:GM
U
Historical Current
78
Our Approach and Outline
LIFEGUARD: Locating Internet Failures Effectively and
Generating Usable Alternate Routes Dynamically
Locate the ISP / link causing the problem
79
Our Goal for Failure Avoidance
Enable content / service providers to repair
persistent routing problems affecting them,
regardless of which ISP is causing them
Setting
Assume we can locate problem
Assume we are multi-homed / have multiple data centers
Assume we speak BGP
82
Ideal Self-Repair of Reverse Paths
AVOID(L3,WS)
AVOID(L3,WS)
AVOID(L3,WS)
Do paths exist that AVOID problem?
LIFEGUARD repairs outages by instructing others to avoid
particular routes.
84
Practical Self-Repair of Reverse Paths
UW → L3 → ATT → WS
L3 → ATT → WS
ATT → WS
Sprint → Qwest → WS WS
UW → Sprint
L3 → ATT
→ Qwest
→ WS→ WS → L3→ WS
?
L3 → ATT → WS
ATT → WS → L3→ WS
SprintSprint
→ Qwest
→ Qwest
→ WS→→WS
L3→ WS WS → L3→ WS
AVOID(L3,WS)
AISP → Qwest
AISP→→WS → L3→
Qwest WS
→ WS Qwest → WS → L3→ WS
LIFEGUARD’s scalability
Overhead and speed of failure location
Router update load if many ISPs deploy our approach
Alternatives to poisoning
Compatibility with secure routing (BGPSEC, etc.)
Comparing to other route control mechanisms
Can poisoning approximate AVOID effects?
89
What if some routes in an ISP still work?
92
What if some routes in an ISP still work?
10
1
Naive Poisoning Causes Transient
Loss
Some ISPs may have
working paths that
avoid problem ISP X
Naively, poisoning
causes path exploration
even for these ISPs
Path exploration causes
transient loss AVOID(X,P)
10
2
Naive Poisoning Causes Transient
Loss
Some ISPs may have
working paths that
avoid problem ISP X
Naively, poisoning
causes path exploration
even for these ISPs
Path exploration causes
transient loss AVOID(X,P)
10
3
Naive Poisoning Causes Transient
Loss
Some ISPs may have
working paths that
avoid problem ISP X
Naively, poisoning
causes path exploration
even for these ISPs
Path exploration causes
transient loss AVOID(X,P)
Naive Poisoning Causes Transient
Loss
Some ISPs may have
working paths that
avoid problem ISP X
Naively, poisoning
causes path exploration
even for these ISPs
Path exploration causes
transient loss AVOID(X,P)
10
5
Naive Poisoning Causes Transient
Loss
Some ISPs may have
working paths that
avoid problem ISP X
Naively, poisoning
causes path exploration
even for these ISPs
Path exploration causes
transient loss AVOID(X,P)
10
6
Naive Poisoning Causes Transient
Loss
Some ISPs may have
working paths that
avoid problem ISP X
Naively, poisoning
causes path exploration
even for these ISPs
Path exploration causes
transient loss AVOID(X,P)
Naive Poisoning Causes Transient
Loss
Some ISPs may have
working paths that
avoid problem ISP X
Naively, poisoning
causes path exploration
even for these ISPs
Path exploration causes
transient loss AVOID(X,P)
10
8
Prepend to Reduce Path Exploration
Most routing decisions
based on:
(1) next hop ISP
(2) path length
Keep these fixed to
speed convergence
Prepending prepares
ISPs for later poison AVOID(X,P)
10
9
Prepend to Reduce Path Exploration
Most routing decisions
based on:
(1) next hop ISP
(2) path length
Keep these fixed to
speed convergence
Prepending prepares
ISPs for later poison AVOID(X,P)
11
0
Prepend to Reduce Path Exploration
Most routing decisions
based on:
(1) next hop ISP
(2) path length
Keep these fixed to
speed convergence
Prepending prepares
ISPs for later poison AVOID(X,P)
11
1
Prepend to Reduce Path Exploration
Most routing decisions
based on:
(1) next hop ISP
(2) path length
Keep these fixed to
speed convergence
Prepending prepares
ISPs for later poison AVOID(X,P)
11
2
Prepend to Reduce Path Exploration
Most routing decisions
based on:
(1) next hop ISP
(2) path length
Keep these fixed to
speed convergence
Prepending prepares
ISPs for later poison AVOID(X,P)
11
3
Prepending Speeds Convergence
Security
BGPSEC, RPKI
Misconfigurations, inflexible policy
SDN
Policy Interactions
PoiRoot (root cause analysis)
Convergence
Consensus Routing
Inconsistent behavior
LIFEGUARD, among others
Why are these still issues?
118
Backward compatibility
Buy-in / incentives for operators
Stubbornness