Professional Documents
Culture Documents
Wong, T., Jacobson, V., Alaettinoglu, C. (2004) Making Sense of BGP
Wong, T., Jacobson, V., Alaettinoglu, C. (2004) Making Sense of BGP
NANOG 30
Miami, FL
February 9, 2004
Overview
Both algorithms are driven by raw BGP event streams. They have no built-in
models, statistical or otherwise. Their goal is to show the world as the
routers see it.
Static picture
Visualization
Algorithm Threshold and Merge Prefixes
Animation
(TAMP)
BGP
Event
Stream
Filtered events Stemming
Analysis
Algorithm
One−line diagnosis
For a particular BGP peer & time, that peer’s routes form a tree rooted at
the peer.
Construct this subtree for each peer and/or time then merge trees by
combining common prefixes.
128.32.0.70 7%
5% 701
UC Berkeley 83% 21% alternet
83% 78% 11423 209
8/30 22:34 128.32.1.3 128.32.0.66 7%
126177 prefixes
calrenN 6%
qwest
11% 7018
100% 128.32.1.200 128.32.0.90 at&t
11537
abilene
1239
sprint
4732
226 dion
los_nettos
UC Berkeley
32% 9824
12/15 18:35 100% 100% 100% 2152 9% athomej
Comm 2152:65297
128.32.1.222 137.164.23.29 cenic_dc 68% 6%
1002 prefixes 2516 7479
4%
kddi 4%
kdd_hk
4% 7679
qtnet
? 10010
tokai
Static picture
Visualization
Algorithm Threshold and Merge Prefixes
Animation
(TAMP)
BGP
Event
Stream
Filtered events Stemming
Analysis
Algorithm
One−line diagnosis
A million events does not mean a million different things happened. BGP
cannot say the actual incident, e.g. peering down or flaky router or prefix
loss. It can only tell you indirectly via prefix withdrawals & announcements.
Packet Design Inc. NANOG30–Making Sense of BGP 8
Extracting correlation with Stemming analysis
ATT
UCB CalrenN QWest UUNet
11423 209
Sprint
Because a branch of the original routing tree is cut. The common portion
tells you where the cut is, which is at the last edge. E.g. CalrenN-QWest.
ATT
UCB CalrenN QWest UUNet
11423 209
Sprint
ATT
UCB CalrenN QWest UUNet
Sprint
Abilene
Must also distinguish among multiple simultaneous failures: need to find all
of them and pull them apart. E.g. failure on CalrenN-QWest vs failure on
CalrenN-Abilene.
Packet Design Inc. NANOG30–Making Sense of BGP 11
Stemming challenges (cont’d)
where is number of possible path stems.
Many serious problems often do not result in distinctive event spikes. For
example, persistent route oscillations. Can Stemming help?
Persistent route flapping once a minute looks like BGP noise here (yellow
box). This lasted for at least 1.5 months, and resulted in a very unhappy
customer.
Packet Design Inc. NANOG30–Making Sense of BGP 16
Persistent customer route flapping at Tier1 ISP
http://www.packetdesign.com/technology/presentations/nanog-30/index.htm