Professional Documents
Culture Documents
Nokia Border Gateway Protocol SG v3.1.1
Nokia Border Gateway Protocol SG v3.1.1
Nokia Border Gateway Protocol SG v3.1.1
This course is part of the Nokia Service Routing Certification (SRC) Program. See www.networks.nokia.com/src for
more information on the SRC program.
To locate additional information relating to the topics presented in this manual, refer to the following:
Technical Practices for the specific product
Internet Standards documentation such as protocol standards bodies, RFCs, and IETF drafts
Technical support pages of the Nokia website located at: http://www.networks.nokia.com/support
An Internet exchange point (IX or IXP) is a physical infrastructure through which Internet service providers (ISPs)
exchange internet traffic between their networks.
Internet exchanges form key interconnection points that allow ISPs to form relationships with each other in a
neutral, low cost setting. Internet exchanges serve, not just ISPs, but also several different types of content
providers that want to get their services closer to end users.
Tier2 providers serve large regional areas of a country or continent but may not have as extensive a global reach
as Tier1 providers and probably pay Tier1 providers to transit their networks.
Although, geographically speaking, Tier 1 networks span very large portions of the Internet and peer at every
major exchange point, many Tier 2’s are actually larger in terms of the number of nodes served and number of
customers served. Tier 2’s are also typically closer to the end system, or customer, or content provider. For
example, the top AS’s, in terms of number of prefixes originated, are mostly not Tier 1’s.
Increasingly, in terms of traffic volumes, Internet traffic originates from very large web hosting sites, which
consolidate traffic from thousands of individual Enterprises to much fewer shared facilities. Most Internet traffic
now originates directly from content providers, such as Google and Yahoo, and content delivery networks, such as
Akamai and LimeLight.
The descriptive terms Downstream and Upstream are references to where a specific customer, network, person,
or node sits, in relation to the overall Internet architecture.
Downstream indicates that, in that direction, network devices are closer to the edge of the Internet, where access
networks actually connect individuals, homes, and enterprise to the Internet. There are, of course, exceptions to
this; many Enterprises and Content Providers will actually locate their routers in carrier hotels and colo facilities to
be as close as possible to as many ISPs, and therefore their customers, as possible.
Upstream indicates that, in that direction, network devices are moving closer to the core of the Internet. If you are
client of a Tier 2 network, upstream would be towards the Tier 1 networks that your Tier 2 provider either peers
with or buys transit services from.
Internet Exchanges are physical locations that bring ISPs, Publishers, Hosting sites, Social Networking, Government,
and other types of BGP speakers that will peer with one another for mutual benefit, together. Exchanges were
once the domain of ISPs alone, but as more content providers began sending more data, it became increasingly
beneficial for ISPs to also peer with other types of Internet experiencial organizations. There are now hundreds of
exchanges worldwide that offer peering services at the local, regional, and continental levels.
A key consideration is that, architecturally speaking, the Internet does not impose any set architecture or set of
required devices to access it, apart from the standards published by the IETF and other Standards Development
Organizations (SDO, such as the ITU and IEEE. The aforementioned ICANN, its member organizations, IETF,
exchange points, and ISPs will, however, insist that you follow some key best practices. Addressing and naming are
two of the most important of these, and the idea of peering is another major consideration.
If fees or tariffs are charged, the relationship between the two AS’s changes; it becomes a transit relationship,
where one AS charges the other to transit traffic across its backbone. This is typical for Tier 1 providers, which
charge lower tier providers for access to their global backbone.
Peering allows two networks to form a BGP neighbor relationship to their mutual benefit (SAVING MONEY). For this to
successfully result, some, or all, of the following policies may need to be met, depending on the size of AS that you
are attempting to peer with (this list is not exhaustive but illustrates typical types of policy agreements used on the
Internet):
1. Cost to peer (BGP session) is zero; cost to operate peer network is non-zero
Access to the exchange point is the responsibility of the AS itself and the cost is not zero. Other operational
costs are typical for peering arrangements too. Peers will insist on 24/7 tech support capability.
3. Traffic Policies
These allow ISPs to push traffic from valuable sources, which could be publishers, government organizations,
and so on. First, a peer may insist on a certain sized backbone, in terms of the minimum bandwidth of core
links, so that there is sufficient capacity for the peering arrangement to work. Pushing traffic means that the AS
has something (apart from prefixes) to offer other AS’s that their customers will value. If the ratio between pull
and push becomes too high, it can mean that, as a peer, the AS is not pushing enough for the arrangement to
remain mutually beneficial. The sending peer is acting more as a transit provider rather than a peer.
In summary, peering is an extremely efficient way for similar sized/scoped networks to exchange prefixes and to
create value for their customers by getting closer to the sources of content on the Internet.
With a transit service (pay for a BGP neighbor in an upstream network), the conditions are quite simple:
The Transit provider will provide a layer 1 or 2 circuit, or a mutually agreed upon exchange point location.
The connection will be a certain size - GigE, TenGigE, and so on.
The Transit provider will provide up to a full Internet table to the transit customer’s AS using BGP, thereby
giving the customer full access to any network on the Internet.
The Transit provider will advertise the client’s prefixes (and those of its customers, including transit AS’s) to
the rest of the Internet.
The customer will follow the Transit provider’s Acceptable Use policy.
Optionally, the transit provider will provide the means for a transit customer to influence incoming and
outgoing BGP path selections (the labs in this course show examples of how to do this).
The customer is free to pull or push as much traffic as it can (extra charges may apply). Typically, the transit
customer will pull more traffic than it sends.
The ICANN/IANA governs the global allocation of all possible address space used in the Internet to the 5 regional
registries. Each Regional Internet Registry (RIR) assigns chunks of address space to end users (mostly ISP’s), based
on their specific regional policies. The five regional registries are African Network Information Center (AfriNIC),
Asia Pacific Network Information Centre (APNIC), American Registry for Internet Numbers (ARIN), Latin America
and Caribbean Network Information Centre (LACNIC), and Réseaux IP Européens Network Coordination
Centre (RIPE NCC).
ICANN also manages the top level domain for generic domains (gTLD) and country code domains (ccTLD), and now
enables International domain names (IDN) that allow for URLs to use international character sets (for example,
Chinese or Arabic).
The IANA function within ICANN tracks and manages all IP protocol related numbers, including AS numbers, BGP
well known communities, PDU’s, Wireless, MPLS and related protocol numbers. The IANA website is a useful
resource to help you to determine whether a protocol works properly, or that certain protocol parameters have
been set correctly.
Apart from the two major governance aspects of name space and address space, the rest of the technologies,
implementation, and methods of access are the responsibility of individual Autonomous Systems, which,
interconnected, form the Internet backbone.
BGPv4, defined in RFC 4271, provides reachability information to external networks (those outside the AS) by
enabling the exchange of routing information between AS’s to allow data flow between them.
Once an exchange has been enabled, the application of administrative policy onto the traffic flows becomes of
equal or greater concern. Policy implementation is a key strength of BGP as it allows the administration to
manipulate traffic, based on virtually any policy.
BGP also has proven scalability. Most implementations, including the Nokia Service Router (SR) product line, scale
to millions of routes and hold multiple Internet tables (each as large as at least 300K). Therefore, BGP is the
fundamental building block of the Internet and is used by every ISP in the world for ISP interoperability.
BGP is the most feature-rich and scalable routing protocol in use today. It supports the current requirements of
the Internet, and with extended capabilities, such as multiple protocol families and extended AS numbers, is well-
positioned for the future.
1. Take the ASPLAIN number and divide by 65536 (2 to the power of 16). The whole number result is the
high order number.
For example, AS 135000 135000 / 65536 = 2.0599365… ASDOT is 2.x (notice this is from APNIC
space)
2. Multiply whole number in step 1 by 65536 and subtract this from the original ASPLAIN number.
For example, 65536 x 2 = 131072 135000 -131072 = 3928
Alternatively, you can arrive at this number by calculating 135000 mod 65536 = 3928 (modulo)
Almost no implementations support ASDOT+, where the 16-bit ASN is converted to 0:<16bit ASN>, most
existing peering policies would be completely broken by doing this
• Recall that, as an Open entity, everything you do on the Internet can be inspected and studied by the rest of
the Internet community, so govern yourself accordingly. For example, do not use the public or private AS’s
shown in example configurations in your own networks. Mistakes can - and do – happen, but remember that
they happen publicly.
• Use of Private or Documentation AS ranges does not protect you from the consequences of bad or
unwanted routing policy in your organization’s networks. Typically, most BGP implementations (SR OS
included) will not prevent an AS from being advertised by default.
• There have been many security incidents involving the Internet's core routing protocol, the Border Gateway
Protocol. Some of these incidents were attacks; others were accidental misconfigurations. But all of them
disrupted traffic to Web sites or entire networks because of incorrect routing messages being propagated
across the Internet through BGP.
• Among the biggest security incidents were: Pakistan Telecom blocks YouTube in Feb 2008, Malaysian ISP
blocks Yahoo in 2004, Turkish ISP takes over the Internet in 2004, and Brazilian carrier leaks BGP table in
2004.
Multi-Homed AS:
• Has multiple connections to 1 or more AS’s.
• Should use its own AS number and IP addressing.
• May also employ default routing.
• More complex policy is required.
• Usually medium-to-large enterprises or ISPs.
• A common requirement to run BGP with an ISP is that the AS be multi-homed.
• Can be either transit or non-transit.
Interior routing policies and protocols must be established within each AS, enabling it to route packets internally.
A Stub AS can usually have a default route to its parent. A Multi-homed AS may use either default-less routing, or
setup a default route to one of its neighbor AS's, but this will probably result in poor quality routing.
AS X is not providing transit services for any other AS’s. Recall that the definition of transit is when an AS
transports networks and traffic that are associated with another AS for a fee. In this case, the broadband and
enterprise services are not providing transit for an actual exterior AS. To the rest of the Internet, all of the
networks associated with those services have been originated by AS X, regardless of which entity those
networks serve.
AS X’s upstream transit providers (Tier 1 ISPs A and B) are providing transit services. They will advertise AS X’s
networks to the rest of the Internet. As a result, other AS’s in the Internet will send traffic to AS X via Tier 1
ISP’s A and B. Also, since both of the Tier 1’s are transit providers, the rest of the Internet still views AS X’s
networks as correctly originated from AS X (and not either ISP A or B).
It is typical for transit providers like AS X or AS Y to protect themselves, and the rest of the Internet, by limiting the
actual prefixes or networks that either stub AS can announce upstream. For example, what would happen if one of
these Enterprises announced a full Internet table to either AS X or AS Y? If you were running either AS X or AS Y,
would you want to send traffic for the rest of the Internet via either Enterprise? If you were running AS X or AS Y,
what would happen if an Enterprise sent you a prefix that they did not own or have any right to? Over the past 20
years, there have been several major outages caused on the Internet when an ISP announced prefixes that did not
belong to them, or that they had no right to originate or transit. When this happens, the Enterprise or ISP causing
the situation pulls or draws all of the traffic associated with that network towards it and away from the rightful
destination - in effect, it creates a blackhole for some traffic.
Neither AS X nor AS Y will want the other AS to announce their routes upstream. This would cause traffic to
transit the other AS from the upstream and would break the peering agreement. For example. AS X should not
announce AS Y’s prefixes (that it received via the peering session) to either of its upstream transit providers ISP A
or ISP B. Traffic should not flow from ISP A to AS X to AS Y, nor should traffic flow from AS Y to AS X to ISP A in
the other direction. In the latter case, AS X would have to be advertising prefixes from its upstream to AS Y.
Neither AS would want that as part of a typical peering agreement.
A session between 2 devices in different AS’s is referred to as an eBGP session. It is typical for devices that have
an eBGP session between them to be directly connected, to share a common data link, but it is not mandatory.
Because the devices are in different AS’s, the administration of each device is typically handled separately. Care
must be taken to ensure that the configuration parameters match, so that the peering will succeed.
A session between 2 devices in the same AS is referred to as an iBGP session. It is possible for devices that have
an iBGP session between them to not be directly connected. Because the devices are in the same AS, the
administration of each device is typically handled by the same organization. Care must still be taken, however, to
ensure that the configuration parameters match, so that the peering will succeed. With the devices locally
controlled, this is often an easier task.
To construct an AS with BGP, an ISP will typically implement a full mesh of BGP neighbors inside of the AS. This
will accomplish all of the benefits of an iBGP mesh, as illustrated on the slide above, and ensure smooth
operation. A full mesh will require N*(N-1)/2 sessions established in the AS. If you look closely above, ISP X will
have to run 15 separate sessions across its backbone.
BGP carries reachability information for prefixes that need to traverse the AS, and the IGP decides how to switch
or route the packets across the backbone between BGP speakers. Theoretically, an AS can choose to not
advertise any prefixes associated with its internal network and still be a fully functioning AS. The global BGP table
would successfully show the external BGP routes that the AS chooses to advertise, but a ping or traceroute
directly to or across the internal backbone would be unsuccessful because the Internet would have no way of
routing to it.
The definition of internal versus external routes is interesting and significant when describing BGP operations.
An internal network is associated with the actual addressing and related subnets used to provide system,
loopback, and link addressing, for the backbone of the AS itself. By that internal definition, an external network
is, therefore, defined as any networks that are brought into the AS via BGP, even those associated with the AS’s
own prefixes assigned by their RIR. For example, from a BGP point of view, prefixes associated with a broadband
service would be considered external networks (in SR OS, BGP does not learn anything about prefixes that may
originate from the AS itself by default; you have to export these with explicit edge policy). The BGP protocol can
be configured, however, to stamp such prefixes with attributes that will make items such as origin and ownership
very clear to the rest of the Internet. Though the routes are external from a protocol point of view, the rest of
the Internet will see the prefixes as belonging to the originating AS.
Note that, at each of the 3 steps above, regional ISP X has the opportunity to impose restrictions and policy. It can
choose to limit the prefixes it allows from external sources (like directly connected and static based prefixes) at
Router C. Over the iBGP session with Router B (and all other iBGP peers for that matter), Router C can set further
policy to set different types of metrics or BGP attributes associated with the broadband prefix 10.5.0.0/16. And
finally at Router B, the AS can impose additional policy to ensure that its upstream providers (Router 1) are treating
the AS X prefixes correctly. Router B will, in turn, advertise any networks it learns from its external peers or transit
providers back to the AS. This is another primary method of getting prefixes into BGP (via BGP neighbors); by
default, any routes learned via iBGP or eBGP will be advertised to other BGP neighbors.
There is one major exception to this default behavior, namely iBGP Split Horizon. By the iBGP Split Horizon rule
Router B will NOT re-advertise routes learned internally from any neighbor inside AS X to any other internal
neighbor. It assumed that a full mesh exists and that all BGP speakers in the AS have received the prefix
themselves. For example, Router B will not re-advertise that 10.5.0.0/16 prefix back to either router D or A or any
other internal peer.
Router D, however, can advertise the same prefix 10.5.0.0/16 to the rest of the AS if it is also adjacent to the same
network as Router C and is learning the prefix another way from iBGP. Each internal BGP neighbor will now receive
the same prefix from 2 routers. The BGP protocol is specifically designed to deal with this situation, where more
than one path exists to any given prefix. BGP will select a best path based on the local policy specified and the
default behaviors of its best path selection criteria.
1. The route for the broadband networks in AS X in Router B’s BGP table has a next-hop address of the router
C (BGP does not insist that neighbors be directly adjacent each other).
2. The IGP (IS-IS or OSPF) resolves the remote next-hop (Router C) to either of the directly attached networks,
Routers E or F (with recursion).
3. If Router B receives a packet for 10.5.1.1/32 from the Tier 1 ISP A, it forwards the packet to the locally
attached next-hop.
4. This internal router (either E or F) now has a route to the external network because it received it directly via
iBGP from Router C (because AS X is running a full iBGP mesh).
5. This process continues until the packet arrives at Router C, which forwards it out towards the destination.
This is the default packet flow, wherein the IGP decides which directly connected next hop to use. However, this
may not be the desired behavior.
The policy of ISP X is to send prefix 10.5/16 towards ISP A to this specific neighbor. As a result of this policy,
traffic will now flow from ISP A towards ISP X. With no other policy specified, sending the NLRI for the prefix alone
is all that is required to get traffic flowing. If no other route for 10.5/16 exists in the Internet table, whether ISP X
intended it or not, the entire Internet will now send traffic for any hosts addresses inside of 10.5/16 via the
Router 1 – Router B link through transit ISP A. This is the simplest policy implementation possible, but is also
actually the most significant. Recall that the longest prefix match is always used, regardless of vendor router
implementation; if a more specific route exists, or if there is only a single route to any given prefix, the Internet
will use it.
The solution to this problem can be either by injecting the external routes learned from BGP into IGP (Not
practical), or run BGP on all transit routers (R2, and R3) to learn the external destination (iBGP full mesh as
explained earlier). This substantially increases the control processing requirements for the transit routers and in a
large network requires many more iBGP peering sessions. An alternative solution is to use BGP shortcuts as
described in the following slide.
With MPLS shortcuts for BGP, MPLS tunnels are used to forward transit traffic across the network as shown in the
slide above. In this case, only the external facing routers (R1 and R4 in this case) need to run BGP and have
knowledge of the external routes. Transit traffic is sent in an MPLS tunnel to the next hop BGP router and is label-
switched across the AS. The transit routers (R2 and R3) do not need to know the external routes – they only label
switch the transit traffic.
The figure above shows only one MPLS tunnel – in reality there will be a full mesh of LSPs between all external
facing routers that have the full BGP routes. These LSPs can be signaled with either LDP or RSVP-TE.
If there is no valid tunnel to resolve the next-hop, the router uses native IP forwarding unless the disallow-igp
option is specified.
Neighbor relationships in BGP are somewhat different from what is normal in the IGP context. Traditionally, a
neighbor is always a directly connected router. With BGP, this is not the case. Neighbors may be directly
connected, but it is not required. Because of this, BGP relies on an IGP to route between peers that are not
directly connected.
BGP uses unicast TCP/IP for neighbor establishment. It is possible for neighbor relationships to be established
with any device that is IP-reachable. There is no guarantee that the neighbor relationship will succeed, because
factors such as firewalls or access control lists may prevent certain types of traffic from passing, but the
relationship is possible and likely to occur.
At the application layer, BGP functions similarly to TCP/IP applications such as Telnet, FTP, and HTTP. BGP is
viewed as an application because it uses registered port number 179 in the TCP/IP model.
Generic TCP/IP applications use a 3-way handshake for session establishment. After the session is established,
the applications exchange or negotiate a set of parameters for the session. In Telnet, for example, parameters
such as terminal types and passwords are typically negotiated. If application-level parameters are also
acceptable, a session is established at the application layer and data is exchanged. Periodic user data keeps the
session alive and, when the session is to be terminated, either user input or an inactivity timeout will cause the
application session to be torn down. TCP/IP initiates the 4-way session teardown.
Adding to the complexity of BGP is the size of the topology and routing tables, which are much larger than in an
IGP environment. The increased size of these tables means that factors such as CPU loading, memory utilization,
update generation, and route processing, have a far greater implication in BGP.
These factors, and others, affect convergence. Convergence may be viewed in two ways. Local convergence is the
time taken for a single router to receive and process all outstanding messages, and settle on a stable topology.
Network convergence is the time taken for all routers in the system to settle on a stable topology. In IGP terms,
the system is usually the local AS. In BGP terms, the system is the Internet.
Because the entire Internet is the scope of BGP, the administration is typically more complex than the
administration for a single AS.
An open message is used to initially request a BGP session with a peer and is the message that exchanges the
BGP parameters so that peers can determine whether their configuration parameters are compatible.
Update messages are used to exchange the routing information between peers.
Notification is the BGP term for error and is used to close down a peer session.
A keepalive message manages the TCP session in the case of inactivity and is also used to respond to an
open message from a peer.
A route-refresh message is an additional BGP capability that is negotiated by peers in the open exchange by
using the BGPv4 capabilities advertisement, as defined in RFC 3392. It is used to request that a BGP peer
resends the routes it advertised at the session establishment time.
Authentication is performed between neighboring routers before setting up the BGP session by verifying the
password. Authentication is performed using the MD-5 message based digest. The authentication key can be any
combination of ASCII characters up to 255 characters long.
The command authentication-key [authentication-key | hash-key] [hash | hash2] is used to configure the BGP
authentication key.
If both routers have established a session, the session initiated by the device with the lower BGP router ID is
terminated.
Periodic keepalive messages are exchanged to maintain the session.
Hold Time – This timer specifies the maximum time that BGP waits between successive messages (KEEPALIVE
or UPDATE) from its peer, before closing the connection. The default value is 90 seconds. The Hold Time is
sent in the BGP Open Message. Unlike some protocols, such as OSPF, if the hold time value does not match
between prospective neighbors, the value is negotiated to the lowest value proposed by either neighbor.
KeepAlive – A KEEPALIVE message is sent every time this timer expires. The keepalive timer is not negotiated
between BGP peers; it is configured locally. The keepalive value is generally one-third of the hold-time
interval. The default value is 30 seconds. Under the following circumstances, the configured keepalive value
is overridden by the hold-time value:
• If the specified keepalive value is greater than the configured hold-time, the specified value is ignored,
and the keepalive is set to one third of the current hold-time value.
• If the specified hold-time value is less than the configured keepalive value, the keepalive value is reset to
one third of the current hold-time value.
• If the hold-time interval is set to zero, the configured value of the keepalive value is ignored. This means
that the connection with the peer is up permanently and no keepalive packets are sent to the peer.
All of the above timers can be configured at the global level (applies to all peers), group level (applies to all peers
in peer-group), and neighbor level (only applies to specifier peer). The most specific value is used.
The above slide illustrates a successful set of transitions leading up to an established BGP peer or neighbor
session. Once this state is achieved, NLRI can be passed back and forth between peers. The “rec/act/sent”
indicates received, active, and sent prefixes to/from this established peer in the show router bgp summary
command.
Under normal neighbor establishment procedures, BGP peers can exist in one of six defined states. Idle is the
initial state, established is the operational state, and all other states are transitional. Peers that exist in one of
these transitional states for extended intervals usually indicate a connection problem.
Below is the list of possible Last Events in the show router bgp neighbor output. These events are the
trigger of BGP state transition.
Start – BGP has initialized the BGP neighbor
Stop – BGP has disabled the BGP neighbor
Open – BGP transport connection opened
Close – BGP transport connection closed
openFail – BGP transport connection failed to open
Error – BGP transport connection error
connectRetry – Connect retry timer expired
holdTime – Hold time timer expired
keepAlive – Keepalive timer expired
recvOpen – Receive an OPEN message
revKeepalive – Receive a KEEPALIVE message
recvUpdate – Receive an UPDATE message
recvNotify – Receive a NOTIFICATION message
None – No events have occurred
When neighbors initially establish a session, there is an exchange of BGP tables. After this exchange, it is desirable
to have as little routing-update activity as possible. In the absence of updates, the devices send periodic
keepalive messages to maintain the session.
A BGP update or keepalive message is expected so that the session will not be torn down. The receipt of either an
update or keepalive restarts the Hold timer.
If neither message arrives within the Hold-timer interval, both the BGP and TCP sessions are terminated.
The loss of a neighbor in BGP is a significant event. When the session is terminated, all routing information
learned from the neighbor is discarded, and the entire network (in BGP’s case, potentially the entire Internet)
must converge. When the neighbor session is reestablished, the TCP and BGP sessions are set up, a bidirectional
exchange of routes occurs, any inbound or outbound policy is applied, and the route-selection criteria are
evaluated for all entries. Best routes are then offered to the RTM, where a final decision based on preference is
made on each route before it is sent to the FIB.
Each BGP attribute is categorized into one of two main categories: well-known and optional.
The categorization of the attributes defines their behavior and handling in BGP.
The mandatory or discretionary classification relates to their presence in a particular BGP update message.
Mandatory attributes must be present in every BGP update; if a well-known mandatory attribute is missing, a
notification results.
A discretionary attribute can be present in the update; it is the sender’s choice to include it, based on its
meaning.
There are 3 well-known mandatory attributes defined in BGP, so there is always a minimum of 3 attributes in
every BGP update message.
If a device receives a recognized optional attribute, the update is accepted and processed, based on the meaning
of the attribute.
If a device receives an unrecognized transitive attribute, the update is accepted, even though the local router is
not aware of the meaning of the attribute. The router propagates the update and attribute (“transits” the
attribute) and sets the partial bit in the BGP message to 1, if not done previously.
If a device receives a non-transitive attribute, the router will rip the attribute right off regardless.The router
propagates the update, but not the attribute.
The attribute propagates in all future BGP updates for this prefix (in this example, across AS 65200 and AS
65250) and should never be modified.
This attribute list may contain zero, one, or more entries. The leftmost entry in the list is the neighboring AS that
sent the prefix into your AS. The rightmost entry in the list is the originating AS for the prefix. Intermediate
entries are transit AS’s that the update has passed through on its way to you.
The AS number of the sender is prepended to the list whenever the update crosses an AS boundary. If you view
the update inside of the originating AS, the list will be empty or null because the update has not yet passed
through an AS.
If a router receives an update that contains its local AS number, the update is flagged as a loop.
The implementation of AS_PATH is the hop count of BGP. Note that this hop count is not an indication of the
number of routers that the update has passed through, but of the number of AS’s that the update has passed
through, regardless of the actual number of routers.
Two new attributes, AS4_PATH and AS4_AGGREGATOR, are introduced in RFC 4893 that can be used to
propagate four-octet based AS path information across BGP speakers that do not support the four-octet AS
numbers. To preserve AS path information with 4-octet AS numbers across OLD BGP speakers, this document
defines a new AS path attribute, called AS4_PATH. This is an optional transitive attribute that contains the AS path
encoded with 4-octet AS numbers. The AS4_PATH attribute has the same semantics as the AS_PATH attribute,
except that it is optional transitive, and it carries 4-octet AS numbers.
The attribute propagates in all future BGP updates for this prefix (in this example, across AS 65200 and AS
65250), and each time the update crosses an AS boundary, the AS number of the sender is prepended to the
AS_PATH list.
The update crosses an AS boundary to arrive in AS 65200, so the AS_PATH attribute now contains 65100, the AS
number of the sender.
Similarly, when it arrives in AS 65250, the AS_PATH attribute now contains the sequence 65200 65100.
The AS_PATH, read from left to right, represents the sequence of AS’s that lead to the origin of the route.
To BGP, a hop is a single AS. Because the update is still in the same hop, there is no change to the AS_PATH
attribute.
1. When sending AS-PATH between 32-bit capable AS’s, the AS-PATH carries full 32-bit AS numbers
2. When sending AS-PATH from a 32-bit capable AS to a 16-bit capable AS, the AS-PATH carries only 16-bit AS
numbers, and 32-bit AS’s are converted to AS 23456. The Optional Transitive attribute AS4-PATH carries
the 32-bit AS numbers in sequence (Optional Transitive means that the receiving AS has to carry it and send
it, even if it does not understand it)
3. When sending AS-PATH from a 16-bit capable AS to a 32-bit capable AS, the AS-PATH carries only 16-bit AS
numbers (and AS4-PATH is also sent). The 16-bit only capable AS ignores the AS4-PATH attribute but sends
it anyway (transitive).
4. When receiving the AS4-Path in a 32-bit capable AS, the AS re-combines the AS-PATH and AS4-PATH into a
single AS-Path, which comprises only 32-bit AS numbers. For example, AS 230000 merges the two AS-PATH
and AS4-PATHs into: 250 150 235000 135000
When a BGP speaker advertises a route to an iBGP peer, the advertising speaker does not have to modify the
NEXT_HOP attribute associated with the route, but it can with next-hop-self. This ensures that iBGP peers can
always reach the next-hop addresses associated with an iBGP neighbor.
When a BGP speaker advertises a route to an eBGP peer, the advertising speaker will modify the NEXT_HOP
attribute associated with the route.
The typical behavior is to set the NEXT_HOP attribute to the IP address of the egress interface used to send the
update to the remote neighbor. There is no restriction to this action, so other scenarios are possible.
If the network is directly connected to the router that originated the prefix, the next-hop is not relevant locally (it
is directly connected), and it is not present in the local BGP table. If the prefix was learned from another router in
the same AS (not shown in the figure), the next-hop is the IP address of the originating router.
In either case, the border router sets the next-hop address to the interface used to reach the router in AS 65200
when it propagates the update.
The NEXT_HOP attribute propagates in all future BGP updates for this prefix (in this example, across AS 65200
and AS 65250, and each time the update crosses an AS boundary, the NEXT_HOP attribute is set to the IP
address of the egress interface used to send the update to the remote neighbor.
When the update is sent between routers in AS 65200, NEXT_HOP is unmodified by default; it remains the
address of the router in AS 65100.
When the update arrives in AS 65250, it crossed an AS boundary to get there, so the NEXT_HOP attribute now
contains the IP address of the eBGP router that sent the update to AS 65250.
When the update propagates into the receiving AS, NEXT_HOP is not modified. When the update is received by
Routers B, C, or D, the first check performed, before selecting a route as best, is whether the next-hop is
reachable. If the next-hop is unreachable, the route is never evaluated in the route-selection criteria.
10.1.1.1 may not be reachable because this network is external to the AS and is, therefore, unknown to the IGP.
Router B sets the next-hop address to the interface used to reach router Y in AS 65250 when it propagates the
update over the eBGP session.
Changing the iBGP default behavior, such that the next-hop address sent in the iBGP update becomes an internal
address, may be done by:
Configuring the neighbor or group with the next-hop-self command. This changes the next-hop to an
address of the sending router.
Configuring an export route policy for the sending peer, to modify the next-hop of the update to any desired
address. This policy configuration may become unmanageable as a result of the volume of updates and
policy complexity.
If the external next-hop address is to remain unchanged in the update, that network must be reachable internal
to the AS. This may be done by:
Configuring an export route policy to advertise (redistribute) the external next-hop interfaces into the IGP.
Configuring the external next-hop interfaces as passive interfaces in the IGP configuration.
Static routing, which is possible but not scalable.
The external next-hop interfaces are usually directly connected to the edge router or routers, and configuration
is required on each one. Filters should also be used to allow only the interfaces that are used as next-hops to be
redistributed.
This is primarily used by an AS edge router when propagating routes received via eBGP to its iBGP peers, if the
eBGP next-hop is unreachable for BGP routers within the AS. It can also set the next-hop to a system address in
iBGP to load-share between redundant physical paths, or to avoid third-party next-hops when connected to a
multi-access network.
Router B sets the next-hop address to the interface used to reach router Y in AS 65250 when it propagates the
update over the eBGP session.
TTL Security Hack (TSH) implementation supports the ability to configure TTL security per BGP peer and evaluate
the incoming TTL value against the configured TTL value. If the incoming TTL value is less than the configured TTL
value, the packets are discarded and a log is generated
ttl-security command is used to configure TTL security parameters for incoming packets. When the feature is
enabled, BGP will accept incoming IP packets from a peer only if the TTL value in the packet is greater than or
equal to the minimum TTL value configured for that peer.
The TTL of iBGP sessions is set to 64 to allow the BGP control packets to reach neighbors that are not directly
connected to the sending router.
LOCAL_PREF is only used in iBGP. A BGP speaker does not include this attribute in update messages that it sends
to its eBGP neighbors. If LOCAL_PREF is contained in an update message that is received from an eBGP neighbor,
this attribute is ignored by the receiving speaker.
The purpose of the ATOMIC_AGGREGATE attribute is to alert BGP speakers along the path that some information
have been lost due to the route aggregation process and that the aggregate path might not be the best path to
the destination. When some routes are aggregated by an aggregator, the aggregator does attach its Router-ID to
the aggregated route into the AGGREGATOR_ID attribute and it sets the ATOMIC_AGGREGATE attribute or not;
based on whether the AS_PATH information of the aggregated routes were preserved or not.
When a BGP speaker aggregates several routes for the purpose of advertisement to a particular peer, the
AS_PATH of the aggregated route normally includes an AS_SET formed from the set of ASes from which the
aggregate was formed. In many cases, the network administrator can determine if the aggregate can safely be
advertised without the AS_SET, and without forming route loops. If an aggregate excludes at least some of the AS
numbers present in the AS_PATH of the routes that are aggregated as a result of dropping the AS_SET, the
aggregated route, when advertised to the peer, should include the ATOMIC_AGGREGATE attribute.
Note on RFC 1997 communities and 4 byte ASNs: The original community attribute defined in RFC 1997 specified
only 2 bytes for the ASN part of the community. It has since been decided by the IETF that if an AS utilizing a 4
byte ASN wants to send a community, it must use extended communities (RFC 4360) to do so; extended
communities allow for 4 Byte ASN’s. Most public peering and transit policies still rely on the older, 2 byte based
communities in their policies. Also, extended communities are not interpreted like 2 byte based communities
(RFC 1997). For example, when aggregating networks, it was typical to see any communities associated with the
more specific routes reflected in the aggregate. This does not happen if you use extended communities.
• Note that no peer does not have a keyword in SR OS, but you can specify it manually with the
community “nopeer” “65535:65284” command. All well-known communities can be specified this
way.
“no-peer” is used in situations where traffic engineering control over a more specific prefix is required, but to
constrain its propagation only to transit providers and not peers. That is, the prefix is advertised from AS to AS
provided there is a transit/customer relationship, unlike “no-export”, which restricts propagation of the prefix to
only the adjacent AS.
The above summary does not cover all possible BGP attributes; the IANA defines other BGP attributes.
6. What do the terms upstream and downstream mean, from an Internet Architecture point of view?
The terms upstream and downstream indicate the relative proximity of a network to the core of the
Internet, formed by Tier 1 providers and Internet Exchanges. For example, DSL and Enterprise customers
are downstream of their Tier 3 or Tier 2 providers, which are themselves downstream from their own
upstream Tier 1 providers.
13. How are AS’s identified on the Internet? How are AS numbers allocated and assigned?
AS’s are identified using either a 2 or 4 byte number. AS’s are allocated by the ICANN/IANA to regional
registries who assign them to actual ISPs and Enterprises according to RFC-1930.
20.What steps are necessary for BGP to propagate external routes across the AS? What should iBGP peers
avoid doing?
Step 1 - The edge router brings the NLRI or prefix into BGP (in SR OS via an export policy or via another BGP
neighbor). Step 2 - The same edge router announces the prefix to all of its iBGP peers. Step 3 - Those
receiving iBGP peers announce the route to their eBGP peers. According to BGP Split Horizon, receiving iBGP
peers should not re-advertise the prefix to each other.
24. What are the five BGP message types and their basic functions?
The 5 message types are Open (exchanges and negotiates neighbor capability), Updates (transfers or
withdraws NLRI), Notification (transmits errors), Keepalive (sustains the session), Route-Refresh (allows BGP
to selectively request re-sends of NLRI).
25. Describe the neighbor establishment phases and parameters required for established neighbors.
The two main phases are TCP (bringing up the transport session) and BGP Capabilities Exchange. BGP
requires that all parameters match during the capabilities exchange including BGP version, AS numbers, Hold
time values, Router IDs and any other optional parameters that need to be negotiated.
28. What is necessary for the successful transition from OpenSent to OpenConfirm, and OpenConfirm to
Established?
To reach an OpenConfirm state, the BGP neighbor must receive an OPEN message with the correct neighbor
capability parameters. To reach Established, the neighbor must receive a keepalive message.
30. What does the transitive property of some optional BGP attributes provide?
The transitive property provides a way for implementations that have not implemented certain optional BGP
attributes to at least be able to either preserve (transitive), or not preserve (non-transitive), the attribute
when sending updates on to other BGP speakers. In either case, the implementation still accepts and sends
the NLRI associated with the optional attribute which it ignores.
32. What is the significance of the location of an AS number inside the AS_Path (from left to right)?
The leftmost part of the AS_PATH is where each transit AS prepends their own AS number. The originating AS
will be located at the right most part of the AS_PATH.
35. What is a common method to stabilize and scale BGP, so it does not create dependencies on other
networks?
By implementing the next-hop-self feature and using only internal (to the AS) address space to move
packets between iBGP peers.
This course is part of the Nokia Service Routing Certification (SRC) Program. See www.networks.nokia.com/src for
more information on the SRC program.
To locate additional information relating to the topics presented in this manual, refer to the following:
Technical Practices for the specific product
Internet Standards documentation such as protocol standards bodies, RFCs, and IETF drafts
Technical support pages of the Nokia website located at: http://www.networks.nokia.com/support
For each destination in the RIB, the routing protocol selects the best route based on the lowest metric. These
best routes are sent to the RTM.
Multiple routes to the same destination can be learned by the router. If these routes are learned from the same
routing protocol, the metric for the protocol is used as a selection criterion. The route with the lowest metric is
selected as the best route and is sent to the RTM.
If multiple routing protocols are in use, each protocol selects its best route, based on the lowest metric from its
RIB. At this point, there are multiple best routes, (one from each protocol), and each protocol sends its best route
to the RTM.
The RTM can choose only one of these best routes because there can be only one best route for each destination
in the routing table.
Different protocols should not be configured with the same preference. If this occurs, the tiebreaker is based on
the default preference table (shown on the following page).
If the RTM learns multiple routes from the same protocol, and the metrics are equal, the best route decision is
determined by the configuration of ECMP in the config>router context.
The best routes from the RTM are placed in the FIB and RT (Routing Table).
The FIB is distributed to the various IOMs on the Nokia 7750 SR.
Adj-RIBs-In (abbreviated to RIB-IN above) – This database comprises updates received from BGP
neighbors as input to the BGP decision process (prior to applying ingress policies).
Loc-RIB – This database results when BGP selects its best path and submits it to the RTM.
Adj-RIBs-Out (abbreviated to RIB-OUT above) – This database comprises only the subset of best paths
placed in Loc-RIB, and processes them based on the export policies applied to BGP neighbors.
Export policy controls both the routes that are sent into BGP from other protocols and the routes that are
propagated to BGP neighbors.
For the local router, strict control is required to ensure that only public networks are reachable externally and are
exported from the IGP. This helps to ensure that restricted or private internal networks are not compromised by
packets originating from outside the domain.
With export route policies, the BGP neighbor also benefits from the reduction of BGP updates. These benefits
include:
Reduced control plane traffic on the physical links between the neighbors.
Reduced control plane processing for the neighbor that manages the BGP updates.
Less memory is required, because tables are smaller.
For the local router, the BGP overhead should be reduced because there are fewer updates to process. Also, less
control plane processing is required, and table sizes are reduced.
The physical links between the routers also experience decreased control plane traffic.
Proper configuration of the import policy also protects the local AS from invalid or unwanted updates that may be
propagated from networks as a result of neighbor misconfigurations or a potential attack by a hacker attempting
a flooding or DoS attack on the BGP router.
It is important to understand that a longer prefix match automatically makes any given prefix better than a similar
shorter length prefix, regardless of which BGP attributes are set. the longest prefix match is always used,
regardless of vendor router implementation; if a more specific route exists, or if there is only a single route to any
given prefix, the Internet will use it.
A route is not considered if it does not have the valid flag associated with it, if it contains an AS_PATH loop, or if
the next-hop is unreachable.
For each prefix in the BGP route table, all entries for that prefix are compared, using the route selection criteria,
to choose the best route for that prefix.
The “Multipath” command can be used to allow BGP to load shares traffic across multiple links. Multipath can be
configured to load share traffic across a maximum of 32 routes. If the equal cost routes available are more than
the configured value, then routes with the lowest next-hop IP address value are chosen.
As a result of the last statement, an export policy must be specified for local networks to be reachable by external
AS’s.
With only these defaults set, the BGP selection process cannot take advantage of any of the key BGP attributes
that can produce the most desirable outcomes, as described on the slide above. Following modules describe how
to plan, design, and configure BGP and the IGP to accomplish all of these outcomes.
BGP instances, groups, and neighbors are all created in the administratively-enabled state.
If a BGP router ID is not specified, BGP uses the router system interface address as the router ID. Although
this serves as a valid router ID for BGP, best practice is to explicitly configure a router ID value in the BGP
instance.
The Nokia 7750 SR OS BGP timer defaults are the values recommended in IETF drafts and RFCs. Timer
settings may be found in the BGP section of Nokia 7750 SR OS Routing Protocols Guide.
If no import route-policy statements are specified, all BGP routes are accepted.
If no export route-policy statements are specified, all BGP routes are advertised, and all non-BGP routes are
not advertised.
An export route policy must be defined to explicitly allow local networks, for example, IGP, static, direct, and
aggregate, to be reachable outside the AS.
Prepare a plan that describes the AS. Keep a diagram and documentation available, with information such as
AS numbers, router IDs, IP addresses, physical links, and peering arrangements.
The BGP speaking router must have a router ID. Remember that if the router ID is not explicitly configured,
BGP uses the router’s system interface address.
Define at least one peer group containing at least one neighbor. Define neighbors and associate each
neighbor with a peer group (each neighbor must belong to a group). The local IP address used for session
establishment with the group or neighbor is optional; the default address is used if the local IP address is not
configured.
When defining neighbors, specify the AS number associated with each remote neighbor.
Within the three levels, many configuration commands are repeated. For repeated commands, the command that
is most specific to the neighboring router is used. In other words, neighbor settings take precedence over group
settings, and group settings take precedence over BGP global settings.
Is associated with the network entity, such as a specific router or switch, and not a specific interface.
Is used to preserve connectivity when routing reconvergence is possible and when an interface fails or is
removed.
At the group level, changing the AS number causes BGP to re-establish peer relationships with all peers in the
group with the new local AS number.
At the neighbor level, changing the AS number causes BGP to re-establish a peer relationship with the new local
AS number.
If the router ID is not manually configured, the system interface IP address acts as the router ID.
If neither the router ID nor the system interface address is configured, the BGP peering will not be established.
The best practice is to have a unique BGP router ID value configured in the BGP instance.
If you configure a new router ID in the config>router-id context, protocols are not automatically restarted
with the new router ID. The next time a protocol is initialized or reinitialized, the new router ID is used. Therefore,
there may be a period when different protocols use different router IDs.
Recall that there are mandatory BGP configurations on the Nokia 7750 SR:
a minimum of one group must be defined
the group must contain at least one neighbor
all neighbors must belong to a group.
Individual parameters may also be applied to a specific neighbor to override group settings (because the most
specific configuration applies) or to assign a unique parameter to one member of the group.
The system address of router R5 is 10.16.10.5/32. When router R5 originates a BGP update to router R1 across
the iBGP session, the next- hop of the update is set to the sending router’s system address. When router R1
forwards a packet to this destination, the packet is sent to the next-hop, which is the link between routers R1 and
R5.
The IGP may have multiple paths to 10.16.10.5/32 in the routing table. If the links in the above slide are all equal-
metric and ECMP is configured, there are two available paths from router R2 to 10.16.10.5: one via router R6 to
router R5, and the other via router R1 to router R5. Packets sent to this next-hop share the available paths,
based on the ECMP algorithm.
The same behavior can be extended to customers and networks that are using AS 65540 to transit their data.
Routers in AS 65540 must simply reset the next-hops associated with external AS networks to themselves (using
the next-hop-self command).
Neighbors defined in the group will inherit all group parameters. However, group parameters may be overridden
by explicit configuration at the lower neighbor level.
The local-address command configures the local IP address used by the group or neighbor when
communicating with BGP peers.
Outgoing connections use the local-address as the source of the TCP connection when initiating connections with
a peer.
When a local address is not specified, the router uses the system IP address to communicate with iBGP peers and
uses the interface address to communicate with directly connected eBGP peers. This command is used at the
neighbor level to revert to the value defined under the group level.
Description: This command displays BGP neighbor information. The command can be entered with or
without parameters. When the command is issued without parameters, information about all
BGP peers is displayed.
When the command is issued with a specific IP address or AS number, only information about
the specific peer, or peers with the same AS number, is displayed.
The State field displays the BGP peer’s protocol state. In addition to standard protocol
states, this field can also display the Disabled operational state, which indicates when the peer
is operationally disabled and must be restarted by the operator.
The “State” column indicates whether the session is established. If the session is established, the number of
routes “Received,” “Active,” and “Sent” will be displayed. Until the neighbors are established, the “State” column
will show the state of the BGP neighbor (Idle, Connect, or Active).
The output in the slide shows that there are no BGP routes received or advertised between the iBGP neighbors.
Remember to use the commit command for the policy modification take effect.
To display all committed route policies, use the command show router policy.
To display information for the specified policy, use the command show router policy <policy
name>.
To display information for all policies, use the command show router policy admin.
Locally exported entries (from IGP or directly connected networks) do not appear in the Loc-RIB; they appear in the RIB-OUT.
Therefore, in order to see the locally exported routes on R5 you can use either show router BGP route < > hunt or
show router bgp neighbor < > advertised routes as shown below:
*A:R5# show router bgp neighbor 10.16.10.6 advertised-routes
===============================================================================
BGP Router ID:10.16.10.5 AS:65540 Local AS:65540
===============================================================================
BGP IPv4 Routes
===============================================================================
Flag Network LocalPref MED
Nexthop (Router) Path-Id Label
As-Path
-------------------------------------------------------------------------------
i 10.16.0.0/30 100 None
10.16.10.5 None -
No As-Path
i 10.16.0.20/30 100 None
10.16.10.5 None -
No As-Path
i 10.16.10.5/32 100 None
10.16.10.5 None -
No As-Path
i 192.168.1.8/29 100 None
10.16.10.5 None -
No As-Path
-------------------------------------------------------------------------------
Routes : 4
The above slide illustrates the show router route-table command output, which displays the Nokia 7750
SR FIB.
The following are included in the list:
the destination prefix
the next-hop IP address
the type of route
the protocol that provided the route to the FIB
the time since the route was learned or refreshed, the metric of the route (specific to each protocol)
the assigned preference value.
The Protocol field shows the routing protocol that provided the best route to the RTM. Routes are installed in the
FIB if they are selected by the RTM as used.
Note that for local, that is directly connected entries, the next-hop indicates an interface of the local Nokia 7750
SR without an IP next-hop. Packets for these destinations are not sent to another router; they are sent on the
local broadcast domain to the destination itself. For non-local entries, the next-hop field shows the IP next-hop
address, that is the next router toward the destination.
Note that a router potentially has several next-hops available toward a given destination. If ECMP is configured to
a non-default value, as in this case where ecmp = 2, and more than one route is learned for the same destination,
from the same protocol, and with the same metric value, the router places the configured number of routes into
the FIB. OSPF, IS-IS, and BGP support up to 16 equal-cost paths per destination.
The configure router ecmp < ecmp max-ecmp-routes> command enables ECMP and configures the
number of routes for path sharing. For example, the value 2 means that two equal-cost routes will be used for
cost-sharing. ECMP can only be used for routes learned with the same preference and protocol. When there are
more ECMP routes are available at the best preference than are configured in max-ecmp-routes, the lowest next-
hop IP address algorithm is used to select the number of routes configured in max-ecmp-routes.
The [no] form of the command disables ECMP. If ECMP is disabled and multiple routes are available at the best
preference and equal cost, the route with the lowest next-hop IP address is used.
A prefix list can be used as a further discriminator. With it, only the direct interfaces that match the pre-defined
prefix-list will be allowed.
The exact keyword defines that the specific entry should be interpreted as an exact match of the prefix. The
longer keyword defines a prefix with a longer network/subnet bits than the prefix specified.
The policy in the slide above will advertise directly connected networks to BGP peers, but only those qualified
through prefix-list “Loop_0”. You can see the beginnings of the routing policy that will be used in AS 65540 here.
In labs, Router R5 is a service edge router that is responsible for bringing NLRI (customer networks) into BGP.
If the egress interface is not known, a second routing table lookup is performed. This additional lookup is called a
recursive lookup. This time, the lookup is not performed on the packet’s destination IP address, but on the next-
hop address. The next-hop address returned from the original packet lookup is matched to the FIB, based on
longest-match routing. If the lookup of the next-hop address returns an interface, the packet is encapsulated and
forwarded via the specified interface.
The Nokia 7750 SR uses a two-step process to resolve the directly connected next-hop address associated with a
BGP next-hop.
Every BGP route is advertised with a BGP next-hop. The BGP process determines which egress interface to use to
get to this next-hop. This is the resolved next-hop address.
The network processor on the Nokia 7750 SR is able to resolve the BGP next-hop in real time on the fast path.
The advantage of resolving the next-hop on the fast path is that updates take effect instantaneously. This offers
Nokia a significant performance advantage in next-hop resolution and convergence.
The show router route-table command output shows that the next-hop is not a direct interface, because
the route to 10.16.10.5/32 is remote and was learned via IS-IS. This means that the BGP next-hop is a remote or
indirect router. BGP must determine the physical next-hop address before installing the route for
192.168.1.8/29 in the FIB.
BGP does this by performing a FIB lookup on the BGP next-hop. The FIB shows that the next-hop for the route to
10.16.10.5/32 is 10.16.0.5. Because a direct interface was not returned, address 10.16.0.5 must be looked up in
the FIB. The FIB then shows that it is connected to a local physical interface, toR6.
This recursion process resolves the BGP next-hop of 10.16.10.5 to the physical next-hop of 10.16.0.5 (address of
the IGP neighbor on interface toR6).
Therefore, the BGP route 192.168.1.8/29 is installed in the BGP Route Table, with a next-hop of 10.16.10.5. The
route is then offered to the Route Table Manager, which installs 192.168.1.8/29 into the FIB with a next-hop of
10.16.0.5.
If there are multiple local next-hops resolved by the IGP (Equal Cost Multipath in effect), the router will load share
packets to a remote BGP neighbor, even though BGP has only one best path.
Note that the actual load sharing performed is flow-based. A specific flow is calculated by performing a hash of
source/destination IP addresses (in case of IP packets) to cause the router to forward packets out of a specific
local interface. With BGP networks, care should be taken to design the IGP so that flows take the correct paths
through the local Autonomous System. The following slides illustrate that this concept.
The loop-detect discard-route command discards routes that are received from a peer with the same AS
number as the router. This option prevents routes that have been looped back to the router from being
added to the RIB and consuming memory. When this option is changed, the change is not active for an
established peer until the connection is re-established. Loop detection is covered in more detail in the
following slides.
If the loop-detect discard-route was not configured, then router R2 would have the following routes in the RIB:
Since the origin, MED and AS paths are all equal, the router prefers the eBGP peer over the iBGP peer.
Recall the BGP route selection criteria. If the entry is valid and loop-free and the next-hop is reachable, prefer
the route with:
1. Higher Local Preference
2. Higher sum of aigp metric and cost, if aigp metric applies
3. Shorter AS path
4. Lower origin code
5. Lower MED
6. Route that was learned from an eBGP peer before one learned from an iBGP peer
7. Lower IGP cost to the next-hop
8. Lower next hop type
9. Lower BGP router ID
10.Shorter cluster list
11.Lower peer IP address
Context: config>router>bgp
config>router>bgp>group
config>router>bgp>group>neighbor
Description: This command configures how the BGP peer session handles loop detection in the AS path.
The configuration parameter can be set at three levels:
• global level (applies to all peers)
• group level (applies to all peers in peer-group)
• neighbor level (only applies to specified peer).
The most specific value is used. Note that dynamic configuration changes of loop-detect are
not recognized.
The [no] form of the command at the global level reverts to the default, which is loop-detect ignore-loop.
The [no] form of the command at the group level reverts to the value defined at the global level.
The [no] form of the command at the neighbor level reverts to the value defined at the group level.
Parameters: drop-peer — Sends a notification message to the remote peer and drops the session
discard-route — Discards routes that are received from a peer with the same AS number as the
router itself. This option prevents routes that have been looped back to the router from being
added to the RIB and consuming memory. When this option is changed, the change is not active
for an established peer until the connection is reestablished.
ignore-loop — Ignores routes with loops in the AS path, but maintains peering.
Description: This command creates an aggregate route. It is used to group a number of routes with common prefixes into a
single entry in the routing table. This reduces the number of routes advertised by the router and the number of
routes in the routing tables of downstream routers.
Both the original components and the aggregated route (source-protocol aggregate) are offered
to the RTM. Subsequent policies can be configured to assign protocol-specific (BGP, IS-IS, or OSPF)
characteristics, such as the route type or OSPF tag, to aggregate routes.
Multiple entries with the same prefix, but a different mask, can be configured. For example, routes are
aggregated to the longest mask. If one aggregate is configured as 10.0.0.0/16 and another as 10.0.0.0/24, route
10.0.128.0/17 would be aggregated into 10.0.0.0/16 and route 10.0.0.128/25 would be aggregated into
10.0.0.0/24. If multiple entries are made with the same prefix and mask, the previous entry is overwritten.
Parameters: ip-prefix — The destination address of the aggregate route, in dotted-decimal notation.
mask — The mask associated with the network address, length range 0 to 32.
summary-only — This optional parameter suppresses the advertisement of more specific
component routes for the aggregate. To remove the summary-only option, enter the aggregate command
without the summary-only parameter.
as-set — This optional parameter is only applicable to BGP. It creates an aggregate in which the path that is
advertised for the route is an AS_SET, which consists of all elements contained in all paths that are being
summarized. Use this feature carefully. Aggregating several paths can result in the constant withdrawal and
insertion of AS paths as associated component routes of the aggregate that are experiencing changes.
aggregator as-number:ip-address — This optional parameter specifies the BGP aggregator path attribute
for the aggregate route. When configuring the aggregator, enter a 2-octet AS number to form
the aggregate route, followed by the IP address of the BGP system that created the aggregate route.
Note that RFC 6472, Recommendation for Not Using AS_SET and AS_CONFED_SET in BGP, recommends operators not to use
aggregate routes and AS_SETs to simplify the design and implementation of BGP and to make the semantics of the originator of a
route more clear. Refer to the RFC for more detail.
Description: This command configures the BGP authentication key. Authentication is performed before setting
up the BGP session. It is done between neighboring routers by verifying a password.
Authentication uses the MD5 message-based digest. The authentication key can be any combination of ASCII
characters, up to 255 characters.
Parameters: authentication-key — The authentication key; is any combination of ASCII characters up to 255
characters (unencrypted). If spaces are used in the string, the entire string is enclosed in double
quotation marks.
hash-key — The hash key; is any combination of ASCII characters up to 342 characters
(encrypted). If spaces are used in the string, the entire string is enclosed in double quotation
marks. This is useful when a user must configure the parameter, but for security purposes, the
actual unencrypted key value is not provided.
hash — Specifies that the key is entered in an encrypted form. If the hash parameter is not
used, the key is assumed to be in a non-encrypted, clear-text form. For security, all keys stored
in the configuration file are encrypted, with the hash parameter specified.
hash2 — Specifies that the key is entered in a more complex encrypted form. If the hash2
parameter is not used, the less-encrypted hash form is assumed.
Description: This command triggers route policy re-evaluation. By default, when a change is made to a
policy in the config router policy options context and is then committed, it is
effective immediately. However, there may be circumstances in which the changes should or
must be delayed. For example, if a policy change that would affect every BGP peer on the 7750
SR is implemented, the consequences could be dramatic. It is more effective to control changes on a peer-by-
peer basis.
If the triggered-policy command is enabled, a given peer is established, and you want that
peer to remain up for a change to a route policy to take effect, you must use a clear
command with the soft or soft-inbound option. In other words, when triggered-policy is
enabled, a routing policy change or policy-assignment change in the protocol will not take
effect until the protocol is reset, or a clear command is issued to re-evaluate route policies,
for example, clear router bgp neighbor x.x.x.x soft. This keeps the peer up, and
the change made to a route policy is applied only to that peer or group of peers.
Parameters: ip-addr — Resets the BGP neighbor with the specified IP address
as as-number — Resets all BGP neighbors with the specified peer AS number
external — Resets all eBGP neighbors
all — Resets all BGP neighbors
soft — The specified BGP neighbors reevaluate all routes in the Local-RIB against the
configured export policies.
soft-inbound — The specified BGP neighbors reevaluate all routes in the RIB-In against the
configured import policies.
statistics — The BGP neighbor statistics
Syntax: protocol
Context: clear>router>bgp
IP version 6 (IPv6) is a new version of the Internet Protocol, designed to be the successor to IP version 4 (IPv4).
Deploying IPv6 is the solution to the IPv4 address shortage. IPv6 is endorsed and implemented by all Internet
technical standards bodies and network equipment vendors. It encompasses many design improvements,
including the replacement of the 32-bit IPv4 address format with a 128-bit address for a capacity of about
3.4×1038 addresses. IPv6 has been actively deployed since June 2006.
There are three types of IPv6 addresses: Unicast, Anycast, and Multicast.
A Unicast address identifies a single interface. A packet destined for a Unicast address is delivered to the
interface identified by that Unicast address.
An Anycast address identifies a set of interfaces. A packet destined for an Anycast address is delivered to
the nearest interface identified by that Anycast address.
A Multicast address identifies a set of interfaces. A packet destined for a Multicast address is delivered to
all the interfaces identified by that Multicast address. There are no broadcast addresses in IPv6.
Since the IPv6 address is 128 bits, there are a number of conventions used to shorten them as much as possible.
1.Addresses are written in groups of four hex digits, separated by a single colon. For example,
2001:0db8:0000:0000:0021:0000:4ab9:0300.
2.One or more groups of zeroes can be replaced by two colons. The number above becomes:
2001:0db8::0021:0000: 4ab9:0300.
3.Only one group of zeroes can be replaced with double colons. Otherwise it would not possible to tell where the
zeroes are located. However, leading zeroes in a group can also be omitted. The address above becomes
2001:db8::21:0: 4ab9:300.
Globally-routed IPv6 address are allocated the address space 2000::/3. An ISP is typically allocated a network
assignment of /32 or larger. ( larger assignment means the prefix will be smaller, such as /31 or /30, and hence
will have a larger network range). An individual enterprise typically receives an assignment of /48 or larger. Since
an assignment of /48 has 16 bits available for the local subnet, this provides 65,536 individual subnets.
The interface ID portion of the address is assigned locally, but can be automatically derived from the 48 bit MAC
address. It may also be assigned by a DHCPv6 server, through an auto-discovery mechanism, or assigned
manually.
To derive an IPv6 interface ID from the MAC address, create a modified EUI-64 (Extended Unique Identifier-64).
To do this, flip the seventh most significant bit of the OUI (Organizationally Unique Identifier) and insert the hex
string ff:fe between the 3 bytes of the OUI and the 3 bytes of the NIC-specific component.
For example, assume an organization is assigned the prefix 2001:db8/48. The organization has 16 bits for
subnetting. Perhaps they have 30 locations and decide to assign the first 8 bits based on the location, and the
next 8 based on the subnet at that location. Subnet 10 at location 3 gives a subnet value of 030a, for a routing
prefix of 2001:db8:0:30a::/64. With the modified EUI-64 assignment, the host with MAC address
00:16:4d:13:5c:ae has an interface ID of 0216:4dff:fe13:5cae. The resulting IPv6 address is
2001:db8::30a:216:4dff:fe13:5cae.
::/128 is the unspecified host address (all zeroes). This address may be used until an address is assigned
to the device.
::1/128 is the loopback address (all zeroes except the last bit). This corresponds to the address 127.0.0.1
in IPv4.
::/0 is the default unicast route (the same as 0.0.0.0/0 in IPv4).
fe80::/64 is the prefix for the link-local address (binary 1111111010 followed by 54 zeroes). IPv6
requires that every IPv6 interface have a link-local address. This is not a valid routing prefix and is only
used for communications on the local link.
Typically, the link-local interface ID is assigned the same value as the global interface ID, which means using the
modified EUI-64 address. For the global address 2001:db8:0:30a:216:4dff:fe13:5cae, the link-local address
would be fe80::216:4dff:fe13:5cae.
fc00::/7 defines a range known as Unique Local Addresses (ULA, RFC 4193). These are addresses
intended to be used on a private network and not routed on the global Internet (similar to private
addresses in IPv4). The ULA range is split into two ranges, depending on the value of the eighth bit.
fd00::/8 is intended to be used as a 48-bit prefix with the remaining 40 bits self-assigned using a
pseudo-random generator. This means that even though addresses are self-assigned, the probability of
two networks sharing the same prefix is very small. This is intended to make it easier to interconnect
privately-addressed networks.
fc00::/8 addresses are intended to have the remaining 40 bits allocated by a registrar to provide globally
unique private addresses, although the mechanism is not yet defined (draft-hain-ipv6-ulac-02) at the
time of writing.
::ffff:0:0/96 is a prefix for IPv4-mapped IPv6 addresses. This provides an IPv6 address space that can be
used by native IPv4 applications. It is acceptable to use the standard IPv4 notation for the low order 32
bits of the address. For example 192.168.0.1 is mapped to the IPv6 address ::ffff:0:0:192.168.0.1.
To use IPv6 on the Nokia 7750 SR, you must first enable chassis mode “c” or higher. IPv6 is only supported on
IOM2s or newer. As soon as we enable IPv6 on the interface, a link-local address is automatically assigned based
on the modified EUI-64 address. If it is not necessary to route to the interfaces, we do not need to assign them
global routing addresses ; we can simply use the link-local addresses.
IPv6 defines state for prefixes. They can be tentative, preferred, duplicated or deprecated. An address should be
preferred, which means that it can be used without restrictions. An IPv6 interface performs duplicate address
detection and the state of the prefix is Tentative until the address has been confirmed as unique.
Multiprotocol BGP extensions support the advertisement of IPv6 prefixes over the BGP sessions established
between two BGP speakers using either the IPv4 or the IPv6 address. Similar to IPv4 networks, IPv6 networks
should also be injected into BGP for a BGP speaker to advertise the network to its peers.
The BGP Open message contains a field for the router ID. This field is 4 bytes long. There is no particular
requirement that this address be reachable or even an actual IPv4 address, only that it be a unique 32-bit
number.
The router generates the router ID automatically based on IPv4 addressing configured on the router; the address
configured on the loopback interface; or, if there is no loopback, the highest IPv4 address on any of the
interfaces.
In a pure IPv6 deployment, no IPv4 addressing is configured. This provides nothing for the router to use to build a
router ID. In this case, the router ID must be manually configured under the BGP process. If there is no router ID,
BGP sessions do not form.
The other component of BGP that requires a unique 4-byte number is the cluster ID, used on route reflectors.
The cluster ID is carried with the NLRI in the BGP UPDATE messages. If a router ID is configured, this value is used
for the cluster ID. The cluster ID can also be configured independent of the router ID. The originator ID attribute
is also a 4-byte value that is used with route reflection. The manual configuration of a 4-byte router ID provides
the value for the originator ID.
IPv6 needs to be enabled for the IGP used to support BGP for IPv6.
When configuring an eBGP session, link local or global IPv6 address can be used. When link local address is used,
next-hop-self is not required because the BGP speaker that advertises a route to its internal peer automatically
changes the next hop address from link local to the global address. When the global address is used for
configuring the eBGP session, then next-hop-self is required similar to IPv4 configuration.
Notice that we still have the IPv4 configuration, therefore we do not need to configure the BGP router ID. The
router generates the router ID automatically based on IPv4 addressing configured on the router.
Note that IPv6 needs to be enabled for the IGP within each AS. In this case IPv6 for ISIS is enabled.
Notice also that next-hop-self is not required when link local addresses are used for the eBGP sessions.
Modified Attributes
Network : 2001:DB8:A:301::1/128
Nexthop : 2001:DB8:13::3
Path Id : None
From : 2001:DB8:13::3
Res. Nexthop : 2001:DB8:13::3
Local Pref. : None Interface Name : toR3
<output omitted>
Flags : Used Valid Best IGP
Route Source : External
AS-Path : 65550
-------------------------------------------------------------------------------
Routes : 1
Notice that –interface is required when configuring the BGP neighbor using link local address
In this case next-hop-self is not required on iBGP sessions of routers R1 and R2.
3. What does BGP send from its Loc-RIB to the BGP export policy process?
BGP sends its best used routes as input to the export policy process (if any). If no export policy exists, the
routes are simply installed into RIB-OUT and sent to both internal and external peers.
4. Assuming a single prefix is the longest match and has the highest local preference, what must also be true for
the route to be installed into Loc-RIB?
The route must be valid and must not have any AS-Path loops present.
6. What guarantees the most traffic control with BGP? What is the danger associated with this activity?
Announcing routes with longer prefix sizes. This is not necessarily the best method because it increases the
BGP table size and some ISPs will not accept longer prefixes.
8. If no export policy is specified, which routes are sent to external BGP peers?
BGP routes are advertised; IGP routes are not advertised.
9. What does the RTM send to the Routing Table and Forwarding Information Base (FIB) on the CPM?
The RTM sends its best used routes (from BGP and IGP) to the routing table and FIB.
12. Where is the FIB distributed on the Service Router platform, and which command allows you to examine
the FIB?
The FIB is distributed by the CPM to each of the installed IOMs. The contents of the distributed FIBs can be
examined with the show router fib <slot> command.
14. Which variant of the show router bgp <prefix> command is used to display the contents of the
various BGP databases and the results of the IGP lookup to the next-hop?
The show router bgp route <prefix> hunt command enables the examination of BGP databases
and the results of an IGP next-hop resolution.
18. What are some of the main configuration tasks associated with bringing local networks into BGP?
Prefix-lists associated with the AS’s address plan are configured, then a policy that accepts directly
connected or static routes is created and applied to BGP, using the export command.
19. Why is the longer keyword used with prefix-lists suitable for the AS’s CIDR space?
Because the longer keyword includes all subnets possible for any given CIDR prefix. Therefore, if
the AS/ISP assigns smaller address spaces to specific customers out of that CIDR prefix, the CIDR
prefix-lists will match any of these assignments.
True
21. Next-hop-self command is not required when eBGP sessions are configured using link local addresses. True or
False?
True
West Region – mostly associated with edge router R4 and core router R1; serves a larger Enterprise presence.
Central Region – smaller region with edge router R5 and core router R3.
East Region – larger region servicing DSL and GPON customers, in addition to Enterprises. Associated with edge
router R6 and core router R2.
AS 65540’s Core – High speed links between routers R1, R2 and R3. Notice that these routers have no directly-
connected customer networks.
Knowing this, you can form policies that best serve each region.
Much of the routing policy formed will depend on this basic information. The BGP application will allow you to classify
each prefix or customer or region, and communicate this information across the entire AS. No other routing protocol
allows for this sort of arbitrary and flexible policy specification. Most ISPs use a best practice that tries to make traffic
flow into and out of a region without traversing other regions; for example, the ISP will create policy on router R1 that
will attempt to make the upstream peers send traffic to the West region, and perhaps the Central region, and may
additionally try to influence the upstream AS to send traffic associated with the East region to another entry point, like
router R2.
Keep the address spaces separate. Assign prefixes and networks so that they are easily known as internal networks
to the rest of the AS. That is, do not mix customer assigned networks with the internal networks used to perform
internal routing by the IGP.
Customer or service assigned address space go into BGP and become BGP NLRI for the AS.
In most cases, the networks associated with the external links of eBGP peers are not exported into BGP and,
therefore, do not become BGP NLRI. This is particularly true if you use next-hop-self.
Use the next-hop-self command to force BGP to use the resources provided by the IGP. Essentially, you are
taking advantage of any and all IGP design features by having internal BGP peers use the IGP to route between
system addresses exclusively.
A sound address plan, with defined address space for internal and external networks, and a good aggregation plan,
help to make configuration, troubleshooting, and administration easier.
Sound address planning, combined with features such as prefix-lists and unicast Reverse Path Forwarding (uRPF),
prevents much miscreant activity on the Internet. uRPF is a data path feature that will not allow a packet to be received
if the source IP of the packet is not consistent with the routing table. ISPs typically implement both prefix-lists and
uRPF.
Bogus networks are sometimes referred to as Bogons, which actually refers to Martians and un-allocated, un-assigned
address space. Most ISPs will maintain lists of standard prefixes that would never be allowed into or out of the AS,
including Martians: RFC 1918 and RFC 5735 special-use addresses. Other lists that have been identified in the Internet
community as associated with miscreant activity typically use address space that has not been assigned yet. There is a
multi-hop eBGP peering service available that allows an ISP to maintain an up-to-date list of bogus BGP prefixes that
should always be prevented from entering or leaving an AS. See http://www.team-cymru.org/Services/Bogons/ for more
information on bogons.
Determine the core links and adjust metrics so that the IGP prefers to route packets crossing the AS through the core
(and not edge-to-edge).
Once routing is optimized, you can make BGP automatically set the egress MED values to each iBGP prefix’s IGP cost
(MED-OUT command). As a result, prefixes announced and originated from one region will have lower MED values than
the same prefixes announced by other regions in the same AS. For example, if router R4 originates a prefix and
advertises it to both routers R1 and R2, and the MED-OUT feature is enabled, router R1’s announcement to its external
peer in AS 1239 will contain a lower MED value than the same prefix announced by router R2. All other factors being
equal, AS 1239 will choose router R1 as an entry point to AS 65540 for router R4’s prefixes, if MEDs are used as the tie
breaker. This is an example of geo-routing policy.
As much as possible, the IGP infrastructure should be stable. If, for instance, an IGP network flaps, a neighbor or
neighbors are lost, or a router fails, the IGP must detect the failure and generate an update to its peers for route
recalculation to take place on all affected devices. If instability is restricted to the IGP, the impact is contained within the
AS.
Prior to implementing any policy, however, it is important to understand the policy definition and what it means. First
consider the language of the policy and, whether there are any points that require clarification; do this before
attempting to design or implement the new policy.
Then, ensure that you understand the current conditions and logic before attempting to change a particular behavior.
For example, before you use a policy to modify BGP route selection so that it chooses a different route as best, it is
important to recognize why the current route is perceived as best.
Also, identify whether it is the control plane or data plane that is an issue. Recall that data plane traffic is generally
modified by control plane manipulation, but that they are inverse to each other. Therefore, modifying outbound routing
updates affects inbound traffic flow, and modifying inbound routing updates affects outbound traffic flow.
Planning also involves carefully considering the impact of the new policy on existing traffic flows. Often, the new policy
must be integrated into the existing configuration, so scalability of policy design becomes critical. Ensure that you
understand the existing traffic flow and that you test the new configuration thoroughly before committing any changes.
Core activities:
From a BGP standpoint, minimal activity required.
Core needs to move traffic according to IGP and Traffic Engineering design.
Border activities:
Safely advertise and receive appropriate networks from/to peers.
Analyze path attributes set by the service edges to implement export policies.
Implement policies that adhere to business and traffic flow goals for the AS.
Most of the export policy is based on known entities and objects that the AS itself controls; therefore, the AS is able to
exert more precise control on how upstream providers and other BGP peers treat both its own and the customer
networks.
Some customers will have very specific requirements regarding how CIDR and customer-owned networks are advertised
to the rest of the Internet. Service level agreements may also point out service up-time requirements, maximum
latencies, and other performance-related service levels, including response time by operations staff.
NOTE: Some enterprises are so sophisticated that they can measure network performance based on other, seemingly
unrelated, business activities, such as credit card transaction rates and time sensitive financial trading applications. ISPs
should expect customer support calls when these types of activities are impeded or slowed down.
Sending longer prefixes is one way to guarantee that certain traffic will always take a specific entry point into the AS.
However, most larger Tier 1s and Tier 2s will also impose minimum prefix length policies. The Internet community is
usually quick to notice an ISP that tries to advertise very small prefixes (smaller than /24, for example). Recall that many
legacy /24 prefixes were assigned to Enterprises and organizations before the introduction of CIDR or SWIP, used by the
RIRs. Basically, this means that there are still quite a few /24 length prefixes in the Internet table, but these should
mostly be legacy prefixes. Much of the newer prefixes that were assigned by ISPs (using SWIP processes described in RFC
2050) are part of larger aggregates and, therefore, are harder (in theory) to break up; these would draw more attention
from the Internet community if they were broken up.
IPv4 and MPLS traffic data flows through a router. Cflowd enables traffic sampling and analysis by ISPs and allows
network engineers to support capacity planning, trends analysis, and characterization of workloads in a network service
provider environment.
Notice that if more than one item is specified, a logical AND is used.
Also, if no default action is specified (in any given policy statement), the next policy statement specified in the export or
import command will also be processed.
If there are no further policy statements, and no default action is specified, the default BGP behavior ― to either accept
all BGP routes into the RTM for consideration, according to the route-selection criteria or announce all used BGP learned
routes to other BGP peers ― is used.
1. If a given route does not fully match the criteria of a policy entry, the defined actions for that entry are not applied,
and the policy evaluation proceeds to the next entry in the current policy statement, or to the next defined policy
statement.
2. When a given route fully matches the match criteria of a policy entry:
If reject is specified, the given route is not modified, policy evaluation is ended, and the routing protocol is
signaled to not accept the route on import or to block the route from being announced on export.
If accept is specified, the given route is modified, based on the action items in the action context, policy
evaluation is ended, and the routing protocol is signaled to accept the route on import, or to announce the
route on export.
If next-entry is specified, the given route is modified, based on the action items in the action context, and
policy evaluation continues with the next entry in the same policy. If the current entry is the last in the policy,
evaluation continues with the first entry in the next policy statement, or, if no remaining policy statements are
defined, route policy evaluation ends.
The use of next-entry and next-policy statements ensures sequential processing, regardless of matches found.
Notice that, you will obtain similar results if the action of entry 20 is next-entry.
If a given route reaches the end of the defined policy statements without an explicit accept or reject action specified,
the default route policy action for the calling protocol is used. However, if previous matches to defined route policies
resulted in modifications to the route attributes, these changes are kept and passed to the calling protocol.
If one or more matches occur in the policy entries, the default action is not used, whether or not it is defined.
Each route-policy statement can have a default-action clause defined. If a default-action is defined for one or more of
the configured route policies, it is handled in the following ways:
If the action is “accept” or “reject”, policy evaluation ends and the appropriate result is returned.
If no match occurs in the policy entries, and a default action is defined, the default-action clause is used.
NOTE: Take care when specifying a default-action of reject, as all policy processing stops at this point. This means
that, if a particular route did not match any of the preceding policy entries, no further policies specified will be
processed, and the route itself will be rejected.
If no match occurs in the policy entries, and a default action is not defined, the default action of the protocol
occurs.
If, in a given policy-statement, a specific prefix does not match any entry that makes up the policy statement, one of
the following actions will result:
1. If there is a default action of “accept” or “reject” in the policy, that action will be taken, the policy processing
will stop, and no further policies will be processed.
2. If there is a default action of “next-policy” or “next-entry” in the policy, the next entry or policy specified will
be processed.
4. If there are no remaining policies, no match has occurred, and no default action has resulted, the default BGP
behaviors ― to send or receive routes from neighbors ― are used.
Note: SR OS Release 10.0.R4 increased the number of policies that may be applied to BGP (group or neighbor) as well as
VRF import or export statements from five (5) to fifteen (15).
Keep address spaces separate. Assign prefixes and networks so that they are easily recognizable as internal networks to
the rest of the AS. That is, do not mix customer assigned networks with the internal networks used to perform internal
routing by the IGP.
Customer or service assigned address space goes into BGP and becomes BGP NLRI for the AS.
External links with other customers or BGP peers are not exported into BGP and, in most cases, do not become BGP
NLRI. In any event, it is unnecessary for external link subnets to become NLRI if you stamp all prefixes learned with
next-hop-self.
The next-hop-self command compels BGP to use the resources provided by the IGP. This will, essentially, take
advantage of any and all IGP design features by having internal BGP peers use the IGP to route between system
addresses exclusively.
A sound address plan, with defined address space for internal and external networks; a good aggregation plan; and
subnets allocated for functions such as the router ID help to make configuration, troubleshooting, and administration
easier.
The prefix-list filter is most useful when the prefixes are known and are not likely to change.
In the above diagram, an export policy is configured on router R5 to advertise prefix 10.17.100.0/24.
NOTE: We use the 10.0.0.0/8 prefix during the labs in this course, so we do not filter all RFC 1918 space in our practice
labs. However, that would be the standard practice in global Internet policies.
In the example above, AS 65550 would accept anything shorter than or equal to length /24. Anything longer, such as
the /29s on router R5, would not match and, therefore, would not appear in AS 65550’s network.
Many ISPs implement policies associated with prefix lengths. Most of the time, an ISP will insist on prefixes of length /24
or shorter but, depending on the transit or peering agreements, it can specify a much shorter /18-/22 range. This is
because higher tier upstream providers insist that downstream networks aggregate as much as possible, especially for
more recently assigned (within the last 15 years) CIDR prefixes.
These policies exist also because some prefixes are non-portable, that is, any given customer of an upstream provider
using that provider’s own address space cannot use another foreign provider to announce the same prefix. If a
customer were to announce a prefix that was a smaller chunk of a much larger prefix to a foreign provider, the original
upstream provider would see its own address space fragmented from many different ISPs. This is clearly an
unacceptable situation for the ISP that owns the address block. Also, the longest prefix match rule always applies, so the
foreign provider advertising the longer prefix will always be the preferred access point for the entire Internet. Such a
scenario raises issues of ownership and acceptable use.
Notice that the default action of this policy is “reject.” This means that, unless a route is accepted by the preceding
parts of the policy, no further policies will apply and the route will be rejected in all cases.
When multiple policy names are specified, the policies are evaluated in the order they were specified. A maximum of five
policy names can be configured. Then, the first policy that matches is applied.
By default, all routes are members of the Internet community; this requires no configuration. All other communities are
explicitly configured.
A prefix may have one or more community attributes appended to it by a BGP router.
An individual community value is a 32-bit number, commonly expressed as two 16-bit numbers separated by a colon,
for example, 65200:12345. Generally, the high-order 16-bit number is the local AS number, or, with the peer’s consent,
the neighbor’s AS number may be used for policy relating to that peer’s AS. The low-order 16-bit numbers are usually
locally defined and administered to maintain one unique value per policy definition.
The specific values have no fixed meaning, except for well-known community values. Implementing the policy defines
the action related to a particular community value.
The community attribute is more appropriately called a community-list attribute because there may be none, one,
or many community values associated with a particular route.
Description: This command creates a route-policy community list to use in route-policy entries. Up to 15
community IDs can be specified.
The no form of the command deletes the community list or the provided community ID.
Parameters: name — Community-list name. It can be any string up to 32 characters, comprising printable, 7-bit
ASCII characters, and excluding double quotation marks. If the string contains spaces, double
quotation marks delimit the start and end of the string.
type {target | origin} :as-num:comm.-value — The keyword target or origin denotes the community
as an extended community of type route target or route origin. The as-num and comm.-value variables allow
the same values as described above for regular community values.
NOTE: Well-known communities can also be set manually. That is, you can create a community called “NO-PEER” and set
its value to “65535:65284”.
Note that without any other policy applied, router R1 would pass on these same communities to AS 65550. AS 65550 is
under no obligation to do anything with them, however, and will simply ignore them in most cases. The details of setting
community values for certain prefixes are shown on the following slides.
Up to 15 policies may be applied at the same time, for both import and export. Two architectural approaches to
applying additional policies are possible: the new policy requirement may be integrated into the existing policy
statement as a new entry or an additional policy statement may be applied.
A:R5>config>router>policy-options# info
----------------------------------------------
prefix-list " Client-CIDR"
prefix 10.17.0.0/16 longer
prefix 10.18.0.0/16 longer
prefix 10.19.0.0/16 longer
prefix 10.20.0.0/14 longer
prefix 10.24.0.0/13 longer
exit
Parameters: standard — Specifies standard communities that existed before VPRNs or RFC 2547bis
extended — Specifies BGP communities that were extended after the concepts of RFC 2547 were
introduced, to include handling of the VRF target.
BGP now supports the ability to enable or disable sending regular or extended BGP communities to an associated peer
at the global, group, or neighbor level, for the base router and VPRN BGP instances. This feature overrides communities
that are already associated with a given route or that may have been added using an export route policy. In other words,
even if the export policy leaves BGP communities attached to a given route, if this feature is enabled, no BGP
communities are advertised to the associated BGP peers.
A policy (advertise aggregate) is configured on router R1 to send the aggregate route to AS65540. If the policy was not
applied, the aggregate route 10.16.0.0/12 will not be advertised; it will still appears as black hole route in the routing
table.
-------------------------------------------------------------------------------
Router R7 needs to be aware of the fact that the actual path to destinations, as specified in the NLRI of the route, while
having the loop-free property, may not be the path specified in the AS_PATH attribute of the route.
As a string, the character sequence of the AS-Path may now be matched with logical functions.
AS-Paths appear in a variety of formats. A prefix that has not propagated outside of the originating AS has a null
AS-Path (an AS-Path of zero length). After it has propagated outside the receiving AS, the AS-Path contains at least
one AS number, and possibly many numbers in sequence, as the prefix propagates across through multiple ASes.
Inside a confederation, the sequence of AS-Paths may also contain entries in parentheses, for the AS members of
the confederation.
A range term, composed of two elementary terms separated by the “-” character, such as “65200-65300”
A regular expression enclosed in square brackets, used to specify a set of choices of elementary or range terms.
For example, “[65100-65300 65400]” matches any AS number between 65100 and 65300, or AS number 65400.
A regular expression enclosed in parentheses “( )” provides a logical grouping of terms and should not be
interpreted as a confederation path. “(65000|65100)” matches AS number 65000 or 65100.
The “.” dot wildcard character is a match for any elementary term. “(65000|65100).” matches AS number 65000 or
65100, followed by any other AS number.
The left column lists the AS-Path that is to be matched, and the right column lists a regular expression that may be used
for matching.
The nature of regular expression terms and operators is such that more than one regular expression is often possible to
match a particular AS-Path sequence.
For example, an AS-Path of 65100 65250 or 65100 65300 can be matched by “(65100 65250) |(65100 65300)”.
The left column lists the AS-Path that is to be matched and the right column lists a regular expression that may be used
for matching.
The nature of regular expression terms and operators is such that more than one regular expression is often possible to
match a particular AS-Path sequence.
An AS-Path-based filter is most useful when the policy is specific to an AS, as opposed to specific prefixes.
as-path-prepend
Context config>router>policy-options>policy-statement>default-action
config>router>policy-options>policy-statement>entry>action
Description The command prepends a BGP AS number once or numerous times to the AS-Path attribute of
routes matching the route policy statement entry.
If an AS number is not configured, the AS-Path is not changed.
If the optional number is specified, then the AS number is prepended as many times as indicated by
the number.
The no form of the command disables the AS-Path prepend action from the route policy entry.
There are two entries for the prefix originated in AS 65540: one learned directly from the neighbor router R1, and the
other via iBGP from router R4. Note that the next-hops are different.
The current best path is via the eBGP link, from router R3 to router R1.
Description: This command creates a route-policy AS-Path regular-expression statement to use in route-policy
entries.
The no form of the command deletes the AS-Path regular-expression statement.
Parameters: name — The AS-Path regular-expression name. It can be any string up to 32 characters, comprising
printable, 7-bit ASCII characters, and excluding double quotation marks. If the string contains
spaces, double quotation marks delimit the start and end of the string.
reg-exp — The AS-Path regular expression. It can be any string up to 256 characters, comprising
printable, 7-bit ASCII characters, and excluding double quotation marks. If the string contains
spaces, double quotation marks delimit the start and end of the string.
Verify the presence of an AS-Path list with the show router policy as-path command. All AS-Path lists that are
configured on the router are summarized and displayed.
To view the logic details for a specified list, use the list name as a command-line parameter. The specified list name and
its contents are displayed.
The optional nature of MED means that it does not have to be a supported attribute. Non-transitive means that the
attribute does not propagate outside of the receiving AS.
If it is received over external links, the MED attribute may be propagated over internal links to other BGP speakers in the
same AS. If it is received over internal links, the MED attribute is never propagated to other BGP speakers in a
neighboring AS.
As a result of this behavior, MED must be configured on the edge routers of the domain and propagated externally.
Unless the MED attribute is explicitly set by some mechanism, it is not propagated to neighbors.
By default, the Multi-Exit Discriminator (MED) path attribute is used in the decision process only if both routes in the
comparison come from the same neighbor AS. However, these rules can be modified using the “best-path-selection
always-compare-med command”.
In this case, router R2 is the closest IGP next-hop to the BGP prefix.
Notice that the MED value received by AS65540 is not propagated to any other AS(s); router R6 resets the MED to null
before sending it to AS 65545.
The MED, also called the metric, is an attribute with limited influence. It tells the receiving AS the exit point (on the
receiving AS) that the sending AS prefers. The MED discriminates among multiple exit points from a neighbor AS, but in
the best interest of the sending AS.
In either case, route selection is changed in the neighbor AS, so that traffic flows to the local AS over the specified path.
Some ISPs choose not to support MED because traffic flow in their domain can be manipulated by a direct neighbor that
propagates a MED value to them. If the neighbor AS does not support MED, however, the effort is unnecessary.
In many cases, MED support is agreed upon in the peering policy between ASes.
Even if MED is supported and recognized, a local policy change can easily override the MED.
Description: This command enables the advertisement of the MED and assigns the MED value that is advertised to
BGP peers if the MED is not already set. The specified value can be overridden by another that is set
using a route policy. This can be set at three levels: global level (applies to all peers), group level (applies to all
peers in a peer group), or neighbor level (only applies to specified peer). The most specific value is used.
The no form of the command at the global level reverts to the default, in which the MED is not
advertised.
The no form of the command at the group level reverts to the value defined at the global level.
The no form of the command at the neighbor level reverts to the value defined at the group level.
Default: no med-out
igp-cost — The MED is set to the IGP cost of the given IP prefix.
As the name of the attribute implies, the preference is local to the AS. It should never be sent in an eBGP update to a
foreign AS. However, sending the preference to an eBGP peer in another confederation member AS is acceptable.
The default Local-Preference value is 100. This value or the configured default value, is applied to all routes that do not
have a Local-Preference set when propagated over an iBGP or confederation eBGP session.
In most cases, Local-Preference should be applied on the edge router that receives the route in the local AS, and
should be left unchanged after it is propagated internally to the AS.
In the example above, the route received over the eBGP connection direct from AS 65550 is preferred over the iBGP
route received from router R2.
It is the best route based on BGP route-selection criteria: an eBGP learned route is preferred over iBGP.
The borders do not modify the default Local-Preference, therefore the edge routers receive two routes to
10.65.102.0/24. All other attributes equal, the edges of AS 65540 will use the route with the lowest IGP metric to the
next-hop address advertised by the border routers.
Description: This command configures an AS-Path regular-expression statement as a match criterion for the
route-policy entry. If no AS-Path criterion is specified, any AS-Path is considered to match. AS-
path regular-expression statements are configured at the global route-policy level
(config>router>policy-options>as-path name).
The no form of the command removes the AS-Path regular-expression statement as a match
criterion.
Parameters: name — The AS-Path regular-expression name. It can be any string up to 32 characters, comprising
printable, 7-bit ASCII characters, and excluding double quotation marks. If the string contains
spaces, double quotation marks delimit the start and end of the string.
The route from router R2 is the best route across AS 65540 for the prefix 10.65.102.0/24, as long as no other border
router sends a route with a higher Local-Preference for the same prefix.
Notice that router R1 sets the Local-Preference to 80, which is less than the current default of 100 in AS 65540. With
this policy applied, router R2 will attract all traffic for 10.65.102.0/24 inside of AS 65540. Router R1 does not send an
iBGP update for 10.65.102.0/24 anymore as it can no longer generate the best route to that prefix.
Router R5 sends traffic for 10.65.102.0/24 correctly via router R2. This traffic also happens to transit router R1
because router R1 is the closest IGP next-hop to router R2.
AS 65550 asks AS 65540 to use its own backbone to reach prefixes on router R8.
Description: This command sets the BGP local-preference attribute for incoming routes (if not specified), and
configures the default value for the attribute. The value is used if a BGP route arrives from a BGP
peer without the local-preference integer set. The specified value can be overridden by any value
that is set using a route policy. The parameter can be set at three levels: global level (applies to all
peers), group level (applies to all peers in a peer group), or neighbor level (only applies to the
specified peer). The most-specific value is used.
The no form of the command at the global level specifies that incoming routes with a set local
preference are not overridden, and incoming routes without a set Local-Preference are interpreted
as having a Local-Preference value of 100.
The no form of the command at the group level reverts to the value defined at the global level.
The no form of the command at the neighbor level reverts to the value defined at the group level.
Default: no local-preference — Do not override the local-preference value set in arriving routes, and
interpret routes without a set Local-Preference as having a value of 100.
Parameters: local-preference — The local-preference value to be used as the override value, expressed as a
decimal integer.
2. What are the minimum activities associated with setting routing policy in an AS?
Activities associated with setting routing policy in an AS include: laying out geographical or business regions in the
network; creating address plans; optimizing the IGP for stability; creating export and import BGP policies and;
optimizing the IGP for routing and redundancy.
4. What are some common address spaces that an administrator will want to recognize in the AS? Which are typically
suited to become BGP NLRI?
Typical address spaces include: internal links; customer address space, using the AS’s own CIDR assignments;
external customer address space; and various links external to the AS itself. It is typical for the assigned customer (or
service, etc.) CIDR spaces and external customer address space to be brought into BGP as NLRI.
5. What is a Bogon? Why would an AS want to create a prefix-list that defines Bogon space?
Bogon prefixes are the combination of Martians (RFC 1918 or RFC 5735) and prefixes that are not allocated or
assigned on the Internet yet. These prefixes should, in theory, never appear in a global Internet routing table. Invalid
and Bogon prefix-lists allow the AS to sanitize networks that are advertised into or out of the AS.
7. Which is better: exporting the IGP internal networks into BGP, or exporting BGP networks into IGP? What is an
exception?
Neither is considered a best practice, and neither would produce a very stable network. The one exception is when
you bring directly-connected or static routes (associated with customer or services) into BGP for BGP to advertise
those networks as NLRI.
8. What three major activities are associated with deploying BGP policy?
IP service edge (bringing NLRI into the AS); core activities; border activities.
9. What are the three main activities at the border of the network?
The three main activities at the border are advertising/receiving appropriate NLRI/PATH, analyzing PATH
attributes, and implementing policy that adheres to the business and traffic flow goals of the AS.
10.What are some basic criteria to know before deploying either import or export BGP policies?
The basis for import and export policies are customer requirements, flow traffic reports, network and AS origins,
communities of interest, and attributes (tools) that the AS can use to influence policy.
11.List six policy options that are typical for BGP export policies.
The six main options for BGP export policies are: rejecting bogus/invalid NLRI, sending more specifics, sending
aggregates, using local-pref policies, using AS-path prepending, and setting an inbound metric using the MED
attribute.
13. What can be done in the core of the network to control service edge to border traffic flow? What effect does
this have on overall traffic flow?
The AS can adjust metrics lower in the core. This forces traffic coming into, and flowing to the edges of the
network to use the core, and not to other edges of the network.
15. If a routing update is manipulated when it is received from a neighbor, in which direction will the change in traffic
flow be noticed?
If the routing update is manipulated in the inbound direction, outbound traffic flow will change. Similarly, if a
routing update is manipulated in the outbound direction, inbound traffic flow will change.
23. If a match occurs, and the specified action is “reject,” will any route modifications occur?
No, modifications do not occur if the route is rejected.
24. If a match occurs and the specified action is “accept,” which route modifications can occur?
Any supported action may occur on a successful match, including the modification of BGP attributes in the update.
25. What will happen if there are no matches in a policy and no default action is defined?
If a match does not occur and a default action is not defined, the action associated with the protocol to which the
route policy applies is performed.
28. If multiple policies are applied to a protocol, in which order are they evaluated?
Multiple policies are evaluated in the order in which they are configured.
29. Which commands on the local router verify the results of an export-policy filter?
The results of an export-policy filter on the local router can be verified with the show router bgp neighbor
<neighbor ip> advertised-routes command, or, on the receiving router, you can use the show router
bgp neighbor <neighbor ip> received-routes command, or any number of other show router BGP
routes variants such as hunt, community, or aspath-regexp. All are useful commands to verify that policies are
applied properly in either direction.
34. Can a prefix list and an AS-Path list be applied in the same policy?
Yes, by combining both a prefix list and an AS-Path list into the same policy, the policy becomes even more explicit
about which routes should be matched.
35. What command can be used to view a prefix that has been denied by a policy?
A prefix that has been denied by a policy can be viewed with the show router bgp routes 192.168.3.0/29
detail command.
46.Which command is used to transfer the IGP metric to the BGP MED?
med-out command.
49.What command is used to display the BGP table entries that contain a particular community value?
The show router bgp routes community <community value> command can be used.
Using the formula n*(n-1)/2 to calculate full mesh, where n is the number of routers; for six routers, 15 logical or
physical connections are required. If the number of routers increases by four, the number of sessions required
increases by 30.
They are useful to subdivide ASes that have a large number of BGP speakers into smaller domains, to control
route policy using information contained in the BGP, or to alleviate full-mesh requirements.
Merged ISPs can be viewed as a single entity externally, but can also maintain some separation internally.
Internally, up to 15 member ASes can comprise a confederation. Each member requires an AS number, typically
selected from the private range, and each member AS is treated as if it were a stand-alone AS. Each member AS
must either maintain a full mesh of iBGP sessions or use route reflection.
BGP sessions between member ASes in the same confederation are referred to as intra-confederation eBGP
sessions; sessions inside the same member AS are referred to as iBGP sessions. Sessions between the
confederation and an external AS remain eBGP sessions.
eBGP sessions are maintained between member ASes and external ASes.
If the update is sent to a neighbor in the same member AS, no modification is performed.
If the update is sent to a neighbor in a different member AS within the confederation, the member AS
number is used.
If the update is sent to a neighbor outside the confederation, the confederation AS number is used.
When the update propagates in member AS 65202, the AS_PATH remains unmodified because it does not cross
an AS boundary.
When the update passes from a router in member AS 65202 to member AS 65204 or 65206, the AS_PATH is
modified to include the member AS that it has passed through. This is part of the same confederation and is not
a foreign AS, so a distinction is noted in AS_PATH with the use of parentheses around the confederation AS
sequence. As a result, the path received in member AS 65204 or 65206 is “(65202) 65100”.
Each member AS performs the same manipulation, so when the update received in member AS 65204
propagates to member AS 65206, the path is “(65204 65202) 65100”.
Loop detection is performed in the same way as confederation AS paths. A router that receives an update checks
for the presence of its own AS number and discards the route if it is present in the list.
A router that propagates an update to a foreign AS must never allow the confederation path to be visible. When
the update propagates to AS 65250, the confederation member portion of the path is replaced with the
confederation AS number. In this example, in the path “(65204 65202) 65100,” “(65204 65202)” is replaced with
“65200.” The route is then propagated externally with a path of “65200 65100.”
TTL values in the BGP control packets are treated as if they were eBGP.
According to RFC 3065, all routers must support confederations. There can be a maximum of 15 member ASes
per confederation.
Description: This command creates a confederation AS within an AS. It reduces the number of iBGP sessions
required in an AS. Route reflection is another technique that is commonly used to reduce the
number of iBGP sessions.
The no form of the command deletes the specified member AS from the confederation.
When no members are specified in the no statement, the entire list is removed and
confederation is disabled. When the last member of the list is removed, confederation is disabled.
Values: 1 to 65 535
Members: member-as-num — AS numbers of members that are part of the confederation, expressed as
a decimal integer. Up to 15 members per confed-as-num can be configured.
Values: 1 to 65 535
Note that in confederations, there is no requirement that member autonomous systems use the same IGP. It is
not necessary for each member AS to reveal its internal topology to other member autonomous systems. When
different IGPs are used, however, BGP next-hop reachability must be guaranteed within each member AS.
Route reflectors (RRs) are used to reduce the number of iBGP sessions required in an AS. Normally, every BGP
speaker in an AS must have a BGP peering with every other BGP speaker in the AS. An RR relaxes these
requirements by disabling iBGP split horizon for its clients.
Confederations can also be used to remove the full iBGP mesh requirement in an AS. Route reflection may be
configured as stand-alone, or inside a confederation.
With route reflection, the full iBGP mesh is required only between RRs and between RRs and non-clients.
If the best route is received from a non-client peer, the RR reflects the route to all its defined client peers and
propagates the route to all eBGP peers.
On the Nokia 7750 SR, when a best route is received from an eBGP peer (or an RR client) it is advertised back to
that same peer as well as to other peers. In prior releases, the route was not reflected back to the sending peer.
To suppress this behavior in SR OS release 9.0, a BGP export policy can be configured for each neighbor to which
we do not wish to re-advertise routes. Starting in SR OS release 10.0r4, the "split-horizon" CLI command
allows the user to turn off this behavior globally, under a group, or under a neighbor.
This behavior was introduced in 7x50 SR OS release 9.0 as a result of some optimizations. It does not violate any
RFCs, and BGP loop detection using AS-Path will ensure that there are no loops. This feature improves
performance on the SR, as the processing used to reject the looped routes is less than that required to keep
track of which routes should not be re-advertised.
split-horizon
Syntax: [no] split-horizon
Context: config>router>bgp>group group-name>neighbor ip-int-name
Description: This command enables the use of split-horizon. Split-horizon prevents routes from being
reflected back to a peer that sends the best route. It applies to routes of all address families and to any
type of sending peer; confed eBGP, eBGP and iBGP.
The configuration default is no split-horizon, meaning that no effort is taken to prevent a
best route from being reflected back to the sending peer.
NOTE: Use of the split-horizon command may have a detrimental impact on peer and
route scaling and therefore operators are encouraged to use it only when absolutely needed.
Default: no split-horizon
If the best route is received from a non-client peer, the RR reflects the route to all its defined client peers and
propagates the route to any external peers. The route is not propagated to other non-clients because they are
part of the full iBGP mesh and will have received it from the original non-client peer.
If the best route is received from a client peer, the RR reflects the route to all its defined client peers, including
the originator, and propagates the route to all other peers, whether they are non-clients or eBGP peers. Non-
client peers may also include other RRs.
A best and used route received from an eBGP peer is propagated to all iBGP peers and all eBGP peers, including
the peer that sent it. The peer will reject this looped route.
In the absence of iBGP split horizon, loop detection must be performed in another way. The AS-PATH attribute
would not be useful because it is not modified in an iBGP update.
Therefore, there are two additional optional non-transitive attributes introduced in an RR environment for loop
detection and prevention.
If a router receives an update that contains its own Router-ID in the Originator-ID field, it discards the update.
If a route is received by an RR and the local Cluster-ID is already contained in the CLUSTER_LIST, the update is
discarded.
Router 1.1.1.1 originates a route into BGP. It sets the Cluster-ID to "no cluster members" and the Originator-
ID to "None", then sends a route update to its route reflector.
When the route reflector propagates/reflects this route to its iBGP peers, it adds its Cluster-ID 10.10.10.10 to
the CLUSTER_LIST, and sets router 1.1.1.1’s Router-ID as the Originator-ID.
When the route reflector propagates this route to its eBGP peer in AS 65100, it does not add the Cluster-ID nor
does it set the Originator-ID.
When the RR’s non-client sends this route to its eBGP peer in AS 65250, it resets the CLUSTER_LIST and
Originator-ID.
Clients in a cluster should have iBGP sessions with all RRs within their cluster. eBGP sessions are also
acceptable.
There are now two RRs in the cluster, and each client peer has an iBGP session to both. The RRs themselves are
fully meshed.
When router B originates a BGP route, it sends the route update to both RR 1 and RR 2. RR 1 then reflects the
route to its two other clients, and propagates the route to RR 2. RR 2 also reflects this route to its two other
clients, and propagates the route to RR 1. RR 1 flags the route it receives from RR 2 as invalid, as it sees its own
Cluster-ID in the CLUSTER_LIST of the route. RR 2 does the same.
This design eliminates a single point of failure for the RRs or for a single client session, although not in all cases.
For example, assume that router A receives the same route to 172.16.5.0/24 from both RR 1 and RR 2, and picks
one as best. If the IGP costs to router B via RR 1 and RR 2 are equal, router A starts to load share across both RRs
to get to the 172.16.5.0/24 network. There is a risk in this design because either RR is prevented from telling the
other about the prefixes in the cluster. Therefore, if one of the iBGP sessions from RR 1 or RR 2 to router B were
to go down, the RR itself could end up with no route to a previously-known destination and would start to drop
packets.
Network : 192.168.3.0/29
Nexthop : 10.16.10.5
From : 10.16.10.1
…
Cluster : 10.16.10.1
Originator Id : 10.16.10.5 Peer Router Id : 10.16.10.1
Fwd Class : None Priority : None
Flags : Invalid IGP Cluster-Loop
Route Source : Internal
AS-Path : 65545
There is only one RR in each cluster, but each client peer has an iBGP session to both RRs. The RRs themselves are
fully meshed.
In the example above, when the leftmost client of RR 1 and RR 2 originates a BGP route, it sends the route update
to both RR 1 and RR 2. RR 1 then reflects the route to its clients, and propagates the route to RR 2. RR 2 also
reflects this route to its clients, and propagates the route to RR 1. RR 1 flags the route it received from RR 2 as
valid, as the Cluster-ID in the received route’s CLUSTER_LIST is different from its own. RR 2 does the same.
This eliminates a single point of failure for the RRs and improves redundancy for a single client session failure.
Both RRs learn routes from their clients and from the other RR, because each is using a different Cluster-ID. In
some situation, this is advantageous for both logical iBGP session redundancy and for physical redundancy. There
is, however, slight overhead associated with carrying the extra path information.
Using multiple cluster-IDs is considered more redundant than using a single cluster-ID. Assume that the session
between router B and RR 1 goes down, and that between router C and RR 2 goes down. If a single cluster-ID is
used, then router C will not receive an update for a route learned by router B. Router B sends the update to
router RR 2, RR 2 sends it to RR 1 and its clients (not to C because of the session failure). When router RR 1
received the update, it will flag it as invalid due to same cluster ID, therefore RR 1 will not send the update to
router C. This can be avoided if RR 1 and RR 2 use different cluster IDs.
Network : 192.168.3.0/29
Nexthop : 10.16.10.5
From : 10.16.10.1
…
Cluster : 10.16.10.1
Originator Id : 10.16.10.5 Peer Router Id : 10.16.10.1
Fwd Class : None Priority : None
Flags : valid IGP
Route Source : Internal
AS-Path : 65545
To further reduce the number of sessions, RR hierarchies can be used. Hierarchical route reflection architecture is
characterized by having more than one level of RRs, with lower-level RRs serving as the clients of the RRs that are
one level above. There is no limit on the number of levels, but having 2 to 3 levels has proven to make more
practical sense. The diagram above shows a two-level RR architecture, Level 1 RRs are also clients of Level 2 RRs.
Because they are clients themselves, Level 1 RRs do not need to be fully meshed with each other. This reduces
the number of iBGP sessions within the domain. The top-level RRs, Level 2 RRs in the diagram, must be fully
meshed, because they are not clients of any RRs. There are 15 iBGP sessions in the diagram above, compared to
55 in full mesh.
Rules for prefix advertisement for the hierarchical RRs are the same as for single-level RRs. In the above example,
when router R4 receives a BGP route update from its eBGP peer in AS 65100, as a client of router R2, it
propagates the route to router R2, and as a Level 1 RR, it advertises the route to its clients routers R8, and R9.
Router R2 propagates the route to routers R1 (non client of R2 and R3) and R3 (another Level 2 RR), and reflects
it to its clients routers R4 and R5. Note that routers R1 and R3 do not advertise the route to each other, because
they are regular iBGP peers with router R2. As RR, router R3 propagates the route to its clients routers R6 and R7.
In turn, routers R6 and R7 advertise the prefix to their clients routers R10 and R11. Router R7 then propagates
this route to AS 65250.
In most cases, the size of the top-level mesh is the main factor when considering the use of hierarchical route
reflection. If the number of full-mesh sessions is considered administratively unmanageable, you should consider
RR hierarchy.
The confederation member AS is treated like a stand-alone AS, which requires a full mesh. Route reflection can
simplify the meshing requirement inside a confederation member AS by reducing the number of iBGP peers.
In the example above, AS 65202 and AS 65204 have implemented route reflection. AS 65206 still uses the full
mesh.
After route reflection is configured, the full mesh is no longer required. However, the selective removal of
neighbors must be carefully managed. If too few neighbors are removed and unnecessary sessions between client
peers remain, routing loops may still occur. If too many neighbors are removed, the transit AS design may break
down.
Context: config>router>bgp
config>router>bgp>group
config>router>bgp>group>neighbor
The no form of the command deletes the Cluster-ID and effectively disables route reflection
for
the given group.
Because the route has not yet passed through the RR, the Cluster- and Originator-ID attributes are not set.
The RR sets the Cluster- and Originator-ID attributes, but they are only viewable in RIB-OUT.
Client router R6 receives a route for prefix 192.168.1.8/29 from its RR. A client in a different cluster originates
the route. The cluster list attribute is updated by each RR that propagates the route. Because the route has
passed through two RRs, the Cluster- and Originator(R5)- ID attributes are set accordingly.
It is more common for confederations to run separate IGPs inside of each member AS, and thereby have more
control over both BGP and IGP scales. When different IGPs are used, however, BGP next-hop reachability must be
guaranteed within each member AS.
ISPs typically try to regionalize traffic so that metrics are lower inside of either cluster in the case of route
reflection, or inside member ASes in the case of member ASes. This is to avoid routing information loops. The
overall objective is to increase IGP metrics between clusters or member ASes. See RFC 3345 for specific examples
of where these techniques are useful.
Both confederations and route reflection increase the chances of routing loops, so it is important that IGP
metrics are handled correctly to keep traffic flows optimal and packet loss at zero.
The scaling techniques also reduce overall iBGP session counts; route reflection is considered easier from a
migration standpoint while confederations require more operational and migration effort.
8. What optional non-transitive attributes can be found in an RR environment, and what are they used for?
Originator-ID and CLUSTER_LIST. Both attributes are used for loop detection in an RR environment.
In certain topologies, best-external can improve convergence times, reduce route oscillation, and allow better load
sharing. This is achieved because routers internal to the AS have knowledge of more exit paths from the AS.
When two exits are available to reach a particular destination and one is preferred over the other, the availability of an
alternate path provides fast connectivity restoration when the primary path fails. Restoration can be quick since the
alternate path is already at hand. The border router could pre compute the backup route and preinstall it in FIB ready to
be switched when the primary goes away.
In certain topologies involving either route reflectors or confederations, the partial visibility of the available exit points
into a neighboring AS may result in an inconsistent best path selection decision as the routers don't have all the relevant
information. If the inconsistencies span more than one peering router, they may result in a persistent route oscillation.
Advertising the best external route will reduce the possibility of route oscillation by introducing additional information
into the iBGP system.
Enabling the best-external feature is supported only at the config>router>bgp level. This feature can be
enabled/disabled on a per address family basis, with IPv4 and IPv6 as the only options supported initially. Enabling best-
external for IPv4 causes the new advertisement rules to apply to both regular IPv4 unicast routes as well as labeled-IPv4
(SAFI4) routes. Similarly, enabling best-external for IPv6 causes the new advertisement rules to apply to both regular
IPv6 unicast routes as well as labeled-IPv6 (SAFI4) routes.
R3>config>router>policy-options# info
----------------------------------------------
prefix-list "LAN3"
prefix 192.168.1.0/27 exact
exit
policy-statement "exportlan"
entry 10
from
protocol direct
prefix-list "LAN3"
exit
action accept
exit
exit
exit
Network : 192.168.1.0/27
Nexthop : 10.10.10.5
Path Id : None
From : 10.10.10.1
Res. Nexthop : 10.10.10.5
Local Pref. : 200 Interface Name : system
<output omitted>
Cluster : 10.10.10.1
Originator Id : 10.10.10.5 Peer Router Id : 10.10.10.1
Flags : Invalid IGP
Route Source : Internal
AS-Path : 65541
ASBR3 receives the route from R1 and advertises it to its eBGP peer router R4. Router R4 in AS65542 receives the
route from ASBR3 as shown below.
Nothing changes on ASBR2, the best route is still the route received from ASBR1
In order for a BGP speaker to advertise multiple paths for the same address prefix, a new identifier known as "Path
Identifier" is used so that a particular path for an address prefix can be identified by the combination of the address
prefix and the Path Identifier.
The assignment of the Path Identifier for a path by a BGP speaker is purely a local matter. However, the Path Identifier
must be assigned in such a way that the BGP speaker is able to use the (prefix, path
identifier) to uniquely identify a path advertised to a neighbor. A BGP speaker that re-advertises a route must generate
its own Path Identifier to be associated with the re-advertised route.
In order to carry the Path Identifier in an UPDATE message, the existing NLRI encodings are extended by pre-pending
the Path Identifier field, which is of four-octets.
The benefits of using BGP Add-Paths include faster convergence (reduction in restoration time after failure), and load
sharing (The availability of multiple paths to reach the same destination
enables load balancing of traffic)
The RIB-IN may have multiple paths for a prefix D. The path selection mode refers to the algorithm used to decide which
of these paths to advertise to an Add-Paths peer. In the current implementation, SR supports only one path selection
algorithm –essentially the Add-N algorithm described in draft-ietf-idr-add-paths-guidelines-00.txt, Best Practices for
Advertisement of Multiple Paths in BGP. The Add-N algorithm implemented in SROS selects, as candidates for
advertisement, the N best overall paths for each prefix, regardless of path type (internal vs. external), degree of
difference between the paths or use in forwarding. If this set of N best overall paths includes multiple paths with the
same BGP NEXT_HOP only the best route with a particular NEXT_HOP is advertised and the others are suppressed.
In the SROS implementation N is configurable, per address-family, at the BGP instance, group and neighbor levels; N has
a minimum value of 1 and a maximum value of 16
If the combination of NLRI and path identifier in an advertisement from a peer is unique (does not match an existing
route in the RIB-IN from that peer) then the route is added to the RIB-IN. If the combination of NLRI and path identifier
in a received advertisement is the same as an existing route in the RIB-IN from the peer then the new route replaces the
existing one. If the combination of NLRI and path identifier in a received withdrawal matches an existing route in the RIB-
IN from the peer then that route is removed from the RIB-IN.
In order to receive multiple paths from a peer on a particular address family, BGP advertisement capability must
indicate that the remote (Remote Add-Paths capability) peer is willing to send multiple paths and that we are willing to
receive more than one path (local Add-Paths Capability).
Context: config>router
Description: This command enables ECMP and configures the number of routes for path sharing. For example,
the value 2 means that two equal-cost routes will be used for cost-sharing. ECMP can only be used for routes
learned with the same preference and protocol. When more ECMP routes are available at the best preference than
are configured in max-ecmp-routes, the lowest next-hop IP address algorithm is used to select the
number of routes configured in max- ecmp-routes.
The no form of the command disables ECMP. If ECMP is disabled and multiple routes are available at
the best preference and equal cost, the route with the lowest next-hop IP address is used.
Default: no ecmp
Parameters: max-ecmp-routes — The maximum number of equal-cost routes allowed in this routing table
instance, expressed as a decimal integer. Setting max-ecmp-routes to 1 is the same as no
ECMP.
Values: 0 to 16
When the number of equal cost routes to use for multipath does not equal the value of the ecmp routes, the lowest
value is used for the number of routes to be installed in the route table. For example, if there are three available paths,
and ecmp = 2 and multipath = 3, then 2 paths will be installed in the route table.
Network : 192.168.1.0/27
Nexthop : 10.10.10.6
Path Id : 2
From : 10.10.10.1
Res. Nexthop : 10.1.2.1
Local Pref. : 100 Interface Name : toR1
<output omitted>
Flags : Used Valid Backup IGP
TieBreakReason : LocalPref
Route Source : Internal
AS-Path : 65541
3. BGP Add-Paths can be configured at the BGP level only, true or false?
False
4. In addition to BGP Add-Paths, what configuration is required to allow BGP to install multiple best paths in the routing
table?
Multipath and ECMP should also be configured to allow BGP to install multiple best paths in the routing table.