(Version 1.0 - Reposted from Linkedin Post on 2014-08-25)
No! Of course, the Internet does not end at 500K routes. On August 13, 2014, there was a lot of news about instability issues on the Internet that might have been caused by a surge of new Internet routes (see articles like Internet routers hitting 512K limit, some become unreliable - http://arstechnica.com/security/2014/08/internet- routers-hitting-512k-limit-some-become-unreliable/). The most accurate write up can be found here:
What cause todays Internet hiccup by Andree Toonk (http://www.bgpmon.net/ what-caused-todays-internet-hiccup/)
Is this instability something to worry about? Yes! But please worry productively. What follows is a check list that is recommended for any organization that is connected to the Internet with their own Autonomous System Number (ASN).
First, please understand the real problem. One service provider de-aggregated thousands of routers and leaked them into the global routing table (see Andree's post). Some routers that did not have enough forwarding memory could not store all these additional routes and became unpredictable. This resulted in some networks being disconnected from the Internet. Why did this happen? Routers and switches have forwarding tables that are used to route packets from one Interface to another. In modern routers, these forwarding tables use high-speed memory that allow for extremely fast lookups. We need these fast lookups to handle the 100G interfaces and packet per second forwarding speeds. If these high speed memory "overloads," the router's programing tried to keep some of the forwarding as normal, but passes the new routes to slow path (details vary between vendors). As a consequence, operators need to understand how their router behaves during these overloads.
Understand the key points:
1. De-aggregating route leaks will happen. While they are not normal, they will happen. Any ASN (network) that is connected to the Internet should prepare for route leaks.
2. The Internet is not coming to an end. In fact, the growth of the Internet route table is not forecasted to be of major concern over the next ve years. Please download and watch Geoff Hustons NANOG 60 talk BGP in 2013 (https://www.nanog.org/ meetings/abstract?id=2270). Geoff walks through an easy to understand analysis of the global Internet route tables growth.
3. Do worry about malicious route leaks! There is little preventing someone to de- aggregate and inject routes into the Internet. Anyone connecting to the Internet must have this contingency as part of their routing policy.
This last point is the critical item. What can you do about it? Start with this "Check List" (or the conversation you need to have with your network engineer) .....
! Have you documented your routers conguration? You would be surprised how many organizations have never saved a copy of their routers conguration. Some will screen scape the conguration and save it. Others will use tools like Rancid to maintain an up to date copy. Still others will have tools that build the conguration ofine and push the full conguration to the router. The key is to have an off line copy of the conguration. It is obvious, but 1/2 the operators that engage in BGP consulting cannot provide a current off-line copy of their conguration (they need to login and get a copy).
! Write down the inbound and outbound routing policy in plain English so that anyone in the company can understand. Gateway routers that connect to the global Internet have two policies. The rst are the rules to accept routes from the Internet (inbound). These routes will govern the packets you send to the Internet. The second are the routes you send to the Internet (outbound). These govern how the Internet gets to your network. The most mistakes with the routing policy have a root cause with the way policy is expressed. Too many network engineers just write the BGP conguration without writing an over all policy. Writing the policy down before you congure a router is similar to ow charting before programing or writing the "test" in TDD (Test Driven Development) before coding world. Here is one example that uses the Routing Resilience Manifesto guidelines as a foundation for a multi-homed organization (two Internet connections):
Inbound Internet Route Policy (Example)
Only accept routes using the minimum practical allocation set by each Regional Internet Registry (RIR). We will lter all more specic routes. For example, the /24s inside the /19 will be ltered. Our two upstream providers will have the more specic routes. We just need the core aggregate route.
Drop all Documented Special Use Addresses (DSUA). We should never see 0.0.0.0 or 127.0.0.0 come to our network, but we need to lter to prevent malicious intent.
Set the Max Prex Limit to alarm at 25% lower than the max number of prexes that can be processed on our routers. If there is a prex-leak on the Internet, we need to have an alarm to let us know what is happening. The SNMP trap from the BGP feature should go to the NOC and trigger an immediate escalation.
Consequences & Risk of inaction: Too many prexes can overload the gateway router and cause network instability.
Outbound Internet Route Policy (Example)
Only advertise our prexes to each of our upstream providers. Tag our advertisements with a BGP community.
Set an outbound prex lter that explicitly permits only our prexes. All other prexes will be denied with a deny all and a log set on the deny. This will be used to spot issues with our outbound policy.
Set an outbound BGP community lter that only allows prexes with the designated BGP community to be passed to our upstream providers. This is a safe guard lter in case the prex lter is broken.
Set a Documented Special Use Addresses (DSUA) lter to ensure our network is not a problem without bound special use prexes. It would be really bad to advertise default to the Internet.
Our outbound prex list should only be the aggregate. More specics should never exceed /24 (IPv4).
Consequences & Risk of inaction: Leaking routes to the Internet will cause unwanted trafc to be pulled into our network. This will cause a self infected DDOS.
This example routing policy can be turned into slides and explained to management, used with a vendor to create specic congurations, or used for team consultation on changes to the route policy. The key is to have something that many people can read, address, and consult. IOS or JUNOS congurations are not the type of route policy that facilitates consultation.
! Do you really need the full Internet Routing Table? When asked, most multi- homed enterprise networks will not be able to coherently give an explicit reason why they need full Internet routes on their gateway router. Most can live with partial routes or routes ltered to not accept the more specic routes. Edge enterprise network can save money (no upgrade of forwarding table memory) and reduce the risk (less chance of being hit with a prex explosion attack).
! Get the empirical data from your router vendor - how many routes will the chips hold. The vendors need to supple empirical test on the number of routes their equipment can process. This needs to be engineering data. Expect the vendors to minimally comply with the guidelines set forward in the IETFs Benchmarking Methodology Working Group (bmwg) (see http://datatracker.ietf.org/doc/draft-ietf-bmwg- bgp-basic-convergence/). The number of routes that can be safely processed in the routers forward table will determine how the router is congured, where it is used, and when it would need to be upgrade/replaced.
Do not be distracted that this issue is a Cisco problem. The problems is when network engineers are not demanding details from their vendors to get an accurate dimensioning details and correct forecasting for when their routers need action (conguration, upgrade, or replacement).
! Know your Peers. Do you have the phone numbers and E-mails of your upstream providers is one of the rst questions I ask of any enterprise dual homed. The majority answer no. This contract information needs to be on your phone, in your NOC, and tested at least once a quarter (contacts change). If you are connected to an Internet Exchange Point (IXP), then you need the contact information for everyone you peer with plus the IXP operator. Having accurate contact information is also true in the reverse. All these peers need your contact information. The community of engineers who maintain global connectivity will look after each other. They will call each other. But they need the numbers to call. Dont wait for something to happen. Proactively get this information. The BGP instability issue on August 13, 2014 was primarily a non-issue for those networks who had the contact information in their address book.
! Sign up to the BGP Reports. The only way to really know what is going on with your BGP interconnectivity is to see your network from the inside and outside. Outside means using tools that monitor your network. These could range from commercial tools to academic projects. Start with these tools:
CIDR Report - http://www.cidr-report.org/as2.0/. Can view your how well you are aggregating. Hurricane Electrics BGP Toolkit - http://bgp.he.net/. Excellent tool to explore how the world sees your BGP advertisement. BGP Mon - http://www.bgpmon.net/. Real time monitor that is free for the rst free prexes. This is perfect for the average multi-homed enterprise.
There are other tools, but these basic ones get people started on the right path.
The key objective is to ensure the network operations team is looking at the data on the global Internet routing table, how the organization impacts that table, and if there are things that can be done to protect the organizations interest. Note that the Internets well being is in all organizations interest.
! Sign up to the appropriate Network Operations Group (NOG). The network engineers in your organization should be on the appropriate network operations groups. These groups are the rst places people will bring up instability issues and problems that are impacting everyone. They are regionalized with various levels of participation. Look through the master list maintained by the North American Network Operations Group (NANOG) - https://www.nanog.org/resources/orgs. Sign up and set up a mail lter. Check the mailing list ever day or several times a day. If there is an instability problem with your Internet connection, check the NOG list to see if anything is going on with the Internets stability.
What if you do not have a local NOG? E-mail to bgreene@senki.org for help. We just started IDNOG (http://www.idnog.or.id/). The team was persistent and found there was plenty of people and organizations who would help.
Summary. No, the Internet is not in trouble (see Geoff Hustons talk). What this incident should teach all network engineers is that they cannot take their routers that connect to the Internet for granted. If you are connected to the Internet through BGP, then due- diligence, monitoring, and good policy are needed to maintain a healthy connection to the Internet.
Barry Greene is a 30 veteran spending 20 of them focusing on expanding the Internets vision. He is a Telecommunication Business Development Executive, Internet Technologist, CyberSecurity Specialist, and mentor of new talent. Connect to Barry via Linkedin (www.linkedin.com/in/barryrgreene/), follow on Twitter (@BarryRGreene), catch his blogs on Packet Pushers (http://packetpushers.net/), and Senki (www.senki.org).
Computer Networking Beginners Guide: An Introduction on Wireless Technology and Systems Security to Pass CCNA Exam, With a Hint of Linux Programming and Command Line