For a little more than 90 minutes yesterday, internet service for millions of users in the U.S. and around the world slowed to a crawl. Was this widespread service degradation caused by the latest botnet threat? Not this time. The cause was yet another BGP routing leak — a router misconfiguration directing internet traffic from its intended path to somewhere else.
On Nov. 6, our network experienced a disruption affecting some IP customers due to a configuration error. All are restored.
— Level 3 Network Ops (@Level3NOC) November 6, 2017
While not a day goes by without a routing leak or misconfiguration of some sort on the internet, it is an entirely different matter when the error is committed by the largest telecommunications network in the world.
In this blog post, I’ll describe what happened in this routing leak and some of the impacts. Unfortunately, there is no silver bullet to completely remove the possibility of these occurring in the future. As long as we have humans configuring routers, mistakes will take place.
At 17:47:05 UTC yesterday (6 November 2017), Level 3 (AS3356) began globally announcing thousands of BGP routes that had been learned from customers and peers and that were intended to stay internal to Level 3. By doing so, internet traffic to large eyeball networks like Comcast and Bell Canada, as well as major content providers like Netflix, was mistakenly sent through Level 3’s misconfigured routers. Traffic engineering is a delicate process, so sending a large amount of traffic down an unexpected path is a recipe for service degradation. Unfortunately, many of these leaked routes stayed in circulation until 19:24 UTC leading to over 90 minutes of problems on the internet.
Bell Canada (AS577)
— Andrew J Dow (@andrewjdow) November 6, 2017
Bell Canada (AS577) typically sends Level 3 a little more than 2,400 prefixes for circulation into Level 3’s customer cone. During the routing leak yesterday, that number jumped up to 6,459 prefixes – most of which were more-specifics of existing routes and, equally as important, announced to Level 3’s Tier 1 peers like NTT (AS2914) and XO (AS2828, now a part of Verizon).
Below is a visualization of the latency impact of the routing leak.
Comcast, the largest internet service provider in the United States, was also directly impacted by yesterday’s routing leak.
— Modiv (@ModivMusic) November 6, 2017
Comcast uses numerous ASNs to operate their network and Level 3 leaked prefixes from quite a few of them, diverting and slowing internet traffic bound for Comcast. According to our data, Level 3 leaked over 3000 prefixes from 18 of Comcast’s ASNs listed below.
- AS33491 (356 leaked prefixes)
- AS7725 (252 leaked prefixes)
- AS7015 (248 leaked prefixes)
- AS33287 (241 leaked prefixes)
- AS33651 (235 leaked prefixes)
- AS22909 (198 leaked prefixes)
- AS33657 (178 leaked prefixes)
- AS33668 (176 leaked prefixes)
- AS20214 (176 leaked prefixes)
- AS7016 (161 leaked prefixes)
- AS33650 (152 leaked prefixes)
- AS33667 (145 leaked prefixes)
- AS33652 (142 leaked prefixes)
- AS33490 (117 leaked prefixes)
- AS13367 (117 leaked prefixes)
- AS33660 (101 leaked prefixes)
- AS33659 (97 leaked prefixes)
- AS33662 (89 leaked prefixes)
Our traceroute measurements into Comcast reveal the impact of the leak from a performance standpoint. The two visualizations below show a bulge of internet traffic headed for the leaked IP address space diverted through Level 3, and the increase in observed latency.
Level 3 leaked 81 prefixes from RCN who appeared to pull the plug on their Level 3 connection at 18:34 UTC, once they figured out what was causing a slowdown in their network.
It is important to keep in mind that the internet is still a best-effort endeavor, held together by a community of technicians in constant coordination. In this particular case, initial clues as to the to origin of this incident were first reported in a technical forum (the outages list) when Job Snijders astutely observed new prefixes being routed between Comcast and Level 3 yesterday.
Peer leaks are a continuing risk to the internet without any silver bullet solution. We previously suggested to use protection when peering promiscuously, but even a well-run network like Google has been both the leaker and the leaked.
Networks share more-specific routes to a peer in order to ensure that return traffic comes directly back over the peering link. But there is always the risk that the peer could leak those routes and adversely affect your network. When the leaker is the biggest telecom in the world (and only getting bigger), the impact is likely to be significant.