VolumeDrive is a Pennsylvania-based hosting company that uses Cogent and (since late May of this year) Atrato for Internet transit. A routing leak this morning by VolumeDrive was passed on to the global Internet by Atrato causing disruptions to traffic in places as far-flung from the USA as Pakistan and Bulgaria.
The way Internet transit is supposed to work in BGP is that a provider announces the global routing table to its customers (i.e., a large number of routes). Then, in turn, the customers announce local routes to their respective providers (generally a small number of routes). Each customer selects the routes it prefers from the options it receives. When a transit customer accidentally announces the global routing table back to one of its providers, things get messy. This is what happened earlier today and it had far-reaching consequences.
At 06:49 UTC this morning (18-September), VolumeDrive (AS46664) began announcing to Atrato (AS5580) nearly all the BGP routes it learned from Cogent (AS174). The resulting AS paths were of the following format:
… 5580 46664 174 …
Normally, VolumeDrive announces 39 prefixes (networks) to Atrato: 27 it originates itself and 12 it transits for two of its downstream customers, Visperad Networks (AS15351) and DataWagon (AS27176). However, during this leak, Atrato propagated over 400,000 routes learned from VolumeDrive or nearly the entire global routing table. A full table is currently hovering at around 500,000 routes — a figure on the minds of many due to the 512k limit on many older routers. (Note that this particular routing leak resulted in no new routes and therefore didn’t increase the size of the global routing table.)
The following graphic shows how the Internet has reached VolumeDrive over the past couple of days, using either Atrato or Cogent. VolumeDrive experienced a brief outage earlier in the week. Then just before midnight UTC, Atrato dropped out entirely as a VolumeDrive provider.
To recap, a major routing leak occurred, one that was entirely preventable with some common-sense limits. So what? How much impact could this small Pennsylvania-hosting company have on the global Internet? Well, quite a lot in fact — such is the nature of our trust-based Internet routing. Pretty much anyone can mess it up.
According to Dyn’s IP Transit Intelligence tool (shown below), Atrato transits around 5,000 prefixes and has over 600 peering connections (not simply BGP adjacencies, but “peering” as opposed to “transit”) — at least they did at the start of the day. That much peering can act as a very loud amplifier for leaked routes.
The following graphics depict the number of our traceroute measurements completing from both Islamabad, Pakistan and Los Angeles to China Unicom, China’s second largest ISP after China Telecom. The dips in completion rate begin at 06:49 UTC when, instead of going through Singtel (in the case of
Islamabad) or Telia (in the China Unicom), traffic was diverted to Atrato. Many of these traces never reached their intended destinations.
These graphics are generated from thousands of measurements; however, examining individual traceroutes reveals the exact details of the path changes during this incident. The traceroute shown below was performed yesterday and takes a path from Islamabad to Karachi where it then boards a submarine cable en-route to Singapore before finally reaching Zhengzhou, China. Geographically, it’s a reasonable route even if the recorded latencies are quite high.
trace from Islamabad, Pakistan to China Unicom Henan Province at 09:08 Sep 17, 2014
2 126.96.36.199 (PTCL, Islamabad, PK) 0.66ms
3 188.8.131.52 s10-0-3-0.rwp44d1.pie.net.pk 0.523ms
4 184.108.40.206 (ITI, Rawalpindi, PK) 3.907ms
5 220.127.116.11 (ITI, Karachi, PK) 30.205ms
6 18.104.22.168 (PTCL, Karachi, PK) 26.221ms
7 22.214.171.124 (SingTel IX, Singapore) 226.13ms
8 126.96.36.199 (Singtel, Singapore) 521.436ms
9 188.8.131.52 (China Unicom, China) 522.932ms
10 184.108.40.206 (China Unicom, China) 472.701ms
11 220.127.116.11 (Backbone of China Unicom) 519.328ms
12 18.104.22.168 (China Unicom Henan province) 500.175ms
13 22.214.171.124 (China Unicom, Zhengzhou) 484.009ms
14 126.96.36.199 (China Unicom, Zhengzhou) 494.929ms
Next is a traceroute from the same server to the same IP in Zhengzhou during the routing leak. Instead of passing through Singapore en-route to China, the path goes first to Atrato in Amsterdam (Atrato peers with PTCL at AMSIX) and then onto Telia who takes it to San Jose, California before finally arriving in China. By exporting the leaked routes from VolumeDrive, Atrato, in effect, inserted itself into the path between Pakistan and China! Atrato could have easily overwhelmed its own capacity at this time, as many of our measurement did not reach their intended destinations.
trace from Islamabad, Pakistan to China Unicom Henan Province at 06:59 Sep 18, 2014
2 188.8.131.52 (PTCL, Islamabad, PK) 0.484ms
3 184.108.40.206 s10-0-3-0.rwp44d1.pie.net.pk 0.567ms
4 220.127.116.11 (ITI, Rawalpindi, PK) 2.838ms
5 18.104.22.168 (ITI, Karachi, PK) 27.867ms
6 22.214.171.124 (PTCL, Karachi, PK) 29.401ms
7 126.96.36.199 khi77.pie.net.pk 164.01ms
8 188.8.131.52 eth15-2.r1.ams2.nl.atrato.net 165.093ms
9 184.108.40.206 eth1-1.core1.ams2.nl.as5580.net 170.78ms
10 220.127.116.11 eth1-7.core1.ams1.nl.as5580.net 172.437ms
11 18.104.22.168 (Atrato, Amsterdam, NL) 159.247ms
12 22.214.171.124 adm-b5-link.telia.net 252.254ms
13 126.96.36.199 adm-bb3-link.telia.net 237.843ms
14 188.8.131.52 ldn-bb1-link.telia.net 243.867ms
15 184.108.40.206 nyk-bb1-link.telia.net 246.865ms
16 220.127.116.11 sjo-bb1-link.telia.net 316.894ms
17 18.104.22.168 chinaunicom-ic-141282-sjo-bb1.c.telia.net 356.87ms
18 22.214.171.124 (China Unicom, China) 355.425ms
19 126.96.36.199 (China Unicom, China) 350.21ms
20 188.8.131.52 (Backbone of China Unicom) 356.632ms
21 184.108.40.206 (Backbone of China Unicom) 644.495ms
22 220.127.116.11 (China Unicom Henan province) 590.476ms
23 18.104.22.168 (China Unicom, Zhengzhou) 537.887ms
24 22.214.171.124 (China Unicom, Zhengzhou) 543.934ms
In the newly released Dyn Internet Intelligence tool, such impairments in the flow of traffic show up as gaps in completed latency measurements, illustrated below:
Not all routing leaks are origination leaks like in the Indosat leak earlier this year or the China Telecom leak of 2010. In that scenario, a provider announces the global routing table claiming that it is the “origin” (and therefore the destination) for every single routed network in the Internet. Routing leaks can also occur when routes are simply passed in the wrong direction between providers.
While basic route hygiene (e.g. using MAXPREF to limit the number of routes accepted from a customer or peer) could have prevented this incident, it underscores a larger point: we’re all in this together. The Internet is our electronic commons and its proper functioning depends on everyone in control of an Internet router.
Routing goof-ups like this don’t need to involve your IP address space to impact you. If it impacts the routes of someone you are trying to communicate with or one of the ISPs along the way, then it’s your problem too. By understanding how the Internet works and gathering real-time Internet Intelligence on your assets and those of your customers or suppliers, you can work with your providers to mitigate the damaged caused by the inevitable mistakes, even when the source is on the other side of the world.