A special Halloween edition of the Renesys Blog: That which was whole is now torn asunder, and cries of grief ring out throughout the land. Cogent (AS174) and Sprint (AS1239) are no longer connected to each other. Customers of each network who do not have other providers—namely single-homed customers—cannot reach each other. Two large portions of the Internet are separated.
Cogent is frequently involved in peering disputes. In the last three years, the only significant peering dispute (one that caused a temporary partition of the Internet) that did not involve Cogent was between Level 3 and XO. That one was settled very quickly. All of the others (Cogent depeering Telia, Level 3 depeers Cogent, and further disputes going back years involving Teleglobe (now Tata, AS6453), France Telecom (AS5511)) involved Cogent.
But in this case, Cogent may have picked the wrong sparring partner. In the past, Cogent won peering disputes simply because their customer base was less sensitive to the outage than the other party in the dispute. Ultimately, the one whose customers complain the loudest loses. This time it may be very different. Sprint hasn’t paid any particular attention to its IP product and network at a senior management level for a very long time. They are clearly focused on wireline and wireless telecom services and Overland Park management seem to remain mostly unaware that they even operate an IP network. In other words, Cogent has picked a fight with a zombie here. They may even rip off a limb or two, but that doesn’t mean the zombie will notice.
Sprint and Cogent only starting peering recently, back in November of 2006. Prior to that the two networks reached each other via NTT Communications (AS2914). Now, almost exactly two years later, it appears that Sprint has disconnected Cogent and chosen to divide the Internet. Cogent has stated that they will litigate this issue so this one is unlikely to get resolved quickly. In the mean time, over 200 downstream autonomous system customers of each organization cannot reach the networks in the other. This is ugly and will remain so.
Let’s take a quick look at what we know so far and set the stage for a story that will likely continue for several days, if not weeks. I’ll also try to set this in a larger context about the evolution of each of these networks and the evolution of Internet interconnection on the whole.
Timeline: Cogent lost access to Sprint’s prefixes between 20:00:11 UTC (4pm EDT yesterday, 30 October 2008) and 20:00:22. Sprint lost access to Cogent’s prefixes between 20:00:22 and 20:00:27. The timing on an even hour suggests that the event was human-initiated. After the adjacency was lost, the only workable paths between 174 and 1239 were leaks: unintentional readvertisements of paths. For example:
- 1239 6327 19752 19752 27168 27168 577 174
That is not a good path. There are just so many things wrong with it. The large length. The number of hops between AS1239 and AS174. The fact that AS577 (Bell Canada) appears in the role of transit provider to Cogent. The fact that Shaw Communications (AS6327) shows up as providing transit for Sprint. Aside from clear, Canadian mistakes like this, there’s really no reachability between the two organizations.
One common reaction to this schism is to say: “So what?” This only affects people who are either single-homed to Cogent or single-homed to Sprint and given all of the high-profile depeering that has gone on over the past three to five years, how many people can that seriously be?
Throwing caution to the wind, 289 autonomous systems are completely single-homed behind Cogent (that is, they have no connectivity to the the Internet through anyone else). 214 autonomous systems are completely single-homed behind Sprint. This number actually significantly understates the impact of this outage on the Internet, though. Due to Cogent’s aggressive pricing, there are a large number of service providers who are multi-homed but who default all of their outbound traffic through Cogent. This is true for Renesys’s deployment in Boston, and it’s also true for a number of other ISPs. In these cases, although those ASes and prefixes show up as unaffected, traffic originating from those users bound for Sprint-connected users will simply not work.
Another way to look at the scope of this event is to identify the number, size and ownership of the network prefixes affected by the outage. The most common way of measuring the size of a network is to look at all of the prefixes in their “downstream cone”—that is, the set of networks that are transitively downstream of a given ASN. Sprint has approximately 100000 prefixes in their downstream cone, of which at least 1989 are single-homed (are not advertised in such as a way that they are reachable via any other provider). Cogent has over 30000 prefixes in their transit cone, of which at least 1544 are single-homed. So, in total, at least 3500 networks on the Internet have less than full connectivity right now. But due to reasons that I cited above, the impact is probably significantly worse than that. We’re also looking at the analysis of “single-homed” in this case to see if we can identify prefixes that are missing transit even when it appears that they are not. Expect a follow-up post on this issue.
One might suspect that these single-homed autonomous systems are simply incautious or insignificant networks. After all, given the history of Internet partitions, who would be rash enough to have important network services located on a single-homed prefix in this day and age?
The following prefixes are some of the more interesting networks single-homed behind Sprint:
- 126.96.36.199/21 Expedia, Inc.
- 188.8.131.52/16 Federal Trade Commission
- 184.108.40.206/24 Federal Aviation Administration
- 220.127.116.11/24 National Aeronautics and Space Administration
- 18.104.22.168/24 Occidental Petroleum Corporation
- 22.214.171.124/16 Pfizer Inc.
- 126.96.36.199/16 Rutgers University
- 188.8.131.52/16 Sprint PCS (lots of networks here, of course)
- 184.108.40.206/23 SUNGARD HIGHER EDUCATION INC.
And that is just a few.
The following prefixes are some of the more interesting networks single-homed behind Cogent:
- 220.127.116.11/24 Joost Production Benelux Network
- 18.104.22.168/24 Loopt, Inc.
- 22.214.171.124/23 National Aeronautics and Space Administration
- 126.96.36.199/21 NTT America, Inc. (and many more like it, from the T1 and hosting customers acquired from NTT/Verio)
- 188.8.131.52/24 Skynet Access (this might actually be good news, if the loss of connectivity to Skynet prevents or delays sentience).
- 184.108.40.206/16 St. Lawrence College
- 220.127.116.11/16 University of Toronto (and a bunch of other colleges and universities)
Notice NASA single-homed on both sides of this division? I have no idea what that is about. The point here is that this is a big deal. There are lots of significant organizations that appear to have lost connectivity due to this dispute.
This dispute is unlikely to be resolved quickly. We’ll revisit it over the course of the weekend and into next week to see how it develops. In particular, it will be interesting to watch the public positioning from both parties, including whether Sprint issues any kind of a statement or indicates any attention to the matter at all. If Sprint really doesn’t care, then Cogent will lose.