Internet Performance Delivered right to your inbox

UK Level (3) Leak

According to the intrepid reporters over at The Register, Level (3) suffered a significant outage at their London Braham Street Facility due to a leak.

Water was discharged from sprinklers inside the building and it only affected operations for a short time. Water. Not routes. This probably explains why no evidence of this leak showed up in Renesys Routing Intelligence. 🙂

For those who aren’t getting the joke, in routing terminology, a leak is any instance of someone readvertising routes in a way they didn’t intend to. For example, normally an Autonomous System (AS) that buys transit from two providers will not readvertise routes from one to the other. If they’re the big guys, they should already have some good paths between them that are not through some little edge AS. But sometimes, people make mistakes in their routing policies that cause them to learn a path from one provider and announce that path to another provider. And occasionally, someone actually selects this path (although it’s longer and less attractive than the regular path for that traffic to take), and some traffic actually gets routed that way. We see evidence of this happening quite frequently.

Small-scale routing leaks are not a significant issue. They happen. They mostly impact the leaker more than anyone else. And they either cause significant disruption and get fixed quickly, or they don’t and hence don’t represent a serious problem. But there are lots of other kinds of route leaks. The most spectacular involve people who learn huge numbers of routes from external sources (providers and peers) and then munge them up and readvertise them as if they were their own routes. The most famous example of this is the case of AS7007 back in 1997. More recently, there was the case of TTNet AS9121 back on Christmas Eve, 2004.

The real problem with large-scale leaks is how much they propagate. It has long been known that the vast majority of Internet failures are caused by human error, not hardware, infrastructure or software failures. The trouble is that at the core of the Internet there is relatively little protection against human error. Basically, the scale of operations is such that large networks do not (and possibly can not) adequately protect themselves against the errors of other large networks. Trying to find a way to do that is the subject of some work that I’ve been involved with and some other interesting work. But it’s certainly not going to get solved any time soon. It probably won’t even get any better any time soon for a bunch of reasons, most of them involving money and inertia, rather than anything else.

Level (3) got lucky in London: leaky water is way easier to deal with.


Share Now

Whois: Dyn Guest Blogs

Oracle Dyn is a pioneer in managed DNS and a leader in cloud-based infrastructure that connects users with digital content and experiences across a global internet.