This post was originally published on Network World on December 20, 2013.
There has been a surge of questions of late regarding IPv6 and whether it can be used to better identify individuals on the Internet. Everyone from marketeers to law enforcement officials seem to hold the same misconception that IPv6 is going to make it possible to expose people in a way that IPv4 does not.
It is true that IPv6 will change addressing on the Internet. Many of us hope it restores the ability to identify an actual network endpoint — a feature that we lost a number of years ago in IPv4. But some appear to be imagining a future where each machine has its very own address, and that these addresses will be easily traced whenever a person visits a website, plays a game online, or even opens an email.
In fact, IPv6 actually has features that are designed to foil these sorts of plans. Also, because of the enormous IPv6 address space, it’s rather unlikely that a single machine will have a single IPv6 address.
To make sense of the discussion, we need some history.
The World With IPv4
As the world started to run out of IPv4 addresses (which is some time ago now), two things happened. First, we changed the way that addresses were given out, so that fewer addresses would be allocated at a time. Second, NAT (Network Address Translation) was invented.
A NAT is a mechanism where one network address is mapped to another address. For example, in your home network you might have a cable modem. It probably has one “public” IPv4 address: an address that is routable on the Internet. You probably have some sort of gateway or router (like a wireless access point). That gateway gives out addresses to your tablet, your phone, your Xbox, and so on. Each of these devices gets an address, usually one from a special “private” range specified in RFC 1918.
When one of your devices wants to connect to a service on the Internet, the gateway takes the connection to the device, remembers the private address for it, and connects to the Internet service using the public address. The gateway translates between the private address and the public one, keeping track so that the different devices can all use the same public address. So each device in your network has its own address, but as far as the rest of the Internet is concerned they’re all at the same address. You don’t have to use NAT this way, but it’s a common way to use it.
As IPv4 addresses get more scarce, NATs are getting larger. We have a NAT in our office. Some ISPs are now running what are called “carrier grade” or “large scale” NAT so there can be hundreds or thousands of machines behind a single address. And unlike the household case above, those “hidden” nodes often have no relation to one another. So yes, in most networks today, it’s difficult to identify someone by their address.
NATs are a problem on the Internet because they’re in the way. Suppose you want to make a voice call over the Internet to your mother. The way you think of this might be that your computer connects to your mother’s computer. What actually happens is that you pass your data through your NAT, and your mother passes her data through her NAT, and the only machines that are actually talking to each other across the Internet are the two NATs. If they get anything wrong, packets get lost and the voice quality degrades.
Now, there is no scarcity of IPv6 addresses: there are more than enough IPv6 addresses for every atom on the face of the earth. So there’s no need to have NAT. Certainly every device that wants one can have an IPv6 address. Doesn’t this mean that identification of users (by marketers or governments or whatever) will be easier? Aren’t we giving up privacy even as we gain the benefits of getting rid of NAT? No.
To begin with, the way that IPv6 addresses are usually issued means that most devices won’t have just one address. Instead, they’re likely to get various ranges, which means that each time you see a different IPv6 address you don’t know whether it is a distinct device. This is sort of the reverse of the IPv4 problem. Under IPv4 and NAT, one address corresponds to multiple machines. Under IPv6, one machine may correspond to multiple addresses.
Moreover, there are standard techniques (like those specified in RFC 4941 and RFC 3972) designed to enable a node to change its address. The goal is to conceal that the same node is involved in different transactions, by using different addresses for different transactions. Such techniques are not available under IPv4. So while it is true that nobody can tell which of the boxes is behind your NAT address, they can certainly associate all the traffic with a single NAT.
Currently, IPv6 also provides a lot less geolocation data than IPv4 does. This is really just a temporary state of affairs, however, there is so much more IPv4 penetration that it is easy for geolocation database builders to identify the geographical location associated with an IPv4 address. And there are only four billion IPv4 addresses, so it is feasible to store information about every one of them. The low use rates of IPv6 so far, and the enormous size of the address space, means that the geolocation information about IPv6 addresses is not currently commercially viable.
In any case, the best way to track someone’s behavior is not by address anyway, because people change networks too often. Smartphones and tablets move back and forth between mobile networks and Wi-Fi networks throughout the day. Even many laptops move through different Wi-Fi networks frequently. But someone who wants to track a user doesn’t want that tracking to fail every time the user leaves home and changes networks. This is why social networks are so beloved of marketers: they actually reveal the additional information to marketers about where users are and the networks through which they travel.
Ultimately, building a profile for a potential or current customer using an IP address — whether v4 or v6 — is both tricky and unsatisfying. That doesn’t mean that IPv6 will usher in a new era of anonymity, but the worry that IPv6’s lack of NAT means it reveals much more about users is mistaken.