The $19B WhatsApp acquisition brings Facebook what they desperately need: 250M daily users in high-growth international mobile markets, many of them in the key under-20 demographic. But how do you design a service delivery infrastructure that can reliably reach the far corners of the earth to keep all these mobile eyeballs connected to each other? Looking at the surprisingly lean visible Internet footprint of WhatsApp, it seems that Facebook’s global content distribution expertise may have arrived just in time.
If you want to tap the growth potential of the global consumer market, you’ll probably end up serving customers in many of the same markets where WhatsApp has eyeballs: countries like India, Brazil, Mexico, Indonesia, Turkey, and Russia. That means that you’ll be relying on some unfamiliar service providers in each of those countries to provide the international and last-mile connectivity, and that in turn means that you’ll be unfairly blamed for a lot of service variability: poor customer experience, inexplicable traffic routing, intermittent packet loss, spikes in latency, key provider handoffs that congest… a whole host of traffic ailments, having to do with the poor and unpredictable performance of your average long-haul Internet path between clients and the Cloud.
How did WhatsApp solve this challenge as they grew to global scale?
Perhaps the secret is: they hadn’t yet, and the Facebook deal came along just in time.
Learning from WhatsApp
WhatsApp’s backend infrastructure is “classic American cloud” — concentrated, scalable, locally reliable, but located on the wrong side of the planet for too many users, across an Internet connection that’s fragile and slow. WhatsApp serves most of their content out of a relatively small handful of Softlayer IP addresses in Northern Virginia (see maps at right). To move beyond the simple messaging model, WhatsApp was going to have to find a partner who knows how to deploy global infrastructure, to get the bits closer to the users.
By all accounts, the Whatsapp founders are great hands-on founders who ran a textbook lean startup campaign and know how to stretch a dollar. They leveraged open standards on the back end, like XMPP. They even used Erlang, an obscure programming language designed by Ericsson for concurrency and scaling.
WhatsApp.net: Softlayer Northern Virginia
Facebook.com: Worldwide content distribution via Akamai
The thing is, all that good engineering was targeted at server-side scalability. This is how you grow to 250M daily users without running out of compute resources in the datacenter. It doesn’t help you manage the challenges of long-distance around-the-world service delivery; in fact, it leaves you blind to the performance problems that can arise when the Internet gets flaky near your customers (as opposed to near your Cloud presence).
The WhatsApp founders had one important thing going for them: they chose to solve a customer problem that doesn’t absolutely demand the lowest latency (or the highest reliability) from the underlying network.
For most of its users, WhatsApp is simply a replacement for sending and receiving expensive international SMS text messages. International SMS delivery is expensive and, to be charitable, “latency challenged.” If WhatsApp takes an extra second or two to deliver a message around the world because of a poor Internet connection, it’s not a big deal. WhatsApp is speedy enough to carry on back-and-forth conversations between distant users, because they’re exchanging messages at human-readable speeds.
What Comes Next
What happens, though, when WhatsApp decides to offer more synchronous forms of interaction — like, say, voice calling, already on its way in the next generation application, or even video chat?
Suddenly, all those 300-400ms round trips that every packet has to take to Washington DC don’t seem so attractive. As the application constraints get stricter on latency, jitter, and packet loss, Internet performance engineering comes into the picture. The Big American Cloud strategy reaches its limits, and you have to begin to consider building local data center presence around the world. You’re going to want to stand up global cache for the imagery, local nodes to mediate local conversations, aggregation points in the big Internet exchange cities of Europe, Asia, and the Americas …
At this point, Facebook and Akamai enter the picture, and the timing couldn’t be better for WhatsApp. These companies are no strangers to the challenge of global content distribution and latency reduction. As you might imagine, Facebook poses a pretty significant global optimization challenge, since people like to look at “local content” (popular within their region, popular among their friends) but also “global content” (things everyone on earth likes to look at). And they don’t want to wait for it to load from the other side of the planet. Reducing latency from “all over” requires a lot of caches, placed intelligently around the world, and users need to be steered to the right places to get the content they need quickly.
The emergence of new datacenter capacity and commoditized server virtualization around the world means that whether you roll your own, or buy service delivery from someone like Akamai, you can make this work when the time is right. But it’s still up to you to map, manage, and monitor the performance of the Internet paths that carry content to your users, and connect them with each other.
WhatsApp’s initial strategic choice (hosting nearly everything behind a few IP addresses in the USA) made lots of sense for a latency-tolerant handset app in its “acquire users” growth mode. As WhatsApp diversifies its services and integrates with Facebook, they’ll naturally switch to a model in which content is hosted closer to the end users, and the latencies and stability of Internet paths they use will improve significantly. In their new home, they have access to a deep pool of talent and relationships to make that happen. Well played, WhatsApp.
18 Mar 2014 Corrected text to agree with data exhibits; most of WhatsApp’s local hosting in the US is provided by SoftLayer, primarily out of Northern Virginia.
For more reflections on the challenges of global performance and availability, see my article in InformationWeek, “Six Lessons CIOs Can Learn from WhatsApp.” Or stop by the Renesys booth this week at Gigaom Structure Data and say hi in person. –jim