A while back, I started a series of articles on my blog regarding the importance of a solid CDN in an ad-serving infrastructure. As I kept looking at the guts of it, I’ve realized that an important part of any online presence is the DNS service, and as such decided this needs more attention on its own.
Even more so, some of these issues I have encountered myself when setting up our infrastructure in Cognitive Match, and I realized that some of these aspects are not known (or maybe they are just wrongly ignored) by engineers occasionally. Hence, this is why I wrote this initial piece which the folks at Dyn were kind enough to host for me.
Regardless of your online presence, DNS is an essential part since without it, your audience will not be able to reach your servers or will take forever to find out “where” your servers are. While for your average web solution this might be acceptable, once you step into the online advertising space, this becomes unacceptable as you will find yourself losing audience – and ultimately clients.
In order to figure out how to eliminate these issues, we need to start by understanding how DNS works and fits in with your infrastructure. So let’s start with the basics and dive into the more complicated matters as we go along.
The above diagram shows a user which wants to access your web infrastructure – as you are probably expecting, upon typing “www.myserver.com” in their web browser, the OS goes and does a DNS lookup to find out that the domain name www.myserver.com maps to IP address 188.8.131.52 and in doing so, it hits a DNS server somewhere “out there” (be it the ISP one or one of the public DNS servers, etc) which performs the resolution and returns the IP address.
While simple, the above diagram already reveals a bit of the importance of a solid DNS setup in your infrastructure. Because of the sequence of actions (DNS lookup followed by HTTP communication), the total time it takes the user to retrieve the content from your website is the sum of…
DNS Resolution Time
The application response time consists of the time it takes the user TCP/IP packets to reach your servers, be assembled and processed by your app, and then response returned back to the user. However, we’re assuming here that your infrastructure and your application are both kick ass and as such, the timing for routing + processing the user request are negligible.
Your major concern, therefore, is the DNS resolution time. This can be broken down into:
- Time to reach the DNS service
- DNS lookup time
- Time for the response to reach the user back
Since BIND is the most commonly used DNS package on Unix/Linux, let’s assume that we’re using it as well in the above diagram. BIND is very fast when it comes to DNS lookups, so we can remove number 2 from the above steps as being negligible. (Note: even if you’re not using BIND as your DNS software package, chances are when using one of the industry standard servers – yes, that includes the Microsoft ones! – the above statement still applies and the actual DNS lookup time is a matter of nanoseconds, which in the context of the timings we’re going to encounter later on is indeed negligible.)
Now, let’s say to tackle numbers 1 and 3 and minimize the distance to your users, you decide to implement your DNS service in a location geographically close to your users so that the trips required in these two steps are negligible. Let’s assume for argument’s sake that your initial application is targeted only at users around New York; you set up a DNS service (maybe even in your own data centre) and away you go. Since you know the routing to your data centre is dead fast, the same would apply to users reaching your DNS service and as such, your user experience is going to be uber-fast! Correct!
However, if you work in advertising space, it’s unlikely you will always get stuck in the very same geographical region all the time – this is a global space and opportunities appear allover the globe. As such, having set up yourself with a super fast DNS service in your own super fast data centre in New York, you find out the very next day that your sales guys signed up a deal with some big publisher in L.A. Excellent!
You start planning right away on expansion of your data centre on the west coast, then proceed to set up your zones on your DNS server such that queries from the west coast will return the IP addresses of your that data centre while queries from the east coast will return the IP addresses of your New York data centre. (If you are not familiar with this, BIND allows you to implement the so-called “geolocation-aware DNS” where you can return different IP addresses for the same domain name based on the IP address where the DNS lookup request originated from.)
Then it hits you: your DNS server is still in New York!
By the way, my geography is not that bad, I know that the users in the diagram above are not in New York and California. However, they are still on the east and west coasts – my drawing skills are not that great unfortunately so please bear with me!
As you can see, once the users from the west coast resolve the DNS, they have a very short trip to your data centre and as such will be served content very quickly (as per before, the red arrows represent DNS lookups while the orange ones represent the web/HTTP traffic). The trouble is finding out about your west coast data centre – which, as you can see, involves a trip across the continent!
For those of you who forgot, Internet traffic ultimately relies on electromagnetic waves traveling through cables and air (up to satellites and back or until they reach a ground antenna). As such, they travel at the speed of light through these environments – which is 300,000 km/s. Taking into account a distance in between L.A. and New York of approx. 4,000 kilometers measured directly, this means that a signal from LA reaches NY in about 4,000 / 300,000 ~ 0.013 seconds. That is approximately 13 miliseconds if the signal travels in a straight line!
The trouble is that the signal travels through fiber optic cables in most cases and has a few hops on the line where the signal is processed (assembled into IP packets) and re-routed. With that in mind, and being generous, I will say that it will take the DNS request about 20msecs once issued in LA to reach NY. Your DNS server then processes the request and sends back the request… which takes another 20 msecs! So your poor west coast users, before even getting to your data centre there, encounter a delay of 40 ms!
40ms is huge in the online advertising world with most ad servers giving themselves 100-200 ms to display content in order to capture user’s attention. So with a generous overall timing of 200ms from the moment the user initiating a request, you are wasting a fifth of that on just DNS resolution. And I’m not taking into account here local routing issues and all the “goodness” that comes with Internet routing which can see your DNS spiraling up to more than 100ms.
At this point, you’ll find out that the publishers start dropping your tags, ad exchanges are shutting you down (try to get back into an ad exchange after you’ve been cut off is twice as difficult as getting in there in the first place by the way!) and your revenue is diminished, not to mention your reputation.
Throw into equation a few other physical locations like Europe, Asia, Australia, and your uber-fast server in your uber-fast data centre is not worth a dime! You can have the fastest servers in the world, but if your DNS doesn’t perform as it should, your solution won’t work since users are being cut off at the very first step: finding out where your application servers are so they can access them.
What To Do?
Of course, you start looking at things like routing tables and routing protocols, BGP, and the likes and start rolling out your own DNS infrastructure. Set up servers in LA and then servers in NYC such that users in LA will be served a DNS request from your LA servers, whereas users in NYC will be served a DNS response from your servers in NYC. You set it all up and off you go, hooray, now you have your in-house DNS service which is very fast for LA and NYC users – job done!… for now.
The moment you get another client – let’s say in Dallas – you find yourself in the same situation again. (I’m not going to compute the times and distances again, but worth remembering here that Dallas is a couple of thousand km away from both your data centres, and as such, you WILL incur a lag in reaching either of these.) You probably go ahead and add more servers in your infrastructure, reconfigure the others so they are aware of one another and as such, they announce themselves accordingly, etc. etc. etc. You also pay for racking, networking, and so on.
Before you know it, just to serve three regions of the entire USA, you are paying for hardcore networking pipes coming out of your data centre plus paying for racking physical servers in there and this is before you even had your users reach your application servers. Every new location that’s not in the geo-vicinity of your existing DNS servers requires more racking, more configuration, more money spent – and this is not to actually expand your servers capacity, this is simply to ensure your users reach your application servers in an acceptable timeframe.
If you’re dealing with advertising on the Internet, the truth is, in 99% of the cases your traffic can come from anywhere – one publisher will target US users, other will deal with Europe, then APAC, South America – really you have to be prepared to be “global” because online opportunities are everywhere nowadays. You can of course go an set up a server in every major geographical region in the world – safe in the knowledge that once you have those you can at any point take traffic for that region.
However, that opportunity might never rise – you might find yourself 2-3 years into it that your APAC side of the business never took off, but you’ve been paying for racked servers for that amount of time. Even more, you had to pay for sysadmins to look after them, upgrade the hardware, patch the kernel, re-cable them, re-rack them and so on. Before you know it, it becomes expensive to maintain an infrastructure like this – and you quite likely want to focus more on developing your core application rather than worrying about an infrastructure.
Outsourcing Your DNS
So, why not outsource your DNS? Why pay for an army of sysadmins and tens of machines which cause headache to maintain and upgrade, when you can pay a monthly fee and outsource all of your DNS? After all, you outsource your content delivery to a CDN provider (you do that, right?) for the same reason: to serve user requests from very close to their locations.
If you are Google or AppNexus or of similar size, granted, you have a huge infrastructure anyway and lots of resources to manage it and optimize that precisely for your needs. However, back in the company I work for, we’re still not quite there yet so we opted for outsourcing this to Dyn.
The setup was actually a breeze as we just set our external IP’s in the system and “described” our traffic (in terms of US, Europe, APAC, etc.) and off we went! This gave us now enough juice to start going into ad exchanges like Yahoo and Google – our platform is pretty much rocket fuel when it comes to serving an ad, but DNS was holding us back a bit, so this outsourcing sorted that issue for us.
However, it doesn’t stop there – like with most other outsourced DNS providers, we got some really cool things out of the box as well like automatic failover. This is an awesome feature, and, again, if you work in online advertising, 24/7/365 is the key – you always need to have contingency plans for failures. (Believe me, disks crash all the time, network timeouts occur and you need to be able to have a system which ticks away throughout all of these!)
Let’s say in the above example, your LA data centre goes down (power outage, your load balancers have a kernel fault, the cabling has a bad day and so on). If you don’t do anything about it, until you get the problem solved, users who normally get served from LA data centre will see nothing – in advertising this quite likely translates to showing “holes” in publishers pages (and they don’t like that!). Instead of that, you can choose to redirect all the LA users to your NYC data centre – as we have seen before, this means a delay in dealing with those requests, however, we are talking about a disaster scenario – it’s not something it happens everyday, and as such, in such cases it’s accepted that it’s better to serve requests slower than usual than not serve them at all.
The nicety about Dyn is that you simply configure certain conditions which determine that your data centre is down. For instance, your load balancer externally-facing IP is no longer reachable, or a certain page is not accessible anymore; once these conditions are met, their solution will automatically update the DNS to exclude that data centre out of the “pool” such that until it comes back up no user will be directed to that location! So you can go ahead and start working on bringing the data centre back up, safe in the knowledge your users are being directed for a while to your (sane) NYC data centre, and when your LA setup is back up, automagically, it appears back in the DNS pool and starts seeing traffic right away .
Of course, you can use this facility manually too and fail over traffic from one data centre to another – this is useful when you need to perform something like a critical hardware upgrade (which can impact live traffic – and you don’t want to take that risk!), simply mark your say NYC data centre as down temporarily and let the DNS automatically reconfigure everything to run from LA while you carry on your upgrade, then once everything is ok, simply bring back NYC data centre back in the pool.
So there you have it. At the end of the day, why spend hundreds of thousands of dollars on hardware and staffing that has an ever-fluctuating need? Take the guess work out of your DNS management (and the large sum of costs) and look into outsourcing your DNS today.