(Editor’s Note: Phil Stanhope previously appeared on the Software Engineering Podcast. We have transcribed some of his interview below. Go here to listen to the entire podcast. Go here to read the first part of the transcript.)
[0:08:37.7] Jeff Meyerson: I think one way that we can look at DNS is it’s essentially a database, so anybody who is looking up a domain name is essentially accessing a database and anybody who stands up a website is performing an insert on that database. You set up the website, you assign the domain name to it. This globally distributed database where anytime you enter a domain name in a browser or on a console or whatever, you get an IP address and you can do stuff with that IP address. Give me an idea for how that big distributed database works. How are things propagated? How does it insert work, and how does an update work, how does a read work?
[0:09:29.5] Phil Stanhope: Okay. That’s a great set of questions. DNS has been around for a very long time. You’re right to refer to it as a database. I would also go further to say it’s an eventually consistent database. In some ways, with all of the rage that has been going on for the last 10 years or so around NoSQL, it is the sort of granddaddy of NoSQL databases.
I say that because it is, and so how does that work under the covers. You’re right. If a browser doesn’t understand what example.com is, it’s going to have to make a DNS lookup, which is a query. It’s a read request to find out what is the actual address of the server that I need to communicate to. That’s going to communicate to the edge of a DNS provider’s systems. It could be your ISP. It could be Dyn itself. It could be one of our competitors. It could be something where you tend to talk to — People think of Google as a DNS provider, and they are. But there’ two core differences or types of DNS systems to get that answer in the first place.
First is the recursive. You, as an internet provider, you must provide a recursive service to your customers because otherwise they won’t be able to get anywhere. At Google — For many years, those open recursives or the recursives that you would see, they weren’t necessarily open. They might be closed just to your ISP. For Comcast, which is my home provider, they give me a recursive and I can communicate through it. You don’t think much about it.
A recursive is actually a cache. It’s kind of like a CDN edge node in the sense that it’s going to remember answers for a period of time. Every DNS record has a time to live associated with it, a TTL, and that recursive will find out the answer from an authority, cleverly called and authority, the master. That authority is the definitive place where you can store manage your records as a user of DNS systems. You do not manage your records in a recursive. Google 220.127.116.11, one of Google’s well-known Anycasted IP addresses globally, is a recursive endpoint, and they allow you, anybody, to communicate through them.
At Oracle Dyn, we offer our own open recursive we have for many many years called Internet Guide, and there a number of others. In fact, there are thousands of open recursives on the internet that you can communicate through. They in turn go to the authority, and the authority is really where the core of the DNS master records are kept. That’s typically going to be managed via an API or some form of a user interface. As you make changes to your records, they will be propagated out — They’ll be available to be answered from the authority.
I’ll actually hold on a deeper answer on how authorities are often structured and how we structure ours here. The recursive, if it doesn’t have it in the cache, it’s going to hit the authority and the authority is going to give up the answer for, for example, an A record, which is a v4 address. You’ll get that answer. The recursive will hold it for it’s time to live, which is — Then the past time to lives were very long; hours, maybe even days.
In the modern internet, time to lives tend to be very short. Technically, they could be zero seconds, but in reality, on the open internet, they never really work when they’re less than 20 or 30 seconds. By having a short TTL, what you’re allowing is the ability to failover or load balance through DNS mechanisms to another point and location. The authority is the one that’s going to tell the recursive how long the answer is good for, and then the recursive is responsible for keeping its cache warm and remembering that answer. Just like a CDN cache however, if there’s a lot of traffic, that cached answer might get flushed out of cache to answer the next set of 10,000 unique things that came along, and that’s the art of running a recursive versus running an authority.