When a nameserver isn’t providing an answer to the question received how does it respond?
The goal of this post is to communicate the circumstances that cause a number of different responses that are not positive answers —like REFUSED, NAME ERROR ( NXDOMAIN ), or NO DATA—and how this can impact name resolution.
When you operate a large authoritative DNS platform, having someone delegate a domain to your service and not create a corresponding zone comes with the territory. It could have been caused by a merger or acquisition where the DNS configuration was overlooked, your brand might have gone around and swept up the .xxx, .ru, and other TLD instances of names related to your business, or any other number of reasons the end result for the authoritative nameserver is the same: some other DNS server believes that Dyn will provide authoritative answers for a domain name, but we don’t know about that name.
What happens when a domain is requested and there is no corresponding zone / resource record hosted on your authoritative server?
From the nameserver’s perspective, it is being asked to answer a question outside of its configured response-ability (DNS pun!). It has no zone file for that domain name and, therefore, it has nothing to respond with. Following RFC1035, a conforming nameserver should issue an RCODE 5 response. This is a refusal because the “the nameserver refuses to perform the specified operation for policy reasons”.
In principle, it should be really strange that a nameserver receives a query for a name for which it is not authoritative. After all, the very act of delegating a nameserver from a parent involves claiming (authoritatively) that the nameservers named by the NS records are the right nameservers. So, historically, many nameservers responded with a referral to the root.
It appears today that this answer is widely scorned by DNS operators (partly because it can be used in amplification attacks), and many nameservers these days will return an error. The error is often RCODE 5 (Refused), on the grounds that the nameserver refuses to perform the specified operation for policy reasons. Sometimes, you will see an RCODE 2 (Server Error or SERVFAIL), for the same reason you see that when a zone is in process of being loaded by a nameserver: the server can’t actually answer the query yet, and does not know whether it ever will be able to do so.
We see the failure to configure a zone or remove a zone pretty often. The effect on our platform varies depending on who is requesting the zone and how often. A particularly bad example was when a home router manufacturer didn’t have a zone configured for the domain each of its 100,000 plus devices used to verify Internet connectivity. The router would request a certain domain and use the processing of the query and the response to monitor its own connectivity. The idea behind this behavior was that if the device could issue the query and see a response with an RCODE 0 (No Error), then it must be connected to the Internet. So when these many devices got errors even while properly connected, things did not go beautifully.
A refused response isn’t cacheable, so the distributed cache that is the recursive layer doesn’t help every client query creates a query on the authoritative servers. So, each time a device issued a query for the domain (which as far as the Internet was concerned, didn’t exist), it queried the recursive resolver which had no cached answer; so it then queried us! Refused responses often aren’t cached because the resolver has no configuration from which to set a negative TTL—a REFUSED response does not return the SOA record and that’s usually where one gets the negative TTL. To be fair, some resolvers are a little smarter about this, using other values they have learned in order to permit negative caching. But if the zone doesn’t exist, there is not much to base that negative TTL upon.
That covers how nameservers deal with requests for domains they don’t have zones for. What happens if a request is received by the authoritative nameserver for a domain in a zone that does exist, but the domain itself doesn’t exist? ( For full detail see https://www.ietf.org/rfc/rfc2308.txt )
Scenario: example.com exists and is delegated to Dyn.
Someone requests norecord.example.com and this domain, norecord, is not configured. In this case, an RFC conforming nameserver should issue an RCODE 3 NXDomain (or Name Error) response. The domain you’re looking for does not exist. This type of failure condition is cacheable, because the response normally comes with some data to show the server is authoritative to provide this negative answer. Somewhere in that data (the details in RFC 2308) is a value you can use as the time to live (TTL) for such negative responses.
The negative TTL is not like the positive TTL you set on every RRset. The positive TTL is different for every RRset, which means different names can have different cache lifetimes. But there is no way to express these differences for names that do not exist, so the negative TTL applies to every name that could be in the zone. If you have a generally static zone, you can cut down on the number of queries per second wasted on NXDomain requests by setting a higher negative cache duration. The trade off is that if you want to make use of that name in the future, you will need to wait for the negative TTL to expire before a new name can be assumed to be visible to the whole Internet.
However, it gets a bit more complex in the case that a domain exists but there isn’t a resource record which matches what is being requested.
Scenario: We create an A record for norecord.example.com so that it now resolves to 203.0.113.1
The domain now exists, so the authoritative server can’t say “NXDOMAIN” when it gets a query, since that would be false. So, if a query arrives for AAAA ( IPv6 address ) for the site, or if the default of your operating system is to use IPv6 and then fail back to IPv4, a AAAA request is issued for norecord.example.com. The RFC conforming response, then, is a NODATA response. That’s not as easy as it sounds.
“NODATA responses have to be algorithmically determined from the response’s contents as there is no RCODE value to indicate NODATA. In some cases to determine with certainty that NODATA is the correct response it can be necessary to send another query.”
After reading in the section from the RFC above, imagine the potential for confusion and the issues your DNS resolver might run into—especially seeing that one part of the behavior is to send another query. We have seen issues with a number of resolvers which, from their volume and frequency of queries, appear to be fundamentally handling NODATA responses / caching incorrectly. Some seem to get caught in loops continuously requesting a record for which there is NO DATA.
One instance of NODATA traffic anomaly involved a popular mobile messaging / social application. We were receiving an abnormally large volume of AAAA queries from a well-connected national telco. The first thought that came to mind was, maybe they are running a recursive farm behind a NAT’ed IP and the recursive servers were not sharing a cache. This would result in a single IP asking for a record at a rate which didn’t correlate with the TTL. I wouldn’t say this is common practice, but it is one of the design patterns to keep in mind when hunting logs to identify recursives which don’t respect TTLs. After looking at a distribution of AAAA queries by domain it seemed that this domain in particular was an outlier.
I’m not normally one for making predictions but if I had to guess the volume of NO DATA response in the DNS is going to increase drastically in the near future. Why, you may ask? iOS 9 and OS X El Capitan are going to prefer IPv6 to IPv4. If the mailing list is correct the devices default DNS resolver will issue both A and AAAA queries. Each resolution event will be a race, if a AAAA response is received first the application will continue as intended, if a A is received first a 25 ms counter will start. If the 25 ms passes without a AAAA response the application will continue using v4. In a sense IPv4 only applications will now have a 25ms penalty! It will be interesting to see how the stub resolver is parsing the response body, “What if the first response received is a NO DATA response to the AAAA?”
Hopefully this post has helped clarify the different questions without answers and how they can impact your customers experience of your application. Whenever you’re setting up a new zone or modifying DNS resource records review your configuration and think about how it impacts negative cache settings. Ensure when you are testing your application that you consider NO DATA responses if your aren’t operating a dual stack and how they might impact your end user experience.