Our continued analysis of patterns in the DNS lead us to some changes in how we track requests and requestors. These anomalies show just how complicated the DNS can be and explain why we must understand this behavior to drive improvement and optimization of our platform, as well as our customers’. The following post is an executive summary of what we discovered and will be followed by specific blog posts examining each individual cause. We hope this clarifies the reasoning behind any customer impact and also educates the general public on the complexities of the DNS.
To succeed in our mission to monitor, control, and optimize online infrastructure we need to maintain an understanding of how Internet protocols should perform and sometimes conversely how they are implemented. This difference requires reconciling observed behavior with RFC conforming behavior.
The feedback loop starts with monitoring global telemetry to provide a base understanding of “Who is asking?” “What are they asking for?” “Why are they asking for it?” and “What are the paths the request traversed to?” With the answers to these questions we establish and maintain an understanding of the possible levers of control, which then facilitate optimization.
Maintaining this model requires understanding trends and patterns that appear in the DNS, which includes creating hypothesis around the root cause of anomalies. Whenever questions come up about anomalous DNS traffic, there are some usual suspects: broken resolvers, security appliances, malware, or denial of service attacks, all depending on what makes the traffic seem anomalous.
The DNS has a long history. It’s more than 20 years old, and is defined in well over 100 RFCs. During this time the number of pieces of software that have implemented their own DNS resolution has grown, as has the complexity of processing responses. This leads to a handful of RFC conforming resolvers and an even larger number of non-RFC conforming ones.
What effect does this have on your application or website’s performance?
This depends on a number of different details and defaults: “What type of device is interacting with your infrastructure?” “How does the networking stack on these devices work?“ … etc. An example of how this can impact your application as well as your DNS usage comes from thinking about IPv6 implementations.
Do you have AAAA ( IPv6 ) resource records configured for frequently requested domains that you operate? If you don’t have AAAA records configured and requests for them are being made your DNS usage, number of queries per second, is going to increase based on what is asking. More impactful than the usage is the potential impact on your end users’ experience: how long are they waiting for the device or application to switch from IPv6 to IPv4? Is it able to process the response an RFC conforming nameserver is sending to it?
The history of the DNS and the 100 plus RFCs which define conforming behavior make the notion of holistic compliance a rigorous challenge. The problem space spans the deployment of the DNS on devices and objects (stub resolvers ), the distributed caching layer (recursive resolvers ) and the authoritative layer (the services that Dyn provides).
In a series of posts, which will be adding directly to this blog, we will be providing a deep dive in to DNS RFC conforming behavior and the impacts of non-conforming behavior. Those will include:
- Non-Conforming Resolvers / varying implementations of the DNS resolution stack
- Confusions during innovation -or- the attractive menace of ANY
- Nuances in caching and failure NXDomain / NODATA Responses
- 0x20 Bit DNS Names (Mixed Case for security)
- Security Appliances, Recursive Farms, and Domain Monitors