DNS performance (measured in terms of latency) is one of the primary drivers promoting companies to seek an outsourced DNS provider. Latency is certainly not the only motivator, but it is a compelling and generally well-understood one; however, how to measure this latency is not well understood.
Therefore I wanted to take a few minutes to cover some of the considerations with regard to measuring Authoritative DNS performance. Hopefully this will enable readers to reach a better outcome with regard their monitoring objectives overall.
Firstly, it is important to establish whether or not your current DNS monitoring draws an accurate picture of nameserver performance. The primary reason to question this relates to the differing methods that might be employed in ‘DNS’ monitoring. The term DNS monitoring can be considered ambiguous and does not evidence a standard or agreed set of principles or methods.
Some monitoring methods may factor in other elements that could be considered irrespective, particularly when seeking accurate data for establishing the performance of an Authoritative DNS provider such as Dyn.
The following video depicts the process of recursion quite nicely, particularly if you are unfamiliar with all the steps in the query chain:
The above representation provides some insight to the fact that there are several pieces to this equation. When evaluating the performance that Dyn provides, it serves little purpose for us to consider the performance of Root, or evaluate the latency of the recursive layer. However, some DNS monitoring methods may incorporate these values into the overall picture.
Whilst insight into these is arguably important, especially when establishing a picture of end-user experience, they do not provide us with any assurance of Authoritative DNS performance. More importantly, relative to the DNS monitoring method employed they could paint, quite incorrectly, a less than positive picture.
Measure The Right Thing:
When measuring an Authoritative nameserver’s performance, we need to ensure that our method is structured around, or at least clearly delineates, direct polling measurements. A basic example of direct polling is detailed below using ‘dig’ in a terminal (please see this page for details regarding ‘dig’ on your platform).
A quick look at dig syntax:
dig rrtype hostname @servername
So firstly, we call dig. We then specify the type of record we want to query and the associated hostname we’re interested in. Lastly, we can specify the DNS server we wish to query – if this were omitted, we would query the same DNS server we would use if we were opening a web page in our web browser (or any other application that requires DNS resolution).
David:~ dig a dyn.com @ns1.p01.dynect.net
;; ANSWER SECTION:
dyn.com. 60 IN A 220.127.116.11
; <<>> DiG 9.8.3-P1 <<>> a dyn.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER< ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;dyn.com. IN A
;; ANSWER SECTION:
dyn.com. 60 IN A 18.104.22.168
;; Query time: 7 msec
;; SERVER: 10.10.0.8#53(10.10.0.8)
;; WHEN: Tue Sep 10 22:47:04 2013
;; MSG SIZE rcvd: 43
In this very basic example, the response in terms of latency, depicted as ‘Query time: 7 msec’ provides an accurate measure of Dyn’s performance, or more specifically, ns1.p01.dynect.net’s performance from the location I am querying from just south of London.
The reason direct polling might be considered a more accurate measure for Authoritative nameservers is due to the fact that we are forcing a direct query against one of the Authoritative nameservers (depicted by the @servername part of the syntax) for the domain in question. Performing a query in this way circumvents other parts of the query chain, i.e. recursion.
Moving Beyond The Basics
Hopefully at this stage we understand some of the inherent pitfalls when utilizing DNS monitoring to assess an Authoritative DNS provider’s performance. We’re now cognizant of the guidelines and have some data, now what?
Essentially when monitoring DNS performance we are trying to gauge the end-user experience. We add this to an overall picture allowing for optimizations and improvements to our service or platform.
Therein we may be looking to improve Time-To-First-Byte, service continuity, availability and delivery speed (of information). Although DNS is only one portion of this equation, arguably it is the most important as it is the system upon which many things rely. If DNS performance degrades then so does everything else within your infrastructure that relies upon it (there are of course some caveats here, which I’ll cover in another post).
How Can We Gain Greater Insight?
A good place to start is to structure your analysis into different buckets, ranking importance relative to your objectives. Service availability will likely be top of the list; however, if performance degrades and the deviation curve sharp then availability may become irrelevant as you have failed to meet user-experience expectations, or you meet them inconsistently. An inconsistent service may be worse than no service at all in some cases.
Let’s look at some common measurement buckets:
How many queries are dropped and where, is it isolated to a region? What regional patterns can we establish?
As with availability, it is typically important to take account of regional factors. For example, performance degradation may be higher in areas where carrier availability is limited, inconsistent, or physical infrastructure is sub-standard.
When looking at global ‘averages’ the aforementioned considerations need to be accounted for, measuring performance regionally provides more accurate data. You may choose several different monitoring providers to help you establish an accurate regional picture.
This last bucket takes account of the variance in performance [latency] over X period of time. If you experience sub X response times several times a day, but the rest of the time you experience above XX response time then this constitutes an area for further investigation and possible improvement.
The above merely scrapes the surface and is intended as a primer. The hope is that after absorbing the above you will have established some basic principles and developed some interest to further investigations on the subject.
The depths to which some companies go to gain accurate insight into Authoritative DNS performance is significantly deeper and more complex that anything outlined in this post. For the majority though, ensuring that the correct things are measured and accurately reported is enough, this allows you to make more informed choices and focus your improvement scope.