Earlier this week, we hosted our latest webinar: a look at a specific application of an advanced feature of our DynECT Managed DNS platform called Active/Active Failover. Our CTO Cory von Wallenstein and iovation’s Principal Infrastructure Architect Eric Rosenberry rocked the hour-long session like Vedder and Springsteen to the sweet melody of DNS excellence.
As the moderator of these webinars, I filter through the questions and try to fit in as many as possible within a short timeframe. Luckily, we always have more questions than time so I grabbed Eric, Cory and our reigning King of Concierge Chris Gonyea for 30 minutes to help clean up the spillover.
As always, these are real questions from real people. Be sure to sign up for our email list and we’ll make sure you get alerted about our next monthly webinar.
Let’s get to it!
How strong is Dyn’s infrastructure in Europe? What dns response times could we expect with a DNS geolocation service? Would you be able to differentiate on certain Scandinavian cities?
Out of our 17 data centers that are currently part of our DynECT anycast network, four are located in Europe: London, Amsterdam, Frankfurt and Warsaw.
With our anycast network, DNS queries are automatically routed to the fastest data center on our network. This ensures in many parts of Europe sub-30ms lookup times even when you are using our advanced features such as Traffic Management and Geo Traffic Management.
This also ensures if one or more data centers in Europe were to become unavailable, traffic is re-routed to the next fastest data center on the network with a negligible increase in latency.
Currently our Traffic Management service can differentiate between three regions in Europe (EU West, EU Central and EU East). Our Geo Traffic Management service, which uses geo location data to determine where the recursive DNS server is located, can get down to the country level in Europe.
If a customer is using Dyn as Secondary DNS, would it still be necessary for them to utilize Active Failover?
These are actually two separate concepts that are easy to confuse with each other.
DNS has a concept of primary and secondary DNS where one server (usually called the “master”) hosts the master copy of your DNS zone while other DNS servers are Secondary DNS mode (often called “slave”) mode. The idea is whenever the primary copy of your zone is updated, a zone transfer occurs to all authorized Secondary DNS servers to keep their copies of the zone in sync with the latest changes.
All DNS servers are then listed in your domain’s nameserver delegation side by side (what you see when doing a WHOIS on your domain name) and DNS queries directed at the domain will be divided up among all nameservers, whether they are primary or secondary. If one or more DNS servers are unavailable, DNS queries are redirected to another of the remaining DNS servers in the delegation. This is a very different scenario compared to what Active Failover addresses.
As for the Active Failover question, DNS servers in Secondary mode essentially have a “read-only” copy of the zone as it must remain in sync with the primary or “master” server. As a result, DynECT Managed DNS does not support use of advanced services such as Active Failover when in Secondary DNS mode since we cannot modify the contents of the zone if a failover event were to occur.
However, you can have DynECT be the primary DNS provider with Active Failover and have other DNS servers act as Secondary DNS. If we were to modify a DNS record due to a failover event, the Secondary DNS server will receive an updated zone file with the committed changes.
Can you discuss the various TTLs (infrastructure nodes TTLs and end point TTLS, etc)? What are rules of thumb?
We have a great blog post that describes what TTLs are and some suggestions for different record types.
Essentially, the choice of TTL comes down to how much of a balance you want between fast propagation times and with keeping your DNS Queries Per Second (QPS). Need fast propagation times in an emergency or using a service such as DynECT Traffic Management? You will need TTLs below 5 minutes and possibly even as low as 30 seconds (which we support).
Do you have a DNS record rarely, if ever, change AND you are willing to wait for a change to propagate? Then you can set to a TTL of 1 hour or even as high as a day. This basically comes down to what you are most comfortable with.
Do you provide service in South America?
Wouldn’t there be a lot of traffic replicating session state?
It depends on how much data you store in your session state and how much traffic you have hitting the site. The vast majority of our (iovation) traffic does not involve any session state, so it is minor in the context of what we do. Also, 10 gigabit per second pipes are capable of a LOT of traffic. 😉
What’s Iovation’s load balancing strategy in their data centers? How is Dyn configured to mark a DNS endpoint down?
Our load balancing strategy is two-tiered. We use Dyn to direct traffic to iovation’s closest “node” location (for services we GSLB) or we use Dyn to 50/50 split traffic to our two main data processing facilities for our main API services. Then at each data center/node, we have internal HA pairs of load balancers that further split the traffic amongst servers.
This allows us to disable a server in one data center at the local load balancer without taking the entire site down (and it is instantaneous as we do not have to wait for DNS propagation or broken resolvers). Taking one node out at a time in each datacenter is how we do code upgrades hot.
From the health checking standpoint, the local load balancers check each of the member nodes to determine which is “healthy” and it uses that data to determine whether to put it in the pool or not. Then, we also have Dyn pointed at the external URL of the “virtual server” pool at each of the global nodes and Dyn will health check each node to ensure network connectivity is not lost, or that the entire node has not failed.
In general, it is extremely rare for a global node to get pulled out of use by Dyn as all of our nodes are highly available within themselves. Generally node failures are caused by service provider instability (i.e. black hole routing).
Could this multiple active nodes setup would work with different hosting/cloud environment? For instance, if I host a same application in Azure and AWS, would it redirect to any of host for availability?
Yes, this is absolutely a viable strategy. As has been seen time after time in Amazon AWS, when bad things happen, there are cascading effects across the environment. If your application is capable of running in different clouds (and you validate that any data replication, you need will work between them on the open Internet), then that would provide you a high degree of fault tolerance.
Want to see the webinar? Here you go!