For customers utilizing the Dyn Standard DNS platform who were impacted by a DDoS attack on our service today, the following is an account of what happened and steps we’re taking to improve. No outages were observed on the DynECT Managed DNS platform (served using an Anycast network) during the course of the event.
11:52 UTC: The Dyn Operations team began to see traffic increase to various data centers across the network. Over the next 15 minutes, the traffic increased to the point that it was clear there was a Distributed Denial of Service (DDoS) attack against all five Dyn Standard DNS name servers and the team immediately began investigating the issue. The attack brought in a tremendous amount of traffic and caused the name servers to become overwhelmed. It quickly became apparent this was a very large attack coming in from nearly all of our network providers and that a great deal of effort and care would be necessary to make sure we got services back up and running quickly without making mistakes and compounding the issue.
12:40 UTC: The exact nature of the attack was identified and mitigation techniques were finalized. Operations began deploying DDoS countermeasures at the impacted sites. Over the next 20 minutes, the servers began to prune off attack traffic as rules were updated to the point where they appeared to be fully responsive to requests again. Our DynStatus site became overwhelmed with people looking for updates. We took to Twitter while we shifted providers to bring up extra resources to handle the increased load.
13:30 UTC: We continued to monitor the traffic and evaluate how our mitigation techniques worked. Due to the physical location of ns2 and ns5, both continued to take on too much traffic and so it was decided that ns2 and ns5 would remain offline while the attack continued.
20:00 UTC: ns5 was brought back online after additional mitigating techniques were put in place. ns2 continued to be kept offline due to specifics of the site.
We continue to see increased traffic at this time and will continue to monitor the attack. We deeply regret the amount of time it took to recover from this incident and have already begun postmortem discussions regarding what we can do next time to be better. We will be evaluating our server infrastructure, physical locations, DDoS mitigation techniques and personnel training, along with methods for recognizing particular attacks.
We appreciate our customers’ continued trust in us and as such, we feel it important to keep everyone updated on exactly what happens at Dyn and how we try to improve every day as a company. We had a similar outage back in 2011 and we’re sad to see it happen again. As the newly hired Director of Operations, it’s my job to keep these services up and operational for you, our customer. We’ve more than doubled the size of the team in recent months, worked to improve documentation and training and tried to do our best prioritizing what needs attention and when.
Today’s outage highlights an area that we as a team need to focus on sooner rather than later and as we progress through 2012 and plan for 2013, I’m focused on making sure we do everything within our power to address this and minimize that risk.