This morning, our DynECT Platform monitoring system noticed a problem, a big one. From our global perspective, it appears that many online services, including amazon.com (the store, AWS, and S3), salesforce.com, advertisting.com, and petco.com, had some serious DNS troubles. As many of our readers know, DNS is the glue that binds domain names, like dynect.com to their respective server’s IP addresses (18.104.22.168). Without DNS, nothing works; no web, e-mail, voip, IM, file sharing, etc.
The DynECT Team quickly began to analyze the situation by checking the resolution chains for these popular web sites. Our analysis revealed that multiple UltraDNS PDNS (a special class of ultradns server) nodes were failing to respond to all DNS queries. Amazon.com clearly knew about the problem, as our monitoring then detected a change in the delegation for amazon.com from UltraDNS’ PDNS nodes to their UDNS nodes at approximately 8:50 am Eastern.
We continued our analysis by checking our global performance trending system – some graphs showing the problems are below.
You can see more graphs from our monitoring on our flickr photostream.
We speculate that the source of the problem may have been a large scale Denial of Service attack against UltraDNS, or an internal operations problem. When we were able to sucessfully query UltraDNS servers, responses were slow to come back, or largely timed out. The problem began to clear itself up around 10:00 am Eastern, when we saw DNS responses returning quickly again, and our favorite sites coming back online.