Internet Performance Delivered right to your inbox

Would the Public Cloud Have Saved Delta from Its No Good, Horrible, Very Bad Week?

Last week’s data center outage at Delta dominated the IT news cycle, and provided a “ripped from the headlines” example of A) IT resiliency gone awry and B) the link between IT resilience, business continuity, and business impact (while we don’t yet know the full fiscal impact on Delta’s business of this outage, a similar outage due to a server failure at Southwest Airlines has now been reported to have come at a cost of $54 million).

While we don’t have a full view into what went wrong, Tony Baer’s tweet summary of an excellent  ZDNet piece gives us a good picture:

Image 2016-08-15 at 9.21.55 AM

The outage also gave rise to a host of punditry on how the cloud might have saved Delta from its no good, horrible, very bad week. For example, writing in Forbes, Kalev Leetaru extols the benefits of public clouds from Google and Amazon to provide redundancy in this article. He writes:

“In this day and age it is extremely surprising that not a week goes by without news of another major corporate data center outage causing a critical disruption in operations. There is simply no technical excuse not to have geographic redundancy with automated failover. Most major cloud providers like Google and Amazon offer high levels of out-of-the-box redundancy. For example, Google’s standard disk storage writes all data 3.3 times automatically and disk snapshots are globally available, while virtual machines automatically migrate on hardware failure and new machines can be spun up from snapshots into any Google data center worldwide. Numerous offerings by both Google and Amazon and their peers allow applications to automatically span data centers, automatically transferring traffic in case of an outage or other issue.”

While Leetaru makes some very relevant points, his summary misses a much larger and more important point: singular-cloud redundancy is not enough to prevent outages when utilizing public cloud assets – a point that can be underscored in light of Google Compute Engine’s outage from late last week, which proves that even the most robust systems can go down – and take you down with it.

Simply moving assets to cloud infrastructure won’t ensure availability. First, as noted above, relying on a single cloud service provider (or CDN for content-centric applications) simply shifts risk from one single-entity point of failure to another. If the cloud or CDN goes out, so does your application. Second, cloud applications introduce new challenges for visibility and control of internet infrastructure (data centers and cloud included), as you no longer directly control how users are connecting to your applications (or perhaps more accurately, you control them less than you do from a corporate DC).

The simple fact is that many organizations still rely on legacy applications running in corporate data centers. This is almost certainly the case with some of Delta’s core business processes, like scheduling and airport operations. Many organizations are migrating these applications to cloud or hybrid cloud environments, and will be bridging from their on-premises solutions to the cloud for years to come.

The cloud offers many benefits, including strategies to de-risk applications from outages. But simply moving to the cloud without strategies to monitor and manage traffic across multiple clouds, CDNs and data centers won’t solve the problem.

More and more we’re seeing customers leverage their DNS infrastructure (including advanced DNS management that can detect issues and bottlenecks and proactively steer traffic around) them to support mission-critical hybrid- and multi-cloud environments.

Enterprises wishing to mitigate risk and get the most out of their internet performance must employ a multi-cloud and/or multi-CDN approach to a hybrid environment, combined with appropriate monitoring and control – or face the very real risk of outages and degraded service that bring with them significant fiscal and customer goodwill costs.

Share Now

Whois: Phil Francisco

Phil Francisco is a VP, Product Management and Product Marketing at Oracle Dyn Global Business Unit, a pioneer in managed DNS and a leader in cloud-based infrastructure that connects users with digital content and experiences across a global internet.

To current Dyn Customers and visitors considering our Dynamic DNS product: Oracle acquired Dyn and its subsidiaries in November 2016. After June 29th, 2020, visitors to will be redirected here where you can still access your current Dyn service and purchase or start a trial of Dynamic DNS. Support for your service will continue to be available at its current site here. Sincerely, Oracle Dyn