Jobs & Community
- Post your job here!
  Anywhere, anytime - Telecom Ramblings
- Featured Job Listings
  List one or more jobs!
  Reach thousands of industry insiders!
- Visit our Jobs Board

Community Resources

Amazon’s Cloud Cascade

April 29th, 2011 by Rob Powell · 1 Comment

Amazon today finally offered up a detailed failure report itemizing what went wrong last week when its Elastic Compute Cloud went down and took a chunk of the internet with it. It’s quite a read. That is, it’s quite a read if you can get through all the definitions and jargon. Data Center Knowledge does a pretty good job of simplifying it a bit, but let’s make it a bit simpler.

Amazon’s EC2 is a complicated beast made up of many automated processes that all interact with each other in predictable well engineered ways. What brought it all crashing down was process complexity. An unanticipated mistake during a capacity upgrade sent rerouted traffic on the wrong path. That caused problems for other components of the cloud, which responded automatically as they were designed to. Simultaneously. That overloaded more stuff, which caused more automatic responses, which overloaded more stuff. Simultaneously. Which…

Well you get the idea now – it was a cascading failure. If this were one server or something, the obvious fix is the ubiquitous reboot — just wipe the slate clean and start fresh. But for EC2 or probably any large distributed cloud it wasn’t nearly so easy I guess and it took days to get back to normal.

So what did Amazon do wrong? Well, nothing we won’t see happen again to them and other similarly large, complicated systems. Nothing, and I mean nothing, will work out all such issues except time and usage. The more complex things get, the harder it becomes to predict how they will respond to a wide variety of stimuli. Each time this sort of thing happens, their engineers will adjust the design so that their cloud becomes that much more stable. Just call it growing pains.

If you haven't already, please take our Reader Survey! Just 3 questions to help us better understand who is reading Telecom Ramblings so we can serve you better!

Categories: Cloud Computing

Join the Discussion!

1 Comment, Add Yours!

Anon says:

May 3, 2011 at 7:59 pm

mom said you get what you pay for

Reply

Ramblings’ Jobs

Post a Job - Just $99/30days
Event Calendar

PNW on The Bear Has a Book: Bandwidth to Drop Soon!: “I worked for a short time as a GM for Zayo, reporting to one of his managers Glenn which was…” Apr 16, 14:16

Priyanka Patil on Trenching Your Own Fiber: “Wow, that’s some serious commitment! Not sure I’d dig my own fiber trench, but I do love when services give…” Apr 14, 03:32

Mary Bridges on Trenching Your Own Fiber: “Fascinating approach by Lyse Tele—empowering customers to take ownership of the last-mile connection truly builds long-term loyalty. It’s a brilliant…” Apr 14, 02:01

Peter Radizeski on The Bear Has a Book: Bandwidth to Drop Soon!: “Rob, you even got a mention in the book.” Mar 27, 11:46

Vin Terraciano on 2025 – The Year of Practical AI: Chain of thought is more powerful, productive, & present in our everyday lives: “What do we make of Zayo acquiring Crown Castle, is that a rumor or will it happen?” Mar 14, 11:15

Fred Thomasson on The Bear Has a Book: Bandwidth to Drop Soon!: “Well, I read the book and I decided to play a drinking game by taking a shot of tequila every…” Mar 14, 10:28

Fred Thomasson on The Bear Has a Book: Bandwidth to Drop Soon!: “This site has always been a shill for Dan Caruso, going back to his time at Zayo. This book will…” Feb 17, 22:41

Eli on Metro Fiber Maps: “Here is an updated link to the ACD fiber map: https://acd.net/fibermap/index.cfm And another for residential fiber to the home: https://acd.net/Residential/viewMap.cfm…” Feb 10, 16:53

sprintpark on What the Golden Resignation Will Mean for Companies, and How Out of Band Can Help: “Great insights, thanks for sharing! Network Engineering Services” Feb 7, 04:55

Rob Powell on Zayo to Buy Crown Castle’s Fiber?: “I imagine that if objections arise, they would likely just be required to divest some assets in certain markets. Especially…” Jan 30, 10:01

Telecom Ramblings

Jobs & Community

Community Resources

Featured Articles

Amazon’s Cloud Cascade

April 29th, 2011 by Rob Powell · 1 Comment

Join the Discussion!

1 Comment, Add Yours!

Leave a Comment

Ramblings’ Jobs

Event Calendar

Recent Comments