I’m a old schooler, I have been programing and dealing with computer systems since mid 1990′s. At that time we did not had all of the cool tools that we have today and everything was dealt with paranoia.
In another words what I’m trying to say is that from experience, there are ways to deal with situations when the sh** hit’s the fan.
Today’s Amazon AWS outage (can’t call a hiccup something that is lasting more than half an hour) either brought down or slowed down sites all over the world. This could have been avoided if business had a simple contingency plan.
Most of today’s mid and large sized web applications use a third party service to host the application, databases, etc. With the increasing demand the Cloud services become more and more popular and the migration of in-house service to third-party service just become natural. If the server died on the third-party, they could replace it with a faster time than if done in-house. With similar reasons the in-house servers became relic and the contingency plan become forgotten. No one would guess that a whole data-warehouse would go down right !?
For whatever reason today Amazon AWS service went down on US-Area-1 (or Virginia) and they stayed (or still are) down for a considerable time. The question that you should be doing right now is: What had to be done to prevent my web application to go offline or slower because one or more servers (doesn’t matter the type – app or db) are offline?
I’m not a DevOPS and I really can’t go deep on the subject, but based on experience (or simply paranoia), here is an idea. Everything today is distributed, so it should be your server instances under the load balancer. Truth is whoever doesn’t have that today probably got the web application fully offline.
In defense of Amazon AWS, you can choose the areas of service of your servers and, with the exception of Virginia, all the other service areas are working just fine. Even knowing that is pretty bad that a whole area got knocked off, your application should not if you have the server instances correctly setup.
I might be saying the obvious here, but I recall a friend telling me that he was at a conference and all other participants that worked on very large websites were spooked because he mentioned that he had all his server instances with only one single service provider.
Obvious or not every business should prepare a contingency plan. A plan that will give answers and restore it to full working conditions in no or little time in the case of the worst case scenario.