Outages Bring Cloud Monitoring in Focus!

The outage of Amazon's cloud service impacted many businesses hosted in their cloud data center

Over the last week, Amazon’s cloud service had a serious outage that caused many popular web businesses to go offline for several hours and resulted in significant loss of business.

All of a sudden, many in the press (and users as well!) are beginning to realize that applications hosted in the cloud are actually hosted on servers in data centers and are hence, prone to same kind of problems as servers in their own data center. Just because you have not purchased a server or have to provision it, provide power/space, etc., does not mean that the server is failure-proof. As this article indicates, failures can happen due to any number of reasons – a hardware failure, a network outage, an application coding error, etc. Even a configuration error inadvertently made by an administrator can cause catastrophic failures.

Many have gone over-board, predicting the end of cloud computing! If you look at the service contract from these cloud service providers, they have not guaranteed that the infrastructure will be 100% failure-proof. With cloud computing as with everything else, you get what you pay for. Not every business that used Amazon suffered during this outage. The outage was limited to the Amazon east coast (Northern Virginia) data center and for enterprises that had paid for Amazon Web Services’ redundant cloud architecture, it was business as usual. Netflix, the popular movie rental site, was one such.

"You get what you pay for" applies to monitoring tools as well!

Outages like the one Amazon had bring cloud monitoring tools into focus. The saying “You get what you pay for” applies to monitoring tools as well. If you are looking to be alerted once a problem happens, a simple low-cost up/down monitoring tool suffices. On the other hand, if you are looking to be like Netflix and be proactive, want to detect problems before they become revenue impacting, you need a monitoring tool that can alert you to abnormal situations in advance, well before users notice the problem. More sophisticated cloud monitoring tools can also help you rapidly triage where the root-cause of a problem lies – i.e., is it in the cloud data center? is it in your application? is it in the infrastructure services (DNS, Active Directory, etc.)?

Monitoring tools provide insurance cover for your infrastructure. Like Netflix would have assessed the cost of redundancy vs. the benefit from having their business up during the outage, you should assess the return on investment from a monitoring tool.

There are several ways to assess the ROI from a monitoring tool::

  1. By the number of times the monitoring tool can help you avert a problem by proactively alerting you and enabling you to take action before users notice the issue;
  2. By the time the monitoring tool saves by helping you to pinpoint the root-cause of a problem;
  3. By the amount of time that the monitoring tool saves for your key IT personnel by allowing your first level support teams to handle user complaints;
  4. By the savings that the tool provides by enabling you to optimize your infrastructure and to get more out of your existing investment, without having to buy additional hardware or to use additional cloud services.

Related links:

The top requirements for a Cloud-Ready Monitoring SolutionClick here >>>