Do Monitoring Solutions Have to be Rearchitected to Monitor the Cloud?

Anything cloud is hot these days, and monitoring is no exception. Check out “cloud monitoring” on Google and you’ll see how many vendors have jumped on this bandwagon. Adding to the confusion is the usage of different terms – e.g., “monitoring the cloud” vs. “monitoring in the cloud”! While “monitoring the cloud” refers to how you can monitor applications that are hosted in the cloud, “monitoring in the cloud” refers to having the management solution hosted in the cloud and offered as a pay-per-use service.

Are either of the approaches fundamentally new? The short answer is NO! Many managed service providers have been offering multi-tenant, pay-per-use monitoring as a service for several years and this is nothing but “monitoring in the cloud”.

Consider “monitoring the cloud” next. Is a fundamentally new approach needed for monitoring the cloud? Again the answer is NO. Just consider the features that one cloud monitoring provider advertises:

• Monitoring Frequency – from 1 minute to 60 minutes
• Multiple Check Locations – America, Europe, Asia and Australia
• Monitors Websites, EMail Servers, Firewalls, VoIP, Databases, Domain Name Servers, Routers, Web Servers from end user perspective
• Supported Protocols – HTTP, HTTPS, FTP, SMTP, POP3, IMAP, SSH, PING, TCP, UDP, SIP, MySQL, DNS

• OS – CPU, RAM, Disk Usage, Processes, System Events, Installed Software

• Transaction Recorder; Load time of each component of the page – check load time of each component of your web application

• Instant Failure Alerts
• Schedule maintenance – define downtime periods during maintenance windows
• Escalation – escalate continuing problems to different staff members
• Alerting periods – specify alerting periods per contacts
• SLA Reporting – detailed reporting with SLA metrics
• Public reports – show your uptime to your customers

How many of these are specific to the cloud? Very few!

To expand on this further, let’s look at different aspects of monitoring and see how these are impacted by the cloud:

The monitoring architecture: Conceivably, a business service could use applications, some of which could be hosted in one cloud (public cloud) and some others in another cloud (private cloud), and there may be stringent firewall rules prohibiting communication across these clouds except over standard ports, using standard protocols. Many of the large monitoring frameworks have been designed to use SNMP or other proprietary communication protocols and are ideal for monitoring in networks that are governed by a single domain. As services start to span multiple clouds and multiple domains of control, these single domain monitoring frameworks fail to function effectively.  Hence, if you are using one of the large management frameworks that have been architected decades ago, then yes – you need to consider a new approach for cloud monitoring!

In contrast, the next gen monitoring solutions like eG Enterprise have been designed ground up with multiple domains of control in mind. Administrators can choose between agent-based and agentless monitoring. All communications happen over standard web protocols HTTP/HTTPS, and the agents do not listen on any ports. Not only is this architecture secure, it also lends itself well for operating across cloud providers. Hence, if you are using a next gen monitoring solution like eG Enterprise, the advent of private and public clouds does not mandate an architectural change.

The metrics collected: Internal monitoring of the applications (by deploying agents to integrate with the application) still has value as it can provide additional details of malfunctioning applications (e.g., which SQL query is slow, or which Java method is taking time). The value of internal monitoring is not de-emphasized because you have moved the application to the cloud.

On the other hand, the advent of the cloud has increased the importance we have to place on external monitoring of applications. The best way of determining if your application hosted in a cloud is working well or not is by periodically measuring its availability and responsiveness. Tools and techniques for performing such external monitoring already exist (we have used external monitoring for years to check on the performance of hosted applications and servers) – you just to make sure you deploy them effectively and that you pay attention to the results.

Based on the above discussion, we can conclude that the cloud has not fundamentally changed the type of metrics that you need to be collecting. Of course, to know if the cloud is working well and what portion of the resources you have paid for are being used, new metrics will have to be incorporated. Most cloud providers have published APIs from which these metrics can be obtained.

Metric analysis and root-cause diagnosis: The analysis of metrics is no different in a cloud environment. The same cannot be said about root-cause diagnosis. Just like the introduction of virtualization introduced additional complexity that had to be handled as part of the root-cause diagnosis process, the introduction of the cloud forces root-cause diagnosis to take into account the functioning of the cloud. For instance, if an application is exhibiting poor response times, is it because the application is malfunctioning, or the workload is unusually high, or is it because the cloud provider is not providing the resources you had requested for the application, or could it be that you have not provisioned the cloud server correctly and hence, you are hitting a resource crunch.  Root-cause diagnosis technology must evolve to handle this additional complexity. As with any new technology, the emphasis of most early solutions for cloud is on the metrics and not on the root-cause diagnosis capability.

To summarize, “monitoring in the cloud” is nothing more than a remote hosting model for the management system. “Monitoring the cloud” requires adaptations to monitoring solutions to accommodate the additional infrastructure tiers involved. The changes required are evolutionary and not radically new in nature.