I was just going through an article where a prominent virtualization monitoring vendor was asking their prospects to see which other solution monitored what they claimed were 20 key metrics for a virtual infrastructure. Far too many vendors look at monitoring as a numbers game – “I can collect these 1000 metrics. Can you?”. A few years ago, one of the popular agentless monitoring solutions was even priced based on the number and type of metrics collected. That was probably taking pricing to an extreme!
These days, many vendors of applications and virtualization platforms provide published APIs that expose hundreds of metrics. Hence, it is relatively easy to incorporate these metrics directly into a monitoring tool. Customers do not care how many metrics are collected by the monitoring tool. The key criteria that customers use to evaluate whether their investment in a monitoring tool was worth it or not is based on its effectiveness:
- How soon is it able to alert me to problems that are occurring? Am I alerted to problem situations before customers call?
- How accurate is its diagnosis? Can the monitoring tool help me in identifying where the problem originated? Does it provide me with sufficient evidence with which i can confront a domain expert and demand that the problem be fixed?
- How proactive is the solution? Has it alerted me to conditions that i was not aware off and was i able to avert the problem before it became service impacting?
- How quickly can i arrive at conclusions using the tool? Do i need to hire an expert to operate the tool or to interpret what the tool is indicating?
What differentiates one monitoring tool from another is its effectiveness. To be truly effective, a monitoring tool should collect the “right” metrics. Often times, the right metrics are not reflected in the vendor’s APIs. Real-world experiences with the technology is key, so the monitoring tool vendor must understand how the technology works, what its common failure modes are, and how to monitor these failure modes. The analysis techniques (e.g., baselining, trending, correlation, etc.) applied to the right metrics, then convert the raw data into actionable information. A monitoring tool that can automate this analysis and make it possible for lower-skilled IT staff to use the tool and take action to avert or fix problems can result in significant cost savings for an organization. For one thing, the organization does not need to use its expensive expert staff for routine firefighting. Secondly, if the problem diagnosis is accurate, just the right experts need to be pulled in to solve a problem. By enabling this, the monitoring solution helps you achieve greater operational efficiency.
The next time you see a vendor touting how many measures they are collecting, remember to tell them “Its not a numbers game any more!”