Performance Monitoring Is Not Just About Diagnosis

Performance Monitoring vs. Diagnosis

Monitoring is one of the most critical aspects of IT administration and management. Various means can be adopted for performance monitoring — many administrators use a commercial monitoring solution, others rely on open source tools (e.g., Nagios, Zabbix, etc.), while some others develop and run home-grown scripts.

The primary usage of a performance monitoring solution is for problem diagnosis: When a problem happens, determine what caused it, where the problem started, what else is affected and how the problem can be resolved. As a doctor uses symptoms to identify the cause of an illness and prescribes appropriate medicine for cure, a monitoring system collects metrics from an IT infrastructure, analyzes these metrics and identifies the cause and effects of problems, thereby helping administrators to troubleshoot and fix them quickly.

However, using a monitoring solution only for problem diagnosis is a serious under-utilization of its capabilities. The metrics that a monitoring solution collects can be used for various other purposes. Here are some ways in which monitoring of your IT infrastructure can help, beyond diagnosis:

Proactive detection of problems
While diagnosis helps find issues after they have happened, performance monitoring tools can track early warning indicators to identify anomalies and warning conditions well before they become business-impacting. Trending and pattern analysis of the data are key methods to predict that problems that may be in the offing.

Service-level reporting
Monitoring helps measure and report on the service levels that you are delivering to your users. This information is useful for management, both to see the service levels you are delivering and to allow you to document your service levels for customers to see. Reports can be created to highlight compliance with service level agreements.

Optimization
When monitoring your infrastructure, you can see usage patterns in all the different tiers. Analysis of these patterns can highlight ways of optimizing the infrastructure – to improve performance or to accommodate a higher workload. For example, you may find that in your server farm, one of the servers in a cluster is handling a larger number of user sessions than the other servers. This may indicate that there is scope for optimizing the load balancing scheme being used to get the most out of the entire infrastructure. Yet another example is an unexpected usage pattern in your infrastructure. For example, in one of our customer’s virtual desktop deployments, it was observed that a specific (older) version of the browser was taking up excessive CPU and memory resources. This led to upgrading the version of the browser, which helped reduce the resource usage and enhance the infrastructure’s performance.

Compliance and audit
A monitoring tool has a key role to play in regulatory compliance and auditing of an IT infrastructure as well. For example, healthcare organizations are required to report on who accessed what system, which particular application, at what time, and for how long. All of this relevant data about infrastructure usage is available from a monitoring tool, making it helpful in streamlining compliance and audit activities.

Management reporting
Management is often looking for administrators to do more with less! By documenting infrastructure usage over time and reporting on where performance bottlenecks lie, administrators can provide quantitative data to managers to help them make informed decisions regarding future upgrades and growth of the infrastructure.

Right-sizing
Data on what resources are available and how much is consumed is useful in determining how to right-size the infrastructure. This is particularly useful in a virtual infrastructure, where resources can be easily shared across virtual machines. Right-sizing enables you to get the most out of your current infrastructure.

Capacity planning
On a related note, administrators are often tasked by management to project future requirements – for example, how many servers are needed to accommodate a user base growth of 30%? Alternatively, when budgeting for the future, where should additional resources be added – should it be additional CPU, memory, or disk capacity? By measuring application workloads and infrastructure resources utilized, you can predict when you will run out of capacity (network, storage, compute resources) and when to invest in new resources.

These are just some of the often under-emphasized uses of a monitoring tool. If you have seen other use cases for monitoring, please share with us in the comments section below.