Challenges in Monitoring Virtual Environments

While virtualization technologies provide businesses with the ability to do more with less, using virtualization to consolidate the number of physical servers does not mean your management duties shrink proportionally. For example, if a company that runs 300 operating systems on 300 servers reduces that to running 10 virtual servers on each of 30 physical servers, it still has 300 OSs and related software applications, to manage!

In reality, for a technology that is supposed to make computing easier, virtualization is becoming quite complicated to monitor and manage. First, besides the physical server, each of the guests running on the server needs to be monitored. Conventional monitoring approaches involve deploying agents (or agentless monitors) for each guest operating system. The additional cost of deploying an agent for each guest operating system is likely to make this approach impractical for virtual desktop environments where a single physical server is used to host tens of desktops. The resource usage (one agent per operating system) is another disadvantage of this approach.

Furthermore, an agent deployed on each guest virtual machine (VM) can provide an indication of the comparative resource usage levels of all the applications executing within that guest VM only. That is, this approach cannot provide an indication of how the guest VM itself is performing relative to other guest VMs on the same physical server - e.g., how much of the physical CPU of a server is the guest VM consuming? Is there memory contention among the guest VMs? etc. Since multiple guest VMs share the physical resources of the server, even one malfunctioning guest VM (or one malfunctioning application executing on a guest VM) can impact the performance seen by all the other guest VMs hosted on the same physical server.

Resource contention may also occur because of the way an administrator has chosen to allocate resources to the guest VMs. For example, the disk space allocated to a guest VM or the CPU allocated to a guest VM may not be sufficient for the workload that the guest is supporting.

At the same time, it is important to also identify situations when the physical server itself is overloaded. In such situations, administrators should be able to identify whether the excessive loading of the server is because of a malfunctioning VM, or because of genuine load on the physical server. In the latter case, by comparing the relative loading of the guest VMs, administrators can determine how to load balance the guest VMs across different physical servers to ensure optimal usage of the infrastructure.

The different virtualization deployment models have different requirements. For instance, while the server application virtualization approach typically involves a smaller number of VMs running on a physical server, in the virtual desktop approach, tens of desktop VMs run on a physical server. The scale of deployment of desktops means that it is not practically feasible to deploy an agent for monitoring each desktop. Since the number of servers hosting applications on a physical server is likely to be smaller, an agent-based approach for monitoring is more feasible in this scenario.

While in-depth monitoring of each of the applications is important in the server application virtualization approach, in the virtual desktop approach, since only client applications are executed on the desktop, monitoring of the desktop need not be as in-depth as in the server application virtualization context. Furthermore, in a virtual desktop environment, it is essential to identify which guest a user is logging on to, for how long the user was logged in, and what applications he/she used. This information is critical for planning the capacity of the virtual desktop environment. Notice should also be taken of the fact that in a virtual desktop environment, virtual desktops may come and go off dynamically (e.g., as a user logs on and logs off, respectively), whereas in a server application virtualization approach, the guest operating systems are likely to be more static (i.e., come on and off less frequently).

To effectively monitor and plan the capacity of virtual desktop environments, it is also important that the activity of individual users be monitored. At any point, the administrator should be able to determine which of many servers a specific user is logged on to, when did the user logon, and what application(s) is the user accessing. This information may have to be available on a historical basis to allow for auditing of virtual desktop accesses. The duration and frequency of user accesses must also be monitored to determine who the most frequent or the heaviest users are.

Since virtualization is being deployed in different ways - e.g., using EMC’s VMware infrastructure, Microsoft’s Virtual Server, AIX LPAR, Solaris Zones, etc., it is important to design the monitoring solution to be easily extensible to accommodate virtual environments besides those based on VMware.

From the above discussion, it is clear that for effectively monitoring a virtual infrastructure, it is important to be able to:

assess the resource usage on the host and identify resource bottlenecks;
compare the resource usage of the different guest VMs to identify malfunctioning guests;
identify resource constraints in each guest VM;
pin-point malfunctioning applications running on each guest VM;
handle the diverse monitoring and management requirements of virtual desktops and hosted server applications;
correlate the performance of the physical server, the guest VMs, and the applications executing on these VMs;
be extensible to support not just VMware among the virtualization platforms.