Challenges in Monitoring Virtual Infrastructures
Your Trouble Shooting Bad Dream?
Multi-tier IT infrastructures are a pain to troubleshoot because of the dependencies that exist between application tiers. For example, a failure in the database tier can lead to a slow down in the application and web server tiers. So, if your server and application monitoring solution views the infrastructure as independent silos, you can't effectively monitor and diagnose problems in a multi-tier infrastructure. Add virtualization to such infrastructures and suddenly your monitoring and management task has become infinitely more complex!
Adding Virtualization? From Bad Dream to Nightmare
Since a single VMware® vSphere ESX/ESXi Server is used to host multiple virtual machines (VMs), a single malfunctioning application on a VM can degrade the performance seen by applications hosted on the other VMs.
Fig 1: A problem in one application can affect all the other applications involved in the service delivery.
Fig 2: Excessive disk reads by the media server slow down Oracle database accesses
A Real World Example
Look at Figures 1 and 2: In this example, users of a web-based business service are complaining about slow access. From the service topology, it is clear that the database server at the backend is causing web access to be slow. Figure 2 illustrates that since the database server is hosted on the same ESX server as a media server, a sudden surge in requests to the media server is causing high I/O activity on the media server's VM, which in turn is resulting in an I/O bottleneck on the ESX server. This is impacting all the VMs on the ESX server, and hence, the database application server is seeing slow disk accesses.
To accurately diagnose this problem, your monitoring solution must do TWO things...
- Consider the inter-dependencies between applications that are involved in service delivery, and
- Consider the existential relationships between applications, virtual machines, and physical machines. Since VMs can move between physical machines (e.g., using VMware's vMotion Live Migration technology), the existential relationships between VMs and physical machines must be continuously tracked and updated in real-time.
Besides resource contention among guest virtual machines, applications executing on the ESX/ESXi service console can also affect the performance of the virtual infrastructure.
You Must Know What's "Normal"
While knowing which VM is consuming excessive resources is helpful, it is even more important to understand whether the VM's behavior is normal. For instance, a memory leak in one of the applications executing inside a VM may be causing the VM's memory usage to increase over time. In such cases, it is essential that the VM monitoring solution be able to look in-depth into each guest VM and detect abnormalities.
Of course, you could deploy individual agents inside each VM to provides this level of visibility, but at what cost?
- Additional resource overhead
- Licensing fees
- Maintenance costs
Performance degradations in a virtual infrastructure may also occur when a virtual machine has not been configured with sufficient resources to handle its workload. A vSphere and ESX monitoring solution must be able to tell the difference between a problem resulting from inadequate virtual machine configuration and hot-spots created by uneven load distribution across vSphere ESX/ESXi servers.