Introduction
In the last three years, virtualization and cloud computing have consistently been among the top CIO priorities. Between these, cloud computing has received more attention because the technology has the potential to radically change the way IT services are provisioned, operated, and managed. Cloud computing is more than just another technology -- it is a set of business processes with enabling technology components that enable systems to be provisioned on demand without requiring human intervention, with response times not experienced in the past. Agility, automation, and repeatability are the cornerstones of cloud computing.
On the other hand, virtualization is a key enabling technology for cloud computing. It is virtualization that allows new virtual machines to be spun up or decommissioned in a few minutes, thereby enabling IT services to be offered on demand. Although it has received less attention than cloud computing, virtualization has the more significant impact when it relates to how performance management for IT infrastructures will be done. This article explores why this is the case.
Performance Management for Cloud Computing
From a performance management perspective, cloud computing has two new requirements that were not previously mandatory:
- Agility is a key attribute of cloud computing. To be cloud-ready, management systems should allow rapid installation (without any human intervention), provisioning and administration.
- Cloud infrastructures are multi-domain in nature. The infrastructure is provisioned and managed by the cloud service provider; the enterprise owns and operates the applications that run on the cloud instances. When a performance issue arises, the often-asked question is where is the root-cause of the problem? Is it the cloud or is it the application? Is the problem in the network or in the client? Performance management systems for new IT infrastructures must be capable of working in infrastructures with multiple domains where complete end-to-end visibility may not be available across all domains.
Although neither of these requirements was mandatory in the past, they are not fundamentally new requirements. For several years, many large enterprises have been looking to automate their operational processes in order to minimize human errors, automate processes, and make them repeatable. Solutions such as HP Orchestration, BMC BladeLogic, Dynamic Ops, and RES Automation Manager have emerged to enable operational automation. Most performance management systems today offer interfaces to allow the management system to be installed, provisioned, and administered automatically without human intervention.
Likewise, management across multi-domain infrastructures is not completely new. It’s an issue large enterprises have had to handle for years, even without the use of cloud computing. Most large enterprises have siloed organizations; one team is responsible for the network, another for the databases, another for infrastructure services (DNS, DHCP, etc.), and still another for the applications.
Siloed organizations often result in the “it’s not me” syndrome (see Figure 1) -- when a performance issue arises, the help desk often struggles to determine who is responsible for the problem because the different silos all point to their respective performance monitoring tools and indicate that the problem is not in their domain.
Figure 1: Siloed IT organizations result in the “it’s not me” syndrome
Performance management systems designed for multi-domain enterprises have had to deal with limited visibility across the silos. Metrics collected from one tier can provide indicators of the performance of other tiers of the infrastructure. For example, by tracking TCP retransmissions recorded on the application servers, a management system can assess the quality of the network connecting the server to end users.
Transaction tracing is another technology that has allowed management systems to look at the performance across IT tiers and to surmise which tier of the infrastructure is responsible for a slowdown, even though complete visibility across every tier is not possible.
Virtualization is a Disruptive Technology for Performance Management
Although cloud computing does introduce some complications from a performance management standpoint, it does not radically change the way we need to manage the IT infrastructure.
On the other hand, virtualization is a disruptive technology, particularly as it relates to performance management. There are several reasons for this.
Reason #1: Performance management systems for virtualization must provide visibility into multiple levels of resource usage.
In a physical infrastructure, each machine has CPU, memory, and disk resources that are dedicated to it, whereas in a virtual infrastructure, virtual machines (VMs) often share the resources of the physical machine on which they are hosted. A single malfunctioning VM can take all the resources of a physical machine, thereby impacting the performance of all the other VMs operating on the same physical machine.
In such a scenario, it is necessary to know how the resources of the physical machine are being used by the VMs and to determine why a VM is taking up resources -- i.e., which application or process running inside the VM is responsible for the resource usage. Performance management systems for virtualization must be capable of providing this multi-level view of performance -- i.e., resource usage across VMs as well as a breakdown of resource usage within a VM (see Figure 2).
Reason #2: Virtualization introduces new types of dependencies that need to be tracked and managed.
As we saw in the previous example, one VM can impact the performance of all other VMs running on the same physical machine. Thus, virtualization introduces new types of dependencies which did not exist when we only had physical machines. Performance management systems for virtual infrastructures need to discover such inter-VM dependencies and use these dependencies for root-cause diagnosis. If a performance management system does not consider these dependencies, it may not be able to provide accurate root-cause diagnosis for problems.
Reason #3: Virtualization dependencies are dynamic.
Traditional performance management systems have been designed with physical infrastructures in mind. Such infrastructures are static -- i.e., applications run on the same physical machine unless IT manually reconfigures them. These systems discovered dependencies statically and it was often sufficient to discover these dependencies periodically (e.g., once a day).
Virtualization breaks this very basic ground rule for performance management systems. Most virtualization platforms allow VMs to migrate from one physical machine to another in real time. Such live migration is used to support high availability and improve load distribution. Performance management systems for virtual infrastructures need to adapt to this new world. Virtual-machine-to-physical-machine dependencies must be discovered and updated in real-time.
Reason #4: Virtualization cannot be managed as yet another infrastructure silo.
A common practice in many enterprises is to view virtualization as just another infrastructure silo. This works fine only as long as the problem does not relate to the virtualization platform. To understand why virtualization cannot be managed as an independent infrastructure silo, consider the example of a simple multi-tier e-business Web service that is hosted on a virtual infrastructure (see Figure 3).
Figure 3: An example of a multi-tier e-business service hosted in a virtual infrastructure.
A problem in the Oracle database tier is causing a slow down of the J2EE application tier and
the Web tier, ultimately impacting the end-user experience.
In this example, the Web service has a Microsoft IIS Web server front-end, J2EE application server middeware, and a backend Oracle database server. Users access the Web service through a firewall. Because the application server depends on the database server to properly function, when the database server is operating 50 percent slower than normal, the application server is affected as well. Because the Web server depends on the application server, it is slowed by the database server problem. Ultimately, the user experience is affected.
This example shows how in a multi-tier infrastructure, a problem in one of the tiers can have an impact on all the other tiers as well. If all the infrastructure tiers were hosted on physical servers, our root-cause diagnosis would have been that the database server is the root cause of the performance problem with the e-business service.
Let us now see how virtualization changes our root-cause diagnosis.
Suppose the database server from Figure 3 is hosted on a VM that is running on a physical machine. Also, suppose the same physical server is also hosting a media server on another VM. A sudden spike in requests for videos from the media server can result in this VM taking up a lot of the disk resources of the physical machine (see Figure 4), ultimately resulting in a disk bottleneck on the physical machine. In turn, this can impact the performance of all the VMs on the physical machine, thereby affecting the VM hosting the Oracle database server as well.
the physical machine to choke, thereby impacting the performance of all the
applications/VMs hosted on the physical machine.
From this example, we can see that with the introduction of virtualization, we now have an application -- a media server that had no relation to the e-business service we were looking at -- that is impacting the performance of the e-business service. In our example, a performance management solution that was not virtualization-aware would have pointed to the Oracle database server as the root-cause of the problem.
However, a virtualization-aware performance management system can provide the right diagnosis -- that is, the physical machine choking as a request of the load from the media server is the real root-cause of the problem. From this example, we can conclude that to be effective in the new virtualization world, performance management systems need to be virtualization-aware.
Summary
Industry experts believe that today there are more virtual machines than physical machines deployed in IT today. As IT infrastructures become more virtualized, there is an increasing need to look closely at what performance management systems and practices must be in place to deal with this new world. In this article, we have offered four reasons why virtualization is changing the ground rules used to design traditional performance management systems and why your enterprise needs to consider new virtualization-aware approaches to performance management.
Srinivas Ramanathan is the president, founder, and chief executive officer of eG Innovations. You can contact the author at srinivas@eginnovations.com.