July 24, 2013
IT environments are changing dramatically and becoming too complex and dynamic for traditional, manual and fragmented management approaches. Whether companies are extending the benefits of virtualization to the next level of business critical applications, virtualizing the desktop layer, adopting multiple virtualization platforms or extending their cloud exposure - they have no tolerance for user experience issues and cost overruns.
IT operations tend to focus on the nuts and bolts of the infrastructure - key concerns are: How hot are my Linux servers? How many IOPS are happening to the disks? Is Active Directory working? Is DNS working, etc.?
At the same time, end users are focused on the business services that they are accessing. That’s what they see and care about to get their jobs done. So a user complaint always relates to the service - bill payment is not working, the CRM service is slow, my online reservation crashed, etc.
This disconnect threatens the success of any IT infrastructure transformation initiative to deliver on the user experience and ROI promise. The disconnect is partly caused by the way IT operations teams are organized and managed. A single business service involves multiple infrastructure silos - the firewall tier, the web server tier, the database tier, the server tier, the application tier, and so on. Virtualization is yet another tier added to the mix.
In most organizations, these tiers are staffed by different administrators. These administrators use different, independent tools for administering and monitoring their silos. When a user calls and complains about a service slowdown, very often IT - using a variety of separate silo tools - cannot quickly diagnose the problem.
In this very typical situation, one team is responsible for the network, another for the databases, another for infrastructure services (DNS, DHCP, etc.), and still another for the applications. Siloed organizations often result in the “it’s not me” syndrome when a performance issue arises, the help desk often struggles to determine who is responsible for the problem because the different silos all point to their respective performance monitoring tools and indicate that the problem is not in their domain.
Performance management systems designed for multi-domain enterprises have had to deal with limited visibility across the silos. Metrics collected from one tier can provide indicators of the performance of other tiers of the infrastructure. For example, by tracking TCP retransmissions recorded on the application servers, a management system can assess the quality of the network connecting the server to end users. This type of transaction tracing has allowed management systems to look at the performance across IT tiers and to surmise which tier of the infrastructure is responsible for a slowdown, even though complete visibility across every tier is not possible.
The IT service manager then often has to guess where the problem originates - Is it the network, database, application, storage, or the virtualization tiers? Finding the root-cause of the problem is the first step to remediate the problem in order to bring the service back to normal. Often, troubleshooting is the most time consuming and expensive step in the incident management process.
Finding the root-cause of a problem, especially in a virtual infrastructure, is easier said than done! Here is why: Let’s take the example of a business service delivered over a typical multi-tier infrastructure - a user connects through a firewall to a web server. The user request is forwarded to a middleware application tier, which then uses a database server to service the request. Suppose the database tier has become 50% slower than normal? Since the application server depends on the database, a slowdown of the database impacts the application server. In turn, the application server problem impacts the web server and ultimately, the user experience.
Evidently, a single problem in one tier can impact all the other tiers and the IT service manager then has to determine where the problem originated. If all the applications were hosted on physical servers, using the inter-dependency information, we could have fairly quickly concluded that the database tier is the root-cause of the problem.
Virtualization makes the diagnosis problem more challenging. Suppose the virtualized Oracle database server from the previous example is hosted on the same physical machine as a virtualized Citrix and a media server. If suddenly there are a lot of requests to stream videos from the media server, this can cause a lot of requests to the server’s disk, choking the physical server. This in turn will impact the performance of the virtualized database server sharing the same physical resource, and result in poor performance for database queries.
Queries handled by the database server start to take longer and longer. Thus the database slowdown in the above graphic may actually be caused by a sudden increase in workload to the media server in the figure below. In this case, the root-casue of the problem is a disk bottleneck on the physical server caused by an increase in workload for the media server application.
From this example, it is clear that root-cause diagnosis technologies for virtual environments need to go beyond how they operate in a physical world. For true root-cause diagnosis, virtual machines running on each physical server must be auto-discovered. Applications running inside each of the virtual machines need to be detected and the monitoring system should automatically determine which applications coexist on the same physical server. That information is used to determine where the root cause of a problem lies.
From these examples, we see that the enterprise needs to emphasize business service management - how a business service is performing and which domains are working and which are not. We’re seeing that it’s no longer sufficient to monitor the uptime or resource usage levels of virtual machines and physical servers and believe that the entire IT infrastructure is working well.
Instead, a truly contemporary enterprise monitoring and management solution should be able to do the following things:
- Provide a single view of the virtual and physical infrastructures
- Support multiple virtualization technologies
- Track physical resource availability, configuration and usage by VMs
- Provide an inside view of virtual machines with clear problem identification
- Automatically establish performance baselines and norms
- Perform automatic correlation for true root-cause diagnosis
- Scale as the infrastructure monitored grows
- Support for virtualized desktop environments
- Offer personalized views for the various stakeholders in an organization to enable collaborative management.
In summary, traditional performance management tools are not keeping pace with the rapid rate of change in virtualized IT environments - they are too silo driven, lack integration and do not quickly and precisely pinpoint the exact cause of performance challenges. IT managers need intelligent performance management solutions that deliver complete transparency and can troubleshoot the exact source of a problem in minutes rather than hours - ideally before the problem manifests itself to the user.
This is where intelligent, virtualization-aware performance management delivers unique value because it:
- Boosts user satisfaction and productivity by significantly reducing downtime and improving application performance.
- Reduces IT support cost & complexity.
- Reduces or slows infrastructure cost through better hardware utilization and right-sizing.
- Delivers new IT initiatives on time, on budget, on target – and reduces the risk, cost and complexity associated with new technologies.
Today, CIOs are focused on balancing the realities of conflicting IT forces. Looking for greater visibility and performance assurance across their business and IT environments, CIOs will be interested in deploying technology to correlate, diagnose, predict and analyze virtualized environments in order to better manage performance, improve the user experience and improve the ROI of their IT investments.