Monitoring the RHEV Manager

eG Enterprise provides a 100%, web-based RHEV Manager monitoring model, which periodically runs availability and health checks on the RHEV manager and proactively reports abnormalities.

layermodel1

Figure 1 : The layer model of the RHEV Manager

Each layer depicted by above figure is mapped to tests, which employ agent-based or agentless mechanisms (depending upon how you want the RHEV manager to be monitored by the eG Enterprise system) to pull out a variety of metrics from the RHEV manager. The metrics so collected enable administrators to quickly find accurate answers to the following performance queries:

Is the RHEV manager available over the network? If so, how quickly is it responding to requests?
Have any error/warning events occurred on the RHEV manager? What are these errors/warnings?
Has the RHEV manager log captured any new errors/warnings? If so, what are they?
How many data centers have been configured on the RHEV manager? What are they, and what is the compatibility level of each one of them?
Is any data center in a problematic state currently?
Which data center is running short of disk space? How many clusters, RHEV servers, and VMs have been configured in that data center, and which ones are they?
How many storage domains are operational in each datacenter? Which ones are they?
Is any storage domain unavailable? If so, which one? Which VMs are using this storage domain?
Is any storage domain running out of space? Which one is it, and which VMs will be impacted by this space crunch?
Is any logical network currently down? Which clusters and RHEV servers are using this logical network?
Which logical network is experiencing heavy network traffic?
Have any errors occurred on a logical network? If so, which one is it, and when did these errors occur - while transmitting data or while receiving it?
Is any cluster using CPU resources excessively? If so, which cluster is it? Are any CPU-hungry VMs operating within that cluster? What are they?
Are all clusters rightly sized in terms of memory, or are there any clusters that are currently experiencing a memory contention? If so, which cluster is it, and what is causing the memory crunch on that cluster - is it owing to improperly sized hosts or memory-starved VMs?
Which cluster has too many hosts and VMs that are powered off?
What is the compatibility level of each cluster?