|
Continuous monitoring of your key applications is imperative to eliminate prolonged delays in the delivery of the business-critical services and to minimize service downtimes. However, not all issues in service performance can be attributed strictly to the applications. Most often, errors at the operating system-level can ripple and affect the performance of applications executing on them; this in turn, can cause the service quality to deteriorate. For instance, if one or more operating system processes are consuming a critical resource excessively, performance of any application requiring that resource will suffer. Focusing on application performance metrics alone will not necessarily lead to the source of the problem. If host-level abnormalities are to be captured and reported before they impact application performance, the underlying operating system must also be monitored 24 x 7.
The eG Enterprise suite includes 100% web-based solaris performance monitoring solution for Solaris servers. Besides measuring the overall operating system health, these models also periodically check: the availability of the host and any critical processes and services, resource contention, I/O activity, network traffic, and network latency. Performance monitoring of the Solaris hardware is also included in eG Enterprise.
Performance Monitoring for Server Farms
The Solaris operating system is known for its scalability and is common-place in many large server farms. As a result, many business-critical services are configured to use applications executing on Solaris-based servers. The continuous availability of these end-user services depends upon the error-free functioning of the Solaris servers and the applications deployed on them.
For round-the-clock Solaris monitoring and to proactively alert administrators to potential performance bottlenecks, eG Enterprise provides a dedicated Solaris performance monitoring model for servers. Inspired by the traditional OSI model, this representation of Solaris servers is composed of a set of hierarchical layers derived from the Solaris architecture. These layers are arranged hierarchically with upper layers depending on the performance of lower layers. Each layer in turn is mapped to a variety of tests, which report a critical metrics related to the Solaris server.
You can have this easy-to-use model up and running in no time, as it only requires that a single eG agent be installed on each Solaris server that has to be monitored. Alternatively, an agentless monitoring option is also available, where a single agent on a Windows system can be configured to monitor any number of remote Solaris servers via SSH/Rexec.
Once the agent is operational, it uses secure mechanisms to extract critical statistics from the target Solaris systems. A wide variety of metrics related to resource usage, network health, TCP retransmissions, and availability of critical Solaris processes is reported. In addition, a wealth of hardware metrics revealing processor availability, memory partition availability, count of system faults, etc., can be extracted from Solaris servers by the same agent with no additional configuration. The values of each metric are compared against pre-defined thresholds set by the administrator, or against automatically computed norms determined by eG Enterprise based on past performance. Any deviations are alerted proactively to administrators, thus enabling administrators to initiate corrective actions quickly. In addition, the eG Enterprise intelligent correlation engine automatically performs correlation of the metrics extracted by each of the layers, and accurately isolates the problem source. Accurate root-cause diagnosis ensures that administrators do not waste time and effort focusing on the symptoms, but instead they can directly attend to the source of the performance issues.
What the eG Solaris Monitor Reveals
 |
 |
 |
 |
 |
Is the Solaris server available over the network? |
 |
Which disk partition on the server is being actively used? |
 |
Is load balanced across all the disk partitions, or are too many requests pending on any partition? |
 |
Are disk partitions able to swiftly process all read and write requests? Is any partition experiencing an I/O bottleneck? |
 |
Is any disk partition running out of space? |
 |
Which processor is utilizing CPU resources excessively? Are any CPU-intensive processes executing on this host? If so, what are they? |
 |
Is swap memory usage optimal? |
 |
Does the host have adequate free memory, or is memory being paged out frequently? |
 |
Did the system reboot as per schedule? Has any of the systems been running without a reboot for an unusually long time? |
 |
What is the server load in terms of TCP connections to and from the server? |
 |
At what rate is the page daemon scanning memory pages? Does the rate suggest a memory shortage? |
|
 |
How many processes are currently running on the system? What is their current state? Are any of these processes using CPU / memory resources excessively? |
 |
Have the inode and buffer caches been used effectively? |
 |
Is any network interface using bandwidth excessively? Which one is it? |
 |
Is the quality of the network connection to the host good, or are too many packets getting lost in transit? |
 |
Are users able to connect to the host quickly via the network link? |
 |
Is any processor supported by the host currently unavailable for use? |
 |
Are all disk and memory partitions on the host available? |
 |
Is the fan available and being used? Is the power supply unit available for use? |
 |
Have too many system faults occurred? |
 |
Were any sudden spikes in temperature detected? |
|
|
 |
 |
 |
 |
Benefits of the eG Monitor for Solaris Servers |
| In-depth monitoring of operating system health: eG Enterprise reports real-time metrics related to the Solaris server and its hardware, and thus provides a true picture of operating system health |
| Internal and external perspectives: A single agent on the Solaris server, or a single remote agent, is capable of monitoring the internal health of the server and its network health |
| Flexible monitoring options: Choose between agentless and agent-based Solaris performance monitoring approaches on a per server basis. This ensures that while some serves can be managed in an agentless manner, others can be managed in an agent-based manner. |
| Automatic top-down correlation: eG Enterprise uses a patented correlation algorithm that correlates the performance across layers, and assigns the highest priority to the layer that serves as the root-cause of problems in the host. This way, system administrators can focus on the source of a problem, and not be distracted by the effects! |
| Intelligent thresholding: The automatic thresholding capability of eG Enterprise ensures that the thresholds vary dynamically based on normal usage patterns. This way, problems are identified before they affect service and false alerts are reduced. |
|
|
Multi-tier IT infrastructures are a nightmare to troubleshoot because of the dependencies that exist between application tiers. For instance, a failure in the database tier could result in slow downs in the application and web server tiers. Hence, monitoring solutions that view the infrastructure as independent silos cannot effectively monitor and diagnose problems in such infrastructures. The addition of virtualization to such infrastructures makes monitoring and management of these infrastructures even more challenging!
 |
Fig 1: A problem in one application can affect all the other applications involved in the service delivery. |
 |
 |
Fig 2: Excessive disk reads by the media server slow down Oracle database accesses |
Since a single VMware® ESX/ESXi Server is used to host multiple virtual machines (VMs), a single malfunctioning application on a VM can degrade the performance seen by applications hosted on the other VMs. Figures 1 and 2 illustrate such an example. In this scenario, users are experiencing slowness in their access to a web-based service. From the service topology, it is clear that the database server is the cause of the slowdown. Figure 2 illustrates that since the database server is hosted on the same ESX/ESXi server as a media server, high I/O activity due to increased access to the media server is resulting in the database server seeing slow disk accesses. To accurately diagnose the problem in this example, a monitoring solution must not only consider the inter-dependencies between applications that are involved in service delivery, but it must also consider the existential relationships between applications, virtual machines, and physical machines. Besides resource contention among guest virtual machines, applications executing on the ESX/ESXi service console can also affect the performance of the virtual infrastructure.
While knowing which VM is consuming excessive resources is helpful, it is even more important to understand whether the VM's behavior is normal. For instance, a memory leak in one of the applications executing inside a VM may be causing the VM's memory usage to increase over time. In such cases, it is essential that the monitoring solution be able to look in-depth into each guest VM and detect abnormalities. While deploying individual agents inside each VM provides this level of visibility, this can result in additional resource overhead, licensing fees, and maintenance cost.
Performance degradations in a virtual infrastructure may also be because a virtual machine has not been configured with sufficient resources to handle its workload. A monitoring solution must be able to differentiate problems resulting from inadequate virtual machine configuration and those resulting from hot-spots created by uneven distribution of load across ESX/ESXi servers. |
|