|
IBM AIX operating system which is a variant of the Unix OS is widely used by many enterprises to power their data centers. Monitoring the server hardware, and critical parts of the server operating system including the processors, memory, disk, and network interfaces is essential for ensuring that the applications running on these servers are working efficiently at all times.
eG Enterprise offers 100% web-based AIX server performance monitoring. To monitor the server, you need to deploy the eG agent software. The agent deployment takes at most a couple of minutes, and soon as the agent is started, it can start monitoring the server hardware, operating system and application processes with little configuration. Baselines for all the collected metrics are pre-defined in eG Enterprise based on industry standard best practices, so you can start receiving alerts when a process fails, a critical event is logged in the server log, or when a disk fills up. If you are interested, the same eG agent can be upgraded to monitor critical applications such as IBM UDB, IBM WebSphere, Oracle, and others running on your AIX systems.
eG Enterprise can monitor servers in an agent-based or in an agentless manner, and administrators can pick and choose the servers that have to be monitored with agents (e.g., critical production servers) and those that can be monitored in an agentless manner (e.g., staging servers). The monitoring system is licensed per server OS, and not based on the number of CPU cores or sockets, or based on the applications running on it.
With its ability to monitor 10+ operating systems including Microsoft Windows 2008, 2003, 2000, Oracle Solaris, Red Hat Linux, SuSE Linux, HPUX, OS/400, and OpenVMS, eG's
AIX system monitoring software
provides a single pane of glass from where administrators can monitor their heterogeneous multi-vendor data center servers from a single console.
| Capability |
Metric |
Description |
| CPU Monitoring |
CPU utilization per processor of a server |
| . |
Know if a server is sized correctly in terms of processing power; |
| . |
Determine times of day when CPU usage level is high; |
|
| |
Run queue length of a server |
| . |
Determine how many processes are contending for CPU resources simultaneously; |
|
| |
Top 10 CPU consuming processes on a server |
| . |
Know which processes are causing a CPU spike on the server; |
|
| |
Top 10 servers by CPU utilization |
| . |
Know which servers have high CPU utilization, and which ones are under-utilized; |
|
| Memory Monitoring |
Free memory availability |
| . |
Track free memory availability on your servers; |
| . |
Determine if your servers are adequately sized in terms of memory availability; |
|
| |
Swap memory usage |
| . |
Determine servers with high swap usage; |
|
| |
Top 10 processes consuming memory on the server |
| . |
Know which processes are taking up memory on a server; |
|
| |
Top 10 servers by memory usage |
| . |
Know which servers have the lowest free memory available and hence, may be candidates for memory upgrades; |
|
| I/O Monitoring |
Blocked processes |
| . |
Track the number of processes blocked on I/O; |
| . |
Indicates if there is an I/O bottleneck on the server; |
|
| |
Disk activity |
| . |
Track the percentage of time that the disks on a server are heavily used; |
| . |
Compare the relative busy times of the disks on a server to determine if you can better balance the load across the disks of a server; |
|
| |
Top 10 processes by disk activity |
| . |
Determine which processes are causing disk reads/writes; |
|
| Uptime Monitoring |
Current uptime |
| . |
Determine how long a server has been up; |
| . |
Track times when a server was rebooted; |
| . |
Determine times when unplanned reboots happened; |
|
| |
Top 10 servers by uptime |
| . |
Know which servers have not been rebooted for a long time; |
|
| Disk Space Monitoring |
Total capacity |
| . |
Know the total capacity of each of the disk partitions of a server; |
|
| |
Free space |
| . |
Track the free space on each of the disk partitions of a server; |
| . |
Proactively be alerted of high disk space levels on a server; |
|
| Network Traffic Monitoring |
Incoming and outgoing traffic |
| . |
Track the traffic into and out of a server through each interface; |
| . |
Identify servers and network interfaces with maximum traffic; |
|
| Network Monitoring |
Packet loss |
| . |
Track the quality of a network connection to a server; |
| . |
Identify times when excessive packet loss happens; |
|
| |
Average delay |
| . |
Determine the average delay of packets to a server; |
|
| |
Availability |
| . |
Determine times when a server is not reachable over the network; |
|
| TCP Monitoring |
Current connections |
| . |
Track currently established TCP connections to a server; |
|
| |
Incoming/outgoing TCP connection rate |
| . |
Monitor the server workload by tracking the rate of TCP connections to and from a server; |
|
| |
TCP retransmissions |
| . |
Track the percentage of TCP segments retransmitted from the server to clients; |
| . |
Be alerted when TCP retransmits are high and therefore, are likely to cause significant slowdowns in application performance; |
|
| Process Monitoring |
Processes running |
| . |
Track the number of processes of a specific application that are running simultaneously; |
| . |
Identify times when a specific application process is not running; |
|
| |
CPU usage |
| . |
Monitor the CPU usage of an application over time; |
| . |
Determine times when an application is taking excessive CPU resources; |
|
| |
Memory usage |
| . |
Track the memory usage of an application over time; |
| . |
Identify if an application has a memory leak or not; |
|
Server Log Monitoring |
New events |
| . |
Obtain details of the events in the system logs files (/var/adm/messages, sulog, syslog, etc.); |
|
|
|
Multi-tier IT infrastructures are a nightmare to troubleshoot because of the dependencies that exist between application tiers. For instance, a failure in the database tier could result in slow downs in the application and web server tiers. Hence, monitoring solutions that view the infrastructure as independent silos cannot effectively monitor and diagnose problems in such infrastructures. The addition of virtualization to such infrastructures makes monitoring and management of these infrastructures even more challenging!
 |
Fig 1: A problem in one application can affect all the other applications involved in the service delivery. |
 |
 |
Fig 2: Excessive disk reads by the media server slow down Oracle database accesses |
Since a single VMware® ESX/ESXi Server is used to host multiple virtual machines (VMs), a single malfunctioning application on a VM can degrade the performance seen by applications hosted on the other VMs. Figures 1 and 2 illustrate such an example. In this scenario, users are experiencing slowness in their access to a web-based service. From the service topology, it is clear that the database server is the cause of the slowdown. Figure 2 illustrates that since the database server is hosted on the same ESX/ESXi server as a media server, high I/O activity due to increased access to the media server is resulting in the database server seeing slow disk accesses. To accurately diagnose the problem in this example, a monitoring solution must not only consider the inter-dependencies between applications that are involved in service delivery, but it must also consider the existential relationships between applications, virtual machines, and physical machines. Besides resource contention among guest virtual machines, applications executing on the ESX/ESXi service console can also affect the performance of the virtual infrastructure.
While knowing which VM is consuming excessive resources is helpful, it is even more important to understand whether the VM's behavior is normal. For instance, a memory leak in one of the applications executing inside a VM may be causing the VM's memory usage to increase over time. In such cases, it is essential that the monitoring solution be able to look in-depth into each guest VM and detect abnormalities. While deploying individual agents inside each VM provides this level of visibility, this can result in additional resource overhead, licensing fees, and maintenance cost.
Performance degradations in a virtual infrastructure may also be because a virtual machine has not been configured with sufficient resources to handle its workload. A monitoring solution must be able to differentiate problems resulting from inadequate virtual machine configuration and those resulting from hot-spots created by uneven distribution of load across ESX/ESXi servers. |
|