Introduction

Most current day IT infrastructures are heterogeneous environments including a mix of different server hardware and operating systems. Sometimes, administrators might not want to monitor any of the applications executing on these operating systems, but would be interested in knowing how healthy the operating system hosting the application is. To cater to these needs, eG Enterprise offers 100% web-based, integrated moni of heterogeneous IT infrastructures. Administrators can monitor and manage a variety of Unix, Windows, and legacy operating systems from a common console. A novel layer model representation is used to analyze and depict the performance of different protocol layers of the infrastructure – network, operating system, TCP/IP stack, critical application processes and services, etc.  By using a common performance model representation across heterogeneous infrastructures, eG Enterprise ensures that administrators are not exposed to the differing nature of each operating system and hence, have a short learning curve.

The monitoring can be done in an agent-based or in an agentless manner, and administrators can pick and choose the servers that have to be monitored with agents (e.g., critical production servers) and those that can be monitored in an agentless manner (e.g., staging servers).

A single agent license suffices to monitor a server and the agent license is transportable across operating systems. Agent-based and agentless monitoring is supported for Microsoft Windows 2000/2003, Sun Solaris, Red Hat Linux, Free BSD, SuSE Linux, HPUX, Tru64, and AIX operating systems. Agentless monitoring is also available for Novell Netware, OpenVMS, and OS/400 operating systems.

The following table summarizes the system monitoring capabilities of the eG Enterprise.

Capability Metric Description

CPU Monitoring

 

CPU utilization per processor of a server

  • Know if a server is sized correctly in terms of processing power;
  • Determine times of day when CPU usage level is high

Run queue length of a server

Determine how many processes are contending for CPU resources simultaneously

Top 10 CPU consuming processes on a server

Know which processes are causing a CPU spike on the server

Top 10 servers by CPU utilization

Know which servers have high CPU utilization, and which ones are under-utilized

Memory  Monitoring

Free memory availability

  • Track free memory availability on your servers;
  • Determine if your servers are adequately sized in terms of memory availability

Swap memory usage

Determine servers with high swap usage

Top 10 processes consuming memory on the server

Know which processes are taking up memory on a server

Top 10 servers by memory usage

Know which servers have the lowest free memory available and hence, may be candidates for memory upgrades

I/O Monitoring

Blocked processes

  • Track the number of processes blocked on I/O;
  • Indicates if there is an I/O bottleneck on the server

Disk activity

  • Track the percentage of time that the disks on a server are heavily used.
  • Compare the relative busy times of the disks on a server to know if you can better balance the load across the disks of a server

Disk read/write times

Monitor disk read and write times to detect instances when a disk is slowing down (Windows only)

Disk queue length

Track the number of processes queued on each disk drive to determine disk drives that may be responsible for slow downs

Top 10 processes by disk activity

Determine which processes are causing disk reads/writes

Uptime Monitoring

Current uptime

  • Determine how long a server has been up;
  • Track times when a server was rebooted;
  • Determine times when unplanned reboots happened;

Top 10 servers by uptime

Know which servers have not been rebooted for a long time;

Disk Space Monitoring

Total capacity

Know the total capacity of each of the disk partitions of a server

Free space

  • Track the free space on each of the disk partitions of a server;
  • Proactively be alerted of high disk space levels on a server;

Page File Usage

Current usage

Monitor and alert on page file usage of a Windows server;

Network Traffic Monitoring

Bandwidth usage

  • Track the bandwidth usage of each of the network interfaces of a server (Windows only);
  • Identify network interfaces that have excessive usage

Outbound queue length

  • Determine queuing on each of the network interfaces of a server;
  • Identify network interfaces that may be causing a slowdown;

Incoming and outgoing traffic

  • Track the traffic into and out of a server through each interface;
  • Identify servers and network interfaces with maximum  traffic;

Network Monitoring

Packet loss

  • Track the quality of a network connection to a server;
  • Identify times when excessive packet loss happens;

Average delay

Determine the average delay of packets to a server;

Availability

Determine times when a server is not reachable over the network;

TCP Monitoring

Current connections

Track currently established  TCP connections to a server;

Incoming/outgoing TCP connection rate

Monitor the server workload by tracking the rate of TCP connections to and from a server

TCP retransmissions

  • Track the percentage of TCP segments retransmitted from the server to clients;
  • Be alerted when TCP retransmits are high and therefore, are likely to cause significant slow*downs in application performance;

Process Monitoring

Processes running

  • Track the number of processes of a specific application that are running simultaneously;
  • Identify times when a specific application process is not running

CPU usage

  • Monitor the CPU usage of an application over time;
  • Determine times when an application is taking excessive CPU resources.

Memory usage

  • Track the memory usage of an application over time;
  • Identify if an application has a memory leak or not;

Threads

Track the number of threads running for an application’s process (Windows only);

Handles

  • Track the number of handles held by an application over time (Windows only);
  • Identify if a process has handle leaks;

Windows Services Monitoring

Availability

Determine if a service is running or not

Server Log Monitoring

New events

  • Track the number of information, warning, and error events logged in the Microsoft Windows System and Application event logs;
  • Correlate events in the Windows event logs with other activity on the server (e.g., service failure)
  • Obtain details of the events in the event logs;

Security success and failure events

  • Monitor all events logged in the Microsoft Windows Security log;
  • Obtain details of all failure events;

Events in /var/adm/messages log

Track and be alerted of all errors logged in the /var/adm/messages log of a Unix system

Auto-correction

Automatic restart of failed services

Determine Windows services that should be running automatically; Monitor if these services are up or not, and restart any failed service automatically