| Capability |
Metric |
Description |
| CPU Monitoring |
CPU utilization per processor of a server |
| |
Know if a server is sized correctly in terms of processing power; |
| |
Determine times of day when CPU usage level is high; |
|
| |
Run queue length of a server |
| |
Determine how many processes are contending for CPU resources simultaneously; |
|
| |
Top 10 CPU consuming processes on a server |
| |
Know which processes are causing a CPU spike on the server; |
|
| |
Top 10 servers by CPU utilization |
| |
Know which servers have high CPU utilization, and which ones are under-utilized; |
|
| Memory Monitoring |
Free memory availability |
| |
Track free memory availability on your servers; |
| |
Determine if your servers are adequately sized in terms of memory availability; |
|
| |
Swap memory usage |
| |
Determine servers with high swap usage; |
|
| |
Top 10 processes consuming memory on the server |
| |
Know which processes are taking up memory on a server; |
|
| |
Top 10 servers by memory usage |
| |
Know which servers have the lowest free memory available and hence, may be candidates for memory upgrades; |
|
| I/O Monitoring |
Blocked processes |
| |
Track the number of processes blocked on I/O; |
| |
Indicates if there is an I/O bottleneck on the server; |
|
| |
Disk activity |
| |
Track the percentage of time that the disks on a server are heavily used; |
| |
Compare the relative busy times of the disks on a server to determine if you can better balance the load across the disks of a server; |
|
| |
Disk read/write times |
| |
Monitor disk read and write times to detect instances when a disk is slowing down (Windows only); |
|
| |
Disk queue length |
| |
Track the number of processes queued on each disk drive to determine disk drives that may be responsible for slow downs; |
|
| |
Top 10 processes by disk activity |
| |
Determine which processes are causing disk reads/writes; |
|
| Uptime Monitoring |
Current uptime |
| |
Determine how long a server has been up; |
| |
Track times when a server was rebooted; |
| |
Determine times when unplanned reboots happened; |
|
| |
Top 10 servers by uptime |
| |
Know which servers have not been rebooted for a long time; |
|
| Disk Space Monitoring |
Total capacity |
| |
Know the total capacity of each of the disk partitions of a server; |
|
| |
Free space |
| |
Track the free space on each of the disk partitions of a server; |
| |
Proactively be alerted of high disk space levels on a server; |
|
| Page File Usage |
Current usage |
| |
Monitor and alert on page file usage of a Windows server; |
|
| Network Traffic Monitoring |
Bandwidth usage |
| |
Track the bandwidth usage of each of the network interfaces of a server (Windows only); |
| |
Identify network interfaces that have excessive usage; |
|
| |
Outbound queue length |
| |
Determine queuing on each of the network interfaces of a server; |
| |
Identify network interfaces that may be causing a slowdown; |
|
| |
Incoming and outgoing traffic |
| |
Track the traffic into and out of a server through each interface; |
| |
Identify servers and network interfaces with maximum traffic; |
|
| Network Monitoring |
Packet loss |
| |
Track the quality of a network connection to a server; |
| |
Identify times when excessive packet loss happens; |
|
| |
Average delay |
| |
Determine the average delay of packets to a server; |
|
| |
Availability |
| |
Determine times when a server is not reachable over the network; |
|
| TCP Monitoring |
Current connections |
| |
Track currently established TCP connections to a server; |
|
| |
Incoming/outgoing TCP connection rate |
| |
Monitor the server workload by tracking the rate of TCP connections to and from a server; |
|
| |
TCP retransmissions |
| |
Track the percentage of TCP segments retransmitted from the server to clients; |
| |
Be alerted when TCP retransmits are high and therefore, are likely to cause significant slowdowns in application performance; |
|
| Process Monitoring |
Processes running |
| |
Track the number of processes of a specific application that are running simultaneously; |
| |
Identify times when a specific application process is not running; |
|
| |
CPU usage |
| |
Monitor the CPU usage of an application over time; |
| |
Determine times when an application is taking excessive CPU resources; |
|
| |
Memory usage |
| |
Track the memory usage of an application over time; |
| |
Identify if an application has a memory leak or not; |
|
| |
Threads |
| |
Track the number of threads running for an application's process (Windows only); |
|
| |
Handles |
| |
Track the number of handles held by an application over time (Windows only); |
| |
Identify if a process has handle leaks; |
|
| Windows Services Monitoring |
Availability |
| |
Determine if a service is running or not; |
|
Windows Event Log and Unix System Log
Monitoring |
New events |
| |
Track the number of information, warning, and error events logged in the Microsoft Windows System and Application event logs; |
| |
Correlate events in the Windows event logs with other activity on the server (e.g., service failure); |
| |
Obtain details of the events in the event logs; |
|
| |
Security success and failure events |
| |
Monitor all events logged in the Microsoft Windows Security log; |
| |
Obtain details of all failure events; |
|
| |
Events in /var/adm/messages log |
| |
Track and be alerted of all errors logged in the /var/adm/messages log of a Unix system; |
|
| Auto-correction |
Automatic restart of failed services |
| |
Determine Windows services that should be running automatically; Monitor if these services are up or not, and restart any failed service automatically; |
|