System Details Test

This operating system-specific test relies on native measurement capabilities of the operating system to collect various metrics pertaining to the CPU and memory usage of a host system. The details of this test are as follows:

Target of the test : Any host system

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each host monitored

  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. Duration - This parameter is of significance only while monitoring Unix hosts, and indicates how frequently within the specified test period, the agent should poll the host for CPU usage statistics.
  4. summary – This attribute is applicable to multi-processor systems only. If the Yes option is selected, then the eG agent will report not only the CPU and memory utilization of each of the processors, but it will also report the summary (i.e., average) of the CPU and memory utilizations of the different processors. If the No option is selected, then the eG agent will report only the CPU usage of the individual processors.  
  5. useiostat – This parameter is of significance to Solaris platforms only. By default, the useiostat flag is set to No. This indicates that, by default, SystemTest reports the CPU utilization of every processor on the system being monitored, and also provides the average CPU utilization across the processors. However, if you want SystemTest to report only the average CPU utilization across processors and across user sessions, then set the useiostat flag to Yes. In such a case, the processor-wise breakup of CPU utilization will not be available.
  6. useps - This flag is applicable only for AIX LPARs. By default, this flag is set to No.
  7. include wait - This flag is applicable to Unix hosts alone. On Unix hosts, CPU time is also consumed when I/O waits occur on the host. By default, on Unix hosts, this test does not consider the CPU utilized by I/O waits while calculating the value of the CPU utilization measure. Accordingly, the include wait flag is set to No by default. To make sure that the CPU utilized by I/O waits is also included in CPU usage computations on Unix hosts, set this flag to Yes
  8. To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

CPU utilization:

This measurement indicates the percentage of utilization of the CPU time of the host system.

Percent

A high value could signify a CPU bottleneck. The CPU utilization may be high because a few processes are consuming a lot of CPU, or because there are too many processes contending for a limited resource. Check the currently running processes to see the exact cause of the problem.

System CPU utilization:

Indicates the percentage of CPU time spent for system-level processing.

Percent

An unusually high value indicates a problem and may be due to too many system-level tasks executing simultaneously.

Run queue length:

Indicates the instantaneous length of the queue in which threads are waiting for the processor cycle. This length does not include the threads that are currently being executed.

Number

A value consistently greater than 2 indicates that many processes could be simultaneously contending for the processor.

Blocked processes:

Indicates the number of processes blocked for I/O, paging, etc.

Number

A high value could indicate an I/O problem on the host (e.g., a slow disk).

Swap memory:

On Windows systems, this measurement denotes the committed amount of virtual memory. This corresponds to the space reserved for virtual memory on disk paging file(s). On Solaris systems, this metric corresponds to the swap space currently available. On HPUX and AIX systems, this metric corresponds to the amount of active virtual memory (it is assumed that one virtual page corresponds to 4 KB of memory in this computation).

MB

An unusually high value for the swap usage can indicate a memory bottleneck. Check the memory utilization of individual processes to figure out the process(es) that has (have) maximum memory consumption and look to tune their memory usages and allocations accordingly.

Free memory:

Indicates the amount of memory (including standby and free memory) that is immediately available for use by processes, drivers or Operating System.

MB

This measure typically indicates the amount of memory available for use by applications running on the target host.

On Unix operating systems (AIX and Linux), the operating system tends to use parts of the available memory for caching files, objects, etc. When applications require additional memory, this is released from the operating system cache. Hence, to understand the true free memory that is available to applications, the eG agent reports the sum of the free physical memory and the operating system cache memory size as the value of the Free memory measure while monitoirng AIX and Linux operating systems.   

Steal Time

Indicates the percentage of time a virtual processor waits for a real CPU while the hypervisor is servicing another virtual processor.

Percent

This measure is applicable only for the Windows VMs that are provisioned via a VMware vSphere ESX.

A low value is desired for this measure.

A high value for this measure indicates that a particular virtual processor is waiting longer for real CPU resources. If this condition is left unattended, it can stall the tasks performed by the virtual processor and cause the overall performance of the virtual processor to deteriorate significantly and badly impact user-experience with the target server.

The impact of stolen CPU always manifests in slowness but can have more profound effects on your infrastructure. Here are some examples:

  • Slower page load times
  • Slower database query times
  • Slower processing of reports
  • Increased queue size of asynchronous tasks because of an inability to process them quickly
  • Increased IaaS bill due to launching more servers to handle the same amount of load

To avoid such eventualities, administrators should either immediately terminate the virtual machine and launch a replacement or upgrade the VM to have more CPU.

Note:

For multi-processor systems, where the CPU statistics are reported for each processor on the system, the statistics that are system-specific (e.g., run queue length, free memory, etc.) are only reported for the "Summary" descriptor of this test.