Not Responding Processes - OS Test

When processes running on a system become unresponsive for any reason, they can cause significant problems, ranging from performance slowdowns to data loss and security risks. These unresponsive processes can interrupt workflows and affect productivity leading to delays, unnecessary resource hogging and poor system reliability. This is why, administrators should monitor the unresponsive processes on the system and take necessary actions before they trigger disk errors, file system corruption, Windows crashes, BSOD (Blue Screen of Death), or forced reboots. For this purpose, administrators can use the Not Responding Processes - OS test.

This test auto-discovers the Windows systems in the target Windows Systems Group and for each Windows system, reports the count of processes that are stuck and unresponsive and the count of processes that were ended. The detailed diagnosis reported for these measures helps administrators to quickly identify the processes that are not currently responding and determine how long each process was in an unresponsive state. Knowing these details helps administrators to troubleshoot the issues and take remedial actions to recover or end the unresponsive processes to ensure better system performance.

Target of the test : A Windows Systems Group

Agent deploying the test : A remote agent

Outputs of the test : One set of results for every Windows system

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The nick name of the Windows Systems Group component for which this test is to be configured.

Port

Refers to the port at which the specified host listens to. By default, this is NULL.

Inside View Using

To obtain the 'inside view' of performance of the systems - i.e., to measure the internal performance of the systems - this test uses a light-weight eG VM Agent software deployed on each of the systems. Accordingly, this parameter is by default set to eG VM Agent.

Report By User

This flag is set to No by default. This implies that the Windows systems in environments will always be identified using the system name. In other words, this test will, by default, report measures for every systemname. On the other hand, if you want this test to report the measures for every user on a system, then set this flag to Yes. In such a case, this test will report the measures for every username_on_systemname.

Report Powered OS

By default, this flag is set to Yes, then the 'inside view' tests will report measures for even those Windows systems that do not have any users logged in currently. The systems will be identified by their name and not by the username_on_systemname. On the other hand, if this flag is set to No, then this test will not report measures for those systems to which no users are logged in currently.

Is Cloud VMs?

This flag is set to Yes by default. The value of this flag cannot be changed. This implies that the cloud-based Windows systems in environments will always be identified using the login name of the user. In other words, in cloud environments, this test will, by default, report measures for every username_on_systemname.

Show Whitelist Process

In some highly secure environments, administrators whitelist an index business-critical and most commonly used applications that are permitted to be present and active on the Windows system. The goal of whitelisting is to protect the target system from potentially harmful applications and prevent any unauthorized files from executing. Applications whitelisting places control over which applications are permitted to run on the target system and is controlled by the administrators, rather than the end-user. In such environments, administrators may wish to monitor only the processes of applications that are whitelisted on the target system. To cater to this need of such administrators, the Show Whitelist Process is set to Yes, by default.

Setting the Show Whitelist Process flag to Yes will enable this test to monitor only processes of the applications that are listed against the WhiteListProcesses parameter in the [EXCLUDE_APPLICATIONS] section of the eg_tests.ini file available in the <eG_INSTALL_DIR>/manager/config folder. If administrators wish to add or remove one or more applications to/from this pre-defined list, then, they can do so by specifying the application names against the WhiteListProcesses parameter. However, you can set this flag to No if you want this test to monitor the processes of all applications executing on the target system. By default, eG Enterprise offers a comma separated list of pre-defined applications specified against the WhiteListProcesses option in the [EXCLUDE_APPLICATIONS] section of the eg_tests.ini file available in the <eG_INSTALL_DIR>/manager/config folder.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Not responding processes

Indicates the number of processes that are currently stuck and unresponsive on this system.

Number

Ideally, the value of this measure should be very low. A high value for this measure may indicate memory leaks, CPU overload, incompatible updates, or software conflicts.

The detailed diagnosis of this measure lists the name and ID of the unresponsive processes.

Not responding processes end

Indicates the number of unresponsive processes that were ended during the last measurement period.

Number

Use the detailed diagnosis provided by this measure to know the name and ID of a process, time stamp at which the process became unresponsive, time stamp at which the process was ended and time duration during which the process was unresponsive.