Application Process Test

The Processes test monitors the server daemon processes and their resource usage. Often, the unavailability of a server daemon is an error condition. In some cases, if specific processes are running or too many of such processes are running, this may indicate an error condition. For example, in a Citrix environment, a process called cmstart.exe is part of the Citrix login process. When logins are working well, there will be very few cmstart.exe processes running on a server. However, when users experience slow logins or have difficulty in launching applications on a Citrix Presentation Server, many cmstart.exe processes are observed. The Application Process test is used to monitor processes like cmstart that are not expected to be running on a server, but which show an unusual change in the number of processes or their resource usage when problem situations occur.

The Application Process test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick the desired Component type, set Performance as the Test type, choose the test from the disabled tests list, and click on the << button to move the test to the ENABLED TESTS list. Finally, click the Update button.

Target of the test : Unix/Windows server

Agent deploying the test : An internal agent

Outputs of the test : One set of results per process pattern specified

Configurable parameters for the test
  1. TEST PERIOD - How often should the test be executed
  2. Host - The host for which the test is to be configured
  3. Port - The port to which the specified host listens
  4. Process - In the PROCESS text box, enter a comma separated list of names:pattern pairs which identify the process(es) associated with the server being considered. processName is a string that will be used for display purposes only. processPattern is an expression of the form - *expr* or expr or *expr or expr* or *expr1*expr2*... or expr1*expr2, etc. A leading '*' signifies any number of leading characters, while a trailing '*' signifies any number of trailing characters. For example, in a Citrix environment, a process called cmstart.exe is part of the Citrix login process. When logins are working well, there will be very few cmstart.exe processes running on a server. However, when users experience slow logins or have difficulty in launching applications on a Citrix Presentation Server, many cmstart.exe processes are observed. This process hence requires monitoring. Similarly, users might also want to be alerted if any instance of the dreaded virus drwatson.exe is executing on the system. Therefore, the PROCESS configuration in this case will be: Citrixstartprocess:*cmstart*,Virus:*drwatson*. Other special characters such as slashes (\) can also be used while defining the process pattern. Typically, slashes (\) are used when the configured process pattern includes the full directory path to the process to be monitored.

    To determine the process pattern to use for your application, on Windows environments, look for the process name(s) in the Task Manager -> Processes selection. To determine the process pattern to use on Unix environments, use the ps command (e.g., the command "ps -e -o pid,args" can be used to determine the processes running on the target system; from this, choose the processes of interest to you).

    Also, note that the process parameter is case-sensitive in Unix environments.

  5. user - By default, this parameter has a value "none"; this means that the test monitors all processes that match the configured patterns, regardless of the user executing them. If you want the test to monitor the processes for specific users alone, then, on Unix platforms, specify a comma-separated list of users to be monitored in the user text box. For instance: john,elvis,sydney

    While monitoring Windows hosts on the other hand, your user configuration should be a comma-separated list of "domain name-user name" pairs, where every pair is expressed in the following format: Domainname\Username. For example, to monitor the processes of user john and elvis who belong to domain mas, your user specification should be: mas\john,mas\elvis. Also, on a Windows host, you will find system processes running on the following user accounts: SYSTEM, LOCAL SERVICE, and NETWORK SERVICE. While configuring these user accounts, make sure the Domainame is always NT AUTHORITY. In this case therefore, your user specification will be: NT AUTHORITY\SYSTEM,NT AUTHORITY\LOCAL SERVICE,NT AUTHORITY\NETWORK SERVICE.

    If multiple processes are configured for monitoring and multiple users are also configured, then the test will check whether the first process is run by the first user, the second process by the second user, and so on. For instance, if the processes configured are java:java.exe,apache:*httpd* and the users configured are john,elvis, then the test will check whether user john is running the process java, and user elvis is running the process apache. Similarly, if multiple processes are configured, but a single user alone is configured, then the test will check whether the specified user runs each of the configured processes. However, if you want to check whether a single process, say java.exe, is run by multiple users - say, james and jane - then, you have to do the following:

    • Your user specification should be: james,jane (if the target host is a Unix host), or <Domainname>\james,<Domainname>\jane (if the target host is a Windows host)
    • Your process configuration should be: Process1:java.exe,Process2:java.exe. The number of processes in this case should match the number of users.
    • Such a configuration will ensure that the test checks for the java.exe process for both the users, james and jane.  
  6. CORRECT - Increased uptime and lower mean time to repair are critical to ensuring that IT infrastructures deliver a high quality of service to users. Towards this end, the eG Enterprise embeds an optional auto-correction capability that enables eG agents to automatically correct problems in the environment, as soon as they occur. With this capability, as and when an abnormal situation is detected, an eG agent can initiate corrective actions automatically to resolve the problem. Automatic correction without the need for manual intervention by IT operations staff reduces service downtime and improves operational efficiency. By default, the auto-correction capability is available in the eG Enterprise for the Number of processes running measure of Processes test, and the Service availability measure of the WindowsServices test. You can enable this capability for the ApplicationProcess test, to correct a problem condition pertaining to a particular measure reported by that test. To enable the auto-correction capability for the ApplicationProcess test, first, select the TRUE option against the CORRECT parameter in this page (by default, FALSE will be selected here).
  7. ALARMTYPE - Upon selecting the true option, three new parameters, namely, ALARMTYPE, USERPARAMS, and CORRECTIVESCRIPT will appear. You can set the corrective script to execute when a specific type of alarm is generated, by selecting an option from the ALARMTYPE list box. For example, if the Critical option is chosen from the ALARMTYPE list box, then the corrective script will run only when a critical alarm for the ApplicationProcess test is generated. Similarly, if the Critical/Major option is chosen, then the corrective script will execute only when the eG Enterprise system generates critical or major alarms for the ApplicationProcess test. In order to ensure that the corrective script executes regardless of the alarm type, select the Critical/Major/Minor option.
  8. USERPARAMS - The user-defined parameters that are to be passed to the corrective script are specified in the USERPARAMS text box. One of the following formats can be applied to the USERPARAMS specification:

    • exec@processName:command: In this specification, processName is the display name of the process pattern specified against the PROCESS parameter, and command is the command to be executed by the default script when there is a problem condition pertaining to the processName.
    • command: In this specification, command signifies the command to be executed when there is a problem condition pertaining to any of configured processes. Such a format best suits situations where only a single process has been configured for monitoring, or, a single command is capable of starting all the configured processes.

    Note:

    • The USERPARAMS specification should be placed within double quotes if this value includes one or more blank spaces.
    • Note that if a processName configured in the PROCESS parameter does not have a corresponding entry in USERPARAMS (as discussed in format 1), then the auto-correction capability will not be enabled for these processes.
  9. CORRECTIVESCRIPT - Administrators will have to build the auto-correction capability for this test to address probable issues with it, by writing their own corrective script. To know how to create custom auto-correction scripts, refer to the eG User Manual. The full path to the corrective script should be specified here.
  10. wide - This parameter is valid on Solaris and Windows systems only.

    On Solaris environments, if the value of the wide parameter is true, the eG agent will use usr/ucb/ps instead of /usr/bin/ps to search for processes executing on the host. /usr/ucb/ps provides a long output (> 80 characters), whereas /usr/bin/ps only outputs the first 80 characters of the process path and its arguments. However, some Solaris systems are configured with tightened security, which prevents the usr/ucb/pscommand to be executed by any and every user to the system  - in other words, only pre-designated users will be allowed to execute this command. The sudo (superuser do) utility (see http://www.gratisoft.us/sudo/) can be used to allow designated users to execute this command. If your system uses sudo to restrict access to the /usr/ucb/ps command, then specify the value of the "wide" parameter to be "sudo". This will ensure that not only does the agent use the /usr/ucb/ps command to monitor processes (like it would do if the "wide" parameter were set to be true), but it would also use sudo to execute this command.

    On Windows environments, by default, the eG agent uses perfmon to search for the processes that match the configured patterns. Accordingly, the wide parameter is set to false by default. Typically, a process definition in Windows includes the full path to the process, the process name, and process arguments (if any). Perfmon however scans the system only for process names that match the configured patterns – in other words, the process path and arguments are ignored by perfmon. This implies that if multiple processes on a Windows host have the same name as specified against processpattern, then perfmon will only be able to report the overall resource usage across all these processes; it will not provide any pointers to the exact process that is eroding the host’s resources. To understand this better, consider the following example. Typically, Windows represents any Java application executing on it as java.exe. Say, two Java applications are executing on a Windows host, but from different locations. If java.exe has been configured for monitoring, then by default, perfmon will report the availability and average resource usage of both the Java applications executing on the host. If say, one Java application goes down, then perfmon will not be able to indicate accurately which of the two Java applications is currently inaccessible. Therefore, to enable administrators to easily differentiate between processes with the same name, and to accurately determine which process is unavailable or resource-hungry, the eG agent should be configured to perform its process searches based on the process path and/or process arguments, and not just on the process name – in other words, the eG agent should be configured not to use perfmon.

    To achieve this, first, set the wide parameter to true. This will instruct the eG agent to not use perfmon to search for the configured process patterns. Once this is done, then, you can proceed to configure a processpattern that includes the process arguments and/or the process path, in addition to the process name. For instance, if both the Remote Access Connection Manager service and the Terminal Services service on a Windows host, which share  the same name – svchost - are to be monitored as two different processes, then your processpattern specification should be as follows:

    Terminal:C:\WINDOWS\System32\svchost -k DcomLaunch,Remote:C:\WINDOWS\system32\svchost.exe -k netsvcs 

    You can also use wildcard characters, wherever required. For instance, in the above case, your processpattern can also be:

    Terminal:*svchost -k DcomLaunch,Remote:*svchost.exe -k netsvcs

    Similarly, to distinctly monitor two processes having the same name, but operating from different locations, your specification can be:

    JavaC:c:\javaapp\java.exe,JavaD:d:\app\java.exe

    Note:

    • Before including process paths and/or arguments in your processpattern configuration, make sure that the wide parameter is set to true. If not, the test will not work.
    • If your processpattern configuration includes a process path that refers to the Program Files directory, then make sure that you do not a include a ~ (tilde) while specifying this directory name. For instance, your processpattern specification should not be say, Adobe:C:\Progra~1\Adobe\AcroRd32.exe.
  11. useps - This flag is applicable only for AIX LPARs. By default, on AIX LPARs, this test uses the tprof command to compute CPU usage of the processes on the LPARs. Accordingly, the useps flag is set to No by default. On some AIX LPARs however, the tprof command may not function properly (this is an AIX issue). While monitoring such AIX LPARs therefore, you can configure the test to use the ps command instead for metrics collection. To do so, set the useps flag to Yes.

    Note:

    Alternatively, you can set the AIXusePS flag in the [agent_settings] section of the eg_tests.ini file (in the <eg_install_dir>\manager\config directory) to yes (default: no) to enable the eG agent to use the ps command for CPU usage computations on AIX LPARs. If this global flag and the useps flag for a specific component are both set to no, then the test will use the default tprof command to compute CPU usage of processes executing on AIX LPARs. If either of these flags is set to yes, then the ps command will perform the CPU usage computations for such processes.

    In some high-security environments, the tprof command may require some special privileges to execute on an AIX LPAR (eg., sudo may need to be used to run tprof). In such cases, you can prefix the tprof command with another command (like sudo) or the full path to a script that grants the required privileges to tprof. To achieve this, edit the eg_tests.ini file  (in the <eg_install_dir>\manager\config directory), and provide the prefix of your choice against the AixTprofPrefix parameter in the [agent_settings] section. Finally, save the file.  For instance, if you set the AixTprofPrefix parameter to sudo, then the eG agent will call the tprof command as sudo tprof.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Processes running:

Number of instances of a process(es) currently executing on a host.

Number

If there is a significant change in the value of this measure, it is an indicator of a problem situation.

CPU utilization:

Percentage of CPU used by executing process(es) corresponding to the pattern specified.

Percent

A very high value could indicate that processes corresponding to the specified pattern are consuming excessive CPU resources.

Memory utilization:

For one or more processes corresponding to a specified set of patterns, this value represents the ratio of the resident set size of the processes to the physical memory of the host system, expressed as a percentage.

Percent

A sudden increase in memory utilization for a process(es) may be indicative of memory leaks in the application.