Processes Test

Application processes can be identified based on specific regular expression patterns. For example, web server processes can be identified by the pattern *httpd*, while DNS server processes can be specified by the pattern *named* where * denotes zero or more characters. For each such pattern, the process test reports a variety of CPU and memory statistics.

Target of the test : Any application server

Agent deploying the test : An internal agent

Outputs of the test : One set of results per process pattern specified

Configurable parameters for the test
TEST PERIOD - How often should the test be executed Host - The host for which the test is to be configured Port - The port to which the specified host listens Process - In the Process text box, enter a comma separated list of names:pattern pairs which identify the process(es) associated with the server being considered. processName is a string that will be used for display purposes only. processPattern is an expression of the form - expr or expr or expr or expr or expr1expr2... or expr1expr2, etc. A leading '' signifies any number of leading characters, while a trailing '' signifies any number of trailing characters. The pattern(s) used vary from one application to another and must be configured per application. For example, for an iPlanet application server (Nas_server), there are three processes named kcs, kjs, and kxs associated with the application server. For this server type, in the Process text box, enter "kcsProcess:kcs, kjsProcess:kjs, kxsProcess:kxs, where * denotes zero or more characters. Other special characters such as slashes (\) can also be used while defining the process pattern. For example, if a server’s root directory is /home/egurkha/apache and the server executable named httpd exists in the bin directory, then, the process pattern is “/home/egurkha/apache/bin/httpd”. Note: The process parameter supports process patterns containing the ~ character. To determine the process pattern to use for your application, on Windows environments, look for the process name(s) in the Task Manager -> Processes selection. To determine the process pattern to use on Unix environments, use the ps command (e.g., the command "ps -e -o pid,args" can be used to determine the processes running on the target system; from this, choose the processes of interest to you.) Also, while monitoring processes on Windows, if the wide parameter of this test is set to true, then your process patterns can include the full path to the process and/or the arguments supported by the process. For instance, your processpattern specification can be as follows: Terminal:C:\WINDOWS\System32\svchost -k DcomLaunch,Remote:C:\WINDOWS\system32\svchost.exe -k netsvcs To save the time and effort involved in such manual process specification, eG Enterprise offers an easy-to-use auto-configure option in the form of a View/Configure button that is available next to the PROCESS text box. Refer to Auto-configuring the Process Patterns to be Monitored to know how to use the auto-configure option. ignorecase – This parameter is applicable to Unix environments alone. By default, this parameter is set to Yes, indicating that the test will monitor the process names/patterns configured against the process parameter in a case-insensitive manner. In other words, the test will report the count and resource usage of all processes that match the configured process name/pattern, even if their cases do not match. For instance, if the process parameter is configured with Apache:apache, then the test will monitor the process named apache and the one named APACHE by default. If you, on the other hand, want process monitoring to be performed in a case-sensitive manner, then set this flag to No. user - By default, this parameter has a value "none"; this means that the test monitors all processes that match the configured patterns, regardless of the user executing them. If you want the test to monitor the processes for specific users alone, then, on Unix platforms, specify a comma-separated list of users to be monitored in the user text box. For instance: john,elvis,sydney While monitoring Windows hosts on the other hand, your user configuration should be a comma-separated list of "domain name-user name" pairs, where every pair is expressed in the following format: Domainname\Username. For example, to monitor the processes of user john and elviswho belong to domain mas, your user specification should be: mas\john,mas\elvis. Also, on a Windows host, you will find system processes running on the following user accounts: SYSTEM, LOCAL SERVICE, and NETWORK SERVICE. While configuring these user accounts, make sure the Domainame is always NT AUTHORITY. In this case therefore, your user specification will be: NT AUTHORITY\SYSTEM,NT AUTHORITY\LOCAL SERVICE,NT AUTHORITY\NETWORK SERVICE. If multiple processes are configured for monitoring and multiple users are also configured, then the test will check whether the first process is run by the first user, the second process by the second user, and so on. For instance, if the processes configured are java:java.exe,apache:httpd and the users configured are john,elvis, then the test will check whether user john is running the process java, and user elvis is running the process apache. Similarly, if multiple processes are configured, but a single user alone is configured, then the test will check whether the specified user runs each of the configured processes. However, if you want to check whether a single process, say java.exe, is run by multiple users - say, james and jane - then, you have to do the following: Your user specification should be: james,jane (if the target host is a Unix host), or <Domainname>\james,<Domainname>\jane (if the target host is a Windows host) Your process configuration should be: Process1:java.exe,Process2:java.exe. The number of processes in this case should match the number of users. Such a configuration will ensure that the test checks for the java.exe process for both the users, james and jane. CORRECT - Increased uptime and lower mean time to repair are critical to ensuring that IT infrastructures deliver a high quality of service to users. Towards this end, the eG Enterprise embeds an optional auto-correction capability that enables eG agents to automatically correct problems in the environment, as soon as they occur. With this capability, as and when an abnormal situation is detected, an eG agent can initiate corrective actions automatically to resolve the problem. Automatic correction without the need for manual intervention by IT operations staff reduces service downtime and improves operational efficiency. By default, the auto-correction capability is available in the eG Enterprise for the Processes running measure of Processes test, and the Service availability measure of WindowsServices test. The eG Enterprise includes a default auto-correction script for Processes test. When a process that has been configured for monitoring stops, this script automatically executes and starts the process. To enable the auto-correction capability for the Processes test, first, select the TRUE option against the CORRECT parameter in this page (by default, FALSE will be selected here). ALARMTYPE - Upon selecting the true option, three new parameters, namely, ALARMTYPE, USERPARAMS, and CORRECTIVESCRIPT will appear. You can set the corrective script to execute when a specific type of alarm is generated, by selecting an option from the ALARMTYPE list box. For example, if the Critical option is chosen from the ALARMTYPE list box, then the corrective script will run only when a critical alarm for the Processes test is generated. Similarly, if the Critical/Major option is chosen, then the corrective script will execute only when the eG Enterprise system generates critical or major alarms for the Processes test. In order to ensure that the corrective script executes regardless of the alarm type, select the Critical/Major/Minor option. USERPARAMS - The user-defined parameters that are to be passed to the corrective script are specified in the USERPARAMS text box. One of the following formats can be applied to the USERPARAMS specification: exec@processName:command: In this specification, processName is the display name of the process pattern specified against the PROCESS parameter, and command is the command to be executed by the default script when the process(es) represented by the processName stops. For example, assume that the PROCESS parameter of Processes test has been configured in the following manner: Apache:/opt/egurkha/manager/apache/bin/httpd,Tomcat:javatomcat, where Apache and Tomcat are the processNames or display names of the configured patterns. If auto-correction is enabled for these processes, then the USERPARAMS specification can be as follows: exec@Apache:/opt/egurkha/manager/apache/bin/apachectl start,Tomcat: /opt/tomcat/bin/catalina.sh start This indicates that if the processes configured under the processName "Apache" stop (i.e. /opt/egurkha/manager/apache/bin/httpd), then the script will automatically execute the command "/opt/egurkha/manager/apache/bin/apachectl start" to start the processes. Similarly, if the "Tomcat" processes (i.e. javatomcat) stop, the script will execute the command "/opt/tomcat/bin/catalina.sh start" to start the processes. command: In this specification, command signifies the command to be executed when any of the processes configured for monitoring, stop. Such a format best suits situations where only a single process has been configured for monitoring, or, a single command is capable of starting all the configured processes. For example, assume that the PROCESS parameter has been configured to monitor IISWebSrv:inetinfo. Since only one process requires monitoring, the first format need not be used for configuring the USERPARAMS. Therefore, simplify specify the command, "net start World Wide Web Publishing Service". Note: The USERPARAMS specification should be placed within double quotes if this value includes one or more blank spaces (eg.,"Apache:/opt/egurkha/bin/apachectl start"). Note that if a processName configured in the PROCESS parameter does not have a corresponding entry in USERPARAMS (as discussed in format 1), then the auto- correction capability will not be enabled for these processes. CORRECTIVESCRIPT - Specify none in the CORRECTIVESCRIPT text box to use the default auto-correction script. Administrators can build new auto-correction capabilities to address probable issues with other tests, by writing their own corrective scripts. To know how to create custom auto-correction scripts, refer to the eG User Manual. wide - This parameter is valid on Solaris, Windows, and Linux systems only. On Solaris systems (before v11), if the value of the wide parameter is Yes, the eG agent will use usr/ucb/ps instead of /usr/bin/ps to search for processes executing on the host. In Solaris 11, the eG agent uses the /usr/bin/ps auxwww command to perform the process search. The /usr/ucb/ps and the /usr/bin/ps auxwww commands provide a long output (> 80 characters), whereas /usr/bin/ps only outputs the first 80 characters of the process path and its arguments. However, some Solaris systems are configured with tightened security, which prevents the usr/ucb/ps and/or the /usr/bin/ps auxwwwcommand to be executed by any and every user to the system - in other words, only pre-designated users will be allowed to execute this command. The sudo (superuser do) utility (see http://www.gratisoft.us/sudo/) can be used to allow designated users to execute this command. If your system uses sudo to restrict access to the commands that return a long output, then set wide to Yes and then specify the value sudo for the keonizedservercmd parameter. This will ensure that not only does the agent use the /usr/ucb/ps and/or the /usr/bin/ps auxwww command (as the case may be) to monitor processes (like it would do if the wide parameter were set to be Yes), but it would also use sudo to execute this command. Note: If the Processes test on Solaris 11 fails, then do the following: Check whether the wide parameter is set to Yes. If so, then make sure that the keonizedservercmd parameter is set to sudo. If the test still fails, then look for the following error in the error_log file (that resides in the /opt/egurkha/agent/logs directory) on the eG agent host: ERROR ProcessTest: ProcessTest failed to execute [sudo: pam_authenticate: Conversation failure] The aforesaid error occurs if the sudo command prompts for a password at runtime. If you find such an error in the error_log file, then, open the sudoers file on the target host and append an entry of the following format to it: Defaults:<eG_Install_Username> !authenticate For instance, if eguser is the eG install user, then your entry will be: Defaults:eguser !authenticate This entry will make sure that you are no longer prompted for a password. Save the file and restart the eG agent. On Windows environments, by default, the eG agent uses perfmon to search for the processes that match the configured patterns. Accordingly, the wide parameter is set to false by default. Typically, a process definition in Windows includes the full path to the process, the process name, and process arguments (if any). Perfmon however scans the system only for process names that match the configured patterns – in other words, the process path and arguments are ignored by perfmon. This implies that if multiple processes on a Windows host have the same name as specified against processpattern, then perfmon will only be able to report the overall resource usage across all these processes; it will not provide any pointers to the exact process that is eroding the host’s resources. To understand this better, consider the following example. Typically, Windows represents any Java application executing on it as java.exe. Say, two Java applications are executing on a Windows host, but from different locations. If java.exe has been configured for monitoring, then by default, perfmon will report the availability and average resource usage of both the Java applications executing on the host. If say, one Java application goes down, then perfmon will not be able to indicate accurately which of the two Java applications is currently inaccessible. Therefore, to enable administrators to easily differentiate between processes with the same name, and to accurately determine which process is unavailable or resource-hungry, the eG agent should be configured to perform its process searches based on the process path and/or process arguments, and not just on the process name – in other words, the eG agent should be configured not to use perfmon. To achieve this, first, set the wide parameter to Yes. This will instruct the eG agent to not use perfmon to search for the configured process patterns. Once this is done, then, you can proceed to configure a processpattern that includes the process arguments and/or the process path, in addition to the process name. For instance, if both the Remote Access Connection Manager service and the Terminal Services service on a Windows host, which share the same name – svchost - are to be monitored as two different processes, then your processpattern specification should be as follows: Terminal:C:\WINDOWS\System32\svchost -k DcomLaunch,Remote:C:\WINDOWS\system32\svchost.exe -k netsvcs You can also use wildcard characters, wherever required. For instance, in the above case, your processpattern can also be: Terminal:svchost -k DcomLaunch,Remote:svchost.exe -k netsvcs Similarly, to distinctly monitor two processes having the same name, but operating from different locations, your specification can be: JavaC:c:\javaapp\java.exe,JavaD:d:\app\java.exe Note: Before including process paths and/or arguments in your processpattern configuration, make sure that the wide parameter is set to Yes. If not, the test will not work. If your processpattern configuration includes a process path that refers to the Program Files directory, then make sure that you do not a include a ~ (tilde) while specifying this directory name. For instance, your processpattern specification should not be say, Adobe:C:\Progra~1\Adobe\AcroRd32.exe. keonizedservercmd - On Solaris hosts, this test takes an additional KEONizedserverCmD parameter. Keon is a security mechanism that can be used with a multitude of operating systems to provide a centralized base for user account and password management, user access and inactivity control, system integrity checking, and auditing. If the Keon security model is in use on the Solaris host being monitored, then this test may require special user privileges for executing the operating system commands. In such a case, specify the exact command that the test is permitted to execute, in the KEONizedserverCmD text box. For example, if the keon command to be executed by the test is sudo, specify sudo in the KEONizedserverCMD text box. Alternatively, you can even specify the full path to the sudo command in the KEONIZEDSERVERCMD text box. On the other hand, if a Keon security model is not in place, then set the KEONIZEDSERVERCMD parameter to none. useglance - This flag applies only to HP-UX systems. HP GlancePlus/UX is Hewlett-Packards’s online performance monitoring and diagnostic utility for HP-UX based computers. There are two user interfaces of GlancePlus/UX -- Glance is character-based, and gpm is motif-based. Each contains graphical and tabular displays that depict how primary system resources are being utilized. In environments where Glance is run, the eG agent can be configured to integrate with Glance to pull out the process status and resource usage metrics from the HP-UX systems that are being monitored. By default, this integration is disabled. This is why the useglance flag is set to No by default. You can enable the integration by setting the flag to Yes. If this is done, then the test polls the Glance interface of HP GlancePlus/UX utility to pull out the desired metrics. useps - This flag is applicable only for AIX LPARs. By default, on AIX LPARs, this test uses the tprof command to compute CPU usage of the processes on the LPARs. Accordingly, the useps flag is set to No by default. On some AIX LPARs however, the tprof command may not function properly (this is an AIX issue). While monitoring such AIX LPARs therefore, you can configure the test to use the ps command instead for metrics collection. To do so, set the useps flag to Yes. Note: Alternatively, you can set the AIXusePS flag in the [AGENT_SETTINGS] section of the eg_tests.ini file (in the <EG_INSTALL_DIR>\manager\config directory) to yes (default: no) to enable the eG agent to use the ps command for CPU usage computations on AIX LPARs. If this global flag and the useps flag for a specific component are both set to no, then the test will use the default tprof command to compute CPU usage of processes executing on AIX LPARs. If either of these flags is set to yes, then the ps command will perform the CPU usage computations for such processes. In some high-security environments, the tprof command may require some special privileges to execute on an AIX LPAR (eg., sudo may need to be used to run tprof). In such cases, you can prefix the tprof command with another command (like sudo) or the full path to a script that grants the required privileges to tprof. To achieve this, edit the eg_tests.ini file (in the <EG_INSTALL_DIR>\manager\config directory), and provide the prefix of your choice against the AixTprofPrefix parameter in the [AGENT_SETTINGS] section. Finally, save the file. For instance, if you set the AixTprofPrefix parameter to sudo, then the eG agent will call the tprof command as sudo tprof. use top - This parameter is applicable only to Linux platforms. By default, this parameter is set to No. This indicates that, by default, this test will report process health metrics by executing the usr/bin/ps command on Linux. In some Linux environments however, this command may not function properly. In such cases, set the USE TOP parameter to Yes.This will enable this test to collect metrics using the /usr/bin/top command. ISPASSIVE – If the value chosen is Yes, then the server under consideration is a passive server in a cluster. No alerts will be generated if the server is not running. Measures will be reported as "Not applicable" by the agent if the server is not up.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Processes running:	Number of instances of a process(es) currently executing on a host.	Number	This value indicates if too many or too few processes corresponding to an application are executing on the host.
CPU utilization:	Percentage of CPU used by executing process(es) corresponding to the pattern specified.	Percent	A very high value could indicate that processes corresponding to the specified pattern are consuming excessive CPU resources.
Memory utilization:	For one or more processes corresponding to a specified set of patterns, this value represents the ratio of the resident set size of the processes to the physical memory of the host system, expressed as a percentage.	Percent	A sudden increase in memory utilization for a process(es) may be indicative of memory leaks in the application.

Note:

The default configurations of the Processes test are applicable for JRun server 4.0. However, if you are monitoring a JRun server 3.0, you would have to modify the default configurations.
In JRun server 3.0, 2 processes are associated with the admin and default servers. They are, "jrun.exe" and "javaw.exe" respectively in Windows and "jrun" and "javaw" in Unix.
Similarly, the JRun Server 4.0 has two default processes, one running for the admin server and the other for the default server. These processes are, namely, "jrun.exe" in Windows and "jrun" in Unix. When you add a new server instance, these processes get created automatically with the same names as mentioned above.
Special characters that are not allowed as part of your manual pattern specifications are as follows:
- ` (Grave Accent)
- | (Vertical bar)
- < (less than)
- > (greater than)
- ~ (tilda)
- @ (at)
- # (hash)
- % (Percent)

Note:

Administrators can extend the built-in auto-correction capabilities to address probable issues with the other measures of the Processes test, by writing their own corrective scripts for the same. The custom-defined script can be associated with the Processes test in the same manner discussed above.
The name of the custom-defined script should be of the following format: InternalTestName_InternalMeasureName. For example, a script that is written to correct problems with the CPU utilization measure (of the Processes test) should be named as "ProcessTest_Cpu_util”, where ProcessTest is the internal name of the Processes test, and Cpu_util is the internal name for the CPU utilization measure. To know the internal names of tests and measures, use any of the eg_lang*.ini file in the <EG_INSTALL_DIR>\manager\config directory. The script extensions will differ according to the operating system on which it will execute. The extensions supported by Windows environments are: .bat, .exe, .com, and .cmd. Scripts to be executed on Unix environments do not require any extension. The most commonly used extension is .sh.
At any given point of time, only one script can be specified in the CORRECTIVESCRIPT text box.
As already stated, the sample script for Processes test will be available for every operating system. If the script is uploaded to the eG manager once for an operating system, it will automatically apply to all the agents executing on the same operating system. For example, say that an environment comprises of 3 agents, all executing on Windows 2000 environments. While configuring the Processes test for one of the agents, if the administrator uploads the sample script, then he/she will not have to repeat the process for the other 2 agents.
Once the eG agent downloads a corrective script from the eG manager, any changes made to the script in the manager side will not be reflected in the agent side, immediately. This is because, the eG agent checks the manager for the existence of an updated version of the corrective script, only once a day. If an update is available, the agent downloads the same and overwrites the script that pre-exists.

Note:

The Processes test of LDAP servers takes an additional parameter named ispassive. If the value chosen against this parameter is Yes, then the LDAP server under consideration is a passive server in an LDAP cluster. No alerts will be generated if the server is not running. Measures will be reported as “Not applicable” by the agent if the server is not up.

Auto-configuring the Process Patterns to be Monitored

To save the time and effort involved in manual process specification, eG Enterprise offers an easy-to-use auto-configure option in the form of a button that is available next to the PROCESS text box.

To auto-configure the processes to be monitored, do the following:

Click on the icon next to the process text box in the Processes test configuration page (see Figure 79).

Figure 79 : Configuring the Processes test

Note:

The icon will appear only if the following conditions are fulfilled:
- The Processes test must be executed in an agent-based manner.
- The eG agent executing the test should be of version 5.2 or above.
- In case the eG manager in question is part of a redundant manager setup, then the agent executing the test must be reporting metrics to the primary manager only.
When the icon is clicked, a process configuration page will appear (see Figure 80).

Figure 80 : Auto-configuring the processes to be monitored
Upon clicking the Get Processes button in the process configuration page (see Figure 80), a pop up window with a list of processes that are running on the host will be displayed (see Figure 81).

Figure 81 : List of auto-discovered processes

Note:

The processes that are already configured for monitoring will not be listed in Figure 81.
By default, Figure 81 provides a 'concise' view of the process list - i.e., only the process names will be listed in the pop-up window, and not the detailed description of the processes. You can click on the Detailed View button in the pop up window to switch to the detailed view (see Figure 82).

Figure 82 : The detailed view of processes
As you can see, in the detailed view, the complete process path and process arguments accompany each auto-discovered process.
Regardless of the view you are in, select the process or list of processes that require monitoring and click the Submit button in the pop-up window. Note that you can select processes from both the views. If there are too many processes running in the host to choose from, you can narrow your search further by using the Search text box. Specify the whole/part of the process name to search for in this text box, and click the icon next to it.

Note:

The Processes test includes a wide flag that is set to Yes by default. In this case, your process specification can include the process path and arguments (if any). Therefore, if the wide flag is set to Yes, then, the eG agent will report metrics for the process(es) that are selected in both the concise manner and detailed manner. If the WIDE flag is set to No, the eG agent will collect metrics only for the process(es) that are selected in a concise manner.
Clicking the Submit button in the pop-up will automatically populate the name and pattern of the chosen process in the Process Name and Process Pattern columns available in the process configuration page (see Figure 83).

Figure 83 : One/Multiple auto-discovered processes configured for monitoring
You can add more name:pattern pairs in the process configuration page by clicking on the Add button present at the right corner of the page. To remove a specification that pre-exists, just click on the icon that corresponds to it. You can also modify contents of the Process Name and Process Pattern columns by clicking on the icon that corresponds to it.

Note:

Duplicate processes will appear in the list of processes pop-up, provided the process description is different - for instance, if a 'cmd.exe' process and a 'cmd.bat' process execute on the same host, then both processes will be listed as 'cmd' in the 'concise' view of the process list. If such duplicate processes are chosen for monitoring, then, each process will appear as a separate Process Name and Process Pattern pair in the process configuration page. To proceed, the user must enter a different name for each process using the icon, so that every distinct pattern can be identified in a unique manner.
If you want to clear all the processes that you have added for monitoring, select all the processes by simply clicking the check button provided in the header row and click on the icon at the right-corner of the table. This action will remove all the processes in single click. If you want to remove some of the processes, then select only those processes by checking the check buttons that correspond to them and click on the icon. This will remove the chosen processes alone.