eG Helper Process Test

To ensure enterprise-class monitoring, the eG manager includes the capability to monitor its various components and to recover from failure of these components. When the eG manager is started, a separate eG recovery process is started. This process is called eGmon. Likewise, when the eG agent is started, a recovery process named eGagentmon also starts simultaneously.

The eGmon process periodically attempts to connect to the eG manager, access the various components of the manager, including the eG database. If it detects any problems during such access, the recovery process attempts to perform further diagnosis. The specific actions performed by the recovery process are as follows:

  • If the eG manager is not accessible, the recovery process attempts to restart the eG manager. If it fails to restart the eG manager thrice in succession, the recovery process generates an alert message to the eG administrator (using the MAIL SENDER ID specified in the Mail Configuration settings of the administration interface).
  • If the eG manager is accessible, the recovery process tests the connections from the eG manager to the database server that it uses. In the event it detects problems, it alerts the administrator of potential problems with the database server access.  By connecting directly to the database server (i.e., without using any other eG manager components), the recovery process further determines whether the database access problem is being caused either because of a database failure or because the eG manager's pool of database connections is not sufficient to handle the current load on the manager.

When the eG manager is stopped manually, the eG recovery process is also shutdown.

In the same way, the eGagentmon process attempts to connect to the eG agent, and upon detection of accessibility issues, restarts the agent. However, note that if the eG agent is stopped manually, the agent recovery process is also shutdown.

This test reports the health of the eGmon and eGagentmon processes. Using this test, you can determine whether these helper processes are running or not, and if running, whether/not they are performing the checks that they are programmed to perform at pre-configured intervals. This way, you can be proactively alerted to the inadvertent termination of these critical help processes and errors in their operations.

Target of the test : The eG Manager

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the eGmon and eGagentmon processes.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed .

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens.

JMX Remote Port

Here, specify the port at which the JMX listens for requests from remote hosts. In the <EG_MANAGER_INSTALL_DIR>\manager directory (on Windows; on Unix, this will be the /opt/egurkha/manager directory) of the eG manager, you will find a management.properties file. Set the port defined against the com.sun.management.jmxremote.port parameter of the file as the JMX Remote Port.

User, Password, and Confirm Password

By default, JMX requires no authentication or security. Therefore, the User, Password , and Confirm Password parameters are set to none by default.

JNDIName

The JNDIName is a lookup name for connecting to the JMX connector. By default, this is jmxrmi. If you have registered the JMX connector in the RMI registry using a different lookup name, then you can change this default value to reflect the same.

JMX Provider

This test uses a JMX Provider to access the MBean attributes of the eG manager and collect metrics. Specify the package name of this JMX Provider here. By default, this is set to com.sun.jmx.remote.protocol.

Timeout

Specify the duration (in seconds) for which this test should wait for a response from the eG manager. If there is no response from the eG manager beyond the configured duration, the test will timeout. By default, this is set to 240 seconds.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Process count

Indicates the number of instances of this process that is currently running.

Number

The value 1 is desired for this measure. Any value above 1 is a sign that more instances of a process are unnecessarily running and draining resources. Use the detailed diagnosis of this measure to know the process ID of the additional processes, so that you can kill them to conserve resources.

Time since last check

Indicates the time (in minutes) that has elapsed since this process last checked the connection to the eG manager or agent (as the case may be)

Mins

Typically, this should be the same as or close to the frequency configured for the check in the eG manager or agent’s (as the case may be) configuration files. If not, it could indicate that the processes are not functioning as per the configure schedule, and could be a cause for concern.  

Has process restarted ?

Indicates whether/not this process has restarted.

 

The values that this measure can report and their corresponding numeric values have been listed in the table below:

Measure Value

Numeric Value

Yes

1

No

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate whether a function has been set to run as a separate process or not. In the graph of this measure however, the same is represented using the numeric equivalents only.