eG Helper Process Test
To ensure enterprise-class monitoring, the eG manager includes the capability to monitor its various components and to recover from failure of these components. When the eG manager is started, a separate eG recovery process is started. This process is called eGmon. Likewise, when the eG agent is started, a recovery process named eGagentmon also starts simultaneously.
The eGmon process periodically attempts to connect to the eG manager, access the various components of the manager, including the eG database. If it detects any problems during such access, the recovery process attempts to perform further diagnosis. The specific actions performed by the recovery process are as follows:
- If the eG manager is not accessible, the recovery process attempts to restart the eG manager. If it fails to restart the eG manager thrice in succession, the recovery process generates an alert message to the eG administrator (using the MAIL SENDER ID specified in the Mail Configuration settings of the administration interface).
- If the eG manager is accessible, the recovery process tests the connections from the eG manager to the database server that it uses. In the event it detects problems, it alerts the administrator of potential problems with the database server access. By connecting directly to the database server (i.e., without using any other eG manager components), the recovery process further determines whether the database access problem is being caused either because of a database failure or because the eG manager's pool of database connections is not sufficient to handle the current load on the manager.
When the eG manager is stopped manually, the eG recovery process is also shutdown.
In the same way, the eGagentmon process attempts to connect to the eG agent, and upon detection of accessibility issues, restarts the agent. However, note that if the eG agent is stopped manually, the agent recovery process is also shutdown.
This test reports the health of the eGmon and eGagentmon processes. Using this test, you can determine whether these helper processes are running or not, and if running, whether/not they are performing the checks that they are programmed to perform at pre-configured intervals. This way, you can be proactively alerted to the inadvertent termination of these critical help processes and errors in their operations.
Target of the test : The eG Manager
Agent deploying the test : An internal/remote agent
Outputs of the test : One set of results for the eGmon and eGagentmon processes.
Parameter | Description |
---|---|
Test period |
How often should the test be executed . |
Host |
The host for which the test is to be configured. |
Port |
The port number at which the specified host listens. |
JMX Remote Port |
Here, specify the port at which the JMX listens for requests from remote hosts. In the <EG_MANAGER_INSTALL_DIR>\manager directory (on Windows; on Unix, this will be the /opt/egurkha/manager directory) of the eG manager, you will find a management.properties file. Set the port defined against the com.sun.management.jmxremote.port parameter of the file as the JMX Remote Port. |
User, Password, and Confirm Password |
By default, JMX requires no authentication or security. Therefore, the User, Password , and Confirm Password parameters are set to none by default. |
JNDIName |
The JNDIName is a lookup name for connecting to the JMX connector. By default, this is jmxrmi. If you have registered the JMX connector in the RMI registry using a different lookup name, then you can change this default value to reflect the same. |
JMX Provider |
This test uses a JMX Provider to access the MBean attributes of the eG manager and collect metrics. Specify the package name of this JMX Provider here. By default, this is set to com.sun.jmx.remote.protocol. |
Timeout |
Specify the duration (in seconds) for which this test should wait for a response from the eG manager. If there is no response from the eG manager beyond the configured duration, the test will timeout. By default, this is set to 240 seconds. |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation | ||||||
---|---|---|---|---|---|---|---|---|---|
Process count |
Indicates the number of instances of this process that is currently running. |
Number |
The value 1 is desired for this measure. Any value above 1 is a sign that more instances of a process are unnecessarily running and draining resources. Use the detailed diagnosis of this measure to know the process ID of the additional processes, so that you can kill them to conserve resources. |
||||||
Time since last check |
Indicates the time (in minutes) that has elapsed since this process last checked the connection to the eG manager or agent (as the case may be) |
Mins |
This measure is applicable only for egmon process. Typically, this should be the same as or close to the frequency configured for the check in the eG manager or agent’s (as the case may be) configuration files. If not, it could indicate that the process is not functioning as per the configured schedule, and could be a cause for concern. |
||||||
Has process restarted ? |
Indicates whether/not this process has restarted. |
|
The values that this measure can report and their corresponding numeric values have been listed in the table below:
Note: By default, the test reports the Measure Values listed in the table above to indicate whether a function has been set to run as a separate process or not. In the graph of this measure however, the same is represented using the numeric equivalents only. |