eG Cluster Test

To ensure high availability of the eG monitoring solution, eG Enterprise offers a licensed Redundant Manager option. If the eG license enables this capability, then two managers can be setup to operate in an Active-Active or an Active-passive manager cluster – i.e., a secondary manager can act as an active or passive standby for the primary manager. In the event of the failure of the primary, the secondary will automatically assume the primary’s role and perform all the functions of the primary – this includes receiving performance data from all eG agents, correlating the metrics, performing state computations, sending out email/SMS alerts (if configured) to users, and providing real-time performance and problem updates via the eG management console. Since this fail over occurs seamlessly, eG administrators have no way of figuring out if the eG manager being used is indeed operating in a redundant cluster, and if so, whether it is the primary or the secondary manager of the cluster. Moreover, during the period of unavailability of the primary, the secondary stores the performance metrics it collects to a local data folder; when the primary comes back up, the secondary automatically replicates this data to the primary. The maximum capacity of this data folder is configurable. To avoid data loss, administrators should periodically check whether/not the max size setting of the data folder is sufficient; for this, they need to closely track the growth in size of the data folder. All this is possible using the eG Cluster test.

This test periodically checks whether the eG manager is operating in a cluster, and if so, reports what type of cluster it is. In addition, the test also reveals whether the eG manager being monitored is the primary or secondary manager in the cluster. Regardless of manager type, the test reports the number of agents that are explicitly assigned to the manager and the number of agents that are actually reporting to the manager; this way, the test points administrators to those agents that are mapped to the manager but are not actively reporting metrics and helps them initiate investigations in this regard. The test also enables administrators to track the usage of the data folder and figure out if the maximum amount of data that can be stored in that folder needs to be increased to avoid data loss during fail-over.

Target of the test : The eG Manager

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the eG manager being monitored.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed .

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens.

JMX Remote Port

Here, specify the port at which the JMX listens for requests from remote hosts. In the <EG_MANAGER_INSTALL_DIR>\manager directory (on Windows; on Unix, this will be the /opt/egurkha/manager directory) of the eG manager, you will find a management.properties file. Set the port defined against the com.sun.management.jmxremote.port parameter of the file as the JMX Remote Port.

User, Password, and Confirm Password

By default, JMX requires no authentication or security. Therefore, the User, Password , and Confirm Password parameters are set to none by default.

JNDIName

The JNDIName is a lookup name for connecting to the JMX connector. By default, this is jmxrmi. If you have registered the JMX connector in the RMI registry using a different lookup name, then you can change this default value to reflect the same.

JMX Provider

This test uses a JMX Provider to access the MBean attributes of the eG manager and collect metrics. Specify the package name of this JMX Provider here. By default, this is set to com.sun.jmx.remote.protocol.

Timeout

Specify the duration (in seconds) for which this test should wait for a response from the eG manager. If there is no response from the eG manager beyond the configured duration, the test will timeout. By default, this is set to 240 seconds.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cluster type of the manager 

Indicates whether/not the eG manager is operating within a redundant cluster, and if it is, then the type of cluster it is.

 

The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Not supported 0
Active-Active 1
Active-Passive 2

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current cluster type of the manager. In the graph of this measure however, the same will be represented using the numeric equivalents only.

Is this primary manager ?

Indicates whether/not this eG manager is the primary manager in the cluster.

 

This measure will not be reported if the ‘Cluster type of the manager’ is ‘Not Supported’.

The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the manager is the primary manager. In the graph of this measure however, the same will be represented using the numeric equivalents only.

Is this manager running ?

Indicates whether/not this manager is currently running.

 

The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the manager is running. In the graph of this measure however, the same will be represented using the numeric equivalents only.

Is other manager in the cluster running ?

Indicates whether/not the other manager in the cluster is currently running or not.

 

The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the other manager in the cluster is running. In the graph of this measure however, the same will be represented using the numeric equivalents only.

Data storage on this manager

Indicates whether/not data is currently stored in this manager for transmission to the other manager in the cluster.

 

 

The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
No 0
Yes 1
Uploading 2

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not data is stored in the manager. In the graph of this measure however, the same will be represented using the numeric equivalents only.

Files stored for the other manager

Indicates the number of files that are currently waiting to be sent by this manager to the other manager.

Number

The amount of data that can be stored by a manager for transmission to other managers is controlled by two configuration settings - maxStoragePerFile and filesPerManager - that are present in the file eg_managers.ini located in the <EG_INSTALL_DIR>\manager\config directory.

The setting maxStoragePerFile defines the amount of data (in MB) that can be stored in each temporary file that is used to store data temporarily for transmission to a manager. An eG manager can store data in multiple files for transmission to another manager. Multiple files are used for storage (rather than a single file) to minimize data read/write operations to memory for transmission to the other manager. The filesPerManager setting defines the maximum number of data files per manager that are used for temporary storage of data.

By default, the maxStoragePerFile value is 0, and the filesPerManager is 0. This implies that a manager does not save data it receives from agents directly for transmission to another manager. If the maxStoragePerFile is 10 and the filesPerManager is 20, then 200MB of data can be saved for transmission to another manager.

If the value of these two measures are consistently close to the maxStoragePerFile and filesPerManager settings of the monitored eG manager, it is a clear indication that a large volume of data is being generated and readied for transmission by that manager, but its temporary storage is not been tuned adequately to handle this load. If these settings are not changed accordingly, it may result in significant data loss in the event of a manager failure.

Data stored for the other manager

Indicates the amount of data currently waiting to be sent by this manager to the other manager.

MB

Agents assigned to the manager

Indicates the number of agents that have been explicitly assigned to this manager.

Number

 

Agents reporting to this manager

Indicates the number of agents that are currently reporting metrics to this manager.

Number

Test data in queue

Indicates the number of test data that are currently waiting to be sent by this manager to the other manager.

Number

 

DDD data in queue

Indicates the number of DDD data that are currently waiting to be sent by this manager to the other manager.

Number

 

Maximum number of allocated threads

Indicates the maximum number of threads allocated for processing the test and DDD data that are currently waiting to be sent by this manager to the other manager.

Number

 

Test data thread usage

Indicates the percentage of threads used for processing the test data to be sent by this manager to the other manager.

Percent

This measure is computed as a ratio of the value of the Test data in queue measure and the Maximum number of thread allocated measure. A high value for this measure indicates that the eG manager is using more number of threads to process the test data. It indicates that the thread count on the manager is rapidly running out and requests for processing the DDD data may not be serviced or may be deferred until the number of active requests for processing test data drops.

In such cases, you should consider increasing maximum number of threads allocated to handle the test and DDD data. However, exercise caution when altering the maximum thread count, since increasing the thread count may consume too much of memory resources leading to eG manager slowdowns. Likewise, if the maximum thread count is set too low, it will cause requests to block or timeout.

DDD data thread usage

Indicates the percentage of threads used for processing the DDD data to be sent by this manager to the other manager.

Percent

This measure is computed as a ratio of the value of the DDD data in queue measure and the Maximum number of thread allocated measure. A high value for this measure indicates that the eG manager is using more number of threads to process the DDD data. It indicates that the thread count on the manager is rapidly running out and requests for processing the test data may not be serviced or may be deferred until the number of active requests for processing test data drops.

In such cases, you should consider increasing maximum number of threads allocated to handle the test and DDD data. However, exercise caution when altering the maximum thread count, since increasing the thread count may consume too much of memory resources leading to eG manager slowdowns. Likewise, if the maximum thread count is set too low, it will cause requests to block or timeout.