Cluster Overview - Linux Test

A Linux cluster is a connected array of Linux computers or nodes that work together and can be viewed and managed as a single system. The redundancy of cluster components eliminates single points of failure. The nodes of a Linux cluster may be servers, storage devices or virtualized containers. Compared to a single computer, a Linux cluster can provide faster processing speed, larger storage capacity, better data integrity, greater reliability and wider availability of resources. Clusters are usually dedicated to specific functions, such as load balancing, high availability, high performance, storage or large-scale processing.

Pacemaker is an open source high-availability cluster resource manager software that runs on a set of nodes. Pacemaker provides a framework to manage the availability of resources. Resources are services on a host that needs to be kept highly available. Pacemaker is responsible to provide maximum availability for your cluster services/resources by detecting and recovering from node and resource-level failures. It uses messaging and membership capabilities provided by Corosync to keep the resource available on any of the cluster nodes. The pacemaker supports a maximum of 16 numbers of nodes per cluster. If the nodes are offline or under maintenance for a longer duration, then, delays may be noticed during failover. This may lead to poor user experience and loss of data in some cases. To avoid this, it is essential to monitor the overall status of the target cluster round the clock! The Cluster Overview - Linux test helps administrators perform a round the clock vigil on the target cluster.

This test reports the current status of the target linux cluster and the total number of nodes configured in the cluster. Using this test, administrators can precisely identify the number of Pacemaker nodes and Pacemaker remote nodes that are online/offline, in standby mode and are under maintenance. This test also reports the available resource groups in the cluster and the resources that are available in the cluster. Using this test, administrators can isolate those pacemaker nodes that are frequently offline and those that are frequently put under maintenance.

This test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick the desired Component type, set Performance as the Test type, choose the test from the DISABLED TESTS list, and click on the << button to move the test to the ENABLED TESTS list. Finally, click the Update button.

Target of the test : A Linux cluster

Agent deploying the test : An internal agent

Outputs of the test : One set of results for the Linux cluster being monitored.

Job Name Description

Test Period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port at which the specified host listens. By default, this is Null.

Report by Owner Node Only

If this flag is set to Yes, then, this test will report metrics only for the owner node and not for the other nodes in the cluster. On the other hand, if the flag is set to No, then, it indicates that the test will report metrics for all the nodes in the cluster. By default, this flag is set to No.

Use SUDO

By default, this flag is set to Yes, indicating that the test uses sudo command to collect the daemon-related metrics. If this flag is set to No, then the test will not collect the metrics using sudo command.

SUDO Path

This parameter is relevant only when the Use SUDO parameter is set to Yes. By default, the SUDO Path is set to none. This implies that the sudo command is in its default location - i.e., in the /usr/bin or /usr/sbin folder of the target host. In this case, once the Use SUDO flag is set to Yes, the eG agent automatically runs the sudo command from its default location to allow access to the daemon process. However, if the sudo command is available in a different location in your environment, you will have to explicitly specify the full path to the sudo command in this text box to enable the eG agent to run the sudo command.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cluster status

Indicates the current status of the cluster.

 

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Ok 100
Abnormal 0

Note:

By default, this measure can report the Measure values mentioned above while indicating the current status of the cluster. However, the graph of this measure is indicated using the corresponding numeric equivalents i.e., 0 or 100.

Total nodes

Indicates the total number nodes configured in the cluster.

Number

 

Pacemaker online nodes

Indicates the number of Pacemaker nodes that are currently online in the cluster.

Number

The detailed diagnosis of this measure reveals the name of the Pacemaker nodes that are online.

Pacemaker standby nodes

Indicates the number Pacemaker nodes that are in standby mode in the cluster.

Number

The detailed diagnosis of this measure displays the name of the Pacemaker nodes that are in standby mode.

Pacemaker nodes in maintenance

Indicates the number Pacemaker nodes that are in maintenance mode in the cluster.

Number

The detailed diagnosis of this measure displays the name of the Pacemaker nodes that are in maintenance mode.

Pacemaker offline nodes

Indicates the number of Pacemaker nodes that are offline in the cluster.

Number

The detailed diagnosis of this measure reveals the name of the Pacemaker nodes that are offline.

Pacemaker remote online nodes

Indicates the number of Pacemaker remote nodes that are online in the cluster .

Number

The detailed diagnosis of this measure displays the name of the Pacemaker remote nodes that are online.

Pacemaker remote standby nodes

Indicates the number Pacemaker remote nodes that are in standby mode in the cluster.

Number

The detailed diagnosis of this measure displays the name of the Pacemaker remote nodes that are in standby mode.

Pacemaker remote nodes in maintenance

Indicates the number Pacemaker remote nodes that are in maintenance mode in the cluster.

Number

The detailed diagnosis of this measure displays the name of the Pacemaker remote nodes that are in maintenance mode.

Pacemaker remote offline nodes

Indicates the number Pacemaker remote nodes that are offline in the cluster.

Number

The detailed diagnosis of this measure displays the name of the Pacemaker remote nodes that are offline.

Available resource groups

Indicates the number of available resource groups in the cluster.

Number

The detailed diagnosis of this measure lists the name of all the available resource groups.

Configured resources

Indicates the total number of resources available in the resource groups.

Number

The detailed diagnosis of this measure lists the name of the configured resources.