Cluster Nodes Status - Linux Test

A high availability Linux cluster is a group of Linux computers or nodes, storage devices that work together and are managed as a single system. With Linux clustering, an application is run on one node, and clustering software is used to monitor its operation. If the software detects an issue, it moves operation of the application to the secondary node in a process called failover. Since the secondary node shares storage with the primary, operation can continue quickly, meeting very short (seconds to minutes) recovery time and recovery point objectives.

In a Linux cluster, users can connect to any node and perform any operation. In case of a node failure, users will be able to reconnect to a different node, recover their topology and continue operation. Regardless of which node is serving user requests, at any point in time, administrators should be able to tell the operational state of each node in the cluster. Administrators should also be aware of frequent failover between the nodes. The Cluster Nodes Status - Linux test helps administrators in this regard!

This test auto-discovers the nodes in the target cluster and for each node, the test reports the current status. This test also reports whether/not each node is the owner node and if the owner node was changed recently. Using this test, administrators can isolate the nodes that have been offline or under maintenance for a longer duration and analyze the reason for the same. This test also helps administrators detect frequent failover between the nodes and can initiate steps to analyze why the nodes have been failing over frequently.

Target of the test : A Linux cluster

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each node in the Linux cluster being monitored.

Job Name Description

Test Period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port at which the specified host listens. By default, this is Null.

Report by Owner Node Only

If this flag is set to Yes, then, this test will report metrics only for the owner node and not for the other nodes in the cluster. On the other hand, if the flag is set to No, then, it indicates that the test will report metrics for all the nodes in the cluster. By default, this flag is set to No.

Use SUDO

By default, this flag is set to Yes, indicating that the test uses sudo command to collect the daemon-related metrics. If this flag is set to No, then the test will not collect the metrics using sudo command.

SUDO Path

This parameter is relevant only when the Use SUDO parameter is set to Yes. By default, the SUDO Path is set to none. This implies that the sudo command is in its default location - i.e., in the /usr/bin or /usr/sbin folder of the target host. In this case, once the Use SUDO flag is set to Yes, the eG agent automatically runs the sudo command from its default location to allow access to the daemon process. However, if the sudo command is available in a different location in your environment, you will have to explicitly specify the full path to the sudo command in this text box to enable the eG agent to run the sudo command.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Node status

Indicates the current status of this node.

 

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Online 100
Standby 75

Maintenance

50

Offline

0

Note:

By default, this measure can report the Measure values mentioned above while indicating the current state of each node. However, the graph of this measure is indicated using the numeric equivalents as specified in the above-mentioned table.

Has the owner node been changed?

Indicates whether/not the owner node has been changed during the last measurement period.

 

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Yes 100
No 0

Note:

By default, this measure can report the Measure values mentioned above while indicating whether/not the owner node has been changed. However, the graph of this measure is indicated using the numeric equivalents i.e., 0 or 100.

The detailed diagnosis of this measure reveals the name of owner node.

Is owner node?

Indicates whether/not this node is the owner node.

 

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Yes 100
No 0

Note:

By default, this measure can report the Measure values mentioned above while indicating whether this node is owner node. However, the graph of this measure is indicated using the numeric equivalents i.e., 0 or 100.