Solace Redundancy by Node Test

Solace PubSub+ appliances can operate in high-availability (HA) redundant pairs for fault tolerance. Redundancy provides 1:1 appliance sparing to increase overall service availability. HA redundancy eliminates the potential for a single point of failure by allowing a network administrator to define two appliances as a redundant pair. If one of the appliances is taken out of service or fails, the other appliance automatically takes over responsibility for the clients typically served by the out-of-service appliance.

The redundancy feature is largely transparent to clients and other appliances in the network. Only the two appliances that are paired as mates require explicit configuration to take advantage of the feature. To support redundancy, each appliance uses a primary and backup virtual router. To enable the backup virtual router to assume the role of its mate’s primary virtual router when a failure occurs, the configuration of the virtual routers on each appliance must mirror one another. That is, the backup virtual routers must have the same configuration as the primary virtual routers they backup.

For an active/standby redundant pair, the primary virtual router is on the primary appliance, and the backup virtual router is on the standby appliance. If the primary appliance goes out of service, the backup virtual router of the standby appliance changes to an active state, and it provides service for clients and handles the data and messages that typically use the primary virtual router of the primary appliance that has gone out of service.

For an active/active redundant pair, the primary virtual routers on both appliances are active, but the backup virtual routers are idle. If one of the appliances in the redundant pair goes out of service, the backup virtual router of the inactive appliance changes to an active state, and it provides service for clients and handles the data and messages that typically use the primary virtual router of the appliance that is out of service.

If the target node in a redundant HA pair is down or if the redundancy configuration of the target node is in shutdown state, then, the target node will be unable to handle data and messages. This may lead to the non-delivery of messages to the clients which will affect the business delivery cycle. Also, administrators have to quickly identify whether the role of the target node had changed from primary virtual router to backup virtual router and vice versa, and identify where exactly the status of the primary virtual router/backup virtual router had faulted - is it the messagespool? or ADB? or flash memory module? or power module? or the routing interface? The Solace Redundancy by Node test helps administrators quickly identify the pain-points encountered by the target node.

Using this test, administrators can figure out the redundancy state, configuration state and the role of the target node. This test also throws light on when exactly the role of the target node changed from primary virtual router to backup virtual router and vice versa. This test also help administrators identify the exact module on the primary virtual router/backup virtual router that had faulted and was not ready - is it the messagespool? or ADB? or flash memory module? or power module? or the routing interface?

Target of the test : A Solace Cluster

Agent deploying the test : An external agent

Outputs of the test : One set of results for each node in the target cluster that is to be monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the target host for which this test is to be configured.

Port

Refers to the port at which the Solace Cluster listens to.

UserName, Password and Confirm Password

The eG agent uses SEMP API to collect metrics from all the nodes in the Solace Cluster. In order to enable the eG agent to access SEMP API and collect metrics, a user with read only privilege has to be created on all the nodes in the cluster that requires monitoring. If such a user does not pre-exist, you have to manually create a user with aforesaid privileges, for that, refer to: Creating a New User for Monitoring Solace PubSub+ Event Broker.

Specify the credentials of such a user against the User Name and Password parameters. Confirm the Password by retyping it in the Confirm Password text box.

Total Cluster Nodes

Provide a comma-separated list of both the primary and backup nodes in the cluster that requires monitoring on this text box. You should specify the nodes in the following format: HOSTNAME1:PORT1,HOSTNAME2:PORT2,... . For example, 172.16.8.233#8080,172.16.8.235#8080,....

SSL

By default, this flag is set to No indicating that the Solace Cluster is not SSL-enabled by default. Set this flag to Yes if the Solace Cluster is SSL-enabled.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Configuration status

Indicates the current state of the redundancy configuration of this node.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Shutdown 0
Released 1
Enabled 2
Enabled-Released 3

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current state of the redundancy configuration of this node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 to 3.

The detailed diagnosis of this measure lists the name of the mate router, the operation mode, switchover mechanism of the broker, the redundancy mode and the failover criteria of the target broker.

Redundancy status

Indicates the current redundancy status of the target node.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Down 0
Up 1

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current redundancy status of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Is auto revert enabled?

Indicates whether/not auto revert option is enabled on this node.

 

The auto-revert option controls what happens when the primary appliance comes back online after a failover has occurred. When auto-revert is not enabled (which is the default and recommended state), the primary appliance stays as a standby after it comes back online, allowing the backup appliance to remain active. In this case, the primary appliance becomes active only if the backup appliance fails or gives up activity.

If auto-revert is enabled, as soon as the primary appliance comes back online, it becomes active and switches the backup appliance from active to standby.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
No 0
Yes 1

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not auto revert is enabled on this node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Active standby role

Indicates the role of the target node in an Active /Standby redundant pair.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Primary 1
Backup 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the Active-Standby role of the node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Has virtual router activity state changed?

Indicates whether/not the state of the target node has changed from primary virtual router to backup virtual router and vice versa.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the state of the target node has changed. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

ADB link state

Indicates whether/not the ADB link to mate is connected from the target node.

 

An Assured Delivery Blade (ADB) is a card in a Solace appliance that enables guaranteed delivery of messages. ADBs have non-volatile memory where critical data-structures are stored and mirrored to an HA mate.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the ADB link is connected from the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

The detailed diagnosis of this measure lists the time at which the ADB link failed at the last instance and the reason for the failure.

ADB hello state

Indicates whether/not the ADB hello message was received by the target node

 

ADB hello refers to a basic interaction between an application and the Solace message broker to send a simple hello message.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the ADB hello message was received by the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

The detailed diagnosis of this measure lists the time at which the ADB hello message failed at the last instance and the reason for the failure.

ADB hello avg latency

Indicates the average time taken by the target node to receive the ADB hello message.

Milliseconds

 

ADB hello max latency

Indicates the maximum time taken by the target node to receive the ADB hello message.

Milliseconds

 

Primary activity

Indicates the current status of the target node if the broker is the primary virtual router in a redundant setup.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Shutdown 0
Subscriptions Pending 1
Local Inactive 2
Local Active 3
Master Active 4
Mate Active 5

Note:

By default, this measure reports the Measure Values listed in the table above to indicate current status of the target node as the primary virtual router. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 to 5.

The detailed diagnosis of this measure lists the name of the VRRP interface, the VRRP address, the VRRP interface role and the routing interface.

Routing interface

Indicates the state of the routing interface connecting the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Up 1
Down 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate state of the routing interface connecting the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Messagespool status

Indicates the current status of the message spool in the target node that is to provide Guaranteed Messaging.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
AD-Disable 0
AD-Not Ready 1
AD-Standby 2
AD-Active 3
AD-Activating 4
Unknown 5

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the message spool in the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 to 5.

SMRP status

Indicates the current status of the Subscription Management Routing Protocol (SMRP) on the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the Subscription Management Routing Protocol (SMRP) on the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

ADM card status

Indicates the current status of the ADB on the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the ADB on the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

ADM datapath status

Indicates the current status of the ADB datapath of the target node.

 

This measure is reported only if the target node is primary virtual router.

This measure is a good indicator to figure out whether the ADB datapath of the primary virtual router is able to spool messages.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the ADB datapath of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Flash module status

Indicates the current status of the Flash Memory Module on the ADB linked to the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the Flash Memory Module on the ADB linked to the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Power module status

Indicates the current status of the power module on the ADB linked to the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the power module on the ADB linked to the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

ADM content status

Indicates the current status of the contents of the ADB linked to the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the contents of the ADB linked to the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Disk status

Indicates the current status of the external disk array of the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the external disk array of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Disk content status

Indicates the current status of the spool file directory on the external disk storage array of the target node.

 

This measure is reported only if the target node is primary virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the spool file directory on the external disk storage array of the primary virtual router. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

DB sync status

Indicates the synchronization status of the database of the target node.

 

This measure is reported only if the target node is primary virtual router.

When an event broker is restarted while running Multi-Node Routing, it must synchronize its database with its neighbor event brokers to learn of the subscriptions it will become active for. This value indicates the SMRP synchronization status.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the synchronization status of the database of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

DB build status

Indicates the current status of the database on the target node.

 

This measure is reported only if the target node is primary virtual router.

Whenever redundancy is enabled on an event broker, it can take up to a minute to ready the database for taking activity from its mate event broker on demand.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the database on the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

DB build

Indicates the percentage of time taken by the target node to ready the database for taking activity from the mate event broker (backup virtual router) on demand.

Percent

This measure is reported only if the target node is primary virtual router.

A value close to 100 percent indicates that the database is not ready and is taking too long to take activity.

Backup activity

Indicates the current status of the target node if the broker is the backup virtual router in a redundant setup

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Shutdown 0
Subscriptions Pending 1
Local Inactive 2
Local Active 3
Master Active 4
Mate Active 5

Note:

By default, this measure reports the Measure Values listed in the table above to indicate current status of the target node as the backup virtual router. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 to 5.

The detailed diagnosis of this measure lists the name of the VRRP interface, the VRRP address, the VRRP interface role and the routing interface.

Backup routing interface

Indicates the state of the routing interface connecting the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Up 1
Down 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate state of the routing interface connecting target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup messagespool status

Indicates the current status of the message spool in the target node that is to provide Guaranteed Messaging.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
AD-Disable 0
AD-Not Ready 1
AD-Standby 2
AD-Active 3
AD-Activating 4
Unknown 5

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the message spool in the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 to 5.

Backup SMRP status

Indicates the current status of the Subscription Management Routing Protocol (SMRP) on the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the Subscription Management Routing Protocol (SMRP) on the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup ADM card status

Indicates the current status of the ADB on the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the ADB on the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup ADM datapath status

Indicates the current status of the ADB datapath of the target node.

 

This measure is reported only if the target node is backup virtual router.

This measure is a good indicator to figure out whether the ADB datapath of the backup virtual router is able to spool messages.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the ADB datapath of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup flash module status

Indicates the current status of the Flash Memory Module on the ADB linked to the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the Flash Memory Module on the ADB linked to the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup power module status

Indicates the current status of the power module on the ADB linked to the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the power module on the ADB linked to the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup ADM content status

Indicates the current status of the contents of the ADB linked to the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the contents of the ADB linked to the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup disk status

Indicates the current status of the external disk array of the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the external disk array of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup disk content status

Indicates the current status of the spool file directory on the external disk storage array of the target node.

 

This measure is reported only if the target node is backup virtual router.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the spool file directory on the external disk storage array of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup DB sync status

Indicates the synchronization status of the database of the target node.

 

This measure is reported only if the target node is backup virtual router.

When an event broker is restarted while running Multi-Node Routing, it must synchronize its database with its neighbor event brokers to learn of the subscriptions it will become active for. This value indicates the SMRP synchronization status.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the synchronization status of the database of the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup DB build status

Indicates the current status of the database on the target node.

 

This measure is reported only if the target node is backup virtual router.

Whenever redundancy is enabled on an event broker, it can take up to a minute to ready the database for taking activity from its mate event broker on demand.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure values Numeric values
Ready 1
Not ready 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the database on the target node. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Backup DB build

Indicates the percentage of time taken by the target noder to ready the database for taking activity from the mate event broker (backup virtual router) on demand.

Percent

This measure is reported only if the target node is backup virtual router.

A value close to 100 percent indicates that the database is not ready and is taking too long to take activity.