Solace System Replications Test
Solace offers a disaster recovery (DR) solution for Solace PubSub+ event brokers using data center replication. Replication provides business continuity and allows mission-critical applications to continue to function during a major service outage to a data center. Replication protects against more catastrophic events in the data center and requires manual intervention to effect a failover. When replication is enabled, guaranteed messages that are published to a Message VPN with an active replication state at one data center are automatically propagated to matching Message VPNs with a standby replication state at another data center.
A typical customer deployment model for replicated data center infrastructure is to have a pair of replication sites located at different geographic locations i.e., atleast the distance should be a few hundred miles apart. These sites are considered replication mates, and known collectively as a replication group. The main or primary site will use a high-availability (HA) pair of event brokers to protect against a local failure of an event broker or equipment in that site. The secondary or standby site may have a single event broker or an HA pair of event brokers. The primary site provides service unless there is a failure of the primary site. If the primary site fails, service is failed over to the backup site. Once the primary site is restored, service can be failed back to the primary site.
The success of any replication system rests on how quickly backup sites reconnect with primary site after a failure, and how rapidly data synchronization occurs between the primary sites and backup sites. If backup sites take too long to reconnect with the primary site after losing contact, or if there are too many messages that are queued on the primary site to be sent to the backup site, then the replication process will be sluggish. To avoid this, administrators should continuously monitor the replication process, proactively identify pain points, and promptly initiate measures to eliminate them, so that the data on the primary and backup sites are in-sync at all times. This is where the Solace System Replications test helps!
This test first determines whether/not replication is enabled on the target broker. When the target broker is in an active/primary site, then, the test reports the number of synchronous and asynchronous messages that were queued to the standby/backup replication site and the count of messages that were transmitted to the stanby/backup replication site. If the target broker is on standby, then, the test reveals the count of transaction replication requests received by the target broker. Transaction replication request failures noticed on the target broker alerts administrators to potential problems which needs immediate attention / resolution. With the help of these metrics, administrators can quickly spot anomalies in the replication process and initiate measures to resolve them.
Target of the test : A Solace PubSub+ Event Broker
Agent deploying the test : A remote agent
Outputs of the test : One set of results for the target Solace PubSub+ Event Broker being monitored
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the target host for which this test is to be configured. |
Port |
Refers to the port at which the Solace PubSub+ Event Broker listens to. |
UserName and Password |
By default, the eG agent executes SEMP ( Solace Element Management Protocol) APIs on the target broker to collect the required metrics. For the eG agent to execute the SEMP APIs, a special user with read only privilege is required. Specify the credentials of such a user in the UserName and Password text boxes. To know how to create such a user, refer to Creating a New User for Monitoring Solace PubSub+ Event Broker. |
Confirm Password |
Confirm the Password by retyping it in the Confirm Password text box. |
SSL |
By default, this flag is set to No indicating that the Solace PubSub+ Event Broker is not SSL-enabled by default. Set this flag to Yes if the Solace PubSub+ Event Broker is SSL-enabled. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation | ||||||
---|---|---|---|---|---|---|---|---|---|
Is replication enabled? |
Indicates whether/not replication is enabled on the target event broker. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the Measure Values listed in the table above to indicate whether/not replication is enabled on the target event broker. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1. The detailed diagnosis of this measure lists the name of the Mate router, Mate connection, Mate uncompressed port, Mate compressed port and the Mate SSL port. |
||||||
Transitions to ineligible |
Indicates the number of times the target broker transitioned to ineligible state to receive messages from a particular topic or queue during the last measurement period. |
Number |
A broker could transition to ineligible state due to various reasons such as network connectivity issues, subscription lag, flow control issues or WAN bandwidth issues that need to be addressed by the system administrator. |
||||||
Async messages queued to standby |
Indicates the number of asynchronous messages that were queued from the target broker to the standby replication mate. |
Number |
|
||||||
Promoted messages queued to standby |
Indicates the number of promoted messages that were queued to the standby replication mate from the target broker. |
Number |
|
||||||
Pruned locally consumed messages |
Indicates the number of messages that were pruned by the target broker during the last measurement period. |
Number |
The event broker will automatically prune out the oldest messages paving way for newer messages to arrive. |
||||||
Sync messages queued to standby |
Indicates the number of synchronous messages that were queued to the standby replication mate from the target broker. |
Number |
|
||||||
Sync messages queued to standby as async |
Indicates the number of synchronous messages that were queued to the standby replication mate as asynchronous messages from the target broker. |
Number |
If the WAN link between the replication sites fails, or if the WAN link is not able to keep up with the ingress rate of messages that are to be replicated to Message VPNs on the replication mate that are in a standby state, then the system transitions to asynchronous replication for synchronous topics to prevent publishing applications from being unable to publish during the WAN link impairment or outage. The system automatically transitions back to synchronous replication once the WAN link is restored, and the backlog of replicated messages that were queued for replication to Message VPNs with standby replication states on the replication mate while the event broker was in this state. |
||||||
Messages transmitted to standby |
Indicates the total number of messages that were transmitted to the standby replication mate from the target broker during the last measurement period. |
Number |
|
||||||
Reconcile requests received from standby |
Indicates the number of reconcile requests received by the target broker from a standby replication mate during the last measurement period. |
Number |
|
||||||
Transactions requests |
Indicates the total number of transaction requests received by the target broker while in standby during the last measurement period. |
Number |
When using transactions, the replication mode is set at the message VPN level. All local and XA transactions in the message VPN use the same replication mode. Synchronous transactions must be stored on the standby site before responding to the client. Asynchronous transactions only need to be stored in the replication queue. The replication mode of the replicated topic subscriptions are ignored when using transactions. |
||||||
Transactions requests success |
Indicates the number of transaction requests that were successful when the target broker was in standby mode during the last measurement period. |
Number |
A high value is desired for this measure. A sudden/gradual decrease in the value of this measure is a cause of concern as this may indicate that there may be issues in replication due to various reasons such as connectivity issue, non-responsive replication mate, etc. |
||||||
Transactions requests success commit |
Indicates the number of transaction requests that were committed successfully when the target broker was in standby mode during the last measurement period. |
Number |
A high value is desired for this measure. |
||||||
Transactions requests success prepare |
Indicates the number of transaction requests that were prepared successfully when the target broker was in standby mode during the last measurement period. |
Number |
A high value is desired for this measure. |
||||||
Transactions requests success rollback |
Indicates the number of transaction requests that were rolled back successfully when the target broker was in standby mode during the last measurement period. |
Number |
|
||||||
Transactions requests fail |
Indicates the number of transaction requests that failed when the target broker was in standby mode during the last measurement period. |
Number |
Ideally, the value of this measure should be 0. |
||||||
Transactions requests fail commit |
Indicates the number of transaction requests that were failed commits when the target broker was in standby mode during the last measurement period. |
Number |
Ideally, the value of this measure should be 0. |
||||||
Transactions requests fail prepare |
Indicates the number of transaction requests that failed to be prepared when the target broker was in standby mode during the last measurement period. |
Number |
Ideally, the value of this measure should be 0. |
||||||
Transactions requests fail rollback |
Indicates the number of transaction requests that failed rollback when the target broker was in standby mode during the last measurement period. |
Number |
Ideally, the value of this measure should be 0. |
||||||
Messages received from active |
Indicates the total number of messages that were received when the target broker was in standby mode during the last measurement period. |
Number |
|
||||||
ACK propagation messages received |
Indicates the number of acknowledgment propagation messages that were received when the target broker was in standby mode during the last measurement period. |
Number |
|
||||||
Out of sequence ACK received |
Indicates the number of out of sequence acknowledgments received when the target broker was in standby mode during the last measurement period. |
Number |
|
||||||
Reconcile requests sent to active |
Indicates the number of reconcile requests sent to the active broker when the target broker was in standby mode during the last measurement period. |
Number |
|