Maria Cluster Replication Transactions Test

In a MariaDB Cluster, replication ensures that data changes made on the primary node are consistently propagated to other nodes in near real time. Smooth replication is critical to guarantee data availability, consistency, and fault tolerance across the cluster. However, issues such as replication lag, transaction errors, or frequent flow control messages can directly affect application performance and lead to stale reads or even service disruptions.

This test helps administrators track the replication performance of a MariaDB cluster by reporting transaction volumes, replication errors, flow control activities, and replication lag times. By doing so, the test provides insights into the efficiency of replication, highlights potential bottlenecks, and assists in ensuring high availability and data consistency across all nodes.

Target of the test : A MariaDB Cluster

Agent deploying the test : An external agent

Outputs of the test : One set of results for each node on the MariaDB Cluster being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the MariaDB Cluster.

Port

The port on which the server is listening. By default, this is set to 3306.

Database

Specify the name of the database that is to be monitored on the target MariaDB Cluster.

User and Password

The eG agent has to be configured with the credentials of a user who has server-wide PROCESS and SELECT privileges on the monitored Maria Cluster Database server. To know how to create such a user, refer to Configuring the eG Agent with Access Privileges section.

Confirm Password

Confirm the Password (if any) by retyping it here.

SSL

This indicates that the eG agent will communicate with theMaria cluster via HTTPS or not. By default, this flag is set to No, as the target Maria database is not SSL-enabled by default. If the target cluster is SSL-enabled, then set this flag to Yes.

Verify CA

If the eG agent is required to establish an encrypted connection with the target MariaDB Cluster by authenticating the server's identity through verifying the server CA certificate, set Verify CA flag to Yes. By default, this flag is set to No.

Available Nodes

In the Available Nodes text box, provide a comma-separated list of all the available nodes to be included for monitoring. This way, the test monitor and collect metrics from all the available nodes in the cluster. By default, this parameter is set to none. The format of this configuration is: HOSTNAME:PORT, for example, 172.16.8.136:3306,172.16.8.139:3306

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Transactions

Indicates the total number of replication transactions processed by this node during the last measurement period.

Number

 This measure is not reported for the Summary descriptor.

Transaction errors

Indicates the number of transactions that experienced errors on this node during the last measurement period.

Number

This measure is not reported for the Summary descriptor.

A high value is a cause for concern, as too many error-prone transactions can significantly damage the user experience.

Use the detailed diagnosis of this measure to find out the Master Host, Master User, Master Binary File, Last Error Number, Last Error Message, Slave SQL State.

Flow control sent

Indicates the number of flow control messages sent by this node to other nodes in the cluster.

Number

This measure is not reported for the Summary descriptor.

Flow control in a MariaDB cluster is a mechanism that regulates the pace of transaction replication between nodes. If a node falls behind in applying replicated transactions, flow control signals are triggered to pause or slow down transaction processing on other nodes until the lagging node catches up.

A high value indicates the node is struggling to apply transactions and is signaling others to slow down, which could cause cluster-wide slowdowns.

Flow controls received

Indicates the number of flow control messages received by this node from other nodes in the cluster.

Number

This measure is not reported for the Summary descriptor.

A high value means other nodes are struggling with replication throughput, potentially slowing cluster-wide performance.

Replication lag time

Indicates the time difference in execution between the primary and this secondary node; i.e., the amount of time the replica is lagging behind the current state of the primary instance.

Seconds

This measure is applicable only for the secondary node.

A higher lag duration implies that the standby node is taking longer to apply changes, which may impact failover readiness. Ideally, this value should remain low to ensure minimal data loss and quick recovery during failover.

Use the detailed diagnosis of this measure to find out the Master Host, Master User, Master Binary File, Last Error Number, Last Error Message, Slave SQL State.

Maximum replication lag time

Indicates maximum replication lag noticed between the primary/master against the secondary/slave node in this cluster.

Seconds

This measure is reported only for the Summary descriptor.

For the Summary descriptor, this measure will report the maximum replication lag across all the nodes in the cluster. A high maximum lag indicates that at least one standby node is significantly behind the primary, posing a risk to data consistency and high availability. Monitoring this helps isolate lagging nodes and take corrective action.

Use the detailed diagnosis of this measure to find out the Master Host, Master User, Master Binary File, Last Error Number, Last Error Message, Slave SQL State.

Minimum replication lag time

Indicates the minimum replication time lag noticed between the primary against the secondary node in this cluster.

Seconds

This measure is reported only for the Summary descriptor.

For the Summary descriptor, this measure will report the minimum replication lag across all the nodes in the cluster.

Use the detailed diagnosis of this measure to find out the Master Host, Master User, Master Binary File, Last Error Number, Last Error Message, Slave SQL State.