MySQL Cluster Replication Transactions Test

Replication lag is the difference between the current time and the original timestamp at which the primary database committed the transaction that is currently being applied on the replica. In MySQL cluster database, the replication lag can be caused due to long running transactions or Queries or transactions containing multiple update queries, missing primary keys, lock contention due to DDL in the replicas, and overloaded replica. Since, MySQL replication is single threaded any long running write query/explicit transactions clogs the replication stream. Due to this small or fast updates into the MySQL binary log can't proceed as all the updates from the transactions are buffered together and dropped in to the binary log. Hence, keeping a track of the transaction happening on the nodes of the cluster can help administrators to be promptly alerted on any possible chances of replication lag before it affects the user experience.

This test continuously monitors each nodes in the cluster and reports the number of transactions happening in each of the nodes. In addition, this test also reports the replication lag time, and number of transaction errors occurred on each nodes in the cluster. Using this test, administrators can identify the transactions that are running for long time and transaction errors leading to replication lag.

Target of the test : A MySQL Cluster

Agent deploying the test : An external agent

Outputs of the test : One set of results for each node on the target MySQL Cluster database server being monitored.

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed
Host	Specify Host name of the server for which the test is to be configured in this text box.
Port	Specify the port to which the specified host listens in this text box.
Database(DB)	Specify the name of a database on the target MySQL Cluster database server being monitored in the Database text box.
Username and Password	The eG agent has to be configured with the credentials of a user who has server-wide Process and Select privileges on the monitored MySQL Cluster database server in the Username and Password text boxes. To know how to create such a user, refer to Pre-requisites for Monitoring the MySQL Cluster
Confirm Password	Confirm the Password by retyping it in the Confirm Password text box.
Allow Public Key	By default, the Allow Public Key flag is set to No. But, if the specified USER is created with caching_sha2_password as the authentication plugin, then the eG agent can connect to the target database cluster using RSA public key. To this effect, you have to set Allow Public Key flag to Yes.
SSL	By default, the SSL flag is set to No, indicating that the target MySQL Cluster server is not SSL-enabled by default. To enable the test to connect to an SSL-enabled MySQL Cluster server, set the SSL flag to Yes.
Verify CA	If the eG agent is required to establish an encrypted connection with the target MySQL Cluster server by authenticating the server's identity through verifying the server CA certificate, set Verify CA flag to Yes. By default, this flag is set to No.
Truststore Password	This parameter is applicable only if the Verify CA parameter is set to Yes. To verify the target server certificate, provide the password of the truststore file which contains the server CA certificate in the Truststore Password text box. By default, this parameter is set to none.
Confirm Password	Confirm the Password by retyping it in the Confirm Password text box.
Keystore Password	This parameter is applicable only if the Verify CA parameter is set to Yes. To establish a connection with the target MySQL Cluster server, the eG agent needs to have access to the client certificate. For this provide the password of the keystore file which contains the client certificate in the Keystore Password text box. By default, this parameter is set to none.
Confirm Password	Confirm the Password by retyping it in the Confirm Password text box.
Include Available Nodes	In the Include Available Nodes text box, provide a comma-separated list of all the available nodes to be included for monitoring. This way, the test monitor and collect metrics from all the available nodes in the cluster. By default, this parameter is set to none. The format of this configuration is: HOSTNAME:PORT, for example, 172.16.8.136:3306,172.16.8.139:3306
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Number of transactions	Indicates the number of transactions happened on this node during the last measurement period.	Number	The detailed diagnosis of this measure reveals the start time and end time of the transactions on this node.
Maximum replication lag time	Indicates time difference of execution between the primary/master against the standby/slave node, ie. the time that the replica's state is lagging behind the state of the primary instance.	Seconds	This measure is applicable only for the standby/slave node. Ideally, the value of this measure should be 0. A high replication lag can be due to bad queries being replicated such as lack of primary keys or bad indexes, a poor network hardware or malfunctioning network card, a distant location between different regions or zones, or some processes such as physical backups running can cause your MySQL database to delay applying the current replicated transaction.
Number of transaction errors	Indicates the number of transactions that experienced errors on this node during the last measurement period.	Number	A high value is a cause for concern, as too many error-prone transactions can significantly damage the user experience. Use the detailed diagnosis of this measure to find out the Channel name, Worker ID, Thread id, Service state, Last applied transaction, Last error time, Last Error number, Last Error message, Last applied transaction and Last transient error message.