MySQL Replication Slave Test

Replication enables data from one MySQL database server (the source) to be copied to one or more MySQL database servers (the replicas). Replication is asynchronous by default; replicas do not need to be connected permanently to receive updates from the source. Depending on the configuration, you can replicate all databases, selected databases, or even selected tables within a database.

MySQL replication capabilities are implemented using three main threads, one on the source server and two on the replica:

  • Binary log dump thread: The binary log dump thread acquires a lock on the source's binary log for reading each event that is to be sent to the replica. As soon as the event has been read, the lock is released, even before the event is sent to the replica.

  • Replication I/O thread: The replication I/O thread reads the updates that the source's Binlog Dump thread sends (see previous item) and copies them to local files that comprise the replica's relay log.

  • Replication SQL thread: The replica creates an SQL thread to read the relay log that is written by the replication I/O thread and execute the transactions contained in it.

There are three main threads for each source/replica connection. A source that has multiple replicas creates one binary log dump thread for each currently connected replica, and each replica has its own replication I/O and SQL threads.

A replica uses two threads to separate reading updates from the source and executing them into independent tasks. Thus, the task of reading transactions is not slowed down if the process of applying them is slow. For example, if the replica server has not been running for a while, its I/O thread can quickly fetch all the binary log contents from the source when the replica starts, even if the SQL thread lags far behind. If the replica stops before the SQL thread has executed all the fetched statements, the I/O thread has at least fetched everything so that a safe copy of the transactions is stored locally in the replica's relay logs, ready for execution the next time that the replica starts.

The success of any replication system rests on how quickly slaves reconnect with master when the link goes down, and how rapidly data synchronization occurs between the masters and the slaves. If slaves take too long to reconnect with the master after losing contact, the complete replication process may be sluggish. To avoid this, administrators should continuously monitor the steps in the replication process, proactively identify pain points, and promptly initiate measures to eliminate them, so that the data on the master and slave sides are in-sync at all times. This is where the MySQL  Replication Slave test helps!

This test auto-discovers the slaves/replica servers that are connected to the target database server and for each slave/replica server, this test reports the current status. Additionally, this test reports the status of the I/O thread executing on each slave. I/O errors and SQL thread errors noticed on the replica servers are also promptly captured and reported. The replication lag duration between the master and the replica servers helps administrators figure out the replica server that is currently out of date. Periodical monitoring of the heartbeats help administrators determine if a replication connection between the master and the replica server is active at all times. With the help of these metrics, administrators can quickly spot anomalies in the replication process and initiate measures to resolve them.

Target of the test : A MySQL server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each slave/replica server connected to the target database being monitored

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the MySQL server.

Port

The port at which the specified host listens.

Database

Specify the name of a database on the target MySQL database server being monitored

Username and Password

The eG agent has to be configured with the credentials of a user who has server-wide Process and Select privileges on the monitored MySQL server. To know how to create such a user, refer to Pre-requisites for Monitoring the MySQL Server topic.

For this test to run, the user should additionally possess the following privileges. To configure the above-mentioned user with the specific privileges, issue the commands given below:

GRANT REPLICATION CLIENT on *.* to '<username>'@'<IP_address_of_eG_agent>';

GRANT REPLICATION SLAVE on *.* to '<username>'@'<IP_address_of_eG_agent>';

Confirm Password

Confirm the password by retyping it here.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Slave IO process

Indicates the current status of this replica server.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Waiting for master to send event 1
Checking master version 2
Connecting to master 3
Queueing master event to the relay log 4
Reconnecting after a failed binlog dump request 5
Reconnecting after a failed master event read 6
Registering slave on master 7
Requesting binlog dump 8
Waiting for its turn to commit 9
Waiting for master update 10
Waiting for slave mutex on exit 11
Waiting for the slave SQL thread to free enough relay log space 12
Waiting to reconnect after a failed binlog dump request 13
Waiting to reconnect after a failed master event read 14

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of a replica server. The graph of this measure however, represents the status of the replica server using the numeric equivalents only - 1 to 14.

Slave IO running

Indicates whether/not the I/O thread is running on this replica server. This thread reads the Master MySQL server's binary logs.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
No 0
is Connecting 1
Yes 2

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the I/O thread is running on the replica server. The graph of this measure however, is represented using the numeric equivalents only - 0 to 2.

Seconds behind master

Indicates the current replication lag of this replica server.

Seconds

The value of this measure depends on the state of the MySQL slave in general, and the states of MySQL slave SQL_THREAD and IO_THREAD in particular. While IO_THREAD connects with the master and reads the updates, SQL_THREAD applies these updates on the slave.

If the slave/replica server is in sync with the master, then the Retrieved_GTID_Set and the Executed_GTID_Set values will be the same. In such cases, the value of this measure be zero. If these values are different, then, the value of this measure will represent the difference between the timestamp of the latest transaction processed by the SQL Thread and the timestamp of the same transaction when it was processed on the master.

Ideally, the value of this measure should be zero. A consistent ‘0’ value or a constant value indicates that the slave is executing at the same speed as the master. On the other hand, a sudden/gradual increase in the value of this measure indicates that the slave is performing slower than the master.

Compare the value of this measure across the replica servers to figure out the replica server that is lagging far behind the master in replication.

Is last IO error happened?

Indicates whether/not I/O errors were noticed on this replica server.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not I/O errors were noticed on the replica server. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Is last SQL error happened?

Indicates whether/not SQL thread errors were noticed on this replica server.

 

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the SQL thread errors were noticed on the replica server. The graph of this measure however, is represented using the numeric equivalents only i.e., 0 or 1.

Slave received heartbeat

Indicates the total number of replication heartbeats received by this replica server from the master during the last measurement period.

Number

The replication heartbeat information lets you check if the replication connection is active even if the source has not sent events to the replica recently. The source/master sends a heartbeat signal to a replica if there are no updates to, and no unsent events in, the binary log for a longer period than the heartbeat interval.

Slave last heartbeat

Indicates the time elapsed since the last replication heartbeat received by this replica server.

Seconds

 

Slave retrieved heartbeat

Indicates the number of times that the SQL thread of this replica server has retried the transactions during the last measurement period.

Number