ApsaraDB MySQL Engine Monitoring - RDS Test

ApsaraDB for RDS is a stable, reliable, and scalable online database service. It is designed based on the Apsara Distributed File System and high-performance SSD storage media of Alibaba Cloud. It supports five database engines: MySQL, SQL Server, PostgreSQL, PPAS (compatible with Oracle), and MariaDB.

ApsaraDB RDS for MySQL is developed based on a branch of the MySQL source code. It provides basic features, such as instance management, account management, database management, backup and restoration, control access, Transparent Data Encryption (TDE), and data migration. ApsaraDB RDS for MySQL also provides advanced features and functions such as ApsaraDB for MyBase dedicated cluster, Read-only RDS instances to which read requests can be offloaded in times of an overload, Read/write splitting, a dedicated proxy, and the Database Autonomy Service that supports intelligent diagnostics and optimization at the instance level.

At the architecture level, ApsaraDB RDS for MySQL supports the InnoDB storage engine of MySQL. InnoDB is a general-purpose storage engine that balances high reliability and high performance. Quick query processing is a quality that any user would expect from a storage engine that promises high performance levels. To ensure that users stay loyal to the MySQL engine therefore, administrators must first measure the query processing ability of the engine, proactively capture slowness (if any), and accurately identify the type of SQL queries/statements (select, insert, update, replace, delete etc.) that are the slowest. If SELECT statements - i.e., read requests - are being processed slowly, then the administrators should try to fathom the reason for this slowness and fix it!

Poor buffer pool usage is one of the common causes for read requests to be processed slowly. The buffer pool is an area in main memory where InnoDB caches table and index data as it is accessed. The buffer pool permits frequently used data to be accessed directly from memory, which speeds up processing. If the buffer pool is inadequately sized, it will not be able to hold many frequently accessed objects in memory, owing to which, it will be unable to service read requests promptly. This can result in a spike in direct disk accesses; these are resource-intensive operations that can significantly slowdown query processing. To avoid this, administrators need to monitor buffer usage continuously, proactively detect sub-optimal usage, and promptly initiate measures to improve buffer usage and reduce disk accesses.

Besides query performance, the resilience of the storage engine to failures should also be checked from time-to-time. This is why, it is important that administrators monitor reads and writes to redo logs on a regular basis.

To perform all of the key monitoring tasks discussed above, administrators can periodically run the ApsaraDB MySQL Engine Monitoring - RDS Test. For every RDS instance for MySQL, this test monitors the speed with which the InnoDB storage engine executes queries, and in the process, reveals slowness (if any) in query processing. Moreover, by monitoring the SQL statements executed by the engine on each instance, the test reveals the level of SQL activity on the engine and how quickly the engine processes these statements. The types of SQL queries/statements that are being processed slowly can be pinpointed in the process. Also, the test periodically monitors the buffer pool and alerts administrators if a majority of read requests are serviced by direct disk accesses and not by the buffer pool. This way, the test enables administrators to determine whether poor buffer pool usage is the reason why read queries are processed slowly by an instance. Furthermore, reads and writes to redo logs per instance are also monitored, and abnormalities (if any) - eg., sluggishness in reading from or writing to logs, forced writes etc. - are promptly brought to the attention of administrators.

Target of the test : An Alibaba Cloud Account

Agent deploying the test : A remote agent

Outputs of the test : One set of results for every RDS instance for MySQL

Configurable parameters for the test
Parameters	Description
Test period	How often should the test be executed
Host	The host for which the test is to be configured.
Alibaba Access Key and Alibaba Secret Key	This test makes REST API requests to the Alibaba cloud to pull the metrics. For this purpose, the test needs to be configured with an AccessKey pair. An AccessKey pair is typically used to call an operation of an Alibaba Cloud service. It is also used to initiate an API request or use a cloud service SDK to manager cloud resources. An AccessKey pair is characterized by an AccessKey ID and an AccessKey Secret. The AccessKey ID is used to identify a user/cloud account. The AccessKey Secret is used to verify a user/cloud account. The first step to configuring the eG agent with an AccessKey pair is to create an AccessKey pair for the target cloud acount. To achieve this, follow the steps below: Log on to the RAM console by using an Alibaba Cloud account. In the left-side navigation pane, click Users under Identities. On the Users page, click the username of the RAM user for which you want to create an AccessKey pair in the User Logon Name/Display Name column. On the page that appears, click Create AccessKey in the User AccessKeys section. Note: You must enter a verification code if you create an AccessKey pair for the first time. Click Close. Note: The AccessKey secret is displayed only when you create an AccessKey pair. If the AccessKey pair is leaked or lost, you must create a new one. You can create a maximum of two AccessKey pairs. Make note of the AccessKey ID and AccessKey secret, once they are displayed. Then, configure the Alibaba Access Key parameter of the test with the AccessKey ID, and the Alibaba Secret Key parameter with the AccessKey Secret you made note of. If you failed to make note of the AccessKey ID and AccessKey Secret at the time of creating the AccessKey pair, then you can obtain the same at a later point in time. Similarly, if an AccessKey pair pre-exists for the target cloud account, then you do not have to create another one. Instead, you can obtain the AccessKey ID and AccessKey Secret of the existing AccessKey pair and configure the eG agent with the same. For this, follow the steps below: Use an Alibaba Cloud account to log on to the Alibaba Cloud Management console. Move the pointer over the profile picture in the upper-right corner, and click AccessKey. In the Security Tips message that appears, click Continue to manage AccessKey. AccessKey ID and AccessKey Secret are displayed. Make note of the displayed ID and secret. Then, configure the Alibaba Access Key parameter of the test with the AccessKey ID, and the Alibaba Secret Key parameter with the AccessKey Secret you made note of.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Average query rate	Indicates the rate at which this instance processes queries.	Queries/Sec	A high value is desired for this measure. A low value signifies slowness in query processing. Compare the value of this measure across RDS instances to know which instance is processing queries slowly.
Average transactions rate	Indicates the rate at which transactions are processed by this instance.	Transactions/Sec	A high value is desired for this measure. A low value signifies slowness in transaction processing. Compare the value of this measure across RDS instances to know which instance is processing transactions slowly.
Read hit rate of InnoDB buffer pool	Indicates the rate at which this instance read from the InnoDB buffer pool .	Reads/Sec
Utilization rate of InnoDB buffer pool	Indicates the percentage of read requests to this instance that were serviced by the buffer pool.	Percent	Ideally, the value of this measure should be over 80%. A low value is indicative of poor cache usage. This in turn can cause queries to be served by directly accessing the disk, thereby degrading query performance and overall database health. To avoid this, the buffer pool should be well-sized, so that it can store many objects in memory. This will reduce the likelihood of cache misses, thereby improving query performance significantly.
Dirty data blocks in InnoDB buffer pool	Indicates the percentage of dirty data blocks in the buffer pool of this instance.	Percent	InnoDB performs certain tasks in the background, including flushing of dirty pages from the buffer pool. Dirty pages are those that have been modified but are not yet written to the data files on disk. Ideally, very few dirty data blocks should only remain in the buffer pool at a given point in time. This is because, if changes are not written to the data files in disk at a brisk pace, the risk of data loss becomes higher if the database instance crashes suddenly. Buffer pool flushing is initiated when the percentage of dirty pages reaches the low water mark value defined by the innodb_max_dirty_pages_pct_lwm variable. The default low water mark is 10% of buffer pool pages. A innodb_max_dirty_pages_pct_lwm value of 0 disables this early flushing behaviour. The purpose of the innodb_max_dirty_pages_pct_lwm threshold is to control the percentage dirty pages in the buffer pool and to prevent the amount of dirty pages from reaching the threshold defined by the innodb_max_dirty_pages_pct variable, which has a default value of 90. InnoDB aggressively flushes buffer pool pages if the percentage of dirty pages in the buffer pool reaches the innodb_max_dirty_pages_pct threshold. By tuning these variables, you can make sure that dirty page flushing happens frequently, thereby ensuring data integrity.
InnoDB average volume of data reads rate	Indicates the rate at which data this instance reads data.	KB/Sec
InnoDB average volume of data write rate	Indicates the rate at which this instance writes data.	KB/Sec
Average read data rate from InnoDB buffer pool	Indicates the rate at which this instance reads data from the buffer pool.	Reads/Sec	A high value for this measure is indicative of a healthy buffer pool.
Average write data rate to the InnoDB buffer pool	Indicates the rate at which this instance writes data to the buffer pool.	Writes/Sec	Changes should be written quickly to the in-memory buffer, so that the buffer pool serves read requests using updated data, and not obsolete data. So, the value of this measure should be high.
Average log writes requests rate	Indicates the rate at which write requests to the redo log files on disk are processed by this instance.	Requests/Sec
Average physical writes rate on log files	Indicates the rate of writes performed by this instance on the redo log files on disk.	Writes/Sec
Average fsync writes completed on log files	Indicates the rate at which this instance wrote to the log files on disk by calling fsync.	Writes/Sec	Using fsync, you can force redo logs to be written to disk. An fsync is required to keep changes to your data safe, so it’s not in an in memory buffer when power is lost. This process should ideally be fast. That is why, a high value is desired for this measure.
Temporary tables created	Indicates the number of temporary tables automatically created in the hard drive during MySQL statement execution.	Number	In MySQL, a temporary table is a special type of table that allows you to store a temporary result set, which you can reuse several times in a single session. A temporary table is very handy when it is impossible or expensive to query data that requires a single SELECT statement with the JOIN clauses. In this case, you can use a temporary table to store the immediate result and use another query to process it.
Average of delete statement execution rate	Indicates the rate at which DELETE statements were executed by this instance.	Executions/Sec	If the value of the Average query rate measure increases consistently for an instance, then compare the value of this measure with that of the Average of insert statement execution rate, Average of insert select statement execution rate, Average of replace statement execution rate, Average of replace select statement execution rate, Average of select statement execution rate, and Average of update statement execution rate measures of that instance to know what type of queries/statements are slowing down query processing.
Average of insert statement execution rate	Indicates the rate at which INSERT statements were executed by this instance.	Executions/Sec	If the value of the Average query rate measure increases consistently for an instance, then compare the value of this measure with that of the Average of delete statement execution rate, Average of insert select statement execution rate, Average of replace statement execution rate, Average of replace select statement execution rate, Average of select statement execution rate, and Average of update statement execution rate measures of that instance to know what type of queries/statements are slowing down query processing.
Average of insert select statement execution rate	Indicates the rate at which INSERT_SELECT statements were executed by this instance.	Executions/Sec	If the value of the Average query rate measure increases consistently for an instance, then compare the value of this measure with that of the Average of delete statement execution rate, Average of insert statement execution rate, Average of replace statement execution rate, Average of replace select statement execution rate, Average of select statement execution rate, and Average of update statement execution rate measures of that instance to know what type of queries/statements are slowing down query processing.
Average of replace statement execution rate	Indicates the rate at which REPLACE statements were executed by this instance.	Executions/Sec	If the value of the Average query rate measure increases consistently for an instance, then compare the value of this measure with that of the Average of delete statement execution rate, Average of insert statement execution rate, Average of insert select statement execution rate, Average of replace select statement execution rate, Average of select statement execution rate, and Average of update statement execution rate measures of that instance to know what type of queries/statements are slowing down query processing.
Average of replace select statement execution rate	Indicates the rate at which REPLACE_SELECT statements were executed by this instance.	Executions/Sec	If the value of the Average query rate measure increases consistently for an instance, then compare the value of this measure with that of the Average of delete statement execution rate, Average of insert statement execution rate, Average of insert select statement execution rate, Average of replace statement execution rate, Average of select statement execution rate, and Average of update statement execution rate measures of that instance to know what type of queries/statements are slowing down query processing.
Average of select statement execution rate	Indicates the rate at which SELECT statements were executed by this instance.	Executions/Sec	If the value of the Average query rate measure increases consistently for an instance, then compare the value of this measure with that of the Average of delete statement execution rate, Average of insert statement execution rate, Average of insert select statement execution rate, Average of replace statement execution rate, Average of replace select statement execution rate, and Average of update statement execution rate measures of that instance to know what type of queries/statements are slowing down query processing.
Average of update statement execution rate	Indicates the rate at which UPDATE statements were executed by this instance.	Executions/Sec	If the value of the Average query rate measure increases consistently for an instance, then compare the value of this measure with that of the Average of delete statement execution rate, Average of insert statement execution rate, Average of insert select statement execution rate, Average of replace statement execution rate, Average of replace select statement execution rate, and Average of select statement execution rate measures of that instance to know what type of queries/statements are slowing down query processing.
Average rows read rate from InnoDB table	Indicates the rate at which rows are read by this instance from the InnoDB tables.	Rows/Sec	Use these measures to understand whether/not any instance experiences slowness when working with InnoDB tables. If so, then these metrics will also point you to the exact type of operation that was the slowest - insert, delete, update, or select.
Average rows updated rate from InnoDB table	Indicates the rate at which rows in InnoDB tables are updated by this instance.	Rows/Sec
Average rows deleted rate from InnoDB table	Indicates the rate at which this instance deletes rows from InnoDB tables.	Rows/Sec
Average rows inserted rate into InnoDB table	Indicates the rate at which this instance inserts rows into InnoDB tables.	Rows/Sec
Average physical writes rate on log file	Indicates how many rows are written to log files in disk every second by this instance.	Rows/Sec	A high value is desired for this measure.