ApsaraDB for RDS Test

ApsaraDB for RDS is a stable, reliable, and scalable online database service. Based on Apsara Distributed File System and high-performance SSD storage of Alibaba Cloud, ApsaraDB for RDS supports the MySQL, SQL Server, PostgreSQL, PPAS (highly compatible with Oracle), and MariaDB database engines. It provides a portfolio of solutions for disaster recovery, backup, restoration, monitoring, and migration to facilitate database operations and maintenance.

The first step to using RDS is to create an RDS instance. An instance is a virtualized database server on which you can create and manage multiple databases. If a cloud user complains that he/she is unable to access their database on an RDS instance, administrators need to quickly figure out why it is so - is it because the instance hosting the database is down? is the instance rebooting? is the instance being deleted? or is the instance being locked? Moreover, the administrator also needs to ensure that each instance is sized with adequate CPU, memory, network, and storage resources, so that no instance experiences any performance degradation. If it does, then administrators should be able to identify the resource-starved instances and right-size them, before users notice any slowness. The ApsaraDB for RDS test helps with this and much more!

This test tracks the availability, operational state, and lock mode of every RDS instance, and alerts administrators to unavailable instances, those that are in an abnormal state currently, and locked instances. Additionally, the test reports the CPU, memory, connection, disk space, and I/O capacity of each instance, and also measures how every instance uses the allocated capacity. In the process, the test pinpoints which instance is hogging which resource. With the help of these diagnostics, administrators can proactively identify and promptly eliminate issues hampering the overall performance of and user experience with the virtual database server instances.

Target of the test : An Alibaba Cloud Account

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each RDS instance

Configurable parameters for the test
Parameters	Description
Test period	How often should the test be executed
Host	The host for which the test is to be configured.
Alibaba Access Key and Alibaba Secret Key	This test makes REST API requests to the Alibaba cloud to pull the metrics. For this purpose, the test needs to be configured with an AccessKey pair. An AccessKey pair is typically used to call an operation of an Alibaba Cloud service. It is also used to initiate an API request or use a cloud service SDK to manager cloud resources. An AccessKey pair is characterized by an AccessKey ID and an AccessKey Secret. The AccessKey ID is used to identify a user/cloud account. The AccessKey Secret is used to verify a user/cloud account. The first step to configuring the eG agent with an AccessKey pair is to create an AccessKey pair for the target cloud acount. To achieve this, follow the steps below: Log on to the RAM console by using an Alibaba Cloud account. In the left-side navigation pane, click Users under Identities. On the Users page, click the username of the RAM user for which you want to create an AccessKey pair in the User Logon Name/Display Name column. On the page that appears, click Create AccessKey in the User AccessKeys section. Note: You must enter a verification code if you create an AccessKey pair for the first time. Click Close. Note: The AccessKey secret is displayed only when you create an AccessKey pair. If the AccessKey pair is leaked or lost, you must create a new one. You can create a maximum of two AccessKey pairs. Make note of the AccessKey ID and AccessKey secret, once they are displayed. Then, configure the Alibaba Access Key parameter of the test with the AccessKey ID, and the Alibaba Secret Key parameter with the AccessKey Secret you made note of. If you failed to make note of the AccessKey ID and AccessKey Secret at the time of creating the AccessKey pair, then you can obtain the same at a later point in time. Similarly, if an AccessKey pair pre-exists for the target cloud account, then you do not have to create another one. Instead, you can obtain the AccessKey ID and AccessKey Secret of the existing AccessKey pair and configure the eG agent with the same. For this, follow the steps below: Use an Alibaba Cloud account to log on to the Alibaba Cloud Management console. Move the pointer over the profile picture in the upper-right corner, and click AccessKey. In the Security Tips message that appears, click Continue to manage AccessKey. AccessKey ID and AccessKey Secret are displayed. Make note of the displayed ID and secret. Then, configure the Alibaba Access Key parameter of the test with the AccessKey ID, and the Alibaba Secret Key parameter with the AccessKey Secret you made note of.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Instance status

Indicates the current status of this RDS instance.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value
Creating	1
Running	2
DBInstanceClassChanging	3
Transing	4
EngineVersionUpgrading	5
TransingToOthers	6
GuardDBInstanceCreating	7
Expired and being recycled	8
Importing	9
ImportingFromOthers	10
DBInstanceNetTypeChanging	11
GuardSwitching	12
Ins_cloning	13
Rebooting	14
Deleting	15

The Measure Values discussed in the table are described in detail below:

Creating: The instance is being created.
Running: The instance is running.
DBInstanceClassChanging: The instance is being upgraded or downgraded.
TRANSING: The instance is being migrated.
EngineVersionUpgrading: The database engine version of the instance is being upgraded.
TransingToOthers: The data of the instance is being migrated to another instance.
GuardDBInstanceCreating: A disaster recovery instance is being created for the instance.
Importing: Data is being imported into the instance.
ImportingFromOthers: Data is being imported into the instance from another instance.
DBInstanceNetTypeChanging: The network type of the instance is being changed.
GuardSwitching: The instance is undergoing a disaster-triggered failover.
INS_CLONING: The instance is being cloned.
Rebooting: The instance is restarting.
Deleting: The instance is being deleted.

Note:

This measure reports the Measure Values listed in the table above to indicate the current state of an RDS instance. In the graph of this measure however, the same is indicated using the numeric equivalents only.

The detailed diagnosis of this measure reveals additional details of the RDS instance, such as, its type, version, the instance class, its port number, connection address, its network type, VPC, and the name of the zone to which it belongs.

Instance type

Indicates the type of this instance.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value	Description
Primary	1	The primary instance role
Readonly	2	The read-only instance role
Guard	3	The disaster recovery instance role
Temp	4	The temporary instance role

Note:

This measure reports the Measure Values listed in the table above to indicate the role assigned to the RDS instance. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Instance class type

Indicates the instance family/class to which this instance belongs.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value
Shared DB	1
General instance	2
Dedicated instance	3
Dedicated host	4

The Measure Values discussed in the table are described in detail below:

Measure Value	Description
Shared DB	A shared instance exclusively occupies the allocated memory resources, but shares CPU and storage resources with the other shared instances that are deployed on the same physical host. CPU resources are highly reused among shared instances that are deployed on the same physical host. This maximizes cost-effectiveness. Shared instances may compete for resources.
General instance	A general-purpose instance exclusively occupies the allocated memory resources, but shares CPU and storage resources with the other general-purpose instances that are deployed on the same physical host. CPU resources are moderately reused among general-purpose instances that are deployed on the same physical host. This increases cost-effectiveness. The storage capacity of a general-purpose instance is independent of the number of CPU cores and memory capacity. You can flexibly configure the storage capacity based on your business requirements.
Dedicated instance	A dedicated instance exclusively occupies the allocated CPU and memory resources. Its performance remains stable and is not affected by the other instances that are deployed on the same physical host.
Dedicated host	The top configuration of the dedicated instance family is dedicated host. A dedicated host instance occupies all the resources on the physical host where it is deployed.

Note:

This measure reports the Measure Values listed in the table above to indicate the instance family. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Lock mode

Indicates the lock mode of this instance.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value
Unlock	1
Manual lock	2
Lock by expiration	3
Lock by restoration	4
Lock by disk quota	5

The Measure Values discussed in the table are described in detail below:

Measure Value	Description
Unlock	The instance is not locked.
Manual lock	The instance has been manually locked.
Lock by expiration	The instance has been automatically locked upon expiration.
Lock by restoration	The instance has been automatically locked before a rollback.
Lock by disk quota	The instance has been automatically locked because the storage capacity is exhausted.

Note:

This measure reports the Measure Values listed in the table above to indicate the lock mode of an instance. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Connection mode

Indicates the access mode of this instance.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value
Standard	1
High security	2

Note:

This measure reports the Measure Values listed in the table above to indicate the connection mode of an instance. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Total memory

Indicates the memory configuration of this instance.

Total capacity

Indicates the total storage capacity of this instance.

Maximum database can be created

Indicates the maximum number of databases that can be created on this instance.

Number

Maximum account can be created

Indicates the maximum number of accounts that can be created on this instance.

Number

Availability

Indicates whether/not this instance is available currently.

Percent

While the value 100 indicates that the instance is available, the value 0 denotes that the instance is unavailable.

Maximum I/O requests

Indicates the maximum number of I/O requests this instance can process per second.

Number

Maximum concurrent connections

Indicates the maximum number of concurrent connections this instance can handle.

Number

Total CPU

Indicates the total number of CPU cores allocated to this instance.

Number

Used space

Indicates the amount of disk space that this instance is currently utilizing.

Ideally, the value of this measure should be much lesser than the value of the Total storage measure. If this measure value is close to or is rapidly approaching the value of the Total storage measure, it implies that the instance is fast-exhausting its storage capacity. This can be detrimental to the performance of the instance. To prevent the storage crunch, you may want to configure the instance with additional storage space. Alternatively, you can compare the values of the Space occupied by data files, Space occupied by log files, Space occupied by backups, Space occupied by SQL data, and Cold backup data measures, to understand what type of data is consuming storage space. You can then see if data of any of these types can be deleted, so as to make more storage space available for critical data.

Space occupied by data files

Indicates the amount of storage space of this instance that is occupied by data files.

If the Percent usage measure of an instance is close to 100%, then you can compare the values of these measures for that instance to know what type of files is contributing to the storage crunch - data files? log files? backup files? SQL data files? or files in cold backup?

Space occupied by log files

Indicates the amount of storage space of this instance that is occupied by log files.

Space occupied by backups

Indicates the amount of storage space of this instance that is occupied by backups.

Space occupied by SQL data

Indicates the amount of storage space of this instance that is occupied by SQL data.

Cold backup size

Indicates the amount of storage space of this instance that is occupied by cold backups.

I/O requests rate

Indicates the rate at which this instance processes I/O operations.

Operations/Sec

If the value of this measure is close to the value of the Maximum I/O requests measure for any instance, it means that the I/O load on that instance is very high. To ensure that the instance does not reject/drop I/O requests, you have to ensure that the instance has adequate processing power to meet with the demand - i.e., ensure that the instance has sufficient resources (CPU, memory, storage space etc.) - and then proceed to increase the limit set for the number of I/O requests that instance can process per second.

Average inbound traffic

Indicates the average amount of data traffic flowing into this instance.

Average outbound traffic

Indicates the average amount of data traffic flowing out of this instance.

Network throughput

Indicates the network throughput of this instance.

Compare the value of this measure across instances to identify the precise instance that is consuming bandwidth excessively.

Total current connections

Indicates the current number of connections to this instnace.

Number

If the value of this measure is close to the Maximum concurrent connections measure for any instance, it implies that very soon the instance may not be able to entertain new connections. Under such circumstances, you may want to check to see if there are any idle connections to the instance and terminate them, so that the instance can handle more connections. The count of idle connections is the difference between the value of the Total current connections measure and the Total currently active connections measure. Alternatively, you can also increase the concurrent connection limit of the instance.

Total currently active connections

Indicates the count of connections to this instance that are currently active.

Number

Ideally, the value of this measure should be equal to the Total current connections measure. If it is much lesser than the value of the Total current connections measure, it means many connections to the instance are idle/inactive. By identifying and removing such connections, you can increase the connection handling capacity of the instance.

Used memory

Indicates the amount of memory currently used by this instance.

Ideally, the value of this measure should be much lower than that of the Total memory measure.

Free memory

Indicates the amount of memory that this instance is not using currently.

For best performance, the value of this measure should be high.

Free space

Indicates the amount of storage space that this instance is not using currently.

For best performance, the value of this measure should be high.

CPU utilization

Indicates the percentage of allocated CPU resources that is used by this instance.

Percent

A value close to 100% is a cause for concern, as it denotes that the instance is hogging the CPU resources. If the instance is a shared instance or a general-purpose instance, then excessive CPU utilization by that instance can cause the other instances on the same physical host to contend for the remaining CPU resources. In this case, you may want to increase the CPU capacity of the host.

Memory utilization

Indicates the percentage of allocated memory resources that is used by this instance.

Percent

A value close to 100% is a cause for concern, as it denotes that the instance is rapidly running out of memory. Without enough memory, the instance may fail to service user requests to it. To avoid this, make sure you size the instance with sufficient memory resources.

Percent usage

Indicates the percentage of allocated disk space that is used by this instance.

Percent

If the Percent usage measure of an instance is close to 100%, then you can compare the values of the Space occupied by data files, Space occupied by log files, Space occupied by backups, Space occupied by SQL data, and Cold backup data measures for that instance to know what type of files is contributing to the storage crunch - data files? log files? backup files? SQL data files? or files in cold backup?

IOPS utilization

Indicates the percent of I/O resources that is used by this instance.

Percent

A value close to 100% is a cause for concern, as it denotes that the instance is rapidly approaching the I/O request limit configured for it. To ensure that the instance services I/O requests to it without rejecting them, you may want to consider increasing the maximum number of I/O requests that instance can handle.

Connection utilization

Indicates the percent of connections used by this instance.

Percent

A value close to 100% is a cause for concern, as it implies that very soon the instance may not be able to entertain new connections. Under such circumstances, you may want to check to see if there are any idle connections to the instance and terminate them, so that the instance can handle more connections. The count of idle connections is the difference between the value of the Total current connections measure and the Total currently active connections measure. Alternatively, you can also increase the concurrent connection limit of the instance.