Relational Database Service - RDS Test

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the cloud. It provides cost-efficient, resizeable capacity for an industry-standard relational database and manages common database administration tasks. It also manages backups, software patching, automatic failure detection, and recovery.

The basic building block of Amazon RDS is the DB instance. A DB instance is an isolated database environment in the cloud. A DB instance can contain multiple user-created databases, and you can access it by using the same tools and applications that you use with a stand-alone database instance.

Each DB instance runs a DB engine. Amazon RDS currently supports the MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server DB engines.

The computation and memory capacity of a DB instance is determined by its DB instance class. For each DB instance, you can select from 5 GB to 6 TB of associated storage capacity. Each DB instance class has minimum and maximum storage requirements for the DB instances that are created from it. You can select the DB instance that best meets your needs. If your needs change over time, you can change DB instances. For example, initially, you may have launched a standard (previous generation) DB instance, which provides a balance of compute, memory, and network resources for your applications. However later, based on usage, you may have realized that a burst capable - current generation DB instance, with the capability to burst to full CPU usage, is ideal for your needs. In such circumstances, RDS facilitates the switch from one type DB instance to another. But to understand which DB instance class best suits your needs and make timely and accurate adjustments to your DB instance class selection, you will have to constantly track the CPU, memory, network, and space usage of each active DB instance on the cloud and derive usage patterns. Also, to ensure optimal storage performance, you additionally need to keep an eye on the I/O operations performed on the DB instances and identify latent DB instances. This is exactly what the Relational Database Service - RDS test enables you to achieve.

This test closely tracks the current status, resource usage, and I/O activity of every active DB instance on a monitored region, and brings the following to light:

  • Is any DB instance in an abnormal state presently?
  • How are the DB instances using the CPU resources they have been configured with? Is any DB instance consuming high levels of CPU consistently? Should the DB instance class be changed?
  • Does the DB instance have enough RAM? Will changing the DB instance class help in reducing the memory pressure on the instance?
  • Do any db.t2 instances have a poor CPU credit balance?
  • Is the disk I/O queue of any DB instance abnormally high? Which instance is this and when is I/O latency on that instance very high - when reading from or writing to the instance?
  • Which DB instance is hungry for network bandwidth?
  • Do all DB instances have enough free space? If not, which ones are rapidly running short of space?

Target of the test : Amazon Region

Agent deploying the test: A remote agent

Output of the test: One set of results for each active DB instance / DB instance class / DB engine name (depending upon the option you choose from the RDS Filter drop-down) in the monitored region

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

To monitor an Amazon instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the proxy user name and proxy password parameters, respectively. Then, confirm the password by retyping it in the CONFIRM PASSWORD text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the proxy domain and proxy workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

RDS Filter Name

By default, this test reports metrics for each active DB instance on the cloud. This is why, this flag is set toDBInstanceIdentifier, by default. If needed, you can pick either of the following options from this drop-down:

  • DatabaseClass: The computation and memory capacity of a DB instance is determined by its DB instance class. If you select this option, then this test will report metrics for each DB instance class. In other words, eG will aggregate metrics for all databases that belong to a DB intance class, and will present these metrics at the macro class-level.
  • EngineName: Each DB instance runs a DB engine. Amazon RDS currently supports the MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server DB engines. Each DB engine has its own supported features, and each version of a DB engine may include specific features. If you select this option, then this test will report metrics for every DB engine. In this case, eG will aggregate metrics for all databases using a particular engine, and will present these metrics at the macro engine-level.
Measures reported by the test:

Measurement

Description

Measurement Unit

Interpretation

RDS instance status:

Indicates the current status of this DB instance.

 

This measure is reported only for a DB instance descriptor.

The values that this measure reports, the significance of each of these values, and the numeric values that correspond to them are discussed in the table below:

Measure Value Description Numeric Value
Failed The instance has failed and Amazon RDS was unable to recover it. Perform a point-in-time restore to the latest restorable time of the instance to recover the data. 0
Available The instance is healthy and available 1
Backing-up The instance is currently being backed up. 2
Creating The instance is being created. The instance is inaccessible while it is being created. 3
Inaccessible-encryption-credentials The KMS key used to encrypt or decrypt the DB instance could not be accessed. 4
Incompatible-credentials The supplied CloudHSM username or password is incorrect. Please update the CloudHSM credentials for the DB instance. 5
Incompatible-network Amazon RDS is attempting to perform a recovery action on an instance but is unable to do so because the VPC is in a state that is preventing the action from being completed. This status can occur if, for example, all available IP addresses in a subnet were in use and Amazon RDS was unable to get an IP address for the DB instance. 6
Incompatible-option-group Amazon RDS attempted to apply an option group change but was unable to do so, and Amazon RDS was unable to roll back to the previous option group state. Consult the Recent Events list for the DB instance for more information. This status can occur if, for example, the option group contains an option such as TDE and the DB instance does not contain encrypted information. 7
Incompatible-parameters Amazon RDS was unable to start up the DB instance because the parameters specified in the instance's DB parameter group were not compatible. Revert the parameter changes or make them compatible with the instance to regain access to your instance. Consult the Recent Events list for the DB instance for more information about the incompatible parameters. 8
Incompatible-restore Amazon RDS is unable to do a point-in-time restore. Common causes for this status include using temp tables or using MyISAM tables. 9
Maintenance Amazon RDS is applying a maintenance update to the DB instance. 10
Modifying The instance is being modified because of a customer request to modify the instance. 11
Rebooting The instance is being rebooted because of a customer request or an Amazon RDS process that requires the rebooting of the instance. 12
Renaming The instance is being renamed because of a customer request to rename it. 13
Resetting-master-credentials The master credentials for the instance are being reset because of a customer request to reset them. 14
Restore-error The DB instance encountered an error attempting to restore to a point-in-time or from a snapshot. 15
Upgrading The database engine version is being upgraded. 16
Storage-full The instance has reached its storage capacity allocation. This is a critical status and should be remedied immediately; you should scale up your storage by modifying the DB instance. Set alarms to warn you when storage space is getting low so you don't run into this situation. 17
Deleting The instance is being deleted. 18

Note:

This measure reports the Measure Values listed in the table above to indicate the current status of a DB instance. In the graph of this measure however, the same will be represented using the corresponding numeric equivalents only.

CPU credit usage:

Indicates the number of CPU units consumed by this T2 DB instance/ all DB instances that belong to this T2 DB instance class / all T2 DB instances using this DB engine, during the last measurement period.

Number

These measures are reported only for individual T2 instances, instances that belong to T2 DB instance classes, and the DB engines used only by T2 instances.

A CPU Credit provides the performance of a full CPU core for one minute. Traditionalinstance types provide fixed performance, while T2 instances provide a baseline level of CPU performance with the ability to burst above that baseline level. The baseline performance and ability to burst are governed by CPU credits.

One CPU credit is equal to one vCPU running at 100% utilization for one minute. Other combinations of vCPUs, utilization, and time are also equal to one CPU credit; for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes.

Each T2 instance starts with a healthy initial CPU credit balance and then continuously (at a millisecond-level resolution) receives a set rate of CPU credits per hour, depending on instance size.

When a T2 instance uses fewer CPU resources than its base performance level allows (such as when it is idle), the unused CPU credits (or the difference between what was earned and what was spent) are stored in the credit balance for up to 24 hours, building CPU credits for bursting. When your T2 instance requires more CPU resources than its base performance level allows, it uses credits from the CPU credit balance to burst up to 100% utilization. The more credits your T2 instance has for CPU resources, the more time it can burst beyond its base performance level when more performance is needed. This implies that ideally, the value of the CPU credit usage measure should be low for an instance and the value of the CPU credit balance for that instance should be high, as that way, an instance is assured of more CPU resources when performance demands increase. By comparing the value of this measure across instances, you can precisely identify the instance that has used up a sizeable portion of its CPU credits.

 

CPU credit balance:

Indicates the number of CPU credits that an instance has accumulated.

Number

CPU utilization:

Indicates the percentage of CPU utilized by this DB instance / DB instance class / DB engine

Percent

A value close to 100% for this measure for any DB instance is indicative of excessive CPU usage by that instance. Track the variations to the value of this measure for such an instance closely, and figure out whether CPU usage is consistently high and close to 100%. If so , you can conclude that the instance requires more CPU than what's been allocated to it. You may want to change to the DB instance class definition to allot more CPU resources to all instances it governs.

Binlog disk usage:

Indicates the amount of disk space occupied by binary logs on this DB instance / all DB instances of this DB instance class / all DB instances using this DB engine

KB

The binary log on MySQL has two important purposes:

  • For replication, the binary log on a master replication server provides a record of the data changes to be sent to slave servers. The master server sends the events contained in its binary log to its slaves, which execute those events to make the same data changes that were made on the master. .
  • Certain data recovery operations require use of the binary log. After a backup has been restored, the events in the binary log that were recorded after the backup was made are re-executed. These events bring databases up to date from the point of the backup.

Typically, MySQL uses several logging formats to record information in the binary log. There are three logging formats:

  • Replication capabilities in MySQL originally were based on propagation of SQL statements from master to slave. This is called statement-based logging.
  • In row-based logging, the master writes events to the binary log that indicate how individual table rows are affected.
  • A third option is also available: mixed logging. With mixed logging, statement-based logging is used by default, but the logging mode switches automatically to row-based in certain cases.

MySQL on Amazon RDS supports both the row-based and mixed binary logging formats for MySQL version 5.6. The default binary logging format is mixed. For DB instances running MySQL versions 5.1 and 5.5, only mixed binary logging is supported.

If the value of this measure grows consistently, it could mean that large binary files are being created. At this juncture, you may want to check the logging format configured for MySQL on Amazon RDS. This is because, very often, row-based binary logging format can result in very large binary log files. If you do not change the logging mode, then such files will continue to be created, thereby reducing the amount of storage space available for a DB instance. This in turn can increase the amount of time to perform a restore operation of a DB instance.

Database connections:

Indicates the number of database connections currently used by this instance / all instances that belong to this DB instance class / all instances using this DB engine

Number

 

Disk queue depth:

Indicates the number of outstanding IOs (read/write requests) waiting to access this DB instance / all DB instances that belong to this DB instance class / all DB instances using this DB engine.

Number

If the value of this measure keeps increasing steadily and significantly for a DB instance, it could indicate that the DB instance is latent, and is unable to process I/O requests quickly.

The value of this measure therefore should be low at all times.

Freeable memory:

Indicates the amount of available random access memory for this DB instance / all DB instances that belong to this DB instance class / all DB instances using this DB engine

MB

A high value is desired for this measure to ensure peak performance of a DB instance.

Replica lag time:

Indicates the amount of time a Read Replica DB Instance lags behind this source DB Instance / all source DB instances that belong to this DB instance class / all source DB instances using DB engine

Secs

This measure applies to MySQL read replicas only.

f your system runs on Amazon Relational Database Service (RDS) you may have opted to configure one or more replicas for your main MySQL database(s). This means you have a master RDS instance and at least one slave RDS instance which receives updates from the master. This process is called replication.

Replication ensures that changes made on the master database also happen on the slave after some period of time. For a variety of reasons this period of time can increase. For example, a long-running query or erroneous query can cause replication to slow down or stop entirely. This results in replication lag: changes made on your main database aren't showing up on the slave replica because the replica is lagging behind.

If the value of this measure is increasing consistently for a DB instance, it is a cause for concern, as it indicates that the slave is not in sync with the master and will take a long time to catch up. If for any reason the master DB instance fails at this juncture, there is bound to be significant data loss owing to the master-slave non-sync.

When there is a replication issue the output of show slave status; is quite useful in debugging and resolving it.

You need to review the values of:

Slave_SQL_Running

Last_Error

Last_SQL_Error

When a particular SQL query failed on the slave it could be that execution of queries in general has stopped. This is indicated by Slave_SQL_Running having the value No.

In that case you'll either need to:

  • Remedy the error by fixing the issue that caused the SQL query to fail.
  • Decide to resume replication by letting the slave ignore that error.

The former situation can be tricky as it requires you to figure out what data or query is problematic based on the values of Last_Error and Last_SQL_Error. These fields may provide enough information to determine any incorrect records but this is not always the case.

In the latter case you would execute the following command on the slave:

CALL mysql.rds_skip_repl_error;

You should only run this command when you've determined that skipping the SQL query won't lead to inconsistent data or incorrect data on the slave (or, at least, that this is allowed to occur by skipping that particular SQL query).

Swap usage:

 

Indicates the amount of swap space used on this DB instance / all DB instances that belong to this DB instance class / all DB instances using this DB engine.

KB

 

Read IOPS:

Indicates the rate at which disk read I/O operations were performed by this DB instance / all DB instances that belong to this DB instance class / all DB instances using this DB engine

Reads/Sec

Ideally, the value of this measure should be high. A consistent drop in this value could indicate a read latency.

Write IOPS:

Indicates the rate at which disk write I/O operations were performed by this DB instance / all DB instances that belong to this DB instance class / all DB instances using this DB engine

Writes/Sec

Ideally, the value of this measure should be high. A consistent drop in this value could indicate a write latency.

Read latency:

Indicates the average amount of time this DB instance / all DB instances of this instance class / all DB instances using this engine, took to service read requests.

Secs

Ideally, the value of this measure should be low. A consistent rise in this value could indicate a read latency. Compare the value of this measure across DB instances to know which instance is the slowest in servicing read requests.

Write latency:

Indicates the average amount of time this DB instance / all DB instances of this instance class / all DB instances using this engine, took to service write requests.

Secs

Ideally, the value of this measure should be low. A consistent rise in this value could indicate a write latency. Compare the value of this measure across DB instances to know which instance is the slowest in servicing write requests.

Read throughput:

Indicates the rate at which data was read from the disk by this DB instance / all DB instances of this instance class / all DB instances using this DB engine.

KB/Sec

Ideally, the value of this measure should be high. A steady decrease in this value could indicate a read latency. Compare the value of this measure across DB instances to know which instance is the slowest in servicing read requests.

Write throughput:

Indicates the rate at which data was written to the disk by this DB instance / all DB instances of this instance class / all DB instances using this DB engine.

KB/Sec

Ideally, the value of this measure should be high. A steady decrease in this value could indicate a write latency. Compare the value of this measure across DB instances to know which instance is the slowest in servicing write requests.

Network receive throughput:

Indicates the incoming network traffic on this DB instance / all DB instances that belong to this instance class / all DB instances using this engine.

KB/Secs

The value of these measures includes both customer database traffic and Amazon RDS traffic used for monitoring and replication.

A high value for these measures is indicative of high bandwidth usage by a DB instance. Under such circumstances, compare the value of the Network receive throughput measure with that of th Network transmit throughput measure to determine when the maximum bandwidth was consumed - when sending data or when receiving it?

Network transmit throughput:

 

Indicates the outgoing network traffic on this DB instance / all DB instances that belong to this instance class / all DB instances using this engine.

KB/Secs

Total storage space:

Indicates the total amount of storage space allocated to this DB instance / all DB instances that belong to this instance class / all DB instances using this DB engine.

MB

 

Used storage space:

Indicates the amount of storage space used by this DB instance / all DB instances that belong to this instance class / all DB instances using this DB engine.

MB

Compare the value of this measre across DB instances to know which instance is consuming storage space excessively.

Free storage space:

Indicates the amount of storage space still unused by this DB instance / all DB instances that belong to this instance class / all DB instances using this DB engine.

MB

A high value for this measure is ideal. Compare the value of this measure across DB instances to know which instance is left with very little free space.

Free storage space:

Indicates the percentage of storage space allocated to this DB instance / all DB instances that belong to this instance class / all DB instances using this DB engine, which is still available for use.

Percent

A value close to 100% is desired for this measure. If the value of this measure is below 50% consistently, it indicates that the DB instance is not sized with adequate resources. You may want to consider changing the DB instance class of that instance, so that more storage resources are available to it.