AWS DynamoDB Replication Gap Test

Amazon DynamoDB global tables are a fully managed, multi-region, and multi-active database option that delivers fast and localized read and write performance for massively scaled global applications. A DynamoDB global table is comprised of multiple replica tables. Each replica table exists in a different region, but all replicas have the same name and primary key. When data is written onto any replica table, DynamoDB automatically replicates that data to all other replica tables in the global table. However, any write throttling or degradation of AWS regions can affect the replication process, which will in turn cause loss of critical data, eventually leading to performance degradation and a bitter user experience. To avoid this, it is very imperative to keep track of the replication process and promptly identify these issues.

This test monitors every AWS replication region on which DynamoDB tables are replicated and reports the replication latencies, and pending replication counts. If either of these measure is elevated for an extended period, then that indicates throttling or AWS region degradation. Hence, using these metrics, administrators can proactively identify any problems in global table replication and remediate them before they affects the user experience.

Target of the test : An AWS DynamoDB server

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each AWS replication region on which DynamoDB tables are replicated being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the AWS DynamoDB server that is being monitored.

AWS Region

This test uses AWS SDK to interact with AWS DynamoDB and pull relevant metrics. To enable the test to connect to AWS, you need to configure the test with the name of the region to which all requests for metrics should be routed, by default. Specify the name of this AWS Region in this text box.

AWS Access Key ID, AWS Secret Access Key and Confirm Password

To monitor AWS DynamoDB, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm Password text box.

Timeout Seconds

Specify the maximum duration (in seconds) for which the test will wait for a response from the server. The default is 120 seconds.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Pending replication tables count

Indicate the number of tables having pending replications on this region.

Number

 

Maximum pending replication by records

Indicate the maximum number of item updates that are written to one replica table, but that have not yet been written to another replica in the global table on this region.

Number

During normal operation, pending replication tables count should be very low. If pending replication tables count increases for extended periods, investigate whether your replica tables' provisioned write capacity settings are sufficient for your current workload. Pending replication tables count can increase if an AWS Region becomes degraded and you have a replica table in that Region. In this case, you can temporarily redirect your application's read and write activity to a different AWS Region.

Use the detailed diagnosis of this measure to find out the Table name, and Pending records.

Maximum replication latency

Indicates the maximum elapsed time between when an updated item appears in the DynamoDB stream for one replica table, and when that item appears in another replica in the global table on this region.

Seconds

During normal operation, Replication Latency should be fairly constant. An elevated value for Replication Latency could indicate that updates from one replica are not propagating to other replica tables in a timely manner. Over time, this could result in other replica tables falling behind because they no longer receive updates consistently. In this case, you should verify that the read capacity units (RCUs) and write capacity units (WCUs) are identical for each of the replica tables.

Replication Latency can increase if an AWS Region becomes degraded and you have a replica table in that Region. In this case, you can temporarily redirect your application's read and write activity to a different AWS Region.

The detailed diagnosis of Maximum replication latency measure shows the table name, and Latency in seconds.

Minimum replication latency

Indicates the minimum elapsed time between when an updated item appears in the DynamoDB stream for one replica table, and when that item appears in another replica in the global table on this region.

Seconds