AWS DynamoDB Replication Gap Test
Amazon DynamoDB global tables are a fully managed, multi-region, and multi-active database option that delivers fast and localized read and write performance for massively scaled global applications. A DynamoDB global table is comprised of multiple replica tables. Each replica table exists in a different region, but all replicas have the same name and primary key. When data is written onto any replica table, DynamoDB automatically replicates that data to all other replica tables in the global table. However, any write throttling or degradation of AWS regions can affect the replication process, which will in turn cause loss of critical data, eventually leading to performance degradation and a bitter user experience. To avoid this, it is very imperative to keep track of the replication process and promptly identify these issues.
This test monitors every AWS replication region on which DynamoDB tables are replicated and reports the replication latencies, and pending replication counts. If either of these measure is elevated for an extended period, then that indicates throttling or AWS region degradation. Hence, using these metrics, administrators can proactively identify any problems in global table replication and remediate them before they affects the user experience.
Target of the test : An AWS DynamoDB server
Agent deploying the test : A remote agent
Outputs of the test : One set of results for each AWS replication region on which DynamoDB tables are replicated being monitored.
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the AWS DynamoDB server that is being monitored. |
AWS Region |
This test uses AWS SDK to interact with AWS DynamoDB and pull relevant metrics. To enable the test to connect to AWS, you need to configure the test with the name of the region to which all requests for metrics should be routed, by default. Specify the name of this AWS Region in this text box. |
AWS Access Key ID, AWS Secret Access Key and Confirm Password |
To monitor AWS DynamoDB, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm Password text box. |
Timeout Seconds |
Specify the maximum duration (in seconds) for which the test will wait for a response from the server. The default is 120 seconds. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement |
Description |
Measurement Unit |
Interpretation |
---|---|---|---|
Pending replication tables count |
Indicate the number of tables having pending replications on this region. |
Number |
|
Maximum pending replication by records |
Indicate the maximum number of item updates that are written to one replica table, but that have not yet been written to another replica in the global table on this region. |
Number |
During normal operation, pending replication tables count should be very low. If pending replication tables count increases for extended periods, investigate whether your replica tables' provisioned write capacity settings are sufficient for your current workload. Pending replication tables count can increase if an AWS Region becomes degraded and you have a replica table in that Region. In this case, you can temporarily redirect your application's read and write activity to a different AWS Region. Use the detailed diagnosis of this measure to find out the Table name, and Pending records. |
Maximum replication latency |
Indicates the maximum elapsed time between when an updated item appears in the DynamoDB stream for one replica table, and when that item appears in another replica in the global table on this region. |
Seconds |
During normal operation, Replication Latency should be fairly constant. An elevated value for Replication Latency could indicate that updates from one replica are not propagating to other replica tables in a timely manner. Over time, this could result in other replica tables falling behind because they no longer receive updates consistently. In this case, you should verify that the read capacity units (RCUs) and write capacity units (WCUs) are identical for each of the replica tables. Replication Latency can increase if an AWS Region becomes degraded and you have a replica table in that Region. In this case, you can temporarily redirect your application's read and write activity to a different AWS Region. The detailed diagnosis of Maximum replication latency measure shows the table name, and Latency in seconds. |
Minimum replication latency |
Indicates the minimum elapsed time between when an updated item appears in the DynamoDB stream for one replica table, and when that item appears in another replica in the global table on this region. |
Seconds |