AWS DynamoDB Replication Gap Test
Amazon DynamoDB global tables are a fully managed, multi-region and multi-active database option that delivers fast and localized read and write performance for massively scaled global applications. A DynamoDB global table is comprised of multiple replica tables. Each replica table exists in a different regions, but all replicas have the same name and primary key. When data is written to any replica table, DynamoDB automatically replicates that data to all other replica tables in the global table. However, any write throttling or degradation of AWS region performance can affect the replication process, that will, in turn, cause the loss of critical data, eventually leading to performance degradation and a bitter user experience. To avoid this, it is imperative to keep track of the replication process and promptly identify these issues.
This test monitors every AWS replication region on which DynamoDB tables are replicated and reports the replication latencies, and pending replication counts. If either of these measures is elevated for an extended period, then it indicates throttling or AWS region degradation. Hence, using these metrics, administrators can proactively identify any problems in global table replication and remediate them before they affect the user experience.
Target of the test : An AWS DynamoDB server
Agent deploying the test : A remote agent
Outputs of the test : One set of results for each AWS replication region on which DynamoDB tables are replicated being monitored.
| Parameter | Description |
|---|---|
|
Test Period |
How often should the test be executed. |
|
Host |
The IP address of the AWS DynamoDB server that is being monitored. |
|
AWS Region |
This test uses AWS SDK to interact with AWS DynamoDB and pull relevant metrics. To enable the test to connect to AWS, you need to configure the test with the name of the region to which all requests for metrics should be routed, by default. Specify the name of this AWS Region in this text box. |
|
AWS Access Key ID, AWS Secret Access Key and Confirm Password |
To monitor AWS DynamoDB, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm Password text box. |
|
Timeout Seconds |
Specify the maximum duration (in seconds) for which the test will wait for a response from the server. The default is 120 seconds. |
|
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
|
Measurement |
Description |
Measurement Unit |
Interpretation |
|---|---|---|---|
|
Pending replication tables count |
Indicates the number of tables having pending replications in this region. |
Number |
|
|
Maximum pending replication by records |
Indicates the maximum number of item updates that are written to one replica table, but that have not yet been written to another replica in the global table in this region. |
Number |
During normal operation, pending replication tables count should be very low. If pending replication tables count increases for extended periods, investigate whether your replica tables' provisioned write capacity settings are sufficient for your current workload. Pending replication tables count can increase if an AWS Region becomes degraded and you have a replica table in that Region. In this case, you can temporarily redirect your application's read and write activity to a different AWS Region. Use the detailed diagnosis of this measure to find out the Table name, and Pending records. |
|
Maximum replication latency |
Indicates the maximum elapsed time between elapsed time between an updated item appearing in the DynamoDB stream for one replica table, and when that item appears in another replica in the global table in this region. |
Seconds |
During normal operation, Replication Latency should be fairly constant. An elevated value for Replication Latency could indicate that updates from one replica are not propagating to other replica tables in a timely manner. Over time, this could result in other replica tables falling behind because they no longer receive updates consistently. In this case, you should verify that the read capacity units (RCUs) and write capacity units (WCUs) are identical for each of the replica tables. Replication Latency can increase if an AWS Region becomes degraded and you have a replica table in that Region. In this case, you can temporarily redirect your application's read and write activity to a different AWS Region. The detailed diagnosis of Maximum replication latency measure shows the table name, and Latency in seconds. |
|
Minimum replication latency |
Indicates the minimum elapsed time between an updated item appearing in the DynamoDB stream for one replica table, and when that item appears in another replica in the global table in this region. |
Seconds |