Mongo Replication Throughput Test
Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.
A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes. The primary node receives all write operations. A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern. The primary records all changes to its data sets in its operation log, i.e. oplog. The oplog is a limited-size collection stored on primary nodes that keeps track of all the write operations. Secondary members replicate this log and apply the operations to their data sets.
If the secondary is unable to apply the changes as fast as they are written to the primary's oplog, then changes will be lost if the primary crashes. Similarly, if the oplog is not sized right, then it will not be able to hold many changes, thus causing significant data loss in the event of a primary failure. This is why, administrators should constantly measure the level of activity on a replica set, check whether the oplog is sized according to this workload, and also ensure that there is little-to-no time lag in data replication between the primary and secondaries of the replica set. This can be achieved using the Mongo Replication Throughput test.
This test tracks the operations performed on the replica set and in the process, reveals the load on the replica set. The test also tracks the usage of the oplog and alerts administrators if the oplog is about to run out of space for recording changes. Additionally, the test also keeps an eye out for long gaps between when a change is recorded in the primary's oplog and when it is actually applied on the secondary , and promptly notifies administrators of the same. This way, the test brings inconsistencies in data replication to the immediate attention of administrators and averts the data loss that might occur if the primary crashes.
Target of the test : A MongoDB server
Agent deploying the test : An internal/remote agent
Outputs of the test : One set of results for the Mongo database server being monitored.
Parameter | Description |
Test period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Port |
The port number at which the specified host listens |
Database Name |
The test connects to a specific Mongo database to run API commands and pull metrics of interest. Specify the name of this database here. The default value of this parameter is admin. |
Username and Password |
The eG agent has to be configured with the credentials of a user who has the required privileges to monitor the target MongoDB instance, if the MongoDB instance is access control enabled. To know how to create such a user, refer to How to monitor access control enabled MongoDB database?. If the target MongoDB instance is not access control enabled, then, specify none against the Username and Password parameters. |
Confirm Password |
Confirm the password by retyping it here. |
Authentication Mechanism |
Typically, the MongoDB supports multiple authentication mechanisms that users can use to verify their identity. In environments where multiple authentication mechanisms are used, this test enables the users to select the authentication mechanism of their interest using this list box. By default, this is set to None. However, you can modify this settings as per the requirement. |
SSL |
By default, the SSL flag is set to No, indicating that the target MongoDB server is not SSL-enabled by default. To enable the test to connect to an SSL-enabled MongoDB server, set the SSL flag to Yes. |
CA File |
A certificate authority (CA) file contains root and intermediate certificates that are electronically signed to affirm that a public key belongs to the owner named in the certificate. If you are looking to monitor the certificates contained within a CA file, then provide the full path to this file in the CA File text box. For example, the location of this file may be: C:\cert\rootCA.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none. |
Certificate Key File |
A Certificate Key File specifies the path on the server where your private key is stored. If you are looking to monitor the Certificate Key File, then provide the full path to this file in the Certificate Key File text box. For example, the location of this file may be: C:\cert\mongodb.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none. |
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Insert operations |
Indicates the rate at which replicated insert operations were performed on the target server. |
Inserts/Sec |
A consistent increase in the value of these measures could indicate a high level of activity on the replica set. |
Query operations |
Indicates the rate at which replicated query operations are performed on the target server. |
Queries/Sec |
|
Update operations |
Indicates the rate at which replicated update operations are performed on the target server. |
Updates/Sec |
|
Delete operations |
Indicates the rate at which replicated delete operations are performed on the target server. |
Deletes/Sec |
|
Get more operations |
Indicates the rate at which replicated get more operations are performed on the target server. |
Getmores/Sec |
The value of this measure can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process. |
Command operations |
Indicates the rate at which replicated commands are issued to the target server. |
Commands/Sec |
|
Replication lag |
Indicates how far a secondary is behind a primary. |
Secs |
Ideally, the value of this measure should be 0. If it is very high, then the integrity of your data set might be compromised in case of failover (secondary member taking over as the new primary because the current primary is unavailable). A high value also implies that write operations are not immediately propagated to secondaries; in this case, related changes might be lost if the primary fails. A high replication lag can be due to:
|
Oplog window |
Indicates the interval of time between the oldest and the latest entries in the oplog. |
Secs |
If a secondary is down longer than this oplog window, it will be able to catch up unless it completely resyncs all data from the primary. The amount of time it takes to fill the oplog varies: during heavy traffic times, it will shrink since the oplog will receive more operations per second. If the oplog window for a primary node is getting too short you should consider increasing the size of your oplog. |
Replication head room |
Indicates the time difference between the primary’s oplog window and the replication lag of the secondary . |
Secs |
If the replication headroom is rapidly shrinking and is about to become negative, that means that the replication lag is getting higher than the oplog window. In that case, write operations recorded in the oplog will be overwritten before secondary nodes have time to replicate them. MongoDB will constantly have to resync the entire data set on this secondary which takes much longer than just fetching new changes from the oplog. Properly monitoring and alerting on Replication lag and Oplog window should allow you to prevent this. |
Total oplog size |
Indicates the amount of space allocated to the oplog. |
MB |
|
Used oplog size |
Indicates the amount of space currently used by operations stored in the oplog. |
MB |
If this value grows closer to the Total oplog size, it is a cause for concern. This is because it implies that the oplog may soon not have enough space to record any more changes. To avoid this, you may want to consider resizing your oplog. Before that, check the level of activity on the replica set and figure out what would be the ideal size setting for the oplog, so that its able to capture all the changes that occur on the replica set. Before mongod creates an oplog, you can specify its size with the oplogSizeMB option. If you can predict your replica set’s workload to resemble one of the following patterns, then you might want to create an oplog that is larger than the default. Conversely, if your application predominantly performs reads with a minimal amount of write operations, a smaller oplog may be sufficient. The following workloads might require a larger oplog size.
|