Mongo Replication Throughput Test

Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.

A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes. The primary node receives all write operations. A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern. The primary records all changes to its data sets in its operation log, i.e. oplog. The oplog is a limited-size collection stored on primary nodes that keeps track of all the write operations. Secondary members replicate this log and apply the operations to their data sets.

If the secondary is unable to apply the changes as fast as they are written to the primary's oplog, then changes will be lost if the primary crashes. Similarly, if the oplog is not sized right, then it will not be able to hold many changes, thus causing significant data loss in the event of a primary failure. This is why, administrators should constantly measure the level of activity on a replica set, check whether the oplog is sized according to this workload, and also ensure that there is little-to-no time lag in data replication between the primary and secondaries of the replica set. This can be achieved using the Mongo Replication Throughput test.

This test tracks the operations performed on the replica set and in the process, reveals the load on the replica set. The test also tracks the usage of the oplog and alerts administrators if the oplog is about to run out of space for recording changes. Additionally, the test also keeps an eye out for long gaps between when a change is recorded in the primary's oplog and when it is actually applied on the secondary , and promptly notifies administrators of the same. This way, the test brings inconsistencies in data replication to the immediate attention of administrators and averts the data loss that might occur if the primary crashes.

Target of the test : A MongoDB server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the Mongo database server being monitored.

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed.
Host	The host for which the test is to be configured.
Port	The port number at which the specified host listens
Database Name	The test connects to a specific Mongo database to run API commands and pull metrics of interest. Specify the name of this database here. The default value of this parameter is admin.
Username and Password	The eG agent has to be configured with the credentials of a user who has the required privileges to monitor the target MongoDB instance, if the MongoDB instance is access control enabled. To know how to create such a user, refer to How to monitor access control enabled MongoDB database? . If the target MongoDB instance is not access control enabled, then, specify none against the Username and Password parameters.
Confirm Password	Confirm the password by retyping it here.
Authentication Mechanism	Typically, the MongoDB supports multiple authentication mechanisms that users can use to verify their identity. In environments where multiple authentication mechanisms are used, this test enables the users to select the authentication mechanism of their interest using this list box. By default, this is set to None. However, you can modify this settings as per the requirement.
SSL	By default, the SSL flag is set to No, indicating that the target MongoDB server is not SSL-enabled by default. To enable the test to connect to an SSL-enabled MongoDB server, set the SSL flag to Yes.
CA File	A certificate authority (CA) file contains root and intermediate certificates that are electronically signed to affirm that a public key belongs to the owner named in the certificate. If you are looking to monitor the certificates contained within a CA file, then provide the full path to this file in the CA File text box. For example, the location of this file may be: C:\cert\rootCA.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.
Certificate Key File	A Certificate Key File specifies the path on the server where your private key is stored. If you are looking to monitor the Certificate Key File, then provide the full path to this file in the Certificate Key File text box. For example, the location of this file may be: C:\cert\mongodb.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.
CA PEM File	A .pem file is a container that may just include the public certificate or the entire certificate chain (private key, public key and root certificates). If the connection requires server authentication and the server certificate is in the .pem format, then, the target instance presents the CA PEM File that contains the server certificate to its clients to establish the instance's identity. Therefore, you should specify the full path to the CA PEM file available in the target MongoDB server in the CA PEM File text box. For example, the location of this file may be: C:\app\openSSL\SSLcert\test-ca.pem.
Client PEM File	If the target instance requires a certificate key file that is in .pem format from the client to verify the client's identity, then, to establish a connection with the target server, the eG agent should access the client certificate. For this, specify the full path to the Client PEM file in the Client PEM File text box. For example, the location of this file may be: C:\app\openSSL\SSLcert\test-client.pem.
CA Cert File	This parameter is applicable only if the target MongoDB server is SSL-enabled and CA PEM File parameter is set to none.The certificate file is a public-key certificate following the x.509 standard. It contains information about the identity of the server, such as its name, geolocation, and public key. Essentially, it’s a certificate that the server serves to the connecting users to prove that they are what they claim to be. Therefore, specify the full path to the server root certificate or certificate file that is signed by the CA in .crt file format for the server in the CA Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\mongodb-test-ca.crt. By default, this parameter is set to none.
Client Cert File	This parameter is applicable only if the target MongoDB server is SSL-enabled and Client PEM File parameter is set to none.In order to collect metrics from the target MongoDB, the eG agent requires client certificate in .p12 format. Hence, specify the full path to the Client certificate file in .p12 format in the Client Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\test-client.p12 . To know how to generate .p12 file from Client PEM file, refer to How to import a Certificate that is in the PEM Format? By default, this parameter is set to none.
Client Cert Password	Provide the password for .p12 Client certificate file in the Client Cert Password text box.
AWS Key ID, AWS Secret Key,Confirm Password	If you are monitoring MongoDB server hosted on the AWS cloud, the eG agent has to be configured with the AWS AccessKey ID and Secret Key to connect with the AWS cloud and collect the required metrics. Therefore, Specify the AWS Key ID and AWS Secret Key and confirm the password by re-typing it in the Confirm Password text box. To obtain the AWS Access key and secret key, refer toObtaining AWS Access Key and Obtaining AWS Secret Key.
Atlas URI	MongoDB Atlas is a NoSQL Database-as-a-Service offering in the public cloud. If the target MongoDB server is deployed and managed in MongoDB Atlas, then the eG agent has to be configured with the MongoDB Atlas connection URI,a unique identifier for connecting to a MongoDB server, in the Atlas URI text box to access the target MongoDB server hosted on Atlas and collect the required metrics.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Insert operations	Indicates the rate at which replicated insert operations were performed on the target server.	Inserts/Sec	A consistent increase in the value of these measures could indicate a high level of activity on the replica set.
Query operations	Indicates the rate at which replicated query operations are performed on the target server.	Queries/Sec
Update operations	Indicates the rate at which replicated update operations are performed on the target server.	Updates/Sec
Delete operations	Indicates the rate at which replicated delete operations are performed on the target server.	Deletes/Sec
Get more operations	Indicates the rate at which replicated get more operations are performed on the target server.	Getmores/Sec	The value of this measure can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process.
Command operations	Indicates the rate at which replicated commands are issued to the target server.	Commands/Sec
Replication lag	Indicates how far a secondary is behind a primary.	Secs	Ideally, the value of this measure should be 0. If it is very high, then the integrity of your data set might be compromised in case of failover (secondary member taking over as the new primary because the current primary is unavailable). A high value also implies that write operations are not immediately propagated to secondaries; in this case, related changes might be lost if the primary fails. A high replication lag can be due to: Networking issue between the primary and secondary making nodes unreachable A secondary node applying data slower than the primary Insufficient write capacity in which case you should add more shards; Slow operations on the primary node blocking replication; Heavy write operations on the primary node or an under-provisioned secondary. You can prevent the latter by scaling up the secondary to match the primary capacity.
Oplog window	Indicates the interval of time between the oldest and the latest entries in the oplog.	Secs	If a secondary is down longer than this oplog window, it will be able to catch up unless it completely resyncs all data from the primary. The amount of time it takes to fill the oplog varies: during heavy traffic times, it will shrink since the oplog will receive more operations per second. If the oplog window for a primary node is getting too short you should consider increasing the size of your oplog.
Replication head room	Indicates the time difference between the primary’s oplog window and the replication lag of the secondary .	Secs	If the replication headroom is rapidly shrinking and is about to become negative, that means that the replication lag is getting higher than the oplog window. In that case, write operations recorded in the oplog will be overwritten before secondary nodes have time to replicate them. MongoDB will constantly have to resync the entire data set on this secondary which takes much longer than just fetching new changes from the oplog. Properly monitoring and alerting on Replication lag and Oplog window should allow you to prevent this.
Total oplog size	Indicates the amount of space allocated to the oplog.	MB
Used oplog size	Indicates the amount of space currently used by operations stored in the oplog.	MB	If this value grows closer to the Total oplog size, it is a cause for concern. This is because it implies that the oplog may soon not have enough space to record any more changes. To avoid this, you may want to consider resizing your oplog. Before that, check the level of activity on the replica set and figure out what would be the ideal size setting for the oplog, so that its able to capture all the changes that occur on the replica set. Before mongod creates an oplog, you can specify its size with the oplogSizeMB option. If you can predict your replica set’s workload to resemble one of the following patterns, then you might want to create an oplog that is larger than the default. Conversely, if your application predominantly performs reads with a minimal amount of write operations, a smaller oplog may be sufficient. The following workloads might require a larger oplog size. The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use. If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.