Mongo Replication Status Test

A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes. The primary node receives all write operations. A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern. The primary records all changes to its data sets in its operation log, i.e. oplog. Secondary members replicate this log and apply the operations to their data sets.

When a primary does not communicate with the other members of the set for more than 10 seconds, an eligible secondary will hold an election to elect itself the new primary. The first secondary to hold an election and receive a majority of the members’ votes becomes primary. When such a switch happens, it is only natural that administrators want to be notified of it, as they may then need to troubleshoot the failure of the primary and bring it back up.

Also, for the failover to be successful, the current primary should be able to access at least the majority of members in the replica set. If not, then the primary will step down and become a secondary, rendering the replica set unable to accept any further writes. To avoid this, administrators should be able to instantly detect the inaccessibility or unavailability of any member in the replica set, and quickly restore it to normalcy.

The Mongo Replication Status test enables administrators to achieve these goals! 

Using this test, administrators can keep track of the status of each member node and be promptly alerted if that node stops running or switches to an abnormal state. Furthermore, the test keeps tabs on heartbeats received from each member node, pinpoints the member node from which heartbeats were not received for a long time, and thus proactively alerts administrators to the potential non-availability of a node. This way, administrators will be able to ensure that quorum is maintained in the replica set and the primary node is able to communicate with each member node in the replica set. The test also notifies administrators if the primary of the replica set has switched. Detailed diagnostics reveals when the switch occurred and what is the current primary.

Target of the test : A MongoDB server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each member node in a replica set.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens.

Database Name

The test connects to a specific Mongo database to run API commands and pull metrics of interest. Specify the name of this database here. The default value of this parameter is admin.

Username and Password

The eG agent has to be configured with the credentials of a user who has the required privileges to monitor the target MongoDB instance, if the MongoDB instance is access control enabled. To know how to create such a user, refer to How to monitor access control enabled MongoDB database?. If the target MongoDB instance is not access control enabled, then, specify none against the Username and Password parameters.

Confirm Password

Confirm the password by retyping it here.

Authentication Mechanism

Typically, the MongoDB supports multiple authentication mechanisms that users can use to verify their identity. In environments where multiple authentication mechanisms are used, this test enables the users to select the authentication mechanism of their interest using this list box. By default, this is set to None. However, you can modify this settings as per the requirement.

SSL

By default, the SSL flag is set to No, indicating that the target MongoDB server is not SSL-enabled by default. To enable the test to connect to an SSL-enabled MongoDB server, set the SSL flag to Yes.

CA File

A certificate authority (CA) file contains root and intermediate certificates that are electronically signed to affirm that a public key belongs to the owner named in the certificate. If you are looking to monitor the certificates contained within a CA file, then provide the full path to this file in the CA File text box. For example, the location of this file may be: C:\cert\rootCA.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.

Certificate Key File

A Certificate Key File specifies the path on the server where your private key is stored. If you are looking to monitor the Certificate Key File, then provide the full path to this file in the Certificate Key File text box. For example, the location of this file may be: C:\cert\mongodb.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Is node running?

Indicates whether this node is currently running or not.

Boolean

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, the measure reports only the Measure Values provided in the table above to indicate whether/not a node is running. In the graph of this measure however, the same is indicated using the numeric equivalents only.

If any replicated node is down, then it will not be responding to requests from any available node in the replica set.

Current status

Indicates the current status of this node.

 

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Description Numeric Value
STARTUP

Not yet an active member of any set. All members start up in this state. The mongod parses the replica set configuration document while in STARTUP.

0
PRIMARY

The member in state primary is the only member that can accept write operations. Eligible to vote.

1
SECONDARY

A member in state secondary is replicating the data store. Eligible to vote.

2
RECOVERING

Members either perform startup self-checks, or transition from completing a rollback or resync. Eligible to vote.

3
STARTUP2

The member has joined the set and is running an initial sync.

5
UNKNOWN

The member’s state, as seen from another member of the set, is not yet known.

6
ARBITER

Arbiters do not replicate data and exist solely to participate in elections.

An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a vote in elections for primary. Arbiters always have exactly 1 election vote, and thus allow replica sets to have an uneven number of voting members without the overhead of an additional member that replicates data.

7
DOWN

The member, as seen from another member of the set, is unreachable.

8
ROLLBACK

This member is actively performing a rollback. A rollback reverts write operations on a former primary when the member rejoins its replica set after a failover.

Data is not available for reads.

9
REMOVE

This member was once in a replica set but was subsequently removed.

10

Note:

By default, the measure reports only the Measure Values provided in the table above to indicate the current status of a node. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Is primary switched?

Indicates whether/not a failover occurred from primary to secondary.

Boolean

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, the measure reports only the Measure Values provided in the table above to indicate whether/not a failover has occurred. In the graph of this measure however, the same is indicated using the numeric equivalents only.

The detailed diagnosis of this measure reveals the current primary, the previous primary, and when the failover occurred. With the help of these details, administrators will be able to rapidly identify which primary failed, and which secondary node has now been elected as the primary.

Uptime

Indicates the total uptime of this member node.

Minutes

A very low uptime could imply that the member node restarted recently.

Last heartbeat time

Indicates the time that has elapsed since the last heartbeat was received from this node.

Secs

If the value of this measure increases consistently, it could imply a prolonged non-availability of the member node.

Is primary?

Indicates whether/not this node is the primary node of the replica set.

 

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, the measure reports only the Measure Values provided in the table above to indicate whether/not a node is the primary node. In the graph of this measure however, the same is indicated using the numeric equivalents only.