Mongo Top Collections Test

Administrators often need to track the level of activity on the Mongo databases, so that they can identify the busy databases and the type of activity that is keeping them busy. The time spent on each activity should also be determined, so that activities taking an abnormally long time can be isolated and the reasons investigated. Administrators also need help in understanding how well each database handles concurrency. Are locks held for long or released quickly? Which type of locks are held for too long?, are some of the questions on concurrency for which administrators often need quick and accurate answers. The Mongo Top Collections test provides useful insights into database activity, thereby alleviating many of the administrative pains related to the same. For each database on the server, the test reports the rate at which operations such as querying, inserting, updating, deleting, etc., are performed on that database. This points to busy databases and what is keeping them busy! Additionally, the test reveals where the database spent most of its time - query execution? updation? insertion? command execution? Detailed diagnostics of the test also point you to the exact collection in a database that prolonged each of these operations, thus enabling administrators to swoop down on problem collections. Furthermore, the test focuses on the locking activity in each database. The rate, type, and duration of locks held are reported, and administrators proactively alerted to locking-related irregularities. In the event of an alert, you can use the detailed diagnostics of the test to identify which collections hold the maximum locks and which ones have held locks for the maximum time.

Target of the test : A MongoDB server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each database in the server being monitored.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens.

Database Name

The test connects to a specific Mongo database to run API commands and pull metrics of interest. Specify the name of this database here. The default value of this parameter is admin.

Username and Password

The eG agent has to be configured with the credentials of a user who has the required privileges to monitor the target MongoDB instance, if the MongoDB instance is access control enabled. To know how to create such a user, refer to How to monitor access control enabled MongoDB database?. If the target MongoDB instance is not access control enabled, then, specify none against the Username and Password parameters.

Confirm Password

Confirm the password by retyping it here.

Authentication Mechanism

Typically, the MongoDB supports multiple authentication mechanisms that users can use to verify their identity. In environments where multiple authentication mechanisms are used, this test enables the users to select the authentication mechanism of their interest using this list box. By default, this is set to None. However, you can modify this settings as per the requirement.

SSL

By default, the SSL flag is set to No, indicating that the target MongoDB server is not SSL-enabled by default. To enable the test to connect to an SSL-enabled MongoDB server, set the SSL flag to Yes.

CA File

A certificate authority (CA) file contains root and intermediate certificates that are electronically signed to affirm that a public key belongs to the owner named in the certificate. If you are looking to monitor the certificates contained within a CA file, then provide the full path to this file in the CA File text box. For example, the location of this file may be: C:\cert\rootCA.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.

Certificate Key File

A Certificate Key File specifies the path on the server where your private key is stored. If you are looking to monitor the Certificate Key File, then provide the full path to this file in the Certificate Key File text box. For example, the location of this file may be: C:\cert\mongodb.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Collections

Indicates the number of collections in this database.

Number

 

Modified collections

Indicates the number of collections in this database that were modified since the last measurement period.

Number

 

Read locks

Indicates the rate at which read locks were held on this database during the last measurement period.

Locks/Sec

A read lock is typically granted when multiple clients are attempting to issue a query to the same collection in a database or get more data from the same cursor.

If the value of this measure is consistently increasing for a database, it hints at a probable read lock contention on that database. Subsequent read requests to that database will hence be blocked. Lock contention may also lead to high CPU usage on the database server.

At this juncture therefore, you may want to know which collection in that database is locked and for how long, so that collections holding the maximum locks can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of read locks on each collection, and the duration of the read locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that is holding an unusually large number of read locks. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that read locks are released and server performance is enhanced.

Average read lock time

Indicates the average duration for which read locks were held by collections on this database.

Millisecs

If the value of this measure is consistently increasing for a database, it implies that one/more collections in that database are probably locked for an unduly long time. This can prevent subsequent requests from accessing those collections. Lock contention may also lead to high CPU usage on the database server.

At this juncture therefore, you may want to know which collection in that database is locked and for how long, so that collections holding locks for the maximum time can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of read locks on each collection, and the duration of the read locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that has been holding locks for the longest time. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that read locks are released and server performance is enhanced.

Write locks

Indicates the number of write locks on this database during the last measurement period.

Number

Typically, a write lock is granted on a collection if multiple clients are attempting to access that collection for inserting, modifying, or removing data.

If the value of this measure is consistently increasing for a database, it implies that many collections in that database are probably locked or one/more collections are holding many locks. This can prevent subsequent requests from writing data into those collections. Lock contention may also lead to high CPU usage on the database server.

At this juncture therefore, you may want to know which collections in that database are locked and how many locks are held by each collection, so that collections holding the maximum number of locks can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of write locks on each collection, and the duration of the write locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that is holding an unusually large number of write locks. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that write locks are released and server performance is enhanced.

Average write lock time

Indicates the average duration for which write locks were held by collections on this database.

Millisecs

If the value of this measure is consistently increasing for a database, it implies that one/more collections in that database are probably locked for an unduly long time. This can prevent subsequent requests from accessing those collections. Lock contention may also lead to high CPU usage on the database server.

At this juncture therefore, you may want to know which collection in that database is locked and for how long, so that collections holding locks for the maximum time can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of write locks on each collection, and the duration of the write locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that has been locked for an unduly long time. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that write locks are released and server performance is enhanced.

Query operations

Indicates the rate at which query operations are performed on this database.

Queries/Sec

 

Insert operations

Indicates the rate at which data insert operations are performed on this database.

Inserts/Sec

 

Update operations

Indicates the rate at which update operations are performed on this database.

Updates/Sec

 

Delete operations

Indicates the rate at which delete operations are performed on this database.

Deletes/Sec

 

Get more operations

Indicates the rate at which get more operations are performed on this database.

Get more/Sec

A get more command is typically used in conjunction with commands that return a cursor, e.g. find and aggregate, to return subsequent batches of documents currently pointed to by the cursor.

Command operations

Indicates the rate at which data insert operations are performed on this database.

Inserts/Sec

Certain administrative commands can exclusively lock the database for extended periods of time. For instance, db.collection.createIndex(), when issued without setting background to true, reIndex, compact, db.repairDatabase(), db.createCollection(), when creating a very large (i.e. many gigabytes) capped collection,db.collection.validate(), and db.copyDatabase().

Some administrative commands lock the database but only hold the lock for a very short time - eg., db.collection.dropIndex(), db.getLastError(), db.addUser(), etc.

Some other commands can even lock multiple databases. These include, db.copyDatabase(), db.repairDatabase(), etc.

Total locks

Indicates the rate at which locks (both read and write) are held by this database.

Locks/Sec

If the value of this measure is increasing consistently for a database, then, check the values of the Read locks and Write locks measures for that database. This will indicate the type of locks that are held more by the database. You can then use the detailed diagnosis of the corresponding measure to figure out which collection is holding the locks and for how long.

Average lock time

Indicates the average duration for which this database held locks (both read and write).

Millisecs

If the value of this measure is increasing consistently for a database, then, check the values of the Average read lock time and Average write lock time measures for that database. This will indicate the type of locks that are held for the maximum time. You can then use the detailed diagnosis of the corresponding measure to figure out which collection is holding the locks for the maximum time.

Maximum query time

Indicates the maximum time taken by query operations on this database.

Millisecs

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection in that database that executed queries slowly. The detailed diagnosis displays the top-10 collections in terms of query execution time. Besides the names of collections and the total time each took to execute queries, the detailed diagnosis also displays the number of query operations performed by each collection and the average query time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused query execution to be slow on a collection - is it because of too many queries to that collection? or is it owing to a few long running queries on that collection?

Maximum get more time

Indicates the maximum time taken by get more operations on this database.

Millisecs

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection in that database that executed get more commands slowly. The detailed diagnosis displays the top-10 collections in terms of get more command execution time. Besides the names of collections and the total time each took to execute get more commands, the detailed diagnosis also displays the number of get more operations performed by each collection and the average get more time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused get more command execution to be slow on a collection - is it because of a command overload on that collection? or is it because a few commands took a significantly longer time to execute the commands?

Maximum insert time

Indicates the maximum time taken by insert operations on this database.

Millisecs

If the value of this measure is abnormally high, use the detailed diagnosis of the measure to identify the collection that took the maximum time for performing inserts. The detailed diagnosis of this measure displays the top-10 collections in terms of insert time. Besides the names of collections and the total time each took to insert data, the detailed diagnosis also displays the number of insert operations performed by each collection and the average insert time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused query execution to be slow on a collection - is it because of too many inserts on that collection? or is it owing to a few long running inserts on that collection?

Maximum update time

Indicates the maximum time taken by update operations on this database.

Millisecs

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the exact collection in that database, on which updates were slowest. The detailed diagnosis displays the top-10 collections in terms of update time. Besides the names of collections and the total time each took to perform update operations, the detailed diagnosis also displays the number of update operations performed by each collection and the average update ime per collection (i.e., Execution rate). This information will enable you to figure out what could have caused updates to be slow on a collection - is it because of too many updates to that collection? or is it because a few updates took too long a time?

Maximum delete time

Indicates the maximum time taken by delete operations on this database.

Millisecs

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection that performed deletes most slowly. The detailed diagnosis displays the top-10 collections in terms of deletion time. Besides the names of collections and the total time each took to delete data, the detailed diagnosis also displays the number of delete operations performed by each collection and the average delete time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused deletion to be slow on a collection - is it because of too many delete requests to that collection? or is it because a few delete operations took too long a time?

Maximum command time

Indicates the maximum time taken by this database to execute commands.

Millisecs

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection that was the slowest in executing commands. The detailed diagnosis displays the top-10 collections in terms of command time. Besides the names of collections and the total time each took to perform command operations, the detailed diagnosis also displays the number of command operations performed by each collection and the average command time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused a collection to execute commands slowly - is it because of a command overload? or is it because of a few long running commands?

Maximum query time rate

Indicates the maximum time this database took to perform a single query operation.

Millisecs/execution

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single query operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute query operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of query operations performed by each collection and the total query time per collection. If the Execution rate is equal to or close to the total Query time of a collection, you can conclude that one or very few query operations are taking too long to execute on that collection.

Maximum get more time rate

Indicates the maximum time this database took to perform a get more operation.

Millisecs/execution

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single get more operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute get more operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of get more operations performed by each collection and the total get more time per collection. If the Execution rate is equal to or close to the total Get more time of a collection, you can conclude that one or very few get more operations are taking too long to execute on that collection.

Maximum insert time rate

Indicates the maximum time this database took to perform a single insert operation.

Millisecs/execution

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single insert operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute insert operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of insert operations performed by each collection and the total insert time per collection. If the Execution rate is equal to or close to the total Insert time of a collection, you can conclude that one or very few insert operations are taking too long to execute on that collection.

Maximum update time rate

Indicates the maximum time this database took to perform a single update operation.

Millisecs/execution

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single update operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute update operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of update operations performed by each collection and the total update time per collection. If the Execution rate is equal to or close to the total Update time of a collection, you can conclude that one or very few update operations are taking too long to execute on that collection.

Maximum delete time rate

Indicates the maximum time this database took to perform a single delete operation.

Millisecs/execution

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single delete operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute delete operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of delete operations performed by each collection and the total delete time per collection. If the Execution rate is equal to or close to the total delete time of a collection, you can conclude that one or very few delete operations are taking too long to execute on that collection.

Maximum command time rate

Indicates the maximum time this database took to perform a single command operation.

Millisecs/execution

If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single command operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute command operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of command operations performed by each collection and the total command time per collection. If the Execution rate is equal to or close to the total Command time of a collection, you can conclude that one or very few command operations are taking too long to execute on that collection.

The detailed diagnosis of the Read locks measure helps find out collections that are locked, the count of read locks on each collection, and the duration of the read locks.

Figure 5 : The detailed diagnosis of the Read locks measure

The detailed diagnosis of the Average read lock time measure reveals the collections that are locked, the count of read locks on each collection, and the duration of the read locks. Using these statistics, you can quickly identify the collection that has been holding locks for the longest time.

Figure 6 : The detailed diagnosis of the Average read lock time measure

The detailed diagnosis of the Maximum query time measure displays the top-10 collections in terms of query execution time. Besides the names of collections and the total time each took to execute queries, the detailed diagnosis also displays the number of query operations performed by each collection and the average query time per collection (i.e., Execution rate)

Figure 7 : The detailed diagnosis of the Maximum query time measure

The detailed diagnosis of the Maximum command time displays the top-10 collections in terms of command time. Besides the names of collections and the total time each took to perform command operations, the detailed diagnosis also displays the number of command operations performed by each collection and the average command time per collection (i.e., Execution rate).

Figure 8 : The detailed diagnosis of the Maximum command time measure

The detailed diagnosis of the Maximum query time rate measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute query operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of query operations performed by each collection and the total query time per collection.

Figure 9 : The detailed diagnosis of the Maximum query time rate measure

the detailed diagnosis of the Maximum command time rate measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute command operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of command operations performed by each collection and the total command time per collection.

Figure 10 : The detailed diagnosis of the Maximum command time measure