Mongo Collection Statistics Test

Administrators often need to track the level of activity on the Mongo databases, so that they can identify the busy databases and the type of activity that is keeping them busy. The time spent on each activity should also be determined, so that activities taking an abnormally long time can be isolated and the reasons investigated. Administrators also need help in understanding how well each database handles concurrency. Are locks held for long or released quickly? Which type of locks are held for too long?, are some of the questions on concurrency for which administrators often need quick and accurate answers. The Mongo Collection Statistics test provides useful insights into database activity, thereby alleviating many of the administrative pains related to the same. For each database on the server, the test reports the rate at which operations such as querying, inserting, updating, deleting, etc., are performed on that database. This points to busy databases and what is keeping them busy! Additionally, the test reveals where the database spent most of its time - query execution? updation? insertion? command execution? Detailed diagnostics of the test also point you to the exact collection in a database that prolonged each of these operations, thus enabling administrators to swoop down on problem collections. Furthermore, the test focuses on the locking activity in each database. The rate, type, and duration of locks held are reported, and administrators proactively alerted to locking-related irregularities. In the event of an alert, you can use the detailed diagnostics of the test to identify which collections hold the maximum locks and which ones have held locks for the maximum time.

Target of the test : A MongoDB server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each database in the server being monitored.

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed.
Host	The host for which the test is to be configured.
Port	The port number at which the specified host listens.
Database Name	The test connects to a specific Mongo database to run API commands and pull metrics of interest. Specify the name of this database here. The default value of this parameter is admin.
Username and Password	The eG agent has to be configured with the credentials of a user who has the required privileges to monitor the target MongoDB instance, if the MongoDB instance is access control enabled. To know how to create such a user, refer to How to monitor access control enabled MongoDB database? . If the target MongoDB instance is not access control enabled, then, specify none against the Username and Password parameters.
Confirm Password	Confirm the password by retyping it here.
Authentication Mechanism	Typically, the MongoDB supports multiple authentication mechanisms that users can use to verify their identity. In environments where multiple authentication mechanisms are used, this test enables the users to select the authentication mechanism of their interest using this list box. By default, this is set to None. However, you can modify this settings as per the requirement.
SSL	By default, the SSL flag is set to No, indicating that the target MongoDB server is not SSL-enabled by default. To enable the test to connect to an SSL-enabled MongoDB server, set the SSL flag to Yes.
CA File	A certificate authority (CA) file contains root and intermediate certificates that are electronically signed to affirm that a public key belongs to the owner named in the certificate. If you are looking to monitor the certificates contained within a CA file, then provide the full path to this file in the CA File text box. For example, the location of this file may be: C:\cert\rootCA.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.
Certificate Key File	A Certificate Key File specifies the path on the server where your private key is stored. If you are looking to monitor the Certificate Key File, then provide the full path to this file in the Certificate Key File text box. For example, the location of this file may be: C:\cert\mongodb.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.
CA PEM File	A .pem file is a container that may just include the public certificate or the entire certificate chain (private key, public key and root certificates). If the connection requires server authentication and the server certificate is in the .pem format, then, the target instance presents the CA PEM File that contains the server certificate to its clients to establish the instance's identity. Therefore, you should specify the full path to the CA PEM file available in the target MongoDB server in the CA PEM File text box. For example, the location of this file may be: C:\app\openSSL\SSLcert\test-ca.pem.
Client PEM File	If the target instance requires a certificate key file that is in .pem format from the client to verify the client's identity, then, to establish a connection with the target server, the eG agent should access the client certificate. For this, specify the full path to the Client PEM file in the Client PEM File text box. For example, the location of this file may be: C:\app\openSSL\SSLcert\test-client.pem.
CA Cert File	This parameter is applicable only if the target MongoDB server is SSL-enabled and CA PEM File parameter is set to none.The certificate file is a public-key certificate following the x.509 standard. It contains information about the identity of the server, such as its name, geolocation, and public key. Essentially, it’s a certificate that the server serves to the connecting users to prove that they are what they claim to be. Therefore, specify the full path to the server root certificate or certificate file that is signed by the CA in .crt file format for the server in the CA Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\mongodb-test-ca.crt. By default, this parameter is set to none.
Client Cert File	This parameter is applicable only if the target MongoDB server is SSL-enabled and Client PEM File parameter is set to none.In order to collect metrics from the target MongoDB, the eG agent requires client certificate in .p12 format. Hence, specify the full path to the Client certificate file in .p12 format in the Client Cert File text box. For example, the location of this file may be: C:\app\eGurkha\JRE\lib\security\test-client.p12 . To know how to generate .p12 file from Client PEM file, refer to How to import a Certificate that is in the PEM Format? By default, this parameter is set to none.
Client Cert Password	Provide the password for .p12 Client certificate file in the Client Cert Password text box.
AWS Key ID, AWS Secret Key,Confirm Password	If you are monitoring MongoDB server hosted on the AWS cloud, the eG agent has to be configured with the AWS AccessKey ID and Secret Key to connect with the AWS cloud and collect the required metrics. Therefore, Specify the AWS Key ID and AWS Secret Key and confirm the password by re-typing it in the Confirm Password text box. To obtain the AWS Access key and secret key, refer toObtaining AWS Access Key and Obtaining AWS Secret Key.
Atlas URI	MongoDB Atlas is a NoSQL Database-as-a-Service offering in the public cloud. If the target MongoDB server is deployed and managed in MongoDB Atlas, then the eG agent has to be configured with the MongoDB Atlas connection URI,a unique identifier for connecting to a MongoDB server, in the Atlas URI text box to access the target MongoDB server hosted on Atlas and collect the required metrics.
DD Frequency	Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Collections	Indicates the number of collections in this database.	Number
Modified collections	Indicates the number of collections in this database that were modified since the last measurement period.	Number
Read locks	Indicates the rate at which read locks were held on this database during the last measurement period.	Locks/Sec	A read lock is typically granted when multiple clients are attempting to issue a query to the same collection in a database or get more data from the same cursor. If the value of this measure is consistently increasing for a database, it hints at a probable read lock contention on that database. Subsequent read requests to that database will hence be blocked. Lock contention may also lead to high CPU usage on the database server. At this juncture therefore, you may want to know which collection in that database is locked and for how long, so that collections holding the maximum locks can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of read locks on each collection, and the duration of the read locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that is holding an unusually large number of read locks. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that read locks are released and server performance is enhanced.
Average read lock time	Indicates the average duration for which read locks were held by collections on this database.	Seconds	If the value of this measure is consistently increasing for a database, it implies that one/more collections in that database are probably locked for an unduly long time. This can prevent subsequent requests from accessing those collections. Lock contention may also lead to high CPU usage on the database server. At this juncture therefore, you may want to know which collection in that database is locked and for how long, so that collections holding locks for the maximum time can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of read locks on each collection, and the duration of the read locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that has been holding locks for the longest time. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that read locks are released and server performance is enhanced.
Write locks	Indicates the number of write locks on this database during the last measurement period.	Number	Typically, a write lock is granted on a collection if multiple clients are attempting to access that collection for inserting, modifying, or removing data. If the value of this measure is consistently increasing for a database, it implies that many collections in that database are probably locked or one/more collections are holding many locks. This can prevent subsequent requests from writing data into those collections. Lock contention may also lead to high CPU usage on the database server. At this juncture therefore, you may want to know which collections in that database are locked and how many locks are held by each collection, so that collections holding the maximum number of locks can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of write locks on each collection, and the duration of the write locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that is holding an unusually large number of write locks. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that write locks are released and server performance is enhanced.
Average write lock time	Indicates the average duration for which write locks were held by collections on this database.	Seconds	If the value of this measure is consistently increasing for a database, it implies that one/more collections in that database are probably locked for an unduly long time. This can prevent subsequent requests from accessing those collections. Lock contention may also lead to high CPU usage on the database server. At this juncture therefore, you may want to know which collection in that database is locked and for how long, so that collections holding locks for the maximum time can be identified. The detailed diagnosis of this measure provides this information. The collections that are locked, the count of write locks on each collection, and the duration of the write locks is reported as part of the detailed diagnosis. From this, you can quickly identify the collection that has been locked for an unduly long time. The abnormal locking activity on that collection can then be investigated and the reasons for the same diagnosed, so that write locks are released and server performance is enhanced.
Query operations	Indicates the rate at which query operations are performed on this database.	Queries/Sec
Insert operations	Indicates the rate at which data insert operations are performed on this database.	Inserts/Sec
Update operations	Indicates the rate at which update operations are performed on this database.	Updates/Sec
Delete operations	Indicates the rate at which delete operations are performed on this database.	Deletes/Sec
Get more operations	Indicates the rate at which get more operations are performed on this database.	Get more/Sec	A get more command is typically used in conjunction with commands that return a cursor, e.g. find and aggregate, to return subsequent batches of documents currently pointed to by the cursor.
Command operations	Indicates the rate at which data insert operations are performed on this database.	Inserts/Sec	Certain administrative commands can exclusively lock the database for extended periods of time. For instance, db.collection.createIndex(), when issued without setting background to true, reIndex, compact, db.repairDatabase(), db.createCollection(), when creating a very large (i.e. many gigabytes) capped collection,db.collection.validate(), and db.copyDatabase(). Some administrative commands lock the database but only hold the lock for a very short time - eg., db.collection.dropIndex(), db.getLastError(), db.addUser(), etc. Some other commands can even lock multiple databases. These include, db.copyDatabase(), db.repairDatabase(), etc.
Total locks	Indicates the rate at which locks (both read and write) are held by this database.	Locks/Sec	If the value of this measure is increasing consistently for a database, then, check the values of the Read locks and Write locks measures for that database. This will indicate the type of locks that are held more by the database. You can then use the detailed diagnosis of the corresponding measure to figure out which collection is holding the locks and for how long.
Average lock time	Indicates the average duration for which this database held locks (both read and write).	Seconds	If the value of this measure is increasing consistently for a database, then, check the values of the Average read lock time and Average write lock time measures for that database. This will indicate the type of locks that are held for the maximum time. You can then use the detailed diagnosis of the corresponding measure to figure out which collection is holding the locks for the maximum time.
Maximum query time	Indicates the maximum time taken by query operations on this database.	Seconds	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection in that database that executed queries slowly. The detailed diagnosis displays the top-10 collections in terms of query execution time. Besides the names of collections and the total time each took to execute queries, the detailed diagnosis also displays the number of query operations performed by each collection and the average query time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused query execution to be slow on a collection - is it because of too many queries to that collection? or is it owing to a few long running queries on that collection?
Maximum get more time	Indicates the maximum time taken by get more operations on this database.	Seconds	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection in that database that executed get more commands slowly. The detailed diagnosis displays the top-10 collections in terms of get more command execution time. Besides the names of collections and the total time each took to execute get more commands, the detailed diagnosis also displays the number of get more operations performed by each collection and the average get more time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused get more command execution to be slow on a collection - is it because of a command overload on that collection? or is it because a few commands took a significantly longer time to execute the commands?
Maximum insert time	Indicates the maximum time taken by insert operations on this database.	Seconds	If the value of this measure is abnormally high, use the detailed diagnosis of the measure to identify the collection that took the maximum time for performing inserts. The detailed diagnosis of this measure displays the top-10 collections in terms of insert time. Besides the names of collections and the total time each took to insert data, the detailed diagnosis also displays the number of insert operations performed by each collection and the average insert time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused query execution to be slow on a collection - is it because of too many inserts on that collection? or is it owing to a few long running inserts on that collection?
Maximum update time	Indicates the maximum time taken by update operations on this database.	Seconds	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the exact collection in that database, on which updates were slowest. The detailed diagnosis displays the top-10 collections in terms of update time. Besides the names of collections and the total time each took to perform update operations, the detailed diagnosis also displays the number of update operations performed by each collection and the average update ime per collection (i.e., Execution rate). This information will enable you to figure out what could have caused updates to be slow on a collection - is it because of too many updates to that collection? or is it because a few updates took too long a time?
Maximum delete time	Indicates the maximum time taken by delete operations on this database.	Seconds	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection that performed deletes most slowly. The detailed diagnosis displays the top-10 collections in terms of deletion time. Besides the names of collections and the total time each took to delete data, the detailed diagnosis also displays the number of delete operations performed by each collection and the average delete time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused deletion to be slow on a collection - is it because of too many delete requests to that collection? or is it because a few delete operations took too long a time?
Maximum command time	Indicates the maximum time taken by this database to execute commands.	Seconds	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to identify the collection that was the slowest in executing commands. The detailed diagnosis displays the top-10 collections in terms of command time. Besides the names of collections and the total time each took to perform command operations, the detailed diagnosis also displays the number of command operations performed by each collection and the average command time per collection (i.e., Execution rate). This information will enable you to figure out what could have caused a collection to execute commands slowly - is it because of a command overload? or is it because of a few long running commands?
Maximum query time rate	Indicates the maximum time this database took to perform a single query operation.	Seconds/execution	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single query operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute query operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of query operations performed by each collection and the total query time per collection. If the Execution rate is equal to or close to the total Query time of a collection, you can conclude that one or very few query operations are taking too long to execute on that collection.
Maximum get more time rate	Indicates the maximum time this database took to perform a get more operation.	Seconds/execution	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single get more operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute get more operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of get more operations performed by each collection and the total get more time per collection. If the Execution rate is equal to or close to the total Get more time of a collection, you can conclude that one or very few get more operations are taking too long to execute on that collection.
Maximum insert time rate	Indicates the maximum time this database took to perform a single insert operation.	Seconds/execution	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single insert operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute insert operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of insert operations performed by each collection and the total insert time per collection. If the Execution rate is equal to or close to the total Insert time of a collection, you can conclude that one or very few insert operations are taking too long to execute on that collection.
Maximum update time rate	Indicates the maximum time this database took to perform a single update operation.	Seconds/execution	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single update operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute update operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of update operations performed by each collection and the total update time per collection. If the Execution rate is equal to or close to the total Update time of a collection, you can conclude that one or very few update operations are taking too long to execute on that collection.
Maximum delete time rate	Indicates the maximum time this database took to perform a single delete operation.	Seconds/execution	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single delete operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute delete operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of delete operations performed by each collection and the total delete time per collection. If the Execution rate is equal to or close to the total delete time of a collection, you can conclude that one or very few delete operations are taking too long to execute on that collection.
Maximum command time rate	Indicates the maximum time this database took to perform a single command operation.	Seconds/execution	If the value of this measure is abnormally high for a database, use the detailed diagnosis of the measure to zero-in on the exact collection in that database that took the maximum time to perform a single command operation. Typically, the detailed diagnosis of this measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute command operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of command operations performed by each collection and the total command time per collection. If the Execution rate is equal to or close to the total Command time of a collection, you can conclude that one or very few command operations are taking too long to execute on that collection.

The detailed diagnosis of the Read locks measure helps find out collections that are locked, the count of read locks on each collection, and the duration of the read locks.

Figure 1 : The detailed diagnosis of the Read locks measure

The detailed diagnosis of the Average read lock time measure reveals the collections that are locked, the count of read locks on each collection, and the duration of the read locks. Using these statistics, you can quickly identify the collection that has been holding locks for the longest time.

Figure 2 : The detailed diagnosis of the Average read lock time measure

The detailed diagnosis of the Maximum query time measure displays the top-10 collections in terms of query execution time. Besides the names of collections and the total time each took to execute queries, the detailed diagnosis also displays the number of query operations performed by each collection and the average query time per collection (i.e., Execution rate)

Figure 3 : The detailed diagnosis of the Maximum query time measure

The detailed diagnosis of the Maximum command time displays the top-10 collections in terms of command time. Besides the names of collections and the total time each took to perform command operations, the detailed diagnosis also displays the number of command operations performed by each collection and the average command time per collection (i.e., Execution rate).

Figure 4 : The detailed diagnosis of the Maximum command time measure

The detailed diagnosis of the Maximum query time rate measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute query operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of query operations performed by each collection and the total query time per collection.

Figure 5 : The detailed diagnosis of the Maximum query time rate measure

the detailed diagnosis of the Maximum command time rate measure displays the top-5 collections in terms of Execution rate - i.e., the average time taken by a collection to execute command operations. Besides the names of collections and the Execution rate, the detailed diagnosis also displays the number of command operations performed by each collection and the total command time per collection.

Figure 6 : The detailed diagnosis of the Maximum command time measure