Mongo Cache Test

With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.

Starting in 3.4, the WiredTiger internal cache, by default, will use the larger of either:

  • 50% of RAM minus 1 GB, or
  • 256 MB

This internal cache should be sized such that the working set of your application fits into it. If the internal cache is poorly sized or if the working set outgrows the cache, then, the cache will be unable to hold additional data, thereby increasing expensive disk reads.

Likewise, if changes to cached data are not written to the disk fast enough, it can cause the cache size to grow, leaving little room for additional data; once again, direct disk reads become inevitable, degrading database performance.

Additionally, if stale data is not evicted from the cache in a timely manner, it can increase cache size and consequently disk reads.

On the other hand, if the internal cache is set too high, then very little RAM will be left outside of this cache for aggregations, sorting, connection management, and the like. If there is insufficient RAM for these operations, then MongoDB can get killed by the OS Out of memory (OOM) killer! Also, over-sizing the internal cache will considerably reduce the free memory that will otherwise be available to the filesystem cache. This can also adversely impact performance!

This is why, administrators should continuously monitor the RAM usage of the WiredTiger internal cache, proactively detect excessive RAM usage by the cache, and accurately isolate its root-cause. The Mongo Cache test helps with this.

This test reports the maximum cache size and how much of this size is presently occupied by cached data; this reveals, whether/not the cache has enough RAM to hold additional data. Inconsistencies in cache sizing can be detected in the process and their impact on performance analyzed. Writes from cache to disk and cache evictions are also monitored, so that administrators can quickly detect bottlenecks in these processes and initiate measures to fine-tune these processes to curb cache growth.

Target of the test : A MongoDB server

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for the MongoDB server monitored.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens.

Database Name

The test connects to a specific Mongo database to run API commands and pull metrics of interest. Specify the name of this database here. The default value of this parameter is admin.

Username and Password

The eG agent has to be configured with the credentials of a user who has the required privileges to monitor the target MongoDB instance, if the MongoDB instance is access control enabled. To know how to create such a user, refer to How to monitor access control enabled MongoDB database?If the target MongoDB instance is not access control enabled, then, specify none against the Username and Password parameters.

Confirm Password

Confirm the password by retyping it here.

Authentication Mechanism

Typically, the MongoDB supports multiple authentication mechanisms that users can use to verify their identity. In environments where multiple authentication mechanisms are used, this test enables the users to select the authentication mechanism of their interest using this list box. By default, this is set to None. However, you can modify this settings as per the requirement.

SSL

By default, the SSL flag is set to No, indicating that the target MongoDB server is not SSL-enabled by default. To enable the test to connect to an SSL-enabled MongoDB server, set the SSL flag to Yes.

CA File

A certificate authority (CA) file contains root and intermediate certificates that are electronically signed to affirm that a public key belongs to the owner named in the certificate. If you are looking to monitor the certificates contained within a CA file, then provide the full path to this file in the CA File text box. For example, the location of this file may be: C:\cert\rootCA.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.

Certificate Key File

A Certificate Key File specifies the path on the server where your private key is stored. If you are looking to monitor the Certificate Key File, then provide the full path to this file in the Certificate Key File text box. For example, the location of this file may be: C:\cert\mongodb.pem. If you do not want to monitor the certificates in a CA file, set this parameter to none.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cache used ratio

Indicates the percentage of the maximum cache size that is used by the cache.

Percent

A value close to 100% is a cause for concern, as it indicates that the cache is consuming RAM excessively. You may want to consider resizing the cache to avoid direct disk reads. To adjust the size of the WiredTiger internal cache, use the e storage.wiredTiger.engineConfig.cacheSizeGB parameter.

Avoid increasing the WiredTiger internal cache size above its default value, as this may erode the memory resources required by the filesystem cache and other critical MongoDB operations.

Used cache size

Indicates the amount of RAM used by the cached data.

MB

Ideally, this value should be well below the Maximum cache size. A steady and significant rise in this value could mean that cached data is not been written to the disks frequently and/or least-used data is not been evicted properly. You may want to fine-tune these operations and then check to see if it reduces cache memory usage.

Maximum cache size

Indicates the maximum size of the cache that WiredTiger will use for all data.

GB

Dirty cache ratio

Indicates what percentage of the Maximum cache size is used by dirty data.

Percent

Dirty data designates data in the cache that has been modified but not yet applied (flushed) to disk. A steady and significant growth in this percentage represents a bottleneck, because it means that cached data is not being written to the disk fast enough.

When writing to disk, WiredTiger writes all the data in a snapshot to disk in a consistent way across all data files. The now-durable data acts as a checkpoint in the data files. The checkpoint ensures that the data files are consistent up to and including the last checkpoint; i.e. checkpoints can act as recovery points.

Using WiredTiger, even without journaling, MongoDB can recover from the last checkpoint; however, to recover changes made after the last checkpoint, run with journaling.

By default, MongoDB sets checkpoints to occur in WiredTiger on user data at an interval of 60 seconds or when 2 GB of journal data has been written, whichever occurs first.

This means that the amount of dirty data is expected to grow until the next checkpoint.

Scaling out by adding more shard will help you reduce the amount of dirty data.

Dirty cache size

Indicates the amount of dirty data in the cache.

MB

A consistent increase in this value is a cause for concern.

Data evicted rate from cache

Indicates the rate at which data was evicted from cache.

MB/Sec

Data read rate into cache

Indicates the rate at which data was read into cache from disk.

MB/Sec

Data written rate from cache

Indicates the rate at which data was written from cache into disk.

MB/Sec

Pages evicted rate from cache

Indicates the rate at which pages were evicted from cache.

Pages/Sec

A high value is desired for this measure. If the Cache used ratio is very high and the value of this measure is very low, it can only mean that data is not evicted frequently enough to control cache growth. You may have to fine-tune eviction to ensure that the cache does not grow uncontrollably.

Typically, when a MongoDB server approaches its maximum cache size, WiredTiger begins eviction to stop memory use from growing too large, approximating a least-recently-used algorithm. WiredTiger provides several configuration options for tuning how pages are evicted from the cache.

The eviction_trigger configuration value is the occupied percentage of the total cache size that causes eviction to start. By default, WiredTiger begins evicting pages when the cache is 95% full. An application concerned about a latency spike as the cache becomes full might want to begin eviction earlier.

The eviction_target configuration value is the overall target for eviction, expressed as a percentage of total cache size; that is, once eviction begins, it will proceed until the target percentage of bytes in the cache is reached. Note the eviction_target configuration value is ignored until eviction is triggered.

The eviction_dirty_target configuration value is the overall dirty byte target for eviction, expressed as a percentage of total cache size; that is, once eviction begins, it will proceed until the target percentage of dirty bytes in the cache is reached. Note the eviction_dirty_target configuration value is ignored until eviction is triggered.

By default, WiredTiger cache eviction is handled by a single, separate thread. In a large, busy cache, a single thread will be insufficient (especially when the eviction thread must wait for I/O). The eviction=(threads_min) and eviction=(threads_max) configuration values can be used to configure the minimum and maximum number of additional threads WiredTiger will create to keep up with the application eviction load.

Pages read rate into cache

Indicates the rate at which pages were read into cache from disk.

Pages/Sec

A high value for this measure indicates effective usage of the cache.

Pages written rate from cache

Indicates the rate at which pages were written from cache into disk.

Pages/Sec

Typically, at configured intervals, data modified in the cache is flushed into disk, so that data in cache and disk are in sync. Flushing also frees up memory in the cache and controls its abnormal growth. This is why, ideally, the value of this measure should be high.

Page faults rate

Indicates the rate at which page faults requiring disk operations occurred.

Faults/Sec

Page faults refer to operations that require the database server to access data which isn’t available in active memory. The page faults counter may increase dramatically during moments of poor performance and may correlate with limited memory environments and larger data sets. Limited and sporadic page faults do not necessarily indicate an issue.

Note that this measure will be reported only if the target MongoDB server runs on Unix/Linux.