Cassandra Cache Test

Cassandra includes integrated caching and distributes cache data around the cluster. When a node goes down, the client can read from another cached replica of the data. The integrated architecture also facilitates troubleshooting because there is no separate caching tier, and cached data matches what is in the database exactly. The integrated cache alleviates the cold start problem by saving the cache to disk periodically. Cassandra reads contents back into the cache and distributes the data when it restarts. The cluster does not start with a cold cache.

The partition key cache is a cache of the partition index for a Cassandra table. Using the key cache instead of relying on the OS page cache decreases seek times. Enabling just the key cache results in disk (or OS page cache) activity to actually read the requested data rows, but not enabling the key cache results in more reads from disk.

To cache rows, if the row key is not already in the cache, Cassandra reads the first portion of the partition, and puts the data in the cache. If the newly cached data does not include all cells configured by user, Cassandra performs another read. The actual size of the row-cache depends on the workload. You should properly benchmark your application to get ”the best” row cache size to configure.

There are two row cache options, the old serializing cache provider and a new off-heap cache (OHC) provider. The new OHC provider has been benchmarked as performing about 15% better than the older option.

Typically, you enable either the partition key or row cache for a table.

If the caches are not sized appropriately, then, frequent disk accesses may happen which may cause severe disk overhead. To avoid this, administrators may need to size the caches appropriately and also figure out the cache that is infrequently used. The Cassandra Cache test helps administrators in this regard!

This test auto-discovers the caches on the target database server. For each cache discovered, this test reports the maximum cache size and how much of this size is presently occupied by cached data; this reveals, whether/not the cache has enough RAM to hold additional data. Inconsistencies in cache sizing can be detected in the process and their impact on performance analyzed. This test also throws light on how well the cache services requests. Using this test, administrators can figure out the cache from which the least requests have been serviced and analyze the real reason behind such poor responsiveness.

Target of the test : A Cassandra Database

Agent deploying the test : An external/remote agent.

Outputs of the test : One set of results for each cache of the target Cassandra Database node being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port on which the specified host listens. By default, this is 9042.

JMX Remote Port

Here, specify the port at which the JMX listens for requests from remote hosts. Ensure that you specify the same port that you configured in the cassandra-env.sh file (if the target Cassandra Database node is installed on a Unix host) or the cassandra-env.ps1 file (if the target Cassandra Database node is installed on a Windows host) in the <CASSANDRA_HOME> directory used by the target Cassandra Database node. To know how to specify the remote port, refer to Enabling JMX Support for JRE.

JMX User and JMX Password

If JMX requires authentication only (but no security), then ensure that the user and password parameters are configured with the credentials of a user with read-write access to JMX. To know how to create this user, refer to Configuring the eG Agent to Support JMX Authentication.

Confirm Password

Confirm the Password by retyping it in this text box.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Total size

Indicates the total size of this cache i.e., the total space allocated to this cache.

MB

 

Used size

Indicates the amount of space that is already utilized in this cache.

MB

Ideally, this value should be well below the Total size measure. A steady and significant rise in this value could mean that cached data is not been written to the disks frequently and/or least-used data is not been evicted properly. You may want to fine-tune these operations and then check to see if it reduces cache memory usage.

Cache usage

Indicates the percentage of space that is already utilized in this cache.

Percent

A value close to 100% is a cause for concern, as it indicates that the cache is consuming RAM excessively. You may want to consider resizing the cache to avoid direct disk reads.

Hit rate

Indicates the rate at which the requests were serviced by this cache during the last measurement period.

Hits/sec

 

Hit ratio

Indicates the requests serviced by this cache without having to read from the disk during the last measurement period.

Percent

A value of 85% or more is desired for this measure. Temporary dips below this number are expected directly after a large bulk update, but if you stay here longer term this can be a problem of data modeling issues or configuration problems, such as JNA is not correctly installed, and therefore the keycache is residing on heap. In theory, when combined with heap pressure this can end up over flushing the cache.

Request rate

Indicates the rate at which cache requests were serviced by this cache during the last measurement period.

Requests/sec

A high value is desired for this measure.