Cassandra Keyspaces Test

A keyspace in Cassandra is a namespace that defines data replication on nodes. A cluster contains one keyspace per node. CQL stores data in tables (SSTables, memtable), whose schema defines the layout of said data in the table, and those tables are grouped in keyspaces. A keyspace defines a number of options that applies to all the tables it contains, most prominently of which is the replication strategy used by the keyspace. It is generally encouraged to use one keyspace by application, and thus many cluster may define only one keyspace.

The keyspace is the top-level database object that controls the replication for the object it contains at each datacenter in the cluster. Keyspaces contain tables, materialized views and user-defined types, functions and aggregates.

In the read path, Cassandra merges data on disk (in SSTables) with data in RAM (in memtables). To avoid checking every SSTable data file for the partition being requested, Cassandra employs a data structure known as a bloom filter. Bloom filters are maintained per SSTable, i.e. each SSTable on disk gets a corresponding bloom filter in memory.

Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two possible states: - The data definitely does not exist in the given file, or - The data probably exists in the given file. While bloom filters can not guarantee that the data exists in a given SSTable, bloom filters can be made more accurate by allowing them to consume more RAM. As accuracy improves (as the bloom_filter_fp_chance (bloom filter false positive) gets closer to 0), memory usage increases non-linearly i.e., the bloom filter with a bloom_filter_fp_chance = 0.01 requires about three times as much memory as the same table with bloom_filter_fp_chance = 0.1. If the bloom filter false positives increases rapidly, the memory usage may decrease and the disk overhead increase manifold. Therefore, it is essential to contain the bloom filter false positives before the disk is bombarded with requests. Similarly, the read requests and write requests in each keyspace also should be monitored at a closer pace so that administrators can ensure that the data is available in the keyspace. This will ensure a reduced disk overhead for the requests received. The Cassandra Keyspaces test helps administrators in monitoring the keyspace and containing the bloom filter false positives!

This test auto-discovers the keyspaces in the target Cassandra Database node and for each keyspace, this test reports the count of SSTables and memory tables available. In addition, this test reveals the count of bloom filter false positives on each keyspace and the space utilization of the bloom filters in depth. The test also provides insights into the read and write latency of each keyspace so that administrators can get an idea of the keyspace that is lagging behind in catering the requests.

Target of the test : A Cassandra Database

Agent deploying the test : An external/remote agent.

Outputs of the test : One set of results for the target Cassandra Database node being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port on which the specified host listens. By default, this is 9042.

JMX Remote Port

Here, specify the port at which the JMX listens for requests from remote hosts. Ensure that you specify the same port that you configured in the cassandra-env.sh file (if the target Cassandra Database node is installed on a Unix host) or the cassandra-env.ps1 file (if the target Cassandra Database node is installed on a Windows host) in the <CASSANDRA_HOME> directory used by the target Cassandra Database node. To know how to specify the remote port, refer to Enabling JMX Support for JRE.

JMX User and JMX Password

If JMX requires authentication only (but no security), then ensure that the user and password parameters are configured with the credentials of a user with read-write access to JMX. To know how to create this user, refer to Configuring the eG Agent to Support JMX Authentication.

Confirm Password

Confirm the Password by retyping it in this text box.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Bloom filter false positives

Indicates the number of bloom filter false positives in this keyspace.

Number

Typical values for bloom_filter_fp_chance are usually between 0.01 (1%) to 0.1 (10%) false-positive chance, where Cassandra may scan an SSTable for a row, only to find that it does not exist on the disk. The parameter should be tuned by use case:

  • Users with more RAM and slower disks may benefit from setting the bloom_filter_fp_chance to a numerically lower number (such as 0.01) to avoid excess IO operations.
  • Users with less RAM, more dense nodes, or very fast disks may tolerate a higher bloom_filter_fp_chance in order to save RAM at the expense of excess IO operations
  • In workloads that rarely read, or that only perform reads by scanning the entire data set (such as analytics workloads), setting the bloom_filter_fp_chance to a much higher number is acceptable.

Bloom filter false positive rate

Indicates the bloom filter false positive ratio in this keyspace.

Percent

A low value is desired for this measure.

Bloom filter space used

Indicates the disk space used by the bloom filter in this keyspace.

MB

A high value indicates that the data is available in the keyspace.

Live SS tables

Indicates the number of SSTables that are currently live/active in this keyspace.

Number

Compare the value of this measure across the keyspaces to figure out the keyspace on which there are too many SSTables that are active/live.

Disk space used by live SSTables

Indicates the disk space utilized by the SSTables that are live/active in this keyspace.

MB

A continuously increasing value of this measure indicates that the SSTables are upto-date with the data.

Memory table column count

Indicates the number of columns present in the memory table available in this keyspace.

Number

 

Memory table switch count

Indicates the number of flushes in memory table per second that resulted in the switch out of the memory table available in this keyspace.

Switches/second

 

Memory table live data size

Indicates the size of the data stored in the memory table available in this keyspace.

MB

A continuously increasing value of this measure indicates that the memory tables are not updating the data to the SSTables. Administrators should therefore check if adequate space is allocated to the SSTables.

Memory table off-heap size

Indicates the off-heap memory size of the memory table available in this keyspace.

MB

 

Memory table on-heap size

Indicates the on-heap memory size of the memory table available in this keyspace.

MB

 

Recent Bloom filter false positives

Indicates the recent number of bloom filter positives negotiated in this keyspace.

Number

 

Recent Bloom filter false positive rate

Indicates the recent bloom filter false positive ratio negotiated in this keyspace.

Percent

 

Avg read latency

Indicates the average time taken by this keyspace to respond to read requests.

Milliseconds/request

Compare the value of this measure across the keyspaces to determine the keyspace that is taking too long to respond to read requests.

Read latency 99thpercentile

Indicates the average 99th percentile time taken by this keyspace to respond to user requests.

Milliseconds

 

Avg write latency

Indicates the average time taken by this keyspace to write the data for the requests.

Millseconds/request

Compare the value of this measure across keyspaces to figure out the keyspace that is taking too long to write the data for the requests received.

Write latency 99thpercentile

Indicates the average 9th percentile time taken by this keyspace to respond to each write request.

Milliseconds

 

Avg range latency

Indicates the average time taken by this keyspace to respond to a range of requests.

Milliseconds/request

 

Range latency 99thpercentile

Indicates the average 99th percentile time taken by this keyspace to respond to a range of user requests.

Milliseconds