Cassandra Keyspaces Test
A keyspace in Cassandra is a namespace that defines data replication on nodes. A cluster contains one keyspace per node. CQL stores data in tables (SSTables, memtable), whose schema defines the layout of said data in the table, and those tables are grouped in keyspaces. A keyspace defines a number of options that applies to all the tables it contains, most prominently of which is the replication strategy used by the keyspace. It is generally encouraged to use one keyspace by application, and thus many cluster may define only one keyspace.
The keyspace is the top-level database object that controls the replication for the object it contains at each datacenter in the cluster. Keyspaces contain tables, materialized views and user-defined types, functions and aggregates.
In the read path, Cassandra merges data on disk (in SSTables) with data in RAM (in memtables). To avoid checking every SSTable data file for the partition being requested, Cassandra employs a data structure known as a bloom filter. Bloom filters are maintained per SSTable, i.e. each SSTable on disk gets a corresponding bloom filter in memory.
Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two possible states: - The data definitely does not exist in the given file, or - The data probably exists in the given file. While bloom filters can not guarantee that the data exists in a given SSTable, bloom filters can be made more accurate by allowing them to consume more RAM. As accuracy improves (as the bloom_filter_fp_chance (bloom filter false positive) gets closer to 0), memory usage increases non-linearly i.e., the bloom filter with a bloom_filter_fp_chance = 0.01 requires about three times as much memory as the same table with bloom_filter_fp_chance = 0.1. If the bloom filter false positives increases rapidly, the memory usage may decrease and the disk overhead increase manifold. Therefore, it is essential to contain the bloom filter false positives before the disk is bombarded with requests. Similarly, the read requests and write requests in each keyspace also should be monitored at a closer pace so that administrators can ensure that the data is available in the keyspace. This will ensure a reduced disk overhead for the requests received. The Cassandra Keyspaces test helps administrators in monitoring the keyspace and containing the bloom filter false positives!
This test auto-discovers the keyspaces in the target Cassandra Database node and for each keyspace, this test reports the count of SSTables and memory tables available. In addition, this test reveals the count of bloom filter false positives on each keyspace and the space utilization of the bloom filters in depth. The test also provides insights into the read and write latency of each keyspace so that administrators can get an idea of the keyspace that is lagging behind in catering the requests.
Target of the test : A Cassandra Database
Agent deploying the test : An external/remote agent.
Outputs of the test : One set of results for the target Cassandra Database node being monitored.
Parameters | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Port |
The port on which the specified host listens. By default, this is 9042. |
JMX Remote Port |
Here, specify the port at which the JMX listens for requests from remote hosts. Ensure that you specify the same port that you configured in the cassandra-env.sh file (if the target Cassandra Database node is installed on a Unix host) or the cassandra-env.ps1 file (if the target Cassandra Database node is installed on a Windows host) in the <CASSANDRA_HOME> directory used by the target Cassandra Database node. To know how to specify the remote port, refer to Enabling JMX Support for JRE. |
JMX User and JMX Password |
If JMX requires authentication only (but no security), then ensure that the user and password parameters are configured with the credentials of a user with read-write access to JMX. To know how to create this user, refer to Configuring the eG Agent to Support JMX Authentication. |
Confirm Password |
Confirm the Password by retyping it in this text box. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Bloom filter false positives |
Indicates the number of bloom filter false positives in this keyspace. |
Number |
Typical values for bloom_filter_fp_chance are usually between 0.01 (1%) to 0.1 (10%) false-positive chance, where Cassandra may scan an SSTable for a row, only to find that it does not exist on the disk. The parameter should be tuned by use case:
|
Bloom filter false positive rate |
Indicates the bloom filter false positive ratio in this keyspace. |
Percent |
A low value is desired for this measure. |
Bloom filter space used |
Indicates the disk space used by the bloom filter in this keyspace. |
MB |
A high value indicates that the data is available in the keyspace. |
Live SS tables |
Indicates the number of SSTables that are currently live/active in this keyspace. |
Number |
Compare the value of this measure across the keyspaces to figure out the keyspace on which there are too many SSTables that are active/live. |
Disk space used by live SSTables |
Indicates the disk space utilized by the SSTables that are live/active in this keyspace. |
MB |
A continuously increasing value of this measure indicates that the SSTables are up to-date with the data. |
Memory table column count |
Indicates the number of columns present in the memory table available in this keyspace. |
Number |
|
Memory table switch count |
Indicates the number of flushes in memory table per second that resulted in the switch out of the memory table available in this keyspace. |
Switches/second |
|
Memory table live data size |
Indicates the size of the data stored in the memory table available in this keyspace. |
MB |
A continuously increasing value of this measure indicates that the memory tables are not updating the data to the SSTables. Administrators should therefore check if adequate space is allocated to the SSTables. |
Memory table off-heap size |
Indicates the off-heap memory size of the memory table available in this keyspace. |
MB |
|
Memory table on-heap size |
Indicates the on-heap memory size of the memory table available in this keyspace. |
MB |
|
Recent Bloom filter false positives |
Indicates the recent number of bloom filter positives negotiated in this keyspace. |
Number |
|
Recent Bloom filter false positive rate |
Indicates the recent bloom filter false positive ratio negotiated in this keyspace. |
Percent |
|
Avg read latency |
Indicates the average time taken by this keyspace to respond to read requests. |
Milliseconds/request |
Compare the value of this measure across the keyspaces to determine the keyspace that is taking too long to respond to read requests. |
Read latency 99thpercentile |
Indicates the average 99th percentile time taken by this keyspace to respond to user requests. |
Milliseconds |
|
Avg write latency |
Indicates the average time taken by this keyspace to write the data for the requests. |
Millseconds/request |
Compare the value of this measure across keyspaces to figure out the keyspace that is taking too long to write the data for the requests received. |
Write latency 99thpercentile |
Indicates the average 9th percentile time taken by this keyspace to respond to each write request. |
Milliseconds |
|
Avg range latency |
Indicates the average time taken by this keyspace to respond to a range of requests. |
Milliseconds/request |
|
Range latency 99thpercentile |
Indicates the average 99th percentile time taken by this keyspace to respond to a range of user requests. |
Milliseconds |
|