Cassandra Compactions Test

The concept of compaction is used for different kinds of operations in Cassandra, the common thing about these operations is that it takes one or more sstables and output new sstables.

The Cassandra write process stores data in files called SSTables. SSTables are immutable. Instead of overwriting existing rows with inserts or updates, Cassandra writes new timestamped versions of the inserted or updated data in new SSTables. Cassandra does not perform deletes by removing the deleted data: instead, Cassandra marks it with tombstones.

Over time, Cassandra may write many versions of a row in different SSTables. Each version may have a unique set of columns stored with a different timestamp. As SSTables accumulate, the distribution of data can require accessing more and more SSTables to retrieve a complete row.

To keep the database healthy, Cassandra periodically merges SSTables and discards old data. This process is called compaction.

Compaction works on a collection of SSTables. From these SSTables, compaction collects all versions of each unique row and assembles one complete row, using the most up-to-date version (by timestamp) of each of the row's columns. The merge process is performant, because rows are sorted by partition key within each SSTable, and the merge process does not use random I/O. The new versions of each row is written to a new SSTable. The old versions, along with any rows that are ready for deletion, are left in the old SSTables, and are deleted as soon as pending reads are completed.

Compaction causes a temporary spike in disk space usage and disk I/O while old and new SSTables co-exist. As it completes, compaction frees up disk space occupied by old SSTables. It improves read performance by incrementally replacing old SSTables with compacted SSTables. Cassandra can read data directly from the new SSTable even before it finishes writing, instead of waiting for the entire compaction process to finish.

As Cassandra processes writes and reads, it replaces the old SSTables with new SSTables in the page cache. The process of caching the new SSTable, while directing reads away from the old one, is incremental — it does not cause a the dramatic cache miss. Cassandra provides predictable high performance even under heavy load.

If there are too many pending reads that are serviced from the old SSTables, the old SStables cannot be deleted during compaction. This process may delay the number of compactions that need to be performed on the target Cassandra database node. If old SSTables are not deleted periodically, then, administrators may be alerted to potential space crunch in the disk which may impact the addition of new data into the SSTables. Therefore, it is necessary to monitor the compactions and the size of data compacted on the target database node round the clock. The Cassandra Compactions test helps administrators in this regard.

This test monitors the target Cassandra database node and reports the rate at which data was compacted. In addition, this test also reveals the count of compactions that are pending and the amount of data pending to be compacted per second. Using this test, administrators can be alerted to irregularities in compaction process if any, and rectify them at a faster pace before space crunch of the disk reaches abnormal limits.

Target of the test : A Cassandra Database

Agent deploying the test : An external/remote agent.

Outputs of the test : One set of results for the target Cassandra Database node being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port on which the specified host listens. By default, this is 9042.

JMX Remote Port

Here, specify the port at which the JMX listens for requests from remote hosts. Ensure that you specify the same port that you configured in the cassandra-env.sh file (if the target Cassandra Database node is installed on a Unix host) or the cassandra-env.ps1 file (if the target Cassandra Database node is installed on a Windows host) in the <CASSANDRA_HOME> directory used by the target Cassandra Database node. To know how to specify the remote port, refer to Enabling JMX Support for JRE.

JMX User and JMX Password

If JMX requires authentication only (but no security), then ensure that the user and password parameters are configured with the credentials of a user with read-write access to JMX. To know how to create this user, refer to Configuring the eG Agent to Support JMX Authentication.

Confirm Password

Confirm the Password by retyping it in this text box.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Compacted size

Indicates the amount of data compacted per second (measured after compaction process) during the last measurement period.

MB/sec

 

Completed compactions

Indicates the number of compactions completed per second during the last measurement period.

Compactions/sec

A high value is desired for this measure.

Pending compactions

Indicates the number of compactions that are currently pending.

Number

Ideally, the value of this measure should be zero.

Total compactions

Indicates the total number of compactions performed per second during the last measurement period.

Compactions/sec