Cassandra Hints Test

Over time, data in a Cassandra replica can become inconsistent with other replicas due to the distributed nature of the database. Node repair corrects the inconsistencies so that eventually all nodes have the same and most up to-date data. It is important part of regular maintenance for every Cassandra cluster.

Cassandra provides the following repair processes:

  • Hinted Handoff
  • Read Repair
  • Anti-Entropy Repair

Occasionally, a node may become unresponsive while data is being written. This unresponsiveness may be due to hardware problems, network issues, or overloaded nodes that experience long garbage collection (GC) pauses. If a node is unable to receive a particular write, the write's coordinator node preserves the data to be written as a set of hints. When the node comes back online, the coordinator effects repair by handing off hints so that the node can catch up with the required writes. This type of repair process is termed as Hinted Handoff. The handing off hints will be happening for a period given by the max_hint_window_ms setting in cassandra.yaml. Once this window expires, nodes will stop saving hints.

Hinted Handoff is an optional part of writes whose primary purpose is to provide extreme write availability when consistency is not required. Secondarily, Hinted Handoff can reduce the time required for a temporarily failed node to become consistent again with live ones. This is especially useful when a flakey network causes false-positive failures. If the hinted handoff is not enabled, then, the node may contain outdated data for a longer duration which may result in users using stale data which may result in a dip in user experience. It is therefore necessary to monitor the status of the hinted handoff round the clock. The Cassandra Hints test helps administrators in this regard!

By closely monitoring the Casssandra Database node, this test helps administrators to figure out if the hinted handoff is enabled or not. In addition, this test reports the total number of hints that the node needs to be updated with and the number of hints that are active for replay. If there is an abnormal increase in the count of hints, administrators may infer that there is a potential database performance degradation.

Target of the test : A Cassandra Database

Agent deploying the test : An external/remote agent.

Outputs of the test : One set of results for the target Cassandra Database node being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port on which the specified host listens. By default, this is 9042.

JMX Remote Port

Here, specify the port at which the JMX listens for requests from remote hosts. Ensure that you specify the same port that you configured in the management.properties file in the <JAVA_HOME>\jre\lib\management folder used by the target application.

JMX User and JMX Password

If JMX requires authentication only (but no security), then ensure that the user and password parameters are configured with the credentials of a user with read-write access to JMX. To know how to create this user, refer to Configuring the eG Agent to Support JMX Authentication.

Confirm Password

Confirm the Password by retyping it in this text box.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Is hinted handoff enabled?

Indicates whether/not the hinted handoff is enabled.

 

The values that this measure reports and its corresponding numeric values are mentioned in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this measure reports the above-mentioned Measure Values to indicate whether/not the hinted handoff is enabled. However, in the graph of this measure, the same will be represented using the numeric equivalents - 0 or 1 only.

Hints in progress

Indicates the number of hints that are currently active to replay.

Number

Ideally, the value of this measure should be 0.

Total hints

Indicates the total number of hints.

Number

Ideally, the value of this measure should be 0.

Though hints are part of Cassandra's failure management, a lot of hints logged in within a short span may indicate a problem that is being ignored. A suddden/gradual increase in the value of this measure indicates network or availability issues.