Ignite Cache Rebalance Test

When a new node joins the cluster, some of the partitions are relocated to the new node so that the data remains distributed equally in the cluster. This process is called cache rebalancing. If an existing node permanently leaves the cluster and backups are not configured, you lose the partitions stored on this node. When backups are configured, one of the backup copies of the lost partitions becomes a primary partition and the rebalancing process is initiated.

The rebalancing process is key to balancing the cluster nodes and ensuring that data is properly spread across the nodes in the cluster. That's why it is really important to monitor rebalancing process to ensure that it is working efficiently and issues if any, are addressed before it can affect application performance.

This test monitors the rebalancing process and provides key statistics like speed, size etc. which help administrators draw insights about the efficiency of rebalancing process.

Target of the test : Apache Ignite Server

Agent deploying the test : An internal or external agent

Outputs of the test : One set of results for each Apache Ignite Server

Configurable parameters for the test

Parameter

Description

Test period

How often should the test be executed.

Host

Enter the IP address of the Apache Ignite cluster.

Port

Enter the port number on which JMX connector listens to incoming connections requests.

JMX Remote Port

In this text box, enter the name of a virtual warehouse that needs to be monitored. The JMX connector listens on 8686 by default. If it listens on different port in your environment then specify the same.

JMX User

Specify the credentials of the user who is authorized to use JMX.

JMX Password

Specify the password for the authorized user.

Confirm Password

Confirm the password by retyping it here.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Rebalance clearing partitions left

Indicates the total number of partitions that need be cleared before the rebalancing process could start.

Number

This given an indication of time left before rebalancing will start. If there are too many partitions to be cleared before rebalancing process can start, the overall process time will be high.

Number of already rebalanced keys

Indicates the total number cache keys which are already rebalanced during last few rebalancing sessions.

Number

If there is already large percentage of keys which are already rebalanced, the rebalancing process will be quick. If the process is still taking time you may have to take a look.

Estimated rebalancing speed

Indicates the average speed of data transfer between the nodes during the rebalancing process.

MB/Sec

If the rebalancing speed is trending upwards over a number of measurements, it might be a cause of concern.

 

Estimated rebalancing speed in keys

Indicates the average number of keys transferred per second between the nodes.

Number

Rebalancing partitions on current node

Indicates the number of partitions on current node which are under rebalancing.

Number

If there are too many partitions being rebalanced on a given node, the process might be slow.

Rebalancing start time

The time when rebalancing of local partitions started for the cache. This metric will return 0 if the local partitions do not participate in the rebalancing.

Number

If rebalancing has been going on for a very long time, the start time might be of great value to administrators.