Ignite Cluster Test

A node is assigned either one of the two roles: server node or client node. Server nodes are the workhorses of the cluster; they cache data, execute compute tasks, etc. Client nodes join the topology as regular nodes but they do not store data. Client nodes are used to stream data into the cluster and execute user queries.

A cluster is the core of Apache Ignite as it provides key benefits of distributed computing like load balancing, resilience, failover, high availability etc. If there is any issue in communication between cluster nodes, cluster job execution etc, Ignite will not be able to serve the application in optimal manner. That's the reason it is important to monitor Ignite Cluster so that any issues can be highlighted before it can affect cluster performance.

This test monitors Ignite cluster gather key statistics related to jobs, CPU load, waiting time etc. which provide key insights into it's health. These can help administrators help improve cluster performance, identify issues, failures and add right infrastructure at right time before application performance is affected.

Target of the test : Apache Ignite Server

Agent deploying the test : An internal or external agent

Outputs of the test : One set of results for each Apache Ignite Server

Configurable parameters for the test

Parameter

Description

Test period

How often should the test be executed.

Host

Enter the IP address of the Apache Ignite cluster.

Port

Enter the port number on which JMX connector listens to incoming connections requests.

JMX Remote Port

In this text box, enter the name of a virtual warehouse that needs to be monitored. The JMX connector listens on 8686 by default. If it listens on different port in your environment then specify the same.

JMX User

Specify the credentials of the user who is authorized to use JMX.

JMX Password

Specify the password for the authorized user.

Confirm Password

Confirm the password by retyping it here.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Active base line nodes

Indicates the total number of nodes which are currently in the cluster baseline topology

Number

The capacity of cluster to hold data depends on the number of nodes and node configuration. If not enough number of nodes are available, the data requirements might not be met.

Average active jobs

Indicates the average number of active jobs executing at any given time on the each node.

Number

The number of active jobs dictate the cluster CPU capacity occupied. You must ensure that number of active jobs is optimal and there is enough CPU capacity available for new jobs.

Average cancelled jobs

Indicates the average number of jobs cancelled on each node.

Seconds

 

Average CPU load

Indicates the CPU load values averaged over all metrics kept in history across all nodes.

Percentage

The number of active jobs dictate the cluster CPU capacity occupied. You must ensure that number of active jobs is optimal and there is enough CPU capacity available for new jobs, so the new jobs don't have to wait too long.

Average job executes time

Indicates the average time a job takes to execute on any node in the cluster.

Seconds

This is important metric to understand how many jobs can be scheduled and predict the jobs timings.

Average job waits time

Indicates the average time a jobs waits in the queue before it is picked for execution.

Seconds

If the wait time is too high it means more compute capacity is required, so adding new nodes can help.

Average rejected job

Indicates the average number of jobs rejected across all nodes during collision resolution operations.

Number

 

Average waiting jobs

Indicates the average number of jobs waiting per node across all nodes in the cluster.

Number

If the number of waiting jobs is too high it means more compute capacity is required, so adding new nodes can help.

Messages received rate

Indicates the rate at which messages are received by nodes in the cluster.

Messages/Sec

 

If the rate is going down over a number of measurements, the communication between nodes needs to be improved.

Messages sent rate

Indicates the rate at which messages are sent by nodes in the cluster.

Messages/Sec

Idle time percentage

Indicates the percentage time any given node is idle vs executing jobs.

Percentage

If there is too much idle time means the system has more capacity than required. Administrators can consider removing nodes and using them for other purpose.

Total server nodes

Indicates the total number of server nodes in the cluster.

Number

 

Total client nodes

Indicates the total number of client nodes in the cluster.

Number