Ignite Cluster Test

A node is assigned either one of the two roles: server node or client node. Server nodes are the workhorses of the cluster; they cache data, execute compute tasks, etc. Client nodes join the topology as regular nodes but they do not store data. Client nodes are used to stream data into the cluster and execute user queries.

A cluster is the core of Apache Ignite as it provides key benefits of distributed computing like load balancing, resilience, failover, high availability etc. If there is any issue in communication between cluster nodes, cluster job execution etc, Ignite will not be able to serve the application in optimal manner. That's the reason it is important to monitor Ignite Cluster so that any issues can be highlighted before it can affect cluster performance.

This test monitors Ignite cluster gather key statistics related to jobs, CPU load, waiting time etc. which provide key insights into it's health. These can help administrators help improve cluster performance, identify issues, failures and add right infrastructure at right time before application performance is affected.

Target of the test : Apache Ignite Server

Agent deploying the test : An internal or external agent

Outputs of the test : One set of results for each Apache Ignite Server

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed.
Host	Enter the IP address of the Apache Ignite cluster.
Port	Enter the port number on which JMX connector listens to incoming connections requests.
JMX Remote Port	In this text box, enter the name of a virtual warehouse that needs to be monitored. The JMX connector listens on 8686 by default. If it listens on different port in your environment then specify the same.
JMX User	Specify the credentials of the user who is authorized to use JMX.
JMX Password	Specify the password for the authorized user.
Confirm Password	Confirm the password by retyping it here.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Active base line nodes	Indicates the total number of nodes which are currently in the cluster baseline topology	Number	The capacity of cluster to hold data depends on the number of nodes and node configuration. If not enough number of nodes are available, the data requirements might not be met.
Average active jobs	Indicates the average number of active jobs executing at any given time on the each node.	Number	The number of active jobs dictate the cluster CPU capacity occupied. You must ensure that number of active jobs is optimal and there is enough CPU capacity available for new jobs.
Average cancelled jobs	Indicates the average number of jobs cancelled on each node.	Seconds
Average CPU load	Indicates the CPU load values averaged over all metrics kept in history across all nodes.	Percentage	The number of active jobs dictate the cluster CPU capacity occupied. You must ensure that number of active jobs is optimal and there is enough CPU capacity available for new jobs, so the new jobs don't have to wait too long.
Average job executes time	Indicates the average time a job takes to execute on any node in the cluster.	Seconds	This is important metric to understand how many jobs can be scheduled and predict the jobs timings.
Average job waits time	Indicates the average time a jobs waits in the queue before it is picked for execution.	Seconds	If the wait time is too high it means more compute capacity is required, so adding new nodes can help.
Average rejected job	Indicates the average number of jobs rejected across all nodes during collision resolution operations.	Number
Average waiting jobs	Indicates the average number of jobs waiting per node across all nodes in the cluster.	Number	If the number of waiting jobs is too high it means more compute capacity is required, so adding new nodes can help.
Messages received rate	Indicates the rate at which messages are received by nodes in the cluster.	Messages/Sec	If the rate is going down over a number of measurements, the communication between nodes needs to be improved.
Messages sent rate	Indicates the rate at which messages are sent by nodes in the cluster.	Messages/Sec
Idle time percentage	Indicates the percentage time any given node is idle vs executing jobs.	Percentage	If there is too much idle time means the system has more capacity than required. Administrators can consider removing nodes and using them for other purpose.
Total server nodes	Indicates the total number of server nodes in the cluster.	Number
Total client nodes	Indicates the total number of client nodes in the cluster.	Number