Ignite Cluster Test
A node is assigned either one of the two roles: server node or client node. Server nodes are the workhorses of the cluster; they cache data, execute compute tasks, etc. Client nodes join the topology as regular nodes but they do not store data. Client nodes are used to stream data into the cluster and execute user queries.
A cluster is the core of Apache Ignite as it provides key benefits of distributed computing like load balancing, resilience, failover, high availability etc. If there is any issue in communication between cluster nodes, cluster job execution etc, Ignite will not be able to serve the application in optimal manner. That's the reason it is important to monitor Ignite Cluster so that any issues can be highlighted before it can affect cluster performance.
This test monitors Ignite cluster gather key statistics related to jobs, CPU load, waiting time etc. which provide key insights into it's health. These can help administrators help improve cluster performance, identify issues, failures and add right infrastructure at right time before application performance is affected.
Target of the test : Apache Ignite Server
Agent deploying the test : An internal or external agent
Outputs of the test : One set of results for each Apache Ignite Server
Parameter |
Description |
---|---|
Test period |
How often should the test be executed. |
Host |
Enter the IP address of the Apache Ignite cluster. |
Port |
Enter the port number on which JMX connector listens to incoming connections requests. |
JMX Remote Port |
In this text box, enter the name of a virtual warehouse that needs to be monitored. The JMX connector listens on 8686 by default. If it listens on different port in your environment then specify the same. |
JMX User |
Specify the credentials of the user who is authorized to use JMX. |
JMX Password |
Specify the password for the authorized user. |
Confirm Password |
Confirm the password by retyping it here. |
Measurement |
Description |
Measurement Unit |
Interpretation |
---|---|---|---|
Active base line nodes |
Indicates the total number of nodes which are currently in the cluster baseline topology |
Number |
The capacity of cluster to hold data depends on the number of nodes and node configuration. If not enough number of nodes are available, the data requirements might not be met. |
Average active jobs |
Indicates the average number of active jobs executing at any given time on the each node. |
Number |
The number of active jobs dictate the cluster CPU capacity occupied. You must ensure that number of active jobs is optimal and there is enough CPU capacity available for new jobs. |
Average cancelled jobs |
Indicates the average number of jobs cancelled on each node. |
Seconds |
|
Average CPU load |
Indicates the CPU load values averaged over all metrics kept in history across all nodes. |
Percentage |
The number of active jobs dictate the cluster CPU capacity occupied. You must ensure that number of active jobs is optimal and there is enough CPU capacity available for new jobs, so the new jobs don't have to wait too long. |
Average job executes time |
Indicates the average time a job takes to execute on any node in the cluster. |
Seconds |
This is important metric to understand how many jobs can be scheduled and predict the jobs timings. |
Average job waits time |
Indicates the average time a jobs waits in the queue before it is picked for execution. |
Seconds |
If the wait time is too high it means more compute capacity is required, so adding new nodes can help. |
Average rejected job |
Indicates the average number of jobs rejected across all nodes during collision resolution operations. |
Number |
|
Average waiting jobs |
Indicates the average number of jobs waiting per node across all nodes in the cluster. |
Number |
If the number of waiting jobs is too high it means more compute capacity is required, so adding new nodes can help. |
Messages received rate |
Indicates the rate at which messages are received by nodes in the cluster. |
Messages/Sec |
If the rate is going down over a number of measurements, the communication between nodes needs to be improved. |
Messages sent rate |
Indicates the rate at which messages are sent by nodes in the cluster. |
Messages/Sec |
|
Idle time percentage |
Indicates the percentage time any given node is idle vs executing jobs. |
Percentage |
If there is too much idle time means the system has more capacity than required. Administrators can consider removing nodes and using them for other purpose. |
Total server nodes |
Indicates the total number of server nodes in the cluster. |
Number |
|
Total client nodes |
Indicates the total number of client nodes in the cluster. |
Number |
|