Ignite Always Failover SPI Test

Apache Ignite comes with high degree of fault tolerance and supports automatic job failover. In case of node crash or job failure on a given node, jobs are automatically transferred to other available nodes for re-execution. The Always Failover SPI (Super interface) ensures that when a job from a compute task fails, an attempt is made to reroute the failed job to a node that has not executed any other job from the same task. If no such node is available, then an attempt is made to reroute the failed job to one of the nodes that may be running other jobs from the same task. If none of the above attempts succeeds, then the job is not failed over.

Always failover SPI is responsible for automatic failover and needs to be monitored to make sure it is working as expected.

This test monitors the Always Failover SPI to ensure that jobs are rerouted to other nodes in case of failover. In case jobs are failing but no jobs are being failed over, administrators may need to figure out if something is wrong with SPI.

Target of the test : Apache Ignite Server

Agent deploying the test : An internal or external agent

Outputs of the test : One set of results for each Apache Ignite Server

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed.
Host	Enter the IP address of the Apache Ignite cluster.
Port	Enter the port number on which JMX connector listens to incoming connections requests.
JMX Remote Port	In this text box, enter the name of a virtual warehouse that needs to be monitored. The JMX connector listens on 8686 by default. If it listens on different port in your environment then specify the same.
JMX User	Specify the credentials of the user who is authorized to use JMX.
JMX Password	Specify the password for the authorized user.
Confirm Password	Confirm the password by retyping it here.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Total failover jobs	Indicates the total number of jobs failed over to other nodes apart from the node where they were originally executed.	Number	In case jobs are failing but no jobs are being failed over, administrators may need to figure out if something is wrong with SPI.