Ignite Always Failover SPI Test

Apache Ignite comes with high degree of fault tolerance and supports automatic job failover. In case of node crash or job failure on a given node, jobs are automatically transferred to other available nodes for re-execution. The Always Failover SPI (Super interface) ensures that when a job from a compute task fails, an attempt is made to reroute the failed job to a node that has not executed any other job from the same task. If no such node is available, then an attempt is made to reroute the failed job to one of the nodes that may be running other jobs from the same task. If none of the above attempts succeeds, then the job is not failed over.

Always failover SPI is responsible for automatic failover and needs to be monitored to make sure it is working as expected.

This test monitors the Always Failover SPI to ensure that jobs are rerouted to other nodes in case of failover. In case jobs are failing but no jobs are being failed over, administrators may need to figure out if something is wrong with SPI.

Target of the test : Apache Ignite Server

Agent deploying the test : An internal or external agent

Outputs of the test : One set of results for each Apache Ignite Server

Configurable parameters for the test

Parameter

Description

Test period

How often should the test be executed.

Host

Enter the IP address of the Apache Ignite cluster.

Port

Enter the port number on which JMX connector listens to incoming connections requests.

JMX Remote Port

In this text box, enter the name of a virtual warehouse that needs to be monitored. The JMX connector listens on 8686 by default. If it listens on different port in your environment then specify the same.

JMX User

Specify the credentials of the user who is authorized to use JMX.

JMX Password

Specify the password for the authorized user.

Confirm Password

Confirm the password by retyping it here.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Total failover jobs

Indicates the total number of jobs failed over to other nodes apart from the node where they were originally executed.

Number

In case jobs are failing but no jobs are being failed over, administrators may need to figure out if something is wrong with SPI.