Job Templates Test

A job template is a definition and set of parameters for running an Ansible job. Job templates are useful to execute the same job many times without having to define the parameters repeatedly. The job templates also encourage the reuse of Ansible playbook content and collaboration between teams. The job templates can be associated with multiple hosts and groups in the Ansible Tower. Each job template enables administrators to run important jobs such as playbook runs, cloud inventory updates, source control updates, etc. across the hosts and groups simultaneously. The predefined job templates save time and reduce workload of administrators. Periodically, administrators should check whether the jobs executed using each template are completed successfully, identify failures and delays (if any), investigate the reasons for the failure and fix them, so that such job failures do not adversely impact the performance of the Tower. The Jobs Templates test helps administrators rapidly capture job failures and promptly initiate remedial actions.

This test auto-discovers the job templates in each project in the Ansible Tower, and reports the total number of jobs that was launched using each job template in a particular project. In the process, this test reports the status of a job that ran last in each project and time taken for each job to complete. This enables administrators to identify a job template that causes more job failures and find out if any of the jobs took longer than usual to complete. In addition, this test alerts administrators to the number of hosts and groups that failed during jobs execution using each job template. These statistics help administrators to find out where the real problem (i.e. is it with job template or with host/group?) lies and resolve the same quickly.

Target of the test : Ansible Tower

Agent deploying the test : A remote agent

Outputs of the test : One set of the results for each project:job template pair in the Ansible Tower being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed. By default, this is set to 5 minutes.

Host

The IP address of the host for which this test is to be configured.

Port

The port at which the specified host listens. By default, this is NULL.

Username and Password

The eG agent makes REST API calls for pulling out performance metrics from the Ansible Tower. For this purpose, the eG agent should be allowed to connect to Ansible Tower's REST API. To enable this connection, administrators need to configure the valid credentials of a user who has administrator privileges on the Ansible Tower against the Username and Password parameters.

Confirm Password

Confirm the password of the specified user by retyping it here.

Excluded Projects

Specify the comma-separated list of projects that you want the eG agent to exclude from monitoring against this parameter. By default, this is set to "none" indicating that this test will monitor all the projects in the Ansible tower.

Excluded Job Templates

Specify the comma-separated list of Job Templates that you want the eG agent to exclude from monitoring in this text box. By default, this is set to "none" indicating that this test will monitor all the Job Templates in the Ansible tower.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Total jobs

Indicates the total number of jobs launched using this job template.

Number

 

Last job status

Indicates whether the job that was last executed using this job template is successful or not.

 

The values that this measure can report and its corresponding numeric equivalents are listed in the table below:

Measure Value Numeric Value
Failed 0
Success 1

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the job is successful or not. In the graph of this measure however, the same is represented using the corresponding numeric equivalents only.

Comparing the output reported for this measure across the job templates will reveal the job template that is prone to more failures.

Last job elapsed time

Indicates the time taken by the job that was last executed using this job template.

Seconds

 

Forks

Indicates the number of Forks i.e., hosts that are running in parallel while launching the jobs using this job template.

Number

Ansible talks to remote hosts in parallel, and the level of parallelism can be set either by passing -forks, or editing the default in a configuration file. The default is a very conservative 5 forks, though if you have a lot of RAM, you can easily set this to a value like 50 for increased parallelism.

Total hosts

Indicates the total number of hosts to which this job template has been associated.

Number

 

Host active failures

Indicates the number of hosts that failed while running the jobs using this job template.

Number

Host failures may occur while running jobs due to missing of the host details such as new/changed IP addresses, variables, updates, etc., in the playbooks of the job template.

Ideally, the value of this measure should be zero or very low. A high value for this measure is a cause for concern. Compare the value of this measure across the job templates to find out which job template is causing more host failures. This way, administrators are accurately pinpointed to the faulty job template and enabled to take corrective measures quickly.

Total groups

Indicates the total number of groups to which this job template has been associated.

Number

A group consists of several hosts assigned to a pool that can be conveniently targeted together, and also given variables that they share in common.

Groups active failures

Indicates the number of groups that failed while running the jobs using this job template.

Number

Group failures during job execution occur due to many reasons. For instance, lets say, if the details about the new group or any host in the group are not updated in the playbooks of a job template that is to be executed on the inventory, then the job will fail. This in turn will cause the group failures during job launches.

Ideally, the value of this measure should be zero or very low. Compare the value of this measure across the job templates to find out which job template is causing more failures.