Controller Manager Queue Test

The Kubernetes Controller Manager queue tracks work items (like resource changes) for controllers to process. Each controller (e.g., Deployment, Node) watches the API Server, adds changes to its queue, and processes them to reconcile the desired state. Efficient queue handling ensures timely updates, and delays indicate bottlenecks, which can be monitored via logs or metrics.

Monitoring the Controller Manager queue ensures timely processing of resource updates, identifies bottlenecks or delays, and maintains cluster stability by preventing workload synchronization issues or state inconsistencies.

This Controller Manager Queue Test continuously monitors controller manager queue, and reports a set of metrics for each queue. These metrics include queue depth, number of items added, removed etc. These metrics allow the administrator to take informed decision on performance of queue and take preventive actions if required.

Target of the test : A Kubernetes Master Node

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each queue on the target Kubernetes Master node API Server object being monitored

Configurable parameters for the test

Parameter

Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

Specify the port at which the specified Host listens. By default, this is 6443.

Timeout

Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds.

Metric URL

Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Current depth of workqueue

Indicates the number of work items waiting in a controller’s queue for processing.

Number

 

Total number of adds handled by workqueue

Indicates the total number of new items added in the queue during last measurement period.

Number

 

Total number of retries handled by workqueue

Indicates the total number of retries because of processing failure in the last measurement period.

Number

Monitoring retries helps identify underlying problems and prevent prolonged delays in reconciling the desired state of cluster resources.

Work items processed in queue

Indicates the total number of work items processed through the queue in last measurement period.

Number

Each controller monitors its queue for changes and processes items to reconcile the actual cluster state with the desired state. Efficient processing ensures timely updates, while delays or failures may indicate performance bottlenecks, controller issues, or resource contention in the cluster.

Total time taken to process work items from queue

Indicates the time takes to process work items in the queue during the last measurement period.

Milliseconds

 

Longest duration of time taken by a work in queue

Indicates the longest time taken by a work item in the queue.

Seconds

 

Workqueue unfinished work duration

Indicates the duration of time spent on unfinished work during the last meastrement period.

Seconds

 

Work items present in queue

Indicates the number of work items present in the queue during the last measurement period.

Number

 

Total time taken by work items present in queue

Indicates the total time taken for processing the items in the queue.

Milliseconds

 

Average time taken to process a work in queue

Indicates the average time taken to process single work item in the queue.

Milliseconds

 

Average time taken by a work item present in queue

Indicates the average time taken by all work items present in the queue.

Milliseconds