Controller Manager Queue Test

The Kubernetes Controller Manager queue tracks work items (like resource changes) for controllers to process. Each controller (e.g., Deployment, Node) watches the API Server, adds changes to its queue, and processes them to reconcile the desired state. Efficient queue handling ensures timely updates, and delays indicate bottlenecks, which can be monitored via logs or metrics.

Monitoring the Controller Manager queue ensures timely processing of resource updates, identifies bottlenecks or delays, and maintains cluster stability by preventing workload synchronization issues or state inconsistencies.

This Controller Manager Queue Test continuously monitors controller manager queue, and reports a set of metrics for each queue. These metrics include queue depth, number of items added, removed etc. These metrics allow the administrator to take informed decision on performance of queue and take preventive actions if required.

Target of the test : A Kubernetes Master Node

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each queue on the target Kubernetes Master node API Server object being monitored

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed.
Host	The IP address of the host for which this test is to be configured.
Port	Specify the port at which the specified Host listens. By default, this is 6443.
Timeout	Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds.
Metric URL	Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Current depth of workqueue	Indicates the number of work items waiting in a controller’s queue for processing.	Number
Total number of adds handled by workqueue	Indicates the total number of new items added in the queue during last measurement period.	Number
Total number of retries handled by workqueue	Indicates the total number of retries because of processing failure in the last measurement period.	Number	Monitoring retries helps identify underlying problems and prevent prolonged delays in reconciling the desired state of cluster resources.
Work items processed in queue	Indicates the total number of work items processed through the queue in last measurement period.	Number	Each controller monitors its queue for changes and processes items to reconcile the actual cluster state with the desired state. Efficient processing ensures timely updates, while delays or failures may indicate performance bottlenecks, controller issues, or resource contention in the cluster.
Total time taken to process work items from queue	Indicates the time takes to process work items in the queue during the last measurement period.	Milliseconds
Longest duration of time taken by a work in queue	Indicates the longest time taken by a work item in the queue.	Seconds
Workqueue unfinished work duration	Indicates the duration of time spent on unfinished work during the last meastrement period.	Seconds
Work items present in queue	Indicates the number of work items present in the queue during the last measurement period.	Number
Total time taken by work items present in queue	Indicates the total time taken for processing the items in the queue.	Milliseconds
Average time taken to process a work in queue	Indicates the average time taken to process single work item in the queue.	Milliseconds
Average time taken by a work item present in queue	Indicates the average time taken by all work items present in the queue.	Milliseconds