Controller Manager Queue Test
The Kubernetes Controller Manager queue tracks work items (like resource changes) for controllers to process. Each controller (e.g., Deployment, Node) watches the API Server, adds changes to its queue, and processes them to reconcile the desired state. Efficient queue handling ensures timely updates, and delays indicate bottlenecks, which can be monitored via logs or metrics.
Monitoring the Controller Manager queue ensures timely processing of resource updates, identifies bottlenecks or delays, and maintains cluster stability by preventing workload synchronization issues or state inconsistencies.
This Controller Manager Queue Test continuously monitors controller manager queue, and reports a set of metrics for each queue. These metrics include queue depth, number of items added, removed etc. These metrics allow the administrator to take informed decision on performance of queue and take preventive actions if required.
Target of the test : A Kubernetes Master Node
Agent deploying the test : A remote agent
Outputs of the test : One set of results for each queue on the target Kubernetes Master node API Server object being monitored
Parameter |
Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the host for which this test is to be configured. |
Port |
Specify the port at which the specified Host listens. By default, this is 6443. |
Timeout |
Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds. |
Metric URL |
Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox. |
Measurement |
Description |
Measurement Unit |
Interpretation |
---|---|---|---|
Current depth of workqueue |
Indicates the number of work items waiting in a controller’s queue for processing. |
Number |
|
Total number of adds handled by workqueue |
Indicates the total number of new items added in the queue during last measurement period. |
Number |
|
Total number of retries handled by workqueue |
Indicates the total number of retries because of processing failure in the last measurement period. |
Number |
Monitoring retries helps identify underlying problems and prevent prolonged delays in reconciling the desired state of cluster resources. |
Work items processed in queue |
Indicates the total number of work items processed through the queue in last measurement period. |
Number |
Each controller monitors its queue for changes and processes items to reconcile the actual cluster state with the desired state. Efficient processing ensures timely updates, while delays or failures may indicate performance bottlenecks, controller issues, or resource contention in the cluster. |
Total time taken to process work items from queue |
Indicates the time takes to process work items in the queue during the last measurement period. |
Milliseconds |
|
Longest duration of time taken by a work in queue |
Indicates the longest time taken by a work item in the queue. |
Seconds |
|
Workqueue unfinished work duration |
Indicates the duration of time spent on unfinished work during the last meastrement period. |
Seconds |
|
Work items present in queue |
Indicates the number of work items present in the queue during the last measurement period. |
Number |
|
Total time taken by work items present in queue |
Indicates the total time taken for processing the items in the queue. |
Milliseconds |
|
Average time taken to process a work in queue |
Indicates the average time taken to process single work item in the queue. |
Milliseconds |
|
Average time taken by a work item present in queue |
Indicates the average time taken by all work items present in the queue. |
Milliseconds |
|