Controller Manager Health Test
The Kubernetes Controller Manager ensures the desired state of cluster resources by running key controllers, like node, deployment, and replica controllers. Monitoring its health is vital to detect failures in resource synchronization or scaling operations. Kubernetes provides a /healthz endpoint for the controller manager, allowing issues to be identified early to maintain cluster stability and workload performance.
Monitoring the Kubernetes Controller Manager's health ensures critical controllers operate correctly, maintaining resource synchronization, scaling, and failover processes. Issues can disrupt cluster stability and application performance. Regular health checks via its /healthz endpoint enable early detection of problems, ensuring seamless orchestration and adherence to the desired state of the cluster.
The Controller Manager Health Test continuously monitors the Controller Manager in the target node and reports key metrics like Health status, end point availability, response time etc. These metrics are invaluable for the administrators to ensure that service is up and prevent any issues in case service has problems.
Target of the test : A Kubernetes Master Node
Agent deploying the test : A remote agent
Outputs of the test : One set of results for the target Kubernetes Master node being monitored
Parameter |
Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the host for which this test is to be configured. |
Port |
Specify the port at which the specified Host listens. By default, this is 6443. |
Timeout |
Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds. |
Metric URL |
Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox. |
Health URL |
Kubernetes provides health endpoins which can be used to monitor the health of Kubernetes worker nodes. Specify the URL for health endpoints. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement |
Description |
Measurement Unit |
Interpretation |
||||||
---|---|---|---|---|---|---|---|---|---|
Health |
Indicates if the health of API Server is Ok or Not Ok. |
|
The values that this measure reports and their corresponding numeric values are detailed in the table below:
Note: By default, this test reports the Measure Values listed in the table above to indicate Health of Controller Manager on the target node. In the graph of this measure however, the state is indicated using the numeric equivalents only. |
||||||
Endpoint availability |
Indicates the percentage of APIs are currently available for clients to call, |
% |
API availability is the ability of the Controller Manager to accept and respond to client requests. It is critical for cluster management, enabling communication with Kubernetes components. If the endpoint is unavailable, resource creation, updates, and cluster operations will be disrupted, leading to failure. |
||||||
Endpoint response time |
Indicates the duration it takes for the Controller Manager to process and return a response to a client's request. |
Milliseconds |
It is critical for system performance. High response times can indicate server overload, inefficient queries, or resource bottlenecks, impacting cluster management and operation efficiency. |
||||||
Metric endpoint availability |
Indicates the percentage of Metric endpoints are currently available for clients to call, |
% |
|
||||||
Metric endpoint response time |
Indicates the duration it takes for the API Server to process and return a response to a client's request on Metric endpoints. |
Milliseconds |
|
||||||
Metric endpoint response size |
Indicates the average size of response size of API responses. |
GB |
|