API Server Health Test

The Kubernetes API Server is a key component of the control plane, responsible for handling RESTful requests from users, applications, and internal cluster components. Its health is critical, as it orchestrates all operations in the cluster. To monitor its health, Kubernetes provides a /healthz endpoint, which indicates if the server is functioning correctly. The API Server's health checks evaluate its connectivity to etcd (the cluster's database) and other key components. Any issues can disrupt cluster management and workload scheduling.

Monitoring the Kubernetes API Server's health is crucial as it ensures the cluster's control plane remains operational. The API Server manages resource deployments, scheduling, and interactions with etcd. Downtime or issues can disrupt workload operations and cluster stability. Regular health checks help detect problems early, enabling quick resolution to maintain seamless orchestration and application availability.

The API Server Health Test continuously monitors the API Server in the target node and reports key metrics like Health status, end point availability, response time etc. These metrics are invaluable for the administrators to ensure that service is up and prevent any issues in case service has problems.

Target of the test : A Kubernetes Master Node

Agent deploying the test : An internal agent

Outputs of the test : One set of results for the target Kubernetes Master node being monitored

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed.
Host	The IP address of the host for which this test is to be configured.
Port	Specify the port at which the specified Host listens. By default, this is 6443.
Timeout	Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds.
Health URL	Kubernetes provides health endpoins which can be used to monitor the health of Kubernetes worker nodes. Specify the URL for health endpoints.
Metric URL	Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Health

Indicates if the health of API Server is Ok or Not Ok.

The values that this measure reports and their corresponding numeric values are detailed in the table below:

Measure Value	Numeric Value
Ok	0
Not Ok	5

Note:

By default, this test reports the Measure Values listed in the table above to indicate Health of API Server on the target node. In the graph of this measure however, the state is indicated using the numeric equivalents only.

The detailed diagnosis of this measure shows the exception message that helps you to identify the type and cause of error.

Endpoint availability

Indicates the percentage of APIs are currently available for clients to call,

API availability is the ability of the API Server to accept and respond to client requests. It is critical for cluster management, enabling communication with Kubernetes components. If the endpoint is unavailable, resource creation, updates, and cluster operations will be disrupted, leading to failure.

The detailed diagnosis of this measure shows the exception message that helps you to identify the type and cause of error.

Endpoint response time

Indicates the duration it takes for the API Server to process and return a response to a client's request.

Milliseconds

It is critical for system performance. High response times can indicate server overload, inefficient queries, or resource bottlenecks, impacting cluster management and operation efficiency.

Metric endpoint availability

Indicates the percentage of Metric endpoints are currently available for clients to call,

Metric endpoint response time

Indicates the duration it takes for the API Server to process and return a response to a client's request on Metric endpoints.

Milliseconds

Metric endpoint response size

Indicates the average size of response size of API responses.