Etcd Server Test
Etcd is a distributed key-value store used by Kubernetes to store and manage cluster configuration data, state, and metadata. It acts as the source of truth for the cluster, storing critical information such as pod configurations, service discovery data, and secrets. etcd servers run as a set of nodes, typically in an odd-numbered quorum for fault tolerance. They ensure consistency and high availability through the Raft consensus algorithm. Kubernetes components, like the API server, interact with etcd to read and write data. If etcd becomes unavailable or inconsistent, the cluster's operation can be disrupted.
Monitoring etcd is crucial because it stores Kubernetes' critical cluster data. If etcd experiences issues like unresponsiveness, data corruption, or high latency, it can lead to cluster instability or failure. Regular monitoring ensures etcd health, preventing disruptions and enabling timely recovery to maintain consistent cluster operations.
The Etcd server Test continuously monitors the server in the deployment and reports the status of containers and reveal key metrics like total number of containers, running containers, added and removed containers etc. These metrics are invaluable for the administrators to ensure that there are enough containers are up and prevent any issues.
Target of the test : A Kubernetes Worker Node
Agent deploying the test : A remote agent
Outputs of the test : One set of results for the target Kubernetes master node being monitored
Parameter |
Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the host for which this test is to be configured. |
Port |
Specify the port at which the specified Host listens. By default, this is 6443. |
Timeout |
Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds. |
Metric URL |
Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement |
Description |
Measurement Unit |
Interpretation |
---|---|---|---|
Total client requests |
Indicates the total number of client requests hitting the server. |
Number |
A very high number of client requests will be not be easy to server. |
Total health failures |
Indicates the total number of health failures in the given time period. |
Number |
|
Total health success |
Indicates the total number of health success over a number of measurements. |
Number |
|
Total connection failures |
Indicates the total number of connection failures encountered by the server. |
Number |
|
Total proposals applied |
indicates the total number of write proposals (or transactions) that have been successfully applied to the etcd store. |
Number |
You can use this metric to monitor the rate of changes in your cluster's state. |
Total proposals committed |
Indicates the total number of write proposals (or transactions) that have been successfully committed to the etcd store. |
Number |
You can use this metric to monitor the rate of changes in your cluster's state. |
Total proposals failed |
Indicates the total number of write proposals (or transactions) that have been failed to be applied to etcd store. |
Number |
A high number of failed proposals needs to be investigated. |
Pending Proposals |
Indicates the total number of write proposals (or transactions) thatare still pending. |
Number |
A high number of pending proposals might take long time to resolve and can affect application performance. |
Total bytes |
Indicates the amount of data stored in etcd store. |
Bytes |
If you are operating a Kubernetes cluster with a large number of services, pods, or config maps, the total bytes in etcd will naturally grow. Regular monitoring ensures the cluster remains healthy, with sufficient disk space and optimized data management. |
Total indexes failed |
Indicates the total number of failed attempts to access or create indexes. |
Number |
This can be used to alert administrators to potential issues that need to be investigated. |
Server |
|
|
|
Total watch stream |
Indicates the total number of watch streams that have been established between clients and the etcd server |
Number |
A watch stream is a mechanism by which clients (such as Kubernetes components or other applications) can subscribe to real-time updates or notifications about changes to data stored in etcd. |
Total Expired lease |
Indicates the total number of leases that have expired in the system |
Number |
A lease in etcd is a mechanism used to manage the TTL (Time to Live) for keys or resources. When a client stores data in etcd with a lease, the data is automatically deleted after the lease expires unless the lease is renewed. |