Etcd Server Test

Etcd is a distributed key-value store used by Kubernetes to store and manage cluster configuration data, state, and metadata. It acts as the source of truth for the cluster, storing critical information such as pod configurations, service discovery data, and secrets. etcd servers run as a set of nodes, typically in an odd-numbered quorum for fault tolerance. They ensure consistency and high availability through the Raft consensus algorithm. Kubernetes components, like the API server, interact with etcd to read and write data. If etcd becomes unavailable or inconsistent, the cluster's operation can be disrupted.

Monitoring etcd is crucial because it stores Kubernetes' critical cluster data. If etcd experiences issues like unresponsiveness, data corruption, or high latency, it can lead to cluster instability or failure. Regular monitoring ensures etcd health, preventing disruptions and enabling timely recovery to maintain consistent cluster operations.

The Etcd server Test continuously monitors the server in the deployment and reports the status of containers and reveal key metrics like total number of containers, running containers, added and removed containers etc. These metrics are invaluable for the administrators to ensure that there are enough containers are up and prevent any issues.

Target of the test : A Kubernetes Worker Node

Agent deploying the test : An internal agent

Outputs of the test : One set of results for the target Kubernetes master node being monitored

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed.
Host	The IP address of the host for which this test is to be configured.
Port	Specify the port at which the specified Host listens. By default, this is 6443.
Timeout	Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds.
Metric URL	Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Total client requests	Indicates the total number of client requests hitting the server.	Number	A very high number of client requests will be not be easy to server.
Total health failures	Indicates the total number of health failures in the given time period.	Number
Total health success	Indicates the total number of health success over a number of measurements.	Number
Total connection failures	Indicates the total number of connection failures encountered by the server.	Number
Total proposals applied	indicates the total number of write proposals (or transactions) that have been successfully applied to the etcd store.	Number	You can use this metric to monitor the rate of changes in your cluster's state.
Total proposals committed	Indicates the total number of write proposals (or transactions) that have been successfully committed to the etcd store.	Number	You can use this metric to monitor the rate of changes in your cluster's state.
Total proposals failed	Indicates the total number of write proposals (or transactions) that have been failed to be applied to etcd store.	Number	A high number of failed proposals needs to be investigated.
Pending Proposals	Indicates the total number of write proposals (or transactions) thatare still pending.	Number	A high number of pending proposals might take long time to resolve and can affect application performance.
Total bytes	Indicates the amount of data stored in etcd store.	Bytes	If you are operating a Kubernetes cluster with a large number of services, pods, or config maps, the total bytes in etcd will naturally grow. Regular monitoring ensures the cluster remains healthy, with sufficient disk space and optimized data management.
Total indexes failed	Indicates the total number of failed attempts to access or create indexes.	Number	This can be used to alert administrators to potential issues that need to be investigated.
Server
Total watch stream	Indicates the total number of watch streams that have been established between clients and the etcd server	Number	A watch stream is a mechanism by which clients (such as Kubernetes components or other applications) can subscribe to real-time updates or notifications about changes to data stored in etcd.
Total Expired lease	Indicates the total number of leases that have expired in the system	Number	A lease in etcd is a mechanism used to manage the TTL (Time to Live) for keys or resources. When a client stores data in etcd with a lease, the data is automatically deleted after the lease expires unless the lease is renewed.