Etcd Server Test

Etcd is a distributed key-value store used by Kubernetes to store and manage cluster configuration data, state, and metadata. It acts as the source of truth for the cluster, storing critical information such as pod configurations, service discovery data, and secrets. etcd servers run as a set of nodes, typically in an odd-numbered quorum for fault tolerance. They ensure consistency and high availability through the Raft consensus algorithm. Kubernetes components, like the API server, interact with etcd to read and write data. If etcd becomes unavailable or inconsistent, the cluster's operation can be disrupted.

Monitoring etcd is crucial because it stores Kubernetes' critical cluster data. If etcd experiences issues like unresponsiveness, data corruption, or high latency, it can lead to cluster instability or failure. Regular monitoring ensures etcd health, preventing disruptions and enabling timely recovery to maintain consistent cluster operations.

The Etcd server Test continuously monitors the server in the deployment and reports the status of containers and reveal key metrics like total number of containers, running containers, added and removed containers etc. These metrics are invaluable for the administrators to ensure that there are enough containers are up and prevent any issues.

Target of the test : A Kubernetes Worker Node

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the target Kubernetes master node being monitored

Configurable parameters for the test

Parameter

Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

Specify the port at which the specified Host listens. By default, this is 6443.

Timeout

Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 10 seconds.

Metric URL

Each of the Kubernetes system components expose monitoring metrics through /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, refer official Kubernetes distribution documentation site. Specify the metric URL textbox.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Total client requests

Indicates the total number of client requests hitting the server.

Number

A very high number of client requests will be not be easy to server.

Total health failures

Indicates the total number of health failures in the given time period.

Number

 

Total health success

Indicates the total number of health success over a number of measurements.

Number

 

Total connection failures

Indicates the total number of connection failures encountered by the server.

Number

 

Total proposals applied

indicates the total number of write proposals (or transactions) that have been successfully applied to the etcd store.

Number

You can use this metric to monitor the rate of changes in your cluster's state.

Total proposals committed

Indicates the total number of write proposals (or transactions) that have been successfully committed to the etcd store.

Number

You can use this metric to monitor the rate of changes in your cluster's state.

Total proposals failed

Indicates the total number of write proposals (or transactions) that have been failed to be applied to etcd store.

Number

A high number of failed proposals needs to be investigated.

Pending Proposals

Indicates the total number of write proposals (or transactions) thatare still pending.

Number

A high number of pending proposals might take long time to resolve and can affect application performance.

Total bytes

Indicates the amount of data stored in etcd store.

Bytes

If you are operating a Kubernetes cluster with a large number of services, pods, or config maps, the total bytes in etcd will naturally grow. Regular monitoring ensures the cluster remains healthy, with sufficient disk space and optimized data management.

Total indexes failed

Indicates the total number of failed attempts to access or create indexes.

Number

This can be used to alert administrators to potential issues that need to be investigated.

Server

 

 

 

Total watch stream

Indicates the total number of watch streams that have been established between clients and the etcd server

Number

A watch stream is a mechanism by which clients (such as Kubernetes components or other applications) can subscribe to real-time updates or notifications about changes to data stored in etcd.

Total Expired lease

Indicates the total number of leases that have expired in the system

Number

A lease in etcd is a mechanism used to manage the TTL (Time to Live) for keys or resources. When a client stores data in etcd with a lease, the data is automatically deleted after the lease expires unless the lease is renewed.