K8s StatefulSets By Namespace Test

In Kubernetes, StatefulSets are used for managing stateful applications, providing guarantees around the ordering and uniqueness of pods. These guarantees make StatefulSets ideal for applications that require stable network identities, persistent storage, and ordered deployment and scaling (e.g., databases, distributed systems).

This test auto-discovers the StatefulSets in each namespace, and for each StatefulSet, reports the age, count of replicas (including ready replica, updated replica, current replica and available replica) ensuring that the application is fully functional and can handle the load with the expected number of healthy pods. This test also reports the under-utilization of the allocated memory and CPU by the pods in StatefulSets that leads to potential performance issues.

Target of the test : Azure Kubernetes Service Cluster

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each StatefulSet in every namespace configured in the Azure Kubernetes Service Cluster being monitored.

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed.
Host	The IP address of the host for which this test is to be configured.
Port	Specify the port at which the specified Host listens. By default, this is 6443.
Load Balancer / Master Node IP	To run this test and report metrics, the eG agent needs to connect to the Kubernetes API on the master node and run API commands. To enable this connection, the eG agent has to be configured with either of the following: If only a single master node exists in the cluster, then configure the eG agent with the IP address of the master node. If the target cluster consists of more than one master node, then you need to configure the eG agent with the IP address of the load balancer that is managing the cluster. In this case, the load balancer will route the eG agent's connection request to any available master node in the cluster, thus enabling the agent to connect with the API server on that node, run API commands on it, and pull metrics. By default, this parameter will display the Load Balancer / Master Node IP that you configured when manually adding the Kubernetes/OpenShift cluster for monitoring, using the Kubernetes Cluster Preferences page in the eG admin interface (see Figure 3). The steps for managing the cluster using the eG admin interface are discussed elaborately in How to Monitor the Kubernetes/OpenShift Cluster Using eG Enterprise? Whenever the eG agent runs this test, it uses the IP address that is displayed (by default) against this parameter to connect to the Kubernetes API. If there is any change in this IP address at a later point in time, then make sure that you update this parameter with it, by overriding its default setting.
SSL	By default, the Kubernetes/OpenShift cluster is SSL-enabled. This is why, the eG agent, by default, connects to the Kubernetes API via an HTTPS connection. Accordingly, this flag is set to Yes by default. If the cluster is not SSL-enabled in your environment, then set this flag to No.
K8s Cluster API Prefix	By default, this parameter is set to none. Do not disturb this setting if you are monitoring a Kubernetes/OpenShift Cluster. To run this test and report metrics for Rancher clusters, the eG agent needs to connect to the Kubernetes API on the master node of the Rancher cluster and run API commands. The Kubernetes API of Rancher clusters is of the default format: http(s)://{IP Address of kubernetes}/{api endpoints}. The Server section of the kubeconfig.yaml file downloaded from the Rancher console helps in identifying the Kubernetes API of the cluster. For e.g., https://{IP address of Kubernetes}/k8s/clusters/c-m-bznxvg4w/ is usually the URL of the Kubernetes API of a Rancher cluster. For the eG agent to connect to the master node of a Rancher cluster and pull out metrics, the eG agent should be made aware of the API endpoints in the Kubernetes API of the Rancher cluster. To aid this, you can specify the API endpoints available in the Kubernetes API of the Rancher cluster against this parameter. In our example, this parameter can be specified as: /k8s/clusters/c-m-bznxvg4w/.
Authentication Token	The eG agent requires an authentication bearer token to access the Kubernetes API, run API commands on the cluster, and pull metrics of interest. The steps for generating this token have been detailed in How Does eG Enterprise Monitor a Kubernetes/OpenShift Cluster? Typically, once you generate the token, you can associate that token with the target Kubernetes/OpenShift cluster, when manually adding that cluster for monitoring using the eG admin interface. The steps for managing the cluster using the eG admin interface are discussed elaborately in How to Monitor the Kubernetes/OpenShift Cluster Using eG Enterprise? By default, this parameter will display the Authentication Token that you provided in the Kubernetes Cluster Preferences page of the eG admin interface, when manually adding the cluster for monitoring (see Figure 3). Whenever the eG agent runs this test, it uses the token that is displayed (by default) against this parameter for accessing the API and pulling metrics. If for any reason, you generate a new authentication token for the target cluster at a later point in time, then make sure you update this parameter with the change. For that, copy the new token and paste it against this parameter.
Namespace to Monitor	To enable the eG agent to monitor a specific Namespace on Kubernetes/OpenShift cluster, specify the name of that Namespace against this parameter. For instance, eshop. Doing so will enable the eG agent to monitor and report metrics specific to this Namespace.
Proxy Host	If the eG agent connects to the Kubernetes API on the master node via a proxy server, then provide the IP address of the proxy server here. If no proxy is used, then the default setting -none - of this parameter, need not be changed,
Proxy Port	If the eG agent connects to the Kubernetes API on the master node via a proxy server, then provide the port number at which that proxy server listens here. If no proxy is used, then the default setting -none - of this parameter, need not be changed,
Proxy Username, Proxy Password, Confirm Password	These parameters are applicable only if the eG agent uses a proxy server to connect to the Kubernetes/OpenShift cluster, and that proxy server requires authentication. In this case, provide a valid user name and password against the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. If no proxy server is used, or if the proxy server used does not require authentication, then the default setting - none - of these parameters, need not be changed.
Kubernetes version	The Version text box indicates the version of the Kubernetes/OpenShift cluster to be managed. The default value is none. If the value of this parameter is not "none", the test uses the value provided (e.g., 28.1) as the Kubernetes version.
Timeout	Specify the duration (in seconds) for which this test should wait for a response from the Kubernetes/OpenShift cluster. If there is no response from the cluster beyond the configured duration, the test will timeout. By default, this is set to 5 seconds.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Statefulset age	Indicates the age of this Statefulset.	Minutes	A StatefulSet is used for stateful applications like databases (MySQL, PostgreSQL, Kafka, etc.). Unlike ReplicaSets, StatefulSets provide persistent storage, stable pod identities, and ordered scaling. As the age of Statefulset increases over time, applications in StatefulSets run longer, consume more resources, such as CPU, memory, and network. This might require scaling the StatefulSet up or allocating more resources.
Replicas	Indicates the number of replicas in this Statefulset.	Number	A replica in Azure Kubernetes Service is simply an identical instance of a pod that Kubernetes manages to ensure that application remains available, scalable, and resilient.The replica count in a StatefulSet should be chosen based on the availability, fault tolerance, scalability, resilience, and resource requirements of stateful application. For distributed, high-availability applications, an odd number of replicas (typically 3 or 5) is a common and effective configuration. For less critical environments, a lower replica count (1 or 2) may suffice.
Ready replicas	Indicates the number of ready replicas in this Statefulset.	Number	For most production use cases, the desirable count for ready replicas should be equal to the desired replica count set in the StatefulSet definition. This ensures that the application is fully functional and can handle the load with the expected number of healthy pods. For example, if the number of replicas is 3 in StatefulSet, then desirable ready replica count should be 3.
Current replicas	Indicates the number of current replicas in this Statefulset.	Number	Ideally, the current replica count should be equal to the desired replica count specified in the StatefulSet. This means that Kubernetes has successfully created and is maintaining the exact number of pods that were requested. For example, if the number of replicas is 3 in StatefulSet, then it denotes that the current replica count is 3.
Updated replicas	Indicates the number of updated replicas in this Statefulset.	Number	During the update process, the updated replica count should match the desired replica count once all pods are successfully updated to the new version. For example, if the number of replicas is 3 in StatefulSet, then updated replica count should gradually increase from 0 to 3 as each pod is updated one by one.
Available replicas	Indicates the number of available replicas in this Statefulset.	Number	The desired available replica count should be equal to the desired replica count. This means that all the pods to be both running and ready to serve traffic. For example, if the number of replicas is 3 in StatefulSet, then the available replica count should ideally be 3, which means all 3 pods are running and ready.
Collisions count	Indicates the count of collisions in this Statefulset.	Number	Collision count typically refers to the number of times a resource (such as a pod, deployment, or service) fails to be scheduled or encounters conflicts due to resource limitations or other issues. The desired collision count ideally should be zero. A collision count greater than zero typically indicates issues with resource scheduling, conflicts, or failures in the clusters.
CPU usage	Indicates the amount of CPU resources used by the containers in this Statefulset.	Millicpu
CPU limits	Indicates the total amount of CPU resources that containers in this Statefulset are allowed to use, as per the resource quota.	Millicpu	Resource requests/limits set using the ResourceQuota object govern the aggregate resource consumption of a namespace - i.e., the total resources that can be consumed/requested across all pods/containers in a namespace. A resource quota is violated only when the total consumption of a resource, across pods/containers in the namespace, exceeds the limits defined in the resource quota. For instance, say that the resource quota of a namespace enforces a CPU usage limit of 2 cores and a memory usage limit of 500Gi. In this case, Kubernetes will allow you to create 2 containers with a CPU core each and 100Gi of memory each. However, if an attempt is made to create another container configured with 1 CPU core and 200Gi of memory, then such an addition operation will fail. This is because, the addition increases the total CPU usage of the namespace to 3 CPU cores, which violates the 2 core limit set by the resource quota.
CPU requests	Indicates the minimum amount of CPU resources that is guaranteed to the containers in this Statefulset, as per the resource quota.	Millicpu
Memory limits	Indicates the total amount of memory resources that containers in this Statefulset are allowed to use, as per the resource quota.	GB
Memory requests	Indicates the minimum amount of memory resources that is guaranteed to the containers in this Statefulset, as per the resource quota.	GB
CPU throttled as percent of Node CPU configured	Indicates the percentage of CPU throttled in this Statefulset when Node CPU is configured.	Percent	Ideally, the value of this measure should be zero, indicating that the application or service running on the Statefulset is able to fully utilize the resources allocated to it without being restricted.
CPU throttled	Indicates the amount of CPU throttled in this Statefulset.	Millicpu	CPU throttling in Statefulset refers to the practice of limiting the CPU's processing power in order to prevent overheating, conserve energy, or ensure that a system does not exceed its allocated resources. A high value of this measure indicates that the containers or pods exceed the allocated CPU resources, leading to performance issues.
CPU slack	Indicates the amount of CPU slack in this Statefulset.	Millicpu	CPU slack in Statefulset refers to the amount of unused CPU capacity available to a pod or container that is below its CPU limit. A high value of this measure indicates that the pod or container is under-utilizing the allocated CPU.
Memory usage	Indicates the amount of memory resources used by the containers in this Statefulset.	GB
Memory slack	Indicates the amount of memory slack in this Statefulset.	GB	Memory slack in Statefulset refers to the amount of unused memory available to a pod or container that is below its memory limit. A high value of this measure indicates that the pod or container is under-utilizing the allocated memory.