Containers Uptime Test

Container uptime in Kubernetes refers to the duration a container has been continuously running since its last start or restart. It is an important metric for assessing the reliability and stability of applications. Uptime can be monitored using kubectl commands, Prometheus, or container runtime tools like Docker or containerd. Frequent restarts or short uptime periods may indicate issues such as crashes or resource constraints. Properly monitoring container uptime helps in performance analysis, debugging, and ensuring overall system reliability. Implementing health checks and setting up alerts can enhance container stability and operational efficiency.

The Container Uptime Test continuously monitors the containers in the current node and reports the key metrics related to uptime of the metrics etc. These metrics are invaluable for the administrators to ensure that containers are restarted regularly and in timely manner.

Target of the test : A Kubernetes Worker Node

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the target Kubernetes Worker node being monitored.

Configurable parameters for the test

Parameter

Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

Specify the port at which the specified Host listens. By default, this is 6443.

Use Sudo

By default, the eG agent does not require any special permissions to parse and read messages from the log file to be monitored. This is why, the Use Sudo parameter is set to No by default. In some highly-secure Unix environments however, the eG agentinstall user may not have the permission to read the log file to be monitored. In such environments, you will have to follow the steps below to ensure that the test is able to read the log file and report metrics:

Edit the SUDOERS file on the target host and append an entry of the following format to it:

<eG_agent_install_user> ALL=(ALL) NOPASSWD: <Log_file_with_path>

For instance, if the eG agent install user is eguser, and the log file to be monitored is /usr/bin/logs/procs.log, then the entry in the SUDOERS file should be:

eguser ALL=(ALL) NOPASSWD: /usr/bin/logs/procs.log

Finally, save the file.

Then, when configuring this test using the eG admin interface, set the Use Sudo parameter to Yes. Once this is done, then every time the test runs, it will check whether the eG agent install user has the necessary permissions to read the log file. If the user does not have the permissions, then the test runs the sudo command to change the permissions of the user, so that the eG agent is able to read from the log file.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 3:1. This indicates that, by default, detailed measures will be generated every third time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Has the container been restarted?

Indicates if the container has ever been restarted.

 

The values that this measure reports and their corresponding numeric values are detailed in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this test reports the Measure Values listed in the table above to indicate If the container has been restarted. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Uptime of the container during the last measurement period.

Indicates the uptime of the container.

Seconds

 

Total uptime of the container

Indicates the total time the container has been up.

Seconds