Containers Uptime Test
Container uptime in Kubernetes refers to the duration a container has been continuously running since its last start or restart. It is an important metric for assessing the reliability and stability of applications. Uptime can be monitored using kubectl commands, Prometheus, or container runtime tools like Docker or containerd. Frequent restarts or short uptime periods may indicate issues such as crashes or resource constraints. Properly monitoring container uptime helps in performance analysis, debugging, and ensuring overall system reliability. Implementing health checks and setting up alerts can enhance container stability and operational efficiency.
The Container Uptime Test continuously monitors the containers in the current node and reports the key metrics related to uptime of the metrics etc. These metrics are invaluable for the administrators to ensure that containers are restarted regularly and in timely manner.
Target of the test : A Kubernetes Worker Node
Agent deploying the test : A remote agent
Outputs of the test : One set of results for the target Kubernetes Worker node being monitored.
Parameter |
Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the host for which this test is to be configured. |
Port |
Specify the port at which the specified Host listens. By default, this is 6443. |
Use Sudo |
By default, the eG agent does not require any special permissions to parse and read messages from the log file to be monitored. This is why, the Use Sudo parameter is set to No by default. In some highly-secure Unix environments however, the eG agentinstall user may not have the permission to read the log file to be monitored. In such environments, you will have to follow the steps below to ensure that the test is able to read the log file and report metrics: Edit the SUDOERS file on the target host and append an entry of the following format to it: <eG_agent_install_user> ALL=(ALL) NOPASSWD: <Log_file_with_path> For instance, if the eG agent install user is eguser, and the log file to be monitored is /usr/bin/logs/procs.log, then the entry in the SUDOERS file should be: eguser ALL=(ALL) NOPASSWD: /usr/bin/logs/procs.log Finally, save the file. Then, when configuring this test using the eG admin interface, set the Use Sudo parameter to Yes. Once this is done, then every time the test runs, it will check whether the eG agent install user has the necessary permissions to read the log file. If the user does not have the permissions, then the test runs the sudo command to change the permissions of the user, so that the eG agent is able to read from the log file. |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 3:1. This indicates that, by default, detailed measures will be generated every third time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement |
Description |
Measurement Unit |
Interpretation |
||||||
---|---|---|---|---|---|---|---|---|---|
Has the container been restarted? |
Indicates if the container has ever been restarted. |
|
The values that this measure reports and their corresponding numeric values are detailed in the table below:
Note: By default, this test reports the Measure Values listed in the table above to indicate If the container has been restarted. In the graph of this measure however, the same is indicated using the numeric equivalents only. |
||||||
Uptime of the container during the last measurement period. |
Indicates the uptime of the container. |
Seconds |
|
||||||
Total uptime of the container |
Indicates the total time the container has been up. |
Seconds |
|