Containers Status Test

A Kubernetes container is the primary unit of deployment and contain the services which are building block of application. For application to perform as expected, the containers have to function in appropriate manner. That is the main reason administrators like to be fully updated about the status of containers running in the deployment.

The Container Status Test continuously monitors the Containers in the deployment and reports the status of containers and reveal key metrics like total number of containers, running containers, added and removed containers etc. These metrics are invaluable for the administrators to ensure that there are enough containers are up and prevent any issues.

Target of the test : A Kubernetes Worker Node

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the target Kubernetes Worker node being monitored

Configurable parameters for the test

Parameter

Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

Specify the port at which the specified Host listens. By default, this is 6443.

Use Sudo

By default, the eG agent does not require any special permissions to parse and read messages from the log file to be monitored. This is why, the Use Sudo parameter is set to No by default. In some highly-secure Unix environments however, the eG agentinstall user may not have the permission to read the log file to be monitored. In such environments, you will have to follow the steps below to ensure that the test is able to read the log file and report metrics:

Edit the SUDOERS file on the target host and append an entry of the following format to it:

<eG_agent_install_user> ALL=(ALL) NOPASSWD: <Log_file_with_path>

For instance, if the eG agent install user is eguser, and the log file to be monitored is /usr/bin/logs/procs.log, then the entry in the SUDOERS file should be:

eguser ALL=(ALL) NOPASSWD: /usr/bin/logs/procs.log

Finally, save the file.

Then, when configuring this test using the eG admin interface, set the Use Sudo parameter to Yes. Once this is done, then every time the test runs, it will check whether the eG agent install user has the necessary permissions to read the log file. If the user does not have the permissions, then the test runs the sudo command to change the permissions of the user, so that the eG agent is able to read from the log file.

Long run container days

Specify the number of days for which the long running containers should run after which the containers will be refreshed.

Show total containers in DD

By default, SHOW TOTAL CONTAINERS IN DD flag is set to No, indicating that this test will not report detailed diagnostics for the Total containers measure. You can enable the detailed diagnosis capability of the Total containers measure by setting this flag to Yes.

Show stopped containers in DD

By default, SHOW STOPPED CONTAINERS IN DD flag is set to No, indicating that this test will not report detailed diagnostics for the Crashed containers. You can enable the detailed diagnosis capability of the Crashed containers measure by setting this flag to Yes.

Time limit in weeks

For this test to report the numerical statistics of the containers that were not started/running, set a valid value against this parameter. For example, if you wish to report the containers that were not started for more than 3 weeks, then set 3 against this text box.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 3:1. This indicates that, by default, detailed measures will be generated every third time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Total containers

Indicates the total number of containers in the current node.

Number

 

Running containers

Indicates the total number of containers which are running in the current node.

Number

The running containers out of total container are the ones which are providing resourcing to application. Make sure that number of running containers are always adequate and don't go beyond a threshold.

Exited containers

Indicates the total number of exited containers in the current node.

Number

 

Added containers

Indicates the total number of containers added in the current node.

Number

If certain containers are added, just ensure that if the resource capacity available is sufficient for application.

Removed containers

Indicates the total number of containers removed from the current node.

Number

If certain containers are removed, just ensure that if the resource capacity available is sufficient for application.

Paused containers

Indicates the total number of containers which are available but currently paused.

Number

Paused containers can lead to insufficient resource availability for application, Make sure you fix the problem and start the paused containers.

Containers not started for long time

Indicates the total number of containers which have not been started for a long time.

Number

This can lead to insufficient resource utilization.

Long running containers

Indicates the total number of containers which have not been restarted in long time.

Number

Long running containers may lead to memory leaks and resource exhaustion, and may also miss security updates. Best practice is to update the container images and recycle containers.

The detailed diagnosis of the Running containers measure reveals the Container Name, Container IP, Container ID, Image ID and Created Time.

Figure 1 : The detailed diagnosis of Added containers measure

The detailed diagnosis of the Added containers measure reveals the Container Name, Container IP, Container ID, Image ID and Created Time.

Figure 2 : The detailed diagnosis of Added containers measure

The detailed diagnosis of the Removed containers measure reveals the Container Name, Container IP, Container ID, Image ID and Created Time.

Figure 3 : The detailed diagnosis of Removed containers measure

The detailed diagnosis of the Long running containers measure reveals the Container Name, Container IP, Container ID, Image ID and Created Time.

Figure 4 : The detailed diagnosis of Long running containers measure