CRIO Containers - Uptime Test

In environments where CRIO container engine is used extensively to launch containers and pods, it is essential to monitor the uptime of critical containers launched by the target CRIO engine. By tracking the uptime of each of the containers, administrators can determine what percentage of time a container has been up. In some environments, administrators may schedule periodic reboots of their containers. By knowing that a specific container has been up for an unusually long time, an administrator can easily identify that the scheduled reboot task is not working on a container. The CRIO Containers - Uptime test helps administrators track such irregularities with ease!

Use this test to promptly detect unscheduled reboots and unexpected breaks in the availability of each container on the target CRIO container engine.

Target of the test : A CRIO Container Engine

Agent deploying the test : A containerized agent

Outputs of the test : One set of results for each container available in the CRIO Engine being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

The port number at which the specified host listens. The default is 2379.

Use SUDO

By default, this flag is set to No. This indicates that, by default, this test will report the metrics of this test by executing the crictl command. However, in some highly secure environments, the eG agent install user may not have the permissions to execute this command directly. In such cases, do the following:

  • Edit the SUDOERS file on the target host and append an entry of the following format to it:

    <eG_agent_install_user> ALL=(ALL) NOPASSWD:<Command>

    For instance, if the eG agent install user is eguser, then the entries in the sudoers file should be:

    eguser ALL=(ALL) NOPASSWD: crictl

  • Finally, save the file.
  • Then, when configuring the test using the eG admin interface, set this parameter to Yes. This will enable the eG agent to execute the sudo crictl command  and retrieve the relevant metrics for this test.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enabled/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Has the container been restarted?

Indicates whether/not this container was rebooted.

 

The values reported by this measure and their numeric equivalents are available in the table below:

Measure Value Numeric Value
No 0
Yes 1

Note:

This measure reports the Measure Values listed in the table above while indicating whether /not this container was rebooted. However, in the graph of this measure, the measure is indicated using only the Numeric Values listed in the above table.

For each container, the detailed diagnosis of this measure lists the time, shutdown date, restart date, duration of shutdown, and whether/not each container is in maintenance.

Uptime of the container during the last measure period

Indicates the time duration for which this container has been up since the last time this test ran.

Secs

A low value implies that the container was recently rebooted. From the measure value, you can figure out if the reboot was scheduled or unscheduled.

A high value could indicate that a scheduled reboot has failed.

Total uptime of the container

Indicates the total time that this container has been up since its last reboot.

 

This measure displays the number of years, months, days, hours, minutes and seconds since the last reboot of each container. Administrators may wish to be alerted if a container has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions.