Uptime - OS Test

In most virtualized environments, it is essential to monitor the uptime of logical domains hosting critical server applications in the infrastructure. By tracking the uptime of each of the guest domains, administrators can determine what percentage of time a domain has been up. Comparing this value with service level targets, administrators can determine the most trouble-prone areas of the virtualized infrastructure.

In some environments, administrators may schedule periodic reboots of their guest domain. By knowing that a specific domain has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a domain.

This test included in the eG agent monitors the uptime of critical logical domains on a Oracle LDoms server. 

Target of the test : A Oracle LDoms server

Agent deploying the test : An internal agent

Outputs of the test : One set of results for every VM on a Oracle LDoms server.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens. By default, the port is NULL.

Domain

Specify the domain within which the virtual guests reside. Since the Oracle LDoms server supports only Oracle and Linux guests, this parameter should always be set to none.

Admin User

This test connects to each virtual guest and collects status and resource usage statistics from the guest. In order to do so, the test must be configured with user privileges that allow a remote connection to the virtual guest from the Oracle host. If a single user has access to all the guest domains on the Oracle server, specify the name of that user against Admin User, and specify his password against Admin Password. On the other hand, if the user credentials vary from one guest to another, then multiple Admin Users and Admin Passwords might have to be specified for every Oracle LDoms server being monitored.

To help administrators provide these user details quickly and easily, the eG administrative interface embeds a special configuration page. To access this page, simply click on the Click here hyperlink that appears just above the parameters of this test in the test configuration page. To know how to use the special page, refer to Configuring Users for VM Monitoring.

Admin Password

The password of the Admin User needs to be provided here. Here again, if multiple passwords need to be specified, then follow the procedure detailed in Configuring Users for VM Monitoring.

Confirm Password

Confirm the password by retyping it here. Here again, if multiple passwords need to be confirmed, then follow the procedure detailed in Configuring Users for VM Monitoring.

ReportManagerTime

By default, this flag is set to Yes, indicating that, by default, the detailed diagnosis of this test, if enabled, will report the shutdown and reboot times of the LDOMs in the manager’s time zone. If this flag is set to No, then the shutdown and reboot times are shown in the time zone of the system where the agent is running.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Has the VM been rebooted?

Indicates whether the logical domain has been rebooted during the last measurement period or not.

 

Boolean

If this measure shows 1, it means that the guest domain was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this domain was rebooted. 

Uptime of VM during the last measure period

Indicates the time period that the logical domain has been up since the last time this test ran.

Secs

If the guest has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the guest was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the guest was rebooted 120 secs back, this metric will report a value of 120 seconds.  The accuracy of this metric is dependent on the measurement period – the smaller the measurement period, greater the accuracy.

Total uptime of the VM

Indicates the total time that the Ldom has been up since its last reboot.

Mins

Administrators may wish to be alerted if a guest has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions.

Note:

If a value less than a minute is configured as the Test Period of the Uptime - OS test, then, the Uptime during the last measure period measure will report the value 0 for Unix VMs (only) until the minute boundary is crossed. For instance, if you configure the Uptime - OS test to run every 10 seconds, then, for the first 5 test execution cyles (i.e., 10 x 5 = 50 seconds), the Uptime during the last measure period measure will report the value 0 for Unix VMs; however, the sixth time the test executes (i.e, when test execution touches the 1 minute boundary), this measure will report the value 60 seconds for the same VMs. Thereafter, every sixth measurement period will report 60 seconds as the uptime of the Unix VMs. This is because Unix-based operating systems report uptime only in minutes and not in seconds.