Uptime - AHV Test

In most virtualized environments, it is essential to monitor the uptime of critical AHV servers in the infrastructure.  By tracking the uptime of each of the servers, administrators can determine what percentage of time a server has been up. Comparing this value with service level targets, administrators can determine the most trouble-prone areas of the infrastructure.

In some environments, administrators may schedule periodic reboots of their servers. By knowing that a specific server has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a server.

The Uptime - AHV test included in the eG agent monitors the uptime of critical AHV servers in a virtualized infrastructure. 

Target of the test : A Nutanix AHV server

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the Nutanix AHV server monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port at which the specified host listens. By default, this is NULL

Console User

Provide the name of a valid user to the AHV server being monitored.

Console Password

Specify the password of the CONSOLE USER here. Confirm the password by retyping in the Confirm Password text box.

High Security

By default, this flag is set to Yes indicating that eG Enterprise connects to the target server using SSH in a more secure manner to collect performance metrics.

Log Location

Typically, the first time this test executes on a Windows host, it creates a sysuptime_<Nameofmonitoredcomponent>.log in the <EG_AGENT_INSTALL_DIR>\agent\logs directory. This log file keeps track of the system reboots - each time a reboot occurs, this log file is updated with the corresponding details. During subsequent executions of this test, the eG agent on the Windows host reads this log file and reports the uptime and reboot-related metrics of the target. In case of a physical Windows host, this log file ‘persists’ in the said location, regardless of how often the system is rebooted. However, in case the target Windows system has been ‘provisioned’ by a Provisioning server, this log file is recreated in the <EG_AGENT_INSTALL_DIR>\agent\logs directory every time a reboot/refresh occurs. In the absence of a ‘persistent’ log file, the test will not be able to track reboots and report uptime accurately. To avoid this, when monitoring a provisioned Windows system/server, you have the option to instruct the test to create the sysuptime_<Nameofmonitoredcomponent>.log file in an alternate location that is ‘persistent’ - i.e., in a directory that will remain regardless of a restart. Specify the full path to this persistent location in the Log Location text box. For instance, your Log Location can be, D:\eGLogs. In this case, when the test executes, the sysuptime_<Nameofmonitoredcomponent>.log file will be created in the D:\eGLogs\eGagent\logs folder. By default, the Log Location parameter is set to none.

Report Manager Time

If this flag is set to Yes (which is the default setting), then this test will report measures for even those VMs that do not have any users logged in currently. Such VMs will be identified by their virtual machine name and not by the username_on_virtualmachinename. On the other hand, if the Report Powered OS flag is set to No, then this test will not report measures for those VMs to which no users are logged in currently.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Has the system been rebooted?

Indicates whether the server has been rebooted during the last measurement period or not.

Boolean

If this measure shows 1, it means that the server was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this server was rebooted. 

The detailed diagnosis of this measure, if enabled, will provide you with the details of the last reboot of the AHV server. Such details will include the shutdown date/time, reboot date/time, the shutdown duration (in minutes), and whether the host has been configured for maintenance or not.

Uptime during the last measure period

Indicates the time period that the system has been up since the last time this test ran.

Seconds

If the server has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the server was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the server was rebooted 120 secs back, this metric will report a value of 120 seconds.  The accuracy of this metric is dependent on the measurement period - the smaller the measurement period, greater the accuracy.

Total uptime of the system

Indicates the total time that the server has been up since its last reboot.

Minutes

This measure displays the number of years, months, days, hours, minutes and seconds since the last reboot. Administrators may wish to be alerted if a server has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions.