Uptime Test

In most production environments, it is essential to monitor the uptime of critical servers in the infrastructure. By tracking the uptime of each of the servers, administrators can determine what percentage of time a server has been up. Comparing this value with service level targets, administrators can determine the most trouble-prone areas of the infrastructure.

In some environments, administrators may schedule periodic reboots of their servers. By knowing that a specific server has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a server.

This test included in the eG agent monitors the uptime of critical Windows and Unix servers.

Target of the test : A Windows or Unix server

Agent deploying the test : An internal agent

Outputs of the test : One set of results for every server being monitored

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed.
Host	The host for which the test is to be configured.
Report Manager Time	By default, this flag is set to Yes, indicating that, by default, the detailed diagnosis of this test, if enabled, will report the shutdown and reboot times of the device in the manager’s time zone. If this flag is set to No, then the shutdown and reboot times are shown in the time zone of the system where the agent is running(i.e., the system being managed for agent-based monitoring, and the system on which the remote agent is running - for agentless monitoring)
Log location	This is applicable only to Windows platforms. Typically, the first time this test executes on a Windows system/server, it creates a sysuptime_<Nameofmonitoredcomponent>.log in the <eg_agent_install_dir>\agent\logs directory. This log file keeps track of the system reboots - each time a reboot occurs, this log file is updated with the corresponding details. During subsequent executions of this test, the eG agent on the Windows system/server reads this log file and reports the uptime and reboot-related metrics of the target. In case of a physical Windows system/server, this log file ‘persists’ in the said location, regardless of how often the system is rebooted. However, in case of a Windows system/server that has been ‘provisioned’ by a Provisioning server, this log file is recreated in the <eg_agent_install_dir>\agent\logs directory every time a reboot/refresh occurs. In the absence of a ‘persistent’ log file, the test will not be able to track reboots and report uptime accurately. To avoid this, when monitoring a provisioned Windows system/server, you have the option to instruct the test to create the sysuptime_<Nameofmonitoredcomponent>.log file in an alternate location that is ‘persistent’ - i.e., in a directory that will remain regardless of a restart. Specify the full path to this persistent location in the log location text box. For instance, your log location can be, D:\eGLogs. In this case, when the test executes, the sysuptime_<Nameofmonitoredcomponent>.log file will be created in the D:\eGLogs\eGagent\logs folder. By default, the log location parameter is set to none
High Security	This flag is applicable only when the target Linux host is monitored in the agentless manner. In highly secure environments, eG Enterprise could not perform agentless monitoring on a Linux host using SSH. To enable monitoring of the Linux hosts in such environments, set the HIGH SECURITY flag to Yes. It indicates that eG Enterprise will connect to the target Linux host in a more secure way and collect performance metrics. By default, this flag is set to No.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test
Measurement	Description	Measurement Unit	Interpretation
Has the system been rebooted?	Indicates whether the server has been rebooted during the last measurement period or not.		If this measure shows 1, it means that the server was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this server was rebooted.
Uptime during the last measure period	Indicates the time period that the system has been up since the last time this test ran.	Seconds	If the server has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the server was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the server was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period - the smaller the measurement period, greater the accuracy.
Total uptime of the system	Indicates the total time that the server has been up since its last reboot.		This measure displays the number of years, months, days, hours, minutes and seconds since the last reboot. Administrators may wish to be alerted if a server has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions.

Note:

For a Unix host, if a value less than a minute is configured as the test period of the Uptime test, then, the Uptime during the last measure period measure will report the value 0 until the minute boundary is crossed. For instance, if you configure the Uptime test for a Unix host to run every 10 seconds, then, for the first 5 test execution cyles (i.e., 10 x 5 = 50 seconds), the Uptime during the last measure period measure will report the value 0 only; however, the sixth time the test executes (i.e, when test execution touches the 1 minute boundary), this measure will report the value 60 seconds. This way, every sixth measurement period will report 60 seconds as the uptime of the host. This is because, Unix hosts report uptime only in minutes and not in seconds.
For systems running Windows 8 (or above), the Uptime test may sometimes report incorrect values. This is because of the 'Fast Startup' feature, which is enabled by default for Windows 8 (and above) operating systems. This feature ensures that the Windows operating system is NOT SHUTDOWN COMPLETELY, when the host is shutdown. Instead, the operating system saves the image of the Windows kernel and loaded drivers to the file, C:\hiberfil.sys, upon shutdown. When the Windows host is later started, the operating system simply loads hiberfil.sys into memory to resume operations, instead of performing a clean start. Because of this, the Windows system will not record this event as an actual 'reboot'. As a result, the Uptime test will not be able to correctly report if any reboot happened recently ; neither will it be able to accurately compute the time since the last reboot.

To avoid this, you need to disable the Fast Startup feature on Windows 8 (and above). The steps to achieve this are outlined below:
1. Login to the target Windows system.
2. Edit the Windows Registry. Look for the following registry entry:
  
  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Power
3. Locate the HiberbootEnabled key under the entry mentioned above.
4. Change the value of this key to 0 to turn off Fast Startup. By default, its value will be 1, as Fast Startup is enabled by default.
  
  Also, note that the Fast Startup feature does not work if the system is “restarted”; it works only when the system is shutdown and then started.