Uptime Test
In most production environments, it is essential to monitor the uptime of critical servers in the infrastructure. By tracking the uptime of each of the servers, administrators can determine what percentage of time a server has been up. Comparing this value with service level targets, administrators can determine the most trouble-prone areas of the infrastructure.
In some environments, administrators may schedule periodic reboots of their servers. By knowing that a specific server has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a server.
This test included in the eG agent monitors the uptime of critical Windows and Unix servers.
Target of the test : A Windows or Unix server
Agent deploying the test : An internal agent
Outputs of the test : One set of results for every server being monitored
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Report Manager Time |
By default, this flag is set to Yes, indicating that, by default, the detailed diagnosis of this test, if enabled, will report the shutdown and reboot times of the device in the manager’s time zone. If this flag is set to No, then the shutdown and reboot times are shown in the time zone of the system where the agent is running(i.e., the system being managed for agent-based monitoring, and the system on which the remote agent is running - for agentless monitoring) |
Log location |
This is applicable only to Windows platforms. Typically, the first time this test executes on a Windows system/server, it creates a sysuptime_<Nameofmonitoredcomponent>.log in the <eg_agent_install_dir>\agent\logs directory. This log file keeps track of the system reboots - each time a reboot occurs, this log file is updated with the corresponding details. During subsequent executions of this test, the eG agent on the Windows system/server reads this log file and reports the uptime and reboot-related metrics of the target. In case of a physical Windows system/server, this log file ‘persists’ in the said location, regardless of how often the system is rebooted. However, in case of a Windows system/server that has been ‘provisioned’ by a Provisioning server, this log file is recreated in the <eg_agent_install_dir>\agent\logs directory every time a reboot/refresh occurs. In the absence of a ‘persistent’ log file, the test will not be able to track reboots and report uptime accurately. To avoid this, when monitoring a provisioned Windows system/server, you have the option to instruct the test to create the sysuptime_<Nameofmonitoredcomponent>.log file in an alternate location that is ‘persistent’ - i.e., in a directory that will remain regardless of a restart. Specify the full path to this persistent location in the log location text box. For instance, your log location can be, D:\eGLogs. In this case, when the test executes, the sysuptime_<Nameofmonitoredcomponent>.log file will be created in the D:\eGLogs\eGagent\logs folder. By default, the log location parameter is set to none |
High Security |
This flag is applicable only when the target Linux host is monitored in the agentless manner. In highly secure environments, eG Enterprise could not perform agentless monitoring on a Linux host using SSH. To enable monitoring of the Linux hosts in such environments, set the HIGH SECURITY flag to Yes. It indicates that eG Enterprise will connect to the target Linux host in a more secure way and collect performance metrics. By default, this flag is set to No. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Has the system been rebooted? |
Indicates whether the server has been rebooted during the last measurement period or not. |
|
If this measure shows 1, it means that the server was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this server was rebooted. |
Uptime during the last measure period |
Indicates the time period that the system has been up since the last time this test ran. |
Seconds |
If the server has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the server was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the server was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period - the smaller the measurement period, greater the accuracy. |
Total uptime of the system |
Indicates the total time that the server has been up since its last reboot. |
|
This measure displays the number of years, months, days, hours, minutes and seconds since the last reboot. Administrators may wish to be alerted if a server has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions. |
Note:
|