Oracle RAC Uptime Test
In most production environments, it is essential to monitor the uptime of critical servers in the infrastructure. By tracking the uptime of each of the servers, administrators can determine what percentage of time a server has been up. Comparing this value with service level targets, administrators can determine the most trouble-prone areas of the infrastructure.
In some environments, administrators may schedule periodic reboots of their servers. By knowing that a specific server has been up for an unusually long time, an administrator may come to know that the scheduled reboot task is not working on a server.
The Oracle RAC Uptime Test monitors the uptime of every node in an Oracle cluster.
Target of the test : Oracle Cluster
Agent deploying the test : An internal agent
Outputs of the test : One set of results for each node in the Oracle cluster being monitored
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Has Oracle server been restarted?: |
Indicates whether this node has been rebooted during the last measurement period or not. |
Boolean |
If this measure shows 1, it means that the node was rebooted during the last measurement period. By checking the time periods when this metric changes from 0 to 1, an administrator can determine the times when this node was rebooted. The detailed diagnosis of this measure, if enabled, indicates the date/time at which the node was shutdown, the date on which it was restarted, the duration of the shutdown, and whether the node was shutdown as part of a maintenance outine. |
Uptime since the last measurement: |
Indicates the time period that this node has been up since the last time this test ran. |
Secs |
If the node has not been rebooted during the last measurement period and the agent has been running continuously, this value will be equal to the measurement period. If the node was rebooted during the last measurement period, this value will be less than the measurement period of the test. For example, if the measurement period is 300 secs, and if the node was rebooted 120 secs back, this metric will report a value of 120 seconds. The accuracy of this metric is dependent on the measurement period – the smaller the measurement period, greater the accuracy. |
Uptime: |
Indicates the total time that the node has been up since its last reboot. |
Mins |
Administrators may wish to be alerted if a node has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions. |