Failover Cluster Disks Test

A cluster resource is any physical or logical component that has the following characteristics:

Can be brought online and taken offline.
Can be managed in a server cluster.
Can be hosted (owned) by only one node at a time.

One of the standard cluster resource type is the Physical Disk Resource Type. You use the Physical Disk resource type to manage disks that are on a cluster storage device. Each cluster disk will at any point in time be owned only by a single node in the cluster. Moreover, when configuring a service or application for a cluster, you can select the cluster disk the service/application should use.

If a cluster disk fails or is in an offline state for a long time, it might affect the functioning of the services/applications that rely on that disk for their functioning. Likewise, if a cluster disk runs short of space suddenly, once again the associated services/applications will be affected. To protect these critical services/applications from failure and to define robust fail-over policies for cluster disk resources, administrators will have to continuously monitor the state and usage of each of the cluster disk resources. This can be achieved using the Failover Cluster Disks test. This test auto-discovers the cluster disks and tracks the state and usage of each disk, so that administrators are proactively alerted to abnormalities in the state and excesses in the usage of any disk.

Target of the test : A node in a Windows cluster

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each cluster disk associated with every cluster created

Parameter	Description
Test Period	How often should the test be executed
Host	The host for which the test is to be configured.
Port	The port at which the specified host listens. By default, this is Null.
DD frequency	Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Cluster disk status

Indicates the current state of this cluster disk.

The values that this measure can report and the states they indicate have been listed in the table below:

State	Numeric Value
Online	100
Online pending	90
Inherited	80
Initializing	70
Pending	60
Offline Pending	50
Unknown	40
Offline	20
Failed	0

If the cluster service detects that a cluster disk is not operational, it attempts to restart that cluster disk. You can specify the number of times the cluster service can attempt to restart a resource in a given time interval. If the cluster service exceeds the maximum number of restart attempts within the specified time period, and the disk is still not operational, the cluster service considers the disk to have failed. Typically, a failed disk will adversely impact the availability and performance of the services/applications to which that disk has been assigned.

To ensure high availability of services/applications, you can add the cluster disk and the services/applications that depend on that disk to a single cluster group and configure a fail-over policy for that group. Then, you can configure the failure of the cluster disk to trigger a group fail-over, so that the entire group is failed over to another node in the cluster.

The detailed diagnosis of this measure, if enabled, will indicate the path of the cluster disk, which node currently owns the cluster disk, the shared volume, and the owner group.

Total capacity of cluster disk

Indicates the total capacity of this cluster disk.

Space used in cluster disk

Indicates the space in this cluster disk that is in use currently.

Ideally, the value of this measure should be low. A high value is indicative of excessive space usage by a cluster disk.

Free space in cluster disk

Indicates the amount of space in this cluster disk that is currently unused.

A high value is desired for this measure.

Percentage of cluster disk space used

Indicates the percentage of the total capacity of this cluster disk that is utilized.

Percent

A value close to 100% is indicative of abnormal space usage. Compare the value of this measure across cluster disks to know disk is using space excessively. Before assigning storage to a cluster service/application, you may want to check this comparison to figure out which cluster disks have enough space to manage more services/applications.

Using the detailed diagnosis of the Cluster disk status measure, you can determine the path of the cluster disk, which node currently owns the cluster disk, the shared volume, and the owner group.

Figure 1 : The detailed diagnosis of the Cluster disk status measure