Disk Mirrors Test

Disk mirroring is a technique that uses data redundancy (two complete copies of all data stored on two separate disks) to protect against loss of data due to disk failure. One logical volume is duplicated on two separate disks (primary and secondary configuration). Whenever the operating system needs to write to a mirrored volume, both disks are updated. The disks are maintained at all times with exactly the same information. When the operating system needs to read from the mirrored volume, the operating system reads from whichever disk is more readily accessible at the moment, which can result in enhanced performance for read operations.

If a single disk fails, the data is still available, but if both disks fail, then data loss is imminent. To prevent such an eventuality, administrators need to keep track of the status of every disk mirror, rapidly identify failed/unavailable disks, and ensure that they are replaced quickly. This can be achieved using the Disk Mirrors test. This test monitors and reports the current state of each disk mirror, and thus points to failed/unavailable disks and those that have errors. In the process, the test warns administrators of any data loss that may occur owing to the unavailability of disks or disk errors.

This test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick the desired Component type, set Performance as the Test type, choose the test from the DISABLED TESTS list, and click on the << button to move the test to the ENABLED TESTS list. Finally, click the Update button.

Target of the test : A Solaris host

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each disk mirror on the Solaris host

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port to which the specified host listens

Use Sudo

By default, the Use Sudo parameter is set to No. This indicates that, by default, this test will report the health of every RAID volume by executing the raidctl –l command. However, in some highly secure environments, the eG agent install user may not have the permissions to execute this command directly. In such cases, do the following:

  • Edit the sudoers file on the target host and append an entry of the following format to it:

<eG_agent_install_user> ALL=(ALL) NOPASSWD: <Command>

For instance, if the eG agent install user is eguser, then the entry in the sudoers file should be:

eguser ALL=(ALL) NOPASSWD: raidctl -l

  • Finally, save the file.

  • Then, when configuring the test using the eG admin interface, set the Use Sudo parameter to Yes. This will enable the eG agent to execute the sudo raidctl -l command and retrieve the desired metrics.

Sudo Path

This parameter is relevant only when the use sudo parameter is set to ‘Yes’. By default, the sudo path is set to none. This implies that the sudo command is in its default location – i.e., in the  /usr/bin or /usr/sbin folder of the target Solaris host. In this case, the eG agent automatically runs the raidctl –l command with sudo from its default location, once the use sudo flag is set to Yes. However, if the sudo command is available in a different location in your environment, you will have to explicitly specify the full path to the sudo command in the sudo path text box to enable the eG agent to run the sudo command.   

Measurements made by the test
Measurement Description Measurement Unit Interpretation

State:

Indicates the current state of this disk mirror.

 

The values that this measure can take and their corresponding numeric values are as follows:

Measure Value Numeric Value
Okay 0
Needs Maintenance 1
Last erred 2
Unavailable 3

If a disk fails, it typically switches to the Unavailable state. When a disk experiences errors, Solaris puts that disk in the Maintenance state. No further reads or writes are performed to a disk in the Maintenance state. Sometimes a disk goes into a Last Erred state. For a RAID-1 volume, this usually occurs with a one-sided mirror. The disk experiences errors; however, there are no redundant components to read from. For a RAID-5 volume, this occurs after one disk goes into Maintenance state, and another disk fails. The second disk to fail goes into the Last Erred state.

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current state of the disk mirror. In the graph of this measure however, the same is represented using the corresponding numeric equivalents only.

Besides the above, hardware monitoring expertise can also be optionally built into the Operating System layer of a Solaris host. Please refer to the Hardware Monitoring document for further details.