Vnx Disks Test

This test monitors the current state, overall health, and the I/O activity-levels of each disk in the EMC VNX Unified storage system. With the help of this test, administrators can not only identify failed disks, but also those that are error-prone and may fail any time, so that they can endeavor to avert the potential disk failure. In addition, the test also points administrators to disks that are busy processing I/O requests almost all the time. This way, the test sheds light on irregularities in the distribution of I/O load across disks, and prompts administrators to fine-tune the load-balancing algorithm, so as to prevent potential delays in data access. In addition, the test also proactively alerts administrators to probable space contentions in disks and excessive bandwidth consumption by disks, thereby enabling administrators to initiate pre-emptive actions. 

Target of the test : An EMC VNX Unified Storage system

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each disk on the EMC VNX Unified Storage system.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the storage device for which this test is to be configured.

Port

The port number at which the storage device listens. The default is NULL.

Controller IP

Specify the IP address of the storage controller on the block-only storage system in the Controller IP text box. By default, the IP address of the Host will be assigned in the Controller IP text box.

NaviseccliPath

The eG agent uses the command-line utility, NaviSecCli.exe, which is part of the NaviSphere Management Suite, to communicate with and monitor the storage device. To enable the eG agent to invoke the CLI, configure the full path to the CLI in the NaviseccliPath text box.

User Name and Password

Provide the credentials of a user with Administrator rights to the storage controller in the User Name and Password text boxes.

Confirm Password

Confirm the password by retyping it here.

User Scope

To use the NaviSphere CLI, the eG agent needs to be configured with a User Scope. Scope defines the access radius of the user account (User and Password) that you have configured for this test. Set User Scope to Local if the user account you have configured for this test applies to the monitored storage system only. Set User Scope to Global if the user account you have configured applies to all the storage systems within a domain.

Timeout

Indicate the duration (in seconds) for which this test should wait for a response from the storage device. By default, this is set to 120 seconds. Note that the 'Timeout' value should always be set between 3 and 600 seconds only.

Ignore Disabled Disks

If you do not wish to monitor the disks that are disabled in the target environment, set the Ignore Disabled Disks flag to Yes. By default, this flag is set to No.

Exclude Disks

Specify a comma-separated list of disks that you wish to exclude from the scope of monitoring in the Exclude Disks text box. By default, none is displayed here.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. For instance, if you set to 1:1, it means that detailed measures will be generated every time this test runs, and also every time the test detects a problem. By default, the DD Frequency is set to 4:1.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Busy ticks

Indicates the percentage of time for which this disk was busy.

Percent

A value close to 100% is a cause for concern, as it indicates a potential I/O overload on the disk. If the problem persists, it is a sign that serious load-balancing irregularities exist and need to be looked into.

Total capacity

Indicates the total size of this disk.

GB

 

Data reads

Indicates the rate at which data is read from this disk.

MB/Sec

Compare the value of these measures across disks to identify the slowest disk in terms of servicing read and write requests (respectively).

Data writes

Indicates the rate at which data is written to this disk.

MB/Sec

Hard read errors

Indicates the number of hard read errors in this disk.

Number

An increase in the value of these measures indicates that the disk life is going to end or fail. By comparing the value of these measures across disks, you can identify the disk that will potentially fail.

Hard write errors

Indicates the number of hard write errors in this disk.

Number

Soft read errors

Indicates the number of soft read errors in this disk.

Number

Soft write errors

Indicates the number of soft write errors in this disk.

Number

Read requests

Indicates the rate at which read requests were made to this disk.

Reqs/sec

Compare the value of these measures across disks to isolate overloaded disks. This will also reveal irregularities in load balancing across disks.

Write requests

Indicates the rate at which write requests were made to this disk.

Reqs/sec

LUNs

Indicates the number of LUNs that are sharing this disk.

Number

Use the detailed diagnosis of this measure to know which LUNs are sharing this disk.

Read retries

Indicates the number of times read requests to this disk were retried.

Number

A low value is desired for this measure.

Remapped sectors

Indicates the number of sectors on this disk that were remapped to new locations on the disk due to read/write errors.

Number

A low value is desired for this measure.

Request service time

Indicates the time taken by this disk to service requests.

Secs

A high value is typically indicative of an I/O processing bottleneck in the disk. Compare the value of this measure across disks to know which disks are experiencing significant latencies.

State

Indicates the current state of the disk.

 

The values that this measure can report and their corresponding numeric values are indicated in the table below:

Measure Value Numeric Value
Failed 0
Off 1
Removed 2
Binding 3
Empty 4
Enabled 5
Expanding 6
Unbound 7
Powering up 8
Ready 9
Reduced power, Transitioning 10
Hot spare ready 11
Unknown 12
Formatting 13
Equalizing 14
Rebuilding 15
Full power 16
Low power 17
Unformatted 18
Unsupported 19

Note:

By default, this measure reports any of the above-mentioned Measure Values while indicating the status of the disk. However, in the graph of this measure, the same will be represented using their numeric equivalents only - i.e., 0 to 19.

Total bandwidth

This measure indicates the sum of data reads and data writes to this disk.

MB/Sec

Compare the value of this measure across disks to identify the disk that is consuming the maximum bandwidth.

Usage

Indicates the percentage of space in this disk that is currently utilized.

Percent

Ideally, the value of this measure should be low. A consistent increase in this value could indicate a gradual, but steady erosion of space in the disk. A value close to 100% indicates that the disk is rapidly running out of space.

User capacity

Indicates the amount of space on this disk that is assigned to bound LUNs.

GB