Vnx LUNs Test

A logical unit number (LUN) is a unique identifier used to designate individual or collections of hard disk devices for address by a protocol associated with a SCSI, iSCSI, Fibre Channel (FC) or similar interface. LUNs are central to the management of block storage arrays shared over a storage area network (SAN). LUN errors, poor LUN cache usage, and abnormal I/O activity on the LUNs, if not promptly detected and resolved, can hence significantly degrade the performance of the block storage array. This is why, it is important that LUN performance is continuously monitored. This can be achieved using the Vnx LUNS test. This test auto-discovers the LUNs in the VNX for Block system and reports the current state of each LUN, captures LUN errors, and measures the level of I/O activity on every LUN, so that administrators are notified of LUN-related problems well before they impact storage system performance.

Target of the test : An EMC VNX Unified Storage system

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each LUN in the EMC VNX Unified Storage system.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the storage device for which this test is to be configured.

Port

The port number at which the storage device listens. The default is NULL.

Controller IP

Specify the IP address of the storage controller on the block-only storage system in the Controller IP text box. By default, the IP address of the Host will be assigned in the Controller IP text box.

NaviseccliPath

The eG agent uses the command-line utility, NaviSecCli.exe, which is part of the NaviSphere Management Suite, to communicate with and monitor the storage device. To enable the eG agent to invoke the CLI, configure the full path to the CLI in the NaviseccliPath text box.

User Name and Password

Provide the credentials of a user with Administrator rights to the storage controller in the User Name and Password text boxes.

Confirm Password

Confirm the password by retyping it here.

User Scope

To use the NaviSphere CLI, the eG agent needs to be configured with a User Scope. Scope defines the access radius of the user account (User and Password) that you have configured for this test. Set User Scope to Local if the user account you have configured for this test applies to the monitored storage system only. Set User Scope to Global if the user account you have configured applies to all the storage systems within a domain.

Timeout

Indicate the duration (in seconds) for which this test should wait for a response from the storage device. By default, this is set to 120 seconds. Note that the 'Timeout' value should always be set between 3 and 600 seconds only.

Ignore disabled LUNs

If you do not wish to monitor the LUNs that are disabled in the target environment, set the Ignore disabled LUNs flag to Yes. By default, this flag is set to No.

Exclude LUNs

Specify a comma-separated list of LUNs that you wish to exclude from the scope of monitoring in the Exclude LUNs text box. By default, none is displayed here.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. For instance, if you set to 1:1, it means that detailed measures will be generated every time this test runs, and also every time the test detects a problem. By default, the DD Frequency is set to 4:1.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

LUN binding completion

Indicates the percentage of the binding process that is complete for this LUN.

Percent

A bind is an information organization, data security, and data integrity feature of a storage system. Binding a LUN involves the preparation of allocated storage space. This preparation is particularly important when storage capacity is being reallocated for reuse. This reuse of storage includes erasing any previous data found on the hard drives, and the setting of parity and metadata for the storage.

LUNs are typically available for use immediately after they are bound. However, the bind is not strictly complete until after all the bound storage has been prepared and verified. Depending on the LUN size and verify priority, these two steps may take several hours. Using the value of this measure, you will be able to track the progress of the binding function, and will be able to gauge how much longer it will take for the binding to complete.

Data reads

Indicates the rate at which data was read from this LUN.

Blocks/Sec

Comparing the value of these measures across LUNs will clearly indicate which LUN is the busiest in terms of the rate at which data is read and written - it could also shed light on irregularities in load balancing across the LUNs.

Data writes

Indicates the rate at which data was written to this LUN.

Blocks/Sec

Total hard errors

Indicates the number of hard errors on this LUN.

Number

Increase in value of this measure indicates disk life is going to end or that the disk is about to fail.

LUN size

Indicates the size of this LUN, in blocks.

Blocks

 

LUN capacity

Indicates the total capacity of this LUN.

GB

 

Average queue requests

Indicates the average number of requests to this LUN that are in queue.

Number

A very high value could indicate a processing bottleneck on the LUN. By comparing the value of this measure across LUNs, you can quickly identify which LUN has too many pending requests - this LUN could probably be the one with the processing bottleneck.

Current read cache hits

Indicates the number of times read requests to this LUN were fulfilled by the read cache.

Number

A high value is desired for this measure.

Read cache misses

Indicates the number of times read requests to this LUN were not serviced by the read cache.

Number

A low value is desired for this measure.

Read hit ratio

Indicates the percentage of read requests to this LUN that were serviced by the cache.

Percent

Ideally, the value of this measure should be high. A low value indicates that many read requests are serviced by direct disk accesses, which is a more expensive operation in terms of processing overheads.

Read requests

Indicates the number of read requests made per second to this LUN.

Reqs/Sec

Comparing the value of these measures across LUNs will clearly indicate which LUN is overloaded with requests and of what kind – read or write? It could also shed light on irregularities in load balancing across the LUNs.

Write requests

Indicates the number of write requests made per second to this LUN.

Reqs/Sec

Rebuild process completion

Indicates the percentage of this LUN that has been rebuilt.

Percent

A rebuild replaces a failed hard disk within a RAID group with an operational disk. If one or more LUNs are bound to the RAID group with the failed disk, then, all the LUNs affected by the failure are rebuilt. A rebuild restores a LUN to its fully assigned number of hard drives using an available hot spare should a drive in one of the RAID groups fail. LUNs are rebuilt one by one. Each LUN is rebuilt by its owning Storage Processor (SP).

Using the value of this measure, you will be able to track the progress of the rebuild, and will be able to gauge how much longer it will take for the rebuilding to complete.

Total soft errors

Indicates the total number of uncorrected read and write errors on this LUN.

Number

Increase in value of this measure indicates disk life is going to end or that the disk is about to fail.

State

Indicates the current state of this LUN.

 

If the state reported by this measure is Bound, it indicates that the LUN is currently in a bound state. A bind creates LUNs on a RAID GROUP. Binding a LUN involves the preparation of allocated storage space. This preparation is particularly important when storage capacity is being reallocated for reuse.

LUNs are bound after RAID GROUPS are created. LUNs are available for use immediately after they are created, but the bind is not strictly complete until after all the bound storage has been prepared and verified.

During the preparation step, the storage allocated to the LUN is overwritten with binary zeroes. These zeroes erase any previous data from the storage and set up for the parity calculation. When zeroing is complete, parity and metadata is calculated for the LUN sectors.

If the state reported by this measure is Not bound, it indicates that the LUN is currently in an unbound state.

The numeric values that correspond to each of the states discussed above are as follows:

State Numeric Value
Bound 1
Not bound 0

Note:

By default, this measure reports the above-mentioned States as its values. The graph of this measure however, represents the LUN status using the numeric equivalents - 0 or 1.

Use the detailed diagnosis of this measure to view additional details of a LUN.

Total I/O

Indicates the rate of the I/O activity on this LUN.

Reqs/Sec

A consistent increase in the value of this measure for a LUN could hint at a potential overload condition.

Current write cache hits

Indicates the number of times write requests to this LUN were fulfilled by the write cache.

Number

A high value is desired for this measure.

Write hit ratio

Indicates the percentage of write requests to this LUN that were serviced by the cache.

Percent

Ideally, the value of this measure should be high. A low value indicates that data is often directly written to the disk, which is a more expensive operation in terms of processing overheads.