EMC RAID LUNs Test

A logical unit number (LUN) is a unique identifier used to designate individual or collections of hard disk devices for address by a protocol associated with a SCSI, iSCSI, Fibre Channel (FC) or similar interface. LUNs are central to the management of storage arrays shared over a storage area network (SAN). LUN errors, poor LUN cache usage, and abnormal I/O activity on the LUNs, if not promptly detected and resolved, can hence significantly degrade the performance of the storage array. This is why, it is important that LUN performance is continuously monitored. This can be achieved using the EMC RAID LUNs test. This test auto-discovers the LUNs in the storage system and reports the current state of each LUN, captures LUN errors, and measures the level of I/O activity on every LUN, so that administrators are notified of LUN-related problems well before they impact storage system performance.

This test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick the EMC Clariion SAN as the desired Component type, set Performance as the Test type, choose the test from the disabled tests list, and click on the < button to move the test to the ENABLED TESTS list. Finally, click the Update button.

Target of the test : An EMC CLARiiON storage device

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each LUN on the storage system.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the storage device for which this test is to be configured.

Port

The port number at which the storage device listens. The default is NULL.

User Name and Password

The SMI-S Provider is paired with the EMC CIM Object Manager Server to provide an SMI-compliant interface for CLARiiON arrays. Against the User and Password parameters, specify the credentials of a user who has been assigned Monitor access to the EMC CIM Object Manager Server paired with EMC CLARiiON’s SMI-S provider.

Confirm Password

Confirm the Password by retyping it here.

SSL

Set this flag to Yes, if the storage device being monitored is SSL-enabled.

IsEmbedded

By default, this flag is set to False for an EMC CLARiiON device. Do not disturb this default setting.

SerialNumber

If the SMI-S provider has been implemented as a proxy, then such a provider can be configured to manage multiple storage devices. This is why, you will have to explicitly specify which storage system you want the eG agent to monitor. Since each storage system is uniquely identified by a Serial number, specify the same here. The serial number for an EMC CLARiiON device will be of the format, FCNMM094900059.

NameSpace

Specify the NameSpace that uniquely identifies the profiles specific to the provider in use. For EMC CLARiiON, this parameter will be set to root/emc by default.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Health state

Indicates how healthy this LUN currently is.

 

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value Numeric Value
OK 0
Unknown 1
Degraded/Warning 2
Minor failure 3
Major failure 4
Critical failure 5
Non-recoverable error 6

Note:

By default, this measure reports the Measure Values discussed above to indicate the state of a LUN In the graph of this measure however, states are represented using the numeric equivalents only.

Operational status

Indicates the current operational state of this LUN.

 

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value Numeric Value
OK 0
In Service 1
Power Mode 2
Completed 3
Starting 4
Dormant 5
Other 6
Unknown 7
Stopping 8
Stressed 9
Stopped 10
Supporting Entity in Error 11
Degraded or Predicted Failure 12
Predictive Failure 13
Lost Communication 14
No Contact 15
Aborted 16
Error 17
Non-Recoverable Error 18

Note:

By default, this measure reports the Measure Values discussed above to indicate the operational state of a LUN. In the graph of this measure however, operational states are represented using the numeric equivalents only.

Detailed operational state

Describes the current operational state of this LUN.

 

This measure will be reported only if the API provides a detailed operational state.

Typically, the detailed state will describe why the LUN is in a particular operational state. For instance, if the Operational status measure reports the value Stopping for a LUN, then this measure will explain why that LUN is being stopped.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value Numeric Value
Online 0
Success 1
Power Saving Mode 2
Write Protected 3
Write Disabled 4
Not Ready 5
Removed 6
Rebooting 7
Offline 8
Failure 9

Note:

By default, this measure reports the Measure Values discussed above to indicate the detailed operational state of a LUN. In the graph of this measure however, detailed operational states are represented using the numeric equivalents only.

Data transmitted

Indicates the rate at which data was transmitted by this LUN.

MB/Sec

 

IOPS

Indicates the rate at which I/O operations were performed on this LUN.

IOPS

Compare the value of this measure across LUNs to know which LUN handled the maximum number of I/O requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across LUNs.

You may then want to take a look at the Reads and Writes measures to understand what to fine-tune – the load-balancing algorithm for read requests or that of the write requests.

Reads

Indicates the rate at which read operations were performed on this LUN.

Reads/Sec

Compare the value of this measure across LUNs to know which LUN handled the maximum number of read requests and which handled the least.

Writes

Indicates the rate at which write operations were performed on this LUN.

Writes/Sec

Compare the value of this measure across LUNs to know which LUN handled the maximum number of write requests and which handled the least.

Data reads

Indicates the rate at which data is read from this LUN.

MB/Sec

Compare the value of these measures across LUNs to identify the slowest LUN in terms of servicing read and write requests (respectively).

Data writes

Indicates the rate at which data is written to this LUN.

MB/Sec

LUN busy

Indicates the percentage of time this LUN was busy processing requests.

Percent

Compare the value of this measure across LUNs to know which LUN was the busiest and which LUN was not. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across LUNs.

Average read size

Indicates the amount of data read from this LUN per I/O operation

MB/Op

Compare the value of these measures across LUNs to identify the slowest LUN in terms of servicing read and write requests (respectively).

Average write size

Indicates the amount of data written to this LUN per I/O operation.

MB/Op

Read hit

Indicates the percentage of read requests that were serviced by the cache of this LUN.

Percent

A high value is desired for this measure. A very low value is a cause for concern, as it indicates that cache usage is very poor; this in turn implies that direct LUN accesses, which are expensive operations, are high.

Write hit

Indicates the percentage of write requests that were serviced by the cache of this LUN.

Percent

A high value is desired for this measure. A very low value is a cause for concern, as it indicates that cache usage is very poor; this in turn implies that direct LUN accesses, which are expensive operations, are high.

Average response time

Indicates the time taken by this LUN to respond to I/O requests.

Microsecs

Ideally, this value should be low. If not, it implies that the LUN is slow.

EMC queue length

Indicates the number of requests that are in queue for this LUN.

Number

A consistent increase in this value indicates a potential processing bottleneck with the LUN.

EMC disk crossings

Indicates the number of times an I/O crossed a stripe boundary on a RAID 6, RAID 5, RAID 0, or RAID 1/0 LUN.

Number

A CLARiiON LUN appears to the host as an OS device. Typically, to use the disk device, it has to be formatted with disk partitions. Then OS file systems are created in one or some of the disk partitions formatted.

Typically, from a striped LUN, OS disk formatting would create a partition, with a disk partition header. Then, an OS file system is created on that disk partition. As OS files are added to the file system. the first file will have a piece sitting on the first stripe element of the LUN (for example, 64 KB). So, if we try to do an I/O of 64 KB on this OS file, part of the data will end up going to the first stripe element, which belongs to one physical drive, and the rest to the second drive that makes up the striped LUN. This type of drive crossing is called a stripe crossing. Striped crossing results in less efficient dispatches of I/O requests from the CLARiion storage processors to the back-end disk drives, thereby reducing request service efficiency. This is why, ideally, the value of this measure should be very low.

Prefetched

Indicates the amount of data prefetched in the read cache of this LUN.

KB

Prefetching is read-ahead caching. It lets the SP anticipate the data an application will request so that it can read it from disk into its read cache before the data is needed.

Prefetched not used

Indicates the amount of prefetched data in the read cache of this LUN that was not read during the last measurement period.

KB

If the value of this measure keeps growing for a LUN, you may want to fine-tune the pre-fetching to ensure that that LUN’s read cache is not unnecessarily filled with data that is not usable. For instance, you may want to reduce the Maximum Prefetch value for a LUN, so that the storage system does not allow too many disk blocks to be prefetched for variable-length prefetching.

EMC queue arrivals

Indicates the number of times a user request arrived while at least one other request was being processed.

 

Number

 

Utilization through SPA

Indicates the amount of data that was utilized in this LUN during storage processor A.

KB

Compare the value of this measure across LUNs to identify the top data consumers through SP A.

Utilization through SPB

Indicates the amount of data that was utilized in this LUN during storage processor B.

KB

Compare the value of this measure across LUNs to identify the top data consumers through SP B.

Response through SPA

Indicates the time taken by this LUN to respond to I/O requests through storage processor A.

Microsec

Compare the value of this measure across LUNs to identify the least responsive LUN through SP A.

Response through SPB

Indicates the time taken by this LUN to respond to I/O requests through storage processor B.

Microsec

Compare the value of this measure across LUNs to identify the least responsive LUN through SP B.