v7000 MDisk Status Test

A managed disk (MDisk) refers to the unit of storage that IBM Storwize V7000 virtualizes. This unit could be a logical volume on an external storage array presented to the IBM Storwize V7000 or a RAID array consisting of internal drives. The IBM Storwize V7000 can then allocate these MDisks into various storage pools.

An MDisk is not visible to a host system on the storage area network, as it is internal or only zoned to the IBM Storwize V7000 system.

An MDisk has four modes:

  • Array: Array mode MDisks are constructed from drives using the RAID function. Array MDisks are always associated with storage pools.
  • Unmanaged: Unmanaged MDisks are not being used by the system. This situation might occur when an MDisk is first imported into the system, for example.
  • Managed: Managed MDisks are assigned to a storage pool and provide extents so that volumes can use it.
  • Image: Image MDisks are assigned directly to a volume with a one-to-one mapping of extents between the MDisk and the volume. This situation is normally used when importing logical volumes into the clustered system that already have data on them, which ensures that the data is preserved as it is imported into the clustered system.

If an MDisk assigned to a cluster cannot be accessed by any of the cluster nodes, then the nodes will not be able to service the host I/O requests they receive, resulting in poor I/O performance of the storage system. If this is to be avoided, then, the current state of every MDisk in the storage system should be tracked continuously, and abnormalities promptly brought to the attention of administrators. This will enable administrators to initiate corrective action and bring the state of the MDisk back to normal. To achieve this, administrators can use the v7000 MDisk Status test. This test reports the current state of each MDisk and the status of the RAID array that hosts the MDisks. In the process, administrators get to know which MDisks and RAID arrays are inaccessible to cluster nodes, investigate the reasons for the anomaly, and resolve it, before it affects the overall I/O performance of the storage system.

Target of the test : An IBM Storwize v7000 storage system

Agent deploying the test : A remote agent

Outputs of the test : One set of results for every MDisk being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens to. By default, this is NULL.

Timeout

Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 60 seconds.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Status

Indicates the current status of this MDisk.

 

This measure reports any of the following values:

  • offline
  • online
  • excluded
  • degraded

The Measure values reported by Status measure table that describes each of the aforesaid measure values.

The numeric values that correspond to the above-mentioned states are as follows:

Measure Value Numeric Value
offline 0
online 1
excluded 2
degraded 3

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the status of the MDisk. However, in the graph of this measure, this will be represented using the corresponding numeric equivalents only.

The detailed diagnosis of this measure reveals the MDisk ID, the MDisk GROUP ID, the MDisk GROUP NAME, the RAID STATUS, the RAID LEVEL, REDUNDANCY of this MDisk, the STRIP SIZE of the MDisk and the TIER to which the MDisk is associated with.

Capacity

Indicates the total capacity of this MDisk.

TB

 

Raid status

Indicates the current status of the RAID array hosting this MDisk.

MB/Sec

This measure reports any one of the following values:

  • offline
  • online
  • excluded
  • degraded

The numeric values that correspond to the above-mentioned measure values are as follows:

Measure Value Numeric Value Description
offline 0 the array is offline on all nodes
degraded 1 the array has deconfigured or offline members; the array is not fully redundant
syncing 2 array members are all online, the array is syncing parity or mirrors to achieve redundancy
initiating 3 array members are all online, the array is initializing; the array is fully redundant
online 4 array members are all online, and the array is fully redundant

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the status of the raid array hosting this MDisk. However, in the graph of this measure, the state will be represented using the corresponding numeric equivalents only.

The table below describes each of the measure values reported by the Status measure.

Measure values reported by Status measure
State Description

Offline

The MDisk cannot be accessed by any of the online nodes. That is, all of the nodes that are currently working members of the cluster cannot access this MDisk. This state can be caused by a failure in the SAN storage system, or one or more physical disks connected to the storage system. The MDisk is reported as offline if all paths to the disk fail.

online

The MDisk can be accessed by all online nodes. The MDisk is online when the following conditions are met:

  • All timeout error recovery procedures complete and report the disk as online.
  • Logical unit number (LUN) inventory of the target ports correctly reported the MDisk.
  • Discovery of this LUN completed successfully.
  • All of the MDisk target ports report this LUN as available with no fault conditions.

excluded

The MDisk has been excluded from use by the cluster after repeated access errors. Run the Directed Maintenance Procedures to determine the problem.

degraded

This can be owing to degraded paths or degraded ports. A degraded path can render an MDisk inaccessible to one/more nodes in the cluster. Degraded path status is most likely the result of incorrect configuration of either the disk controller or the fibre-channel fabric. However, hardware failures in the disk controller, fibre-channel fabric, or node could also be a contributing factor to this state. Complete the following actions to recover from this state:

  • Verify that the fabric configuration rules for storage systems are correct.
  • Ensure that you have configured the storage system properly.
  • Correct any errors in the event log.

If the MDisk has one/more 1220 errors in the event log, then the degraded state is owing to degraded ports. The 1220 error indicates that the remote fibre-channel port has been excluded from the MDisk. This error might cause reduced performance on the storage controller and usually indicates a hardware problem with the storage controller. To fix this problem you must resolve any hardware problems on the storage controller and fix the 1220 errors in the event log.