v7000 VDisk Status Test

A volume or a VDisk is a logical disk that the clustered system presents to a host connected over a Fibre Channel or Ethernet network. These VDisks enable administrators to more efficiently manage resources. If any of these VDisks is in an offline or degraded state, it can cause write data that has been modified to be pinned in the SAN Volume Controller cache. This prevents volume failover and causes a loss of input/output (I/O) access. I/O loss can also occur if the cache of a VDisk is corrupt. To prevent or at least minimize such losses, administrators need to swiftly detect the abnormal state of the VDisk and/or its cache and instantly initiate measures to remove the abnormality, so that normalcy is restored soon. This is where the v7000 VDisk Status test helps. This test reports the current status of each VDisk of the IBM Storwize v7000 storage system and also reports the cache state of every VDisk, so that the abnormal state of the VDisk and/or the cache can be promptly detected and speedily resolved.

Target of the test : An IBM Storwize v7000 storage system

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each VDisk being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens to. By default, this is NULL.

Timeout

Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 60 seconds.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Status

Indicates the current status of this VDisk.

This measure can take any of the follo:

  • offline
  • online
  • degraded

A VDisk is offline and unavailable if one of the following takes place:

  • Both nodes in the I/O group are missing.
  • None of the nodes in the I/O group that are present can access the VDisk.
  • All synchronized copies for this VDIsk are in storage pools that are offline.
  • The VDisk is formatting.

A VDisk is reported as degraded if any of the following occurs:

  • One of the nodes in the I/O group is missing.
  • One of the nodes in the I/O group cannot access all the MDisks in the storage pool that the VDisk spans. In this case MDisks are shown as degraded and the fix procedures for MDisks should be followed to resolve the problem.
  • The fast write cache pins data for one or more VDisks in the I/O group and is unable to perform a failback until the situation is resolved. An error log indicating that the cache has pinned data is displayed. Follow the fix procedures for this error log to resolve the problem. The most common causes of pinned data are the following:
  • One or more VDisks in an I/O group is offline due to an asymmetric failure and has pinned data in the cache. Asymmetric failures can occur because of Storwize V7000 fabric faults or misconfiguration, back-end controller faults or misconfiguration or because repeated errors has led to the system excluding access to a MDisk through one or more nodes.
  • One or more VDisks in an I/O group is offline due to a problem with a FlashCopy mapping.

The numeric values that correspond to the above-mentioned measure values are as follows:

Measure Value Numeric Value
offline 0
online 1
degraded 2

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the status of a VDisk. However, in the graph of this measure, VDisk status is represented using the corresponding numeric equivalents only

The detailed diagnosis of this measure reveals the VDisk ID, the VDisk IO GROUP ID, the VDisk IO GROUP NAME, MDISK ID, MDISK NAME, the VDisk TYPE and the FAST WRITE STATUS of the VDisk. From the detailed diagnostics, you can glean the name of the I/O group to which the VDisk belongs and the MDisks (i.e., the management disks) in the storage pool that the VDisk spans. In the event that the VDisk is offline or degraded, you can use the I/O group and MDisk ID to investigate the reason for the degradation or unavailability of the VDisk – is it because the I/O group has a missing node? or is it because the MDisk is degraded?

Fast write status

Indicates the cache status of this VDisk.

This measure reports any of the values listed below:

  • corrupt
  • repairing
  • empty
  • not empty

A cache state of corrupt indicates that the VDisk requires recovery by using one of the recovervdisk commands. A cache state of repairing indicates that repairs initiated by a recovervdisk command are in progress.

The numeric values that correspond to each of the measure values listed above are mentioned in the table below:

Measure Value Numeric Value
corrupt 1
repairing 2
empty 3
not empty 4

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the cache status of a VDisk. However, in the graph of this measure, cache state is represented using the corresponding numeric equivalents only.