Raid Groups Test

Data ONTAP organizes disks into RAID groups, which are collections of data and parity disks to provide parity protection. For Data ONTAP 6.5 onwards the following RAID types are supported for NetApp storage systems:

  • RAID4 technology: In this RAID, within each RAID group, a single disk is assigned for holding parity data, which ensures against data loss due to a single disk failure within a group.
  • RAID-DP™ technology (DP for double-parity): RAID-DP provides a higher level of RAID protection for Data ONTAP aggregates. Within its RAID groups, it allots one disk for holding parity data and one disk for holding double-parity data. Double-parity protection ensures against data loss due to a double disk failure within a group.

For native storage, Data ONTAP uses RAID-DP or RAID4 groups to provide parity protection. For third-party storage, Data ONTAP uses RAID0 groups to optimize performance and storage utilization. The storage arrays provide the parity protection for third-party storage. Data ONTAP RAID groups are organized into plexes, and plexes are organized into aggregates.

This test auto discovers the RAID groups in the storage system and helps the administrator figure out the following:

  • How many disks are in abnormal state i.e., prefailed and replacing?
  • What is the total size of this RAID group? Is any RAID group facing/is about to encounter a space crunch?
  • The percentage of media scrubbing andd parity scrubbing that has been completed in this RAID group.

Target of the test : A NetApp Unified Storage

Agent deploying the test : An external/remote agent

Outputs of the test : One set of results for each RAID group on the NetApp storage system being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

Specify the port at which the specified host listens in the Port text box. By default, this is NULL.

User

Here, specify the name of the user who possesses the following privileges:

login-http-admin,api-aggr-check-spare-low,api-aggr-list-info,api-aggr-mediascrub-list-info,api-aggr-scrub-list-info,api-cifs-status,api-clone-list-status,api-disk-list-info,api-fcp-adapter-list-info,api-fcp-adapter-stats-list-info,api-fcp-service-status,api-file-get-file-info,api-file-read-file,api-iscsi-connection-list-info,api-iscsi-initiator-list-info,api-iscsi-service-status,api-iscsi-session-list-info,api-iscsi-stats-list-info,api-lun-config-check-alua-conflicts-info,api-lun-config-check-cfmode-info,api-lun-config-check-info,api-lun-config-check-single-image-info,api-lun-list-info,api-nfs-status,api-perf-object-get-instances-iter*,api-perf-object-instance-list-info,api-quota-report-iter*,api-snapshot-list-info,api-vfiler-list-info,api-volume-list-info-iter*.

If such a user does not pre-exist, then, you can create a special user for this purpose using the steps detailed in Creating a New User with the Privileges Required for Monitoring the NetApp Unified Storage.

Password

Specify the password that corresponds to the above-mentioned User.

Confirm Password

Confirm the Password by retyping it here.

Authentication Mechanism

In order to collect metrics from the NetApp Unified Storage system, the eG agent connects to the ONTAP management APIs over HTTP or HTTPS. By default, this connection is authenticated using the LOGIN_PASSWORD authentication mechanism. This is why, LOGIN_PASSWORD is displayed as the default authentication mechanism.

Use SSL

Set the Use SSL flag to Yes, if SSL (Secured Socket Layer) is to be used to connect to the NetApp Unified Storage System, and No if it is not.

API Port

By default, in most environments, NetApp Unified Storage system listens on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled) only. This implies that while monitoring the NetApp Unified Storage system, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of the NetApp Unified Storage system - i.e., if the NetApp Unified Storage system is not SSL-enabled (i.e., if the Use SSL flag above is set to No), then the eG agent connects to the NetApp Unified Storage system using port 80 by default, and if the NetApp Unified Storage system is SSL-enabled (i.e., if the Use SSL flag is set to Yes), then the agent-NetApp Unified Storage system communication occurs via port 443 by default. Accordingly, the API Port parameter is set to default by default.

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the API Port parameter, you can specify the exact port at which the NetApp Unified Storage system in your environment listens, so that the eG agent communicates with that port for collecting metrics from the NetApp Unified Storage system.

vFilerName

A vFiler is a virtual storage system you create using MultiStore, which enables you to partition the storage and network resources of a single storage system so that it appears as multiple storage systems on the network. If the NetApp Unified Storage system is partitioned to accommodate a set of vFilers, specify the name of the vFiler that you wish to monitor in the vFilerName text box. In some environments, the NetApp Unified Storage system may not be partitioned at all. In such a case, the NetApp Unified Storage system is monitored as a single vFiler and hence the default value of none is displayed in this text box.

Timeout

Specify the duration (in seconds) beyond which the test will timeout if no response is received from the device. The default is 120 seconds.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Prefailed disks

Indicates the number of prefailed disks in this RAID group.

Number

The disks that are manually failed due to excessive error logging are termed as Prefailed disks. The contents of these disks are copied into suitable replacement disks i.e., the spare disks available in the storage system.

Ideally, the value of this measure should be 0.

Replacing disks

Indicates the number of replacing disks in this RAID group.

Number

Mismatched disks that are part of an aggregate can be replaced with a more suitable spare disk without disrupting the data service. This process uses the Rapid RAID Recovery process to copy the data from the disk being replaced to a specified spare disk. Frequently replacing the disks will lead to the system degradation. Therefore, the frequent replacement of the disks needs to be avoided by proper initial configuration.

Total physical space

Indicates the total size of this RAID group.

MB

 

Used space

Indicates the total amount of space used by all disks in this RAID group.

MB

Ideally, the value of this measure should be low. If this value grows close to that of the Total physical space measure, then you may want to consider adding more disks to the storage system, or free space in the disks by deleting unnecessary data.

Used space percentage

Indicates the percent of space that is utilized across all disks in this RAID group.

Percent

A low value is desired for this measure. A value close to 100% indicates excessive disk space usage by a RAID group.

Media scrub percentage

Indicates the percentage of media scrubbing that is currently completed in this RAID group.

Percent

Media scrubbing is a continuous background process. The purpose of the continuous media scrub is to detect and correct media errors in order to minimize the chance of storage system disruption due to a media error while a storage system is in degraded or reconstruction mode.

By default, Data ONTAP runs continuous background media scrubbing for media errors on all storage system disks. If a media error is found, Data ONTAP uses RAID to reconstruct the data and repairs the error.

Due to media scrubbing process, the disk LEDs may blink on an apparently idle storage system and some CPU activity may occur even when no user workload is present.

Parity scrub percentage

Indicates the percentage of parity scrubbing that is currently completed in this RAID group.

Percent

The purpose of the parity scrub is to detect and correct errors in the parity disk of the RAID group. A consistent parity is required for disk reconstruction.