NetApp USD Disks Test

Disks form the basic storage device in the NetApp storage systems. ATA disks, Fibre Channel disks, SCSI disks, SAS disks or SATA disks are used, depending on the storage system model.

Data ONTAP assigns and makes use of four different disk categories to support data storage, parity protection, and disk replacement. The disk category can be one of the following types: Data disk - Holds data stored on behalf of clients within RAID groups (and any system management data) Global hot spare disk - Does not hold usable data, but is available to be added to a RAID group in an aggregate. Any functioning disk that is not assigned to an aggregate functions acts as a hot spare disk. Parity disk - Stores information required for data reconstruction within RAID groups. Double-parity disk - Stores double-parity information within RAID groups, if RAIDDP is used.

Administrators should closely monitor the space usage and the level of I/O activity of each of these disks, so that they can proactively detect a space crunch or an I/O latency and receive early warnings of inconsistencies in load-balancing across disks. The NetApp Unified Storage Disks test aids administrators in this endeavor. This test auto-discovers the disks used by the storage system and reports how well every disk uses the available space and processes I/O requests. This way, potential space contentions and I/O latencies can be isolated, and slow disks and those that are running short of space can be identified. In addition, the test also reports the current state of each disk and how busy each disk is, thus pointing administrators to broken disks and over-used disks. In the process, the test turns the spotlight on irregularities in load-balancing.

Target of the test : A NetApp Unified Storage

Agent deploying the test : An external/remote agent

Outputs of the test : One set of results for each disk on the NetApp storage system being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

Specify the port at which the specified host listens in the Port text box. By default, this is NULL.

User

Here, specify the name of the user who possesses the following privileges:

login-http-admin,api-aggr-check-spare-low,api-aggr-list-info,api-aggr-mediascrub-list-info,api-aggr-scrub-list-info,api-cifs-status,api-clone-list-status,api-disk-list-info,api-fcp-adapter-list-info,api-fcp-adapter-stats-list-info,api-fcp-service-status,api-file-get-file-info,api-file-read-file,api-iscsi-connection-list-info,api-iscsi-initiator-list-info,api-iscsi-service-status,api-iscsi-session-list-info,api-iscsi-stats-list-info,api-lun-config-check-alua-conflicts-info,api-lun-config-check-cfmode-info,api-lun-config-check-info,api-lun-config-check-single-image-info,api-lun-list-info,api-nfs-status,api-perf-object-get-instances-iter*,api-perf-object-instance-list-info,api-quota-report-iter*,api-snapshot-list-info,api-vfiler-list-info,api-volume-list-info-iter*.

If such a user does not pre-exist, then, you can create a special user for this purpose using the steps detailed in Creating a New User with the Privileges Required for Monitoring the NetApp Unified Storage.

Password

Specify the password that corresponds to the above-mentioned User.

Confirm Password

Confirm the Password by retyping it here.

Authentication Mechanism

In order to collect metrics from the NetApp Unified Storage system, the eG agent connects to the ONTAP management APIs over HTTP or HTTPS. By default, this connection is authenticated using the LOGIN_PASSWORD authentication mechanism. This is why, LOGIN_PASSWORD is displayed as the default authentication mechanism.

Use SSL

Set the Use SSL flag to Yes, if SSL (Secured Socket Layer) is to be used to connect to the NetApp Unified Storage System, and No if it is not.

API Port

By default, in most environments, NetApp Unified Storage system listens on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled) only. This implies that while monitoring the NetApp Unified Storage system, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of the NetApp Unified Storage system - i.e., if the NetApp Unified Storage system is not SSL-enabled (i.e., if the Use SSL flag above is set to No), then the eG agent connects to the NetApp Unified Storage system using port 80 by default, and if the NetApp Unified Storage system is SSL-enabled (i.e., if the Use SSL flag is set to Yes), then the agent-NetApp Unified Storage system communication occurs via port 443 by default. Accordingly, the API Port parameter is set to default by default.

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the API Port parameter, you can specify the exact port at which the NetApp Unified Storage system in your environment listens, so that the eG agent communicates with that port for collecting metrics from the NetApp Unified Storage system.

vFilerName

A vFiler is a virtual storage system you create using MultiStore, which enables you to partition the storage and network resources of a single storage system so that it appears as multiple storage systems on the network. If the NetApp Unified Storage system is partitioned to accommodate a set of vFilers, specify the name of the vFiler that you wish to monitor in the vFilerName text box. In some environments, the NetApp Unified Storage system may not be partitioned at all. In such a case, the NetApp Unified Storage system is monitored as a single vFiler and hence the default value of none is displayed in this text box.

Timeout

Specify the duration (in seconds) beyond which the test will timeout if no response is received from the device. The default is 120 seconds.

Disk Busy Threshold

A disk is termed as Busy if there is atleast one outstanding request that is awaiting a response. Alternately, you can set a threshold value in terms of percentage of time to classify the disk as a Busy disk. Specify such a threshold value in the Disk Busy Threshold text box. By default, this value is set to 70 (percent). This parameter has been deprecated in v5.6.5 (and above).

Read Latency Threshold

Sometimes, the read operations by users on a disk may take too long to complete. In such a case, specify a threshold value in the Read Latency Threshold text box, above which you can classify the disk as a Slow disk(read) i.e., you can term this disk as a slow disk (read) when the read operation by the user violates the threshold value mentioned in this text box. By default, this value is set to 20 (milliseconds). This parameter has been deprecated in v5.6.5 (and above).

Write Latency Threshold

Sometimes, the write operations by users on a disk may take too long to complete. In such a case, specify a threshold value in the Write Latency Threshold text box, above which you can classify the disk as a Slow disk(write) i.e., you can term this disk as a slow disk (write) when the write operation by the user violates the threshold value mentioned in this text box. By default, this value is set to 20 (milliseconds). This parameter has been deprecated in v5.6.5 (and above).

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Number of disks

Indicates the total number of disks in this disk group.

Number

This measure is applicable only for disk groups and not individual disks.  This measure has been deprecated in v5.6.5 (and above).

Raid state

Indicates the current RAID status of this disk in this Storage system.

 

The values that this measure reports and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value
partner 1
Present 2
Zeroing 3
Spare 4
Copy 5
Pending 6
Reconstructing 7
Broken 8

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the current RAID status of this disk in this Storage system. However, in the graph of this measure, status will be represented using the corresponding numeric equivalents i.e., 1 to 8.

Free space

Indicates the amount of free space that is currently available for use in this disk of this Storage system.

MB

A high value is desired for this measure.

Physical space

Indicates the total amount of space available in this disk of this Storage system.

MB

 

Used space

Indicates the amount of space that is already utilized in this disk of this Storage system.

MB

A consistent increase in the value of these measures could indicate that the disk space is getting slowly but steadily eroded.

Compare the value of these measures across all disks to identify the disks that are utilizing disk space excessively.

Used space percentage

Indicates the percentage of space that has been already utilized in this disk.

Percent

Transfers

Indicates the rate at which data transfer is being initiated from this disk.

Ops/Sec

 

User reads

Indicates the rate at which data or metadata associated with user requests is being retrieved from this disk.

Ops/Sec

A consistent decrease in the value of this measure is indicative of a gradual slowdown in a user's ability to read from the disk. Compare the value of this measure across disks to know which disks service read requests slowly.

User writes

Indicates the rate at which data or metadata associated with user requests is being stored in this disk.

Ops/Sec

A consistent decrease in the value of this measure is indicative of a gradual slowdown in a user's ability to write to a disk. Compare the value of this measure across disks to know which disks are servicing write requests slowly.

User read latency

Indicates the time taken for retrieving data or metadata associated with user requests from this disk during the last measurement period.

Msecs

Very high values for these measures are indicative of the existence of road-blocks to rapid reading/writing by the Storage system. By observing the variations in these measures over time, you can understand whether the latencies are sporadic or consistent. Consistent delays in reading/writing could indicate that there are persistent bottlenecks (if any) in the Storage device to speedy I/O processing.

User write latency

Indicates the time taken for a write operation on this disk during the last measurement period.

Msecs

Disk busy

Indicates the percentage of time when there is at least one outstanding request (i.e., read or write) to this disk.

Percent

Comparing the percentage of time that the different disks are busy, an administrator can determine whether the application load is properly balanced across the different disks.