NetApp Volume Details Test

Volumes contain file systems that hold user data that is accessible using one or more of the access protocols supported by Data ONTAP, including NFS, CIFS, HTTP, FTP, FC, and iSCSI.

For users to be able to read from/write data into volumes quickly, adequate space must be available in the volumes and the I/O requests should be processed rapidly by the volumes. Slowdowns in data storage/retrieval can be attributed to storage space contentions experienced by one/more volumes or I/O processing bottlenecks. In the event of such slowdowns, administrators need to swiftly isolate the following:

  • Which volumes are over-utilized?
  • Which volumes are overloaded?
  • Which volumes are experiencing serious latencies?
  • When were these latencies observed most frequently – while reading or writing?
  • What  type of operations registered the maximum latency – CIFS, NFS, or iSCSI?

The NetApp Volume Details test provides accurate answers to these questions. With the help of these answers, you can quickly diagnose the root-cause of slowdowns when reading from/writing into a volume.

Target of the test : A NetApp Unified Storage

Agent deploying the test : An external/remote agent

Outputs of the test : One set of results for each volume on the NetApp storage system being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

Specify the port at which the specified host listens in the Port text box. By default, this is NULL.

User

Here, specify the name of the user who possesses the following privileges:

login-http-admin,api-aggr-check-spare-low,api-aggr-list-info,api-aggr-mediascrub-list-info,api-aggr-scrub-list-info,api-cifs-status,api-clone-list-status,api-disk-list-info,api-fcp-adapter-list-info,api-fcp-adapter-stats-list-info,api-fcp-service-status,api-file-get-file-info,api-file-read-file,api-iscsi-connection-list-info,api-iscsi-initiator-list-info,api-iscsi-service-status,api-iscsi-session-list-info,api-iscsi-stats-list-info,api-lun-config-check-alua-conflicts-info,api-lun-config-check-cfmode-info,api-lun-config-check-info,api-lun-config-check-single-image-info,api-lun-list-info,api-nfs-status,api-perf-object-get-instances-iter*,api-perf-object-instance-list-info,api-quota-report-iter*,api-snapshot-list-info,api-vfiler-list-info,api-volume-list-info-iter*.

If such a user does not pre-exist, then, you can create a special user for this purpose using the steps detailed in Creating a New User with the Privileges Required for Monitoring the NetApp Unified Storage.

Password

Specify the password that corresponds to the above-mentioned User.

Confirm Password

Confirm the Password by retyping it here.

Authentication Mechanism

In order to collect metrics from the NetApp Unified Storage system, the eG agent connects to the ONTAP management APIs over HTTP or HTTPS. By default, this connection is authenticated using the LOGIN_PASSWORD authentication mechanism. This is why, LOGIN_PASSWORD is displayed as the default authentication mechanism.

Use SSL

Set the Use SSL flag to Yes, if SSL (Secured Socket Layer) is to be used to connect to the NetApp Unified Storage System, and No if it is not.

API Port

By default, in most environments, NetApp Unified Storage system listens on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled) only. This implies that while monitoring the NetApp Unified Storage system, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of the NetApp Unified Storage system - i.e., if the NetApp Unified Storage system is not SSL-enabled (i.e., if the Use SSL flag above is set to No), then the eG agent connects to the NetApp Unified Storage system using port 80 by default, and if the NetApp Unified Storage system is SSL-enabled (i.e., if the Use SSL flag is set to Yes), then the agent-NetApp Unified Storage system communication occurs via port 443 by default. Accordingly, the API Port parameter is set to default by default.

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the API Port parameter, you can specify the exact port at which the NetApp Unified Storage system in your environment listens, so that the eG agent communicates with that port for collecting metrics from the NetApp Unified Storage system.

vFilerName

A vFiler is a virtual storage system you create using MultiStore, which enables you to partition the storage and network resources of a single storage system so that it appears as multiple storage systems on the network. If the NetApp Unified Storage system is partitioned to accommodate a set of vFilers, specify the name of the vFiler that you wish to monitor in the vFilerName text box. In some environments, the NetApp Unified Storage system may not be partitioned at all. In such a case, the NetApp Unified Storage system is monitored as a single vFiler and hence the default value of none is displayed in this text box.

Timeout

Specify the duration (in seconds) beyond which the test will timeout if no response is received from the device. The default is 120 seconds.

Used Percentage Threshold

This test not only reports a set of metrics for each volume on the storage device, but also reports metrics for the following descriptors: Busy volumes, Slow volumes, and Highly utilized volumes. By default, the Highly utilized volumes descriptor will report metrics for those volumes in which over 80% of space has already been utilized. This is why, the Used Percentage Threshold is set to 80 by default. You can change this threshold by specifying a different percentage value against Used Percentage Threshold. This parameter is deprecated in v5.6.5 (and above).

Operations Threshold

This test not only reports a set of metrics for each volume on the storage device, but also reports metrics for the following descriptors: Busy volumes, Slow volumes, and Highly utilized volumes. The Operations Threshold value (in operations/sec) you set  determines which volumes will be counted as Busy volumes by this test. Typically, if the rate of operations to a volume exceeds the rate specified against Operations Threshold, then the test will consider such a volume to be a Busy volume. This parameter is deprecated in v5.6.5 (and above).

Avg Latency Threshold

This test not only reports a set of metrics for each volume on the storage device, but also reports metrics for the following descriptors: Busy volumes, Slow volumes, and Highly utilized volumes. The avg latency threshold value (in milliseconds) you set determines which volumes will be counted as Slow volumes by this test. Typically, if the latency registered by a volume falls exceeds the Avg Latency Threshold you specify, then the test will consider such a volume to be a Slow volume. This parameter is deprecated in v5.6.5 (and above).

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Number of volumes

Indicates the number of volumes that are currently highly utilized/slow/busy.

Number

  1. This measure appears only for the Highly utilized, Slow and Busy volumes. In the case of Highly utilized volumes, the detailed diagnosis of this measure if enabled, lists the names of the highly utilized volumes and the percentage of space that is utilized in each volume.
  2. In the case of Slow volumes, the detailed diagnosis of this measure if enabled, lists the names of the slow volumes and the average latency i.e., the time taken to perform read/write operations on each volume.
  3. In the case of Busy volumes, the detailed diagnosis of this measure if enabled, lists the names of the busy volumes and the rate at which operations were performed on each volume.
  4. With the help of the detailed diagnosis information therefore, you can quickly identify the highly utilized, slow, and busy volumes.

This measure is deprecated in v5.6.5 (and above).

State

Indicates the current state of this volume.

 

The values that this measure can report and their corresponding numeric equivalents are shown in the table below:

 

Measure Value Numeric Value
Online 0
Creating 1
Restricted 2
Offline 3
Partial 4
Unknown 5
Failed 6

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the current state of a volume. However, in the graph of this measure, states will be represented using the corresponding numeric equivalents only.

Is volume in error?

Indicates whether/not this volume is error-prone.

 

Generally, errors may be caused when the volume is inconsistent, unrecoverable or invalid. A volume is considered to be inconsistent if there exists known inconsistencies in the associated file system. An increase in the inconsistencies will render the volume unrecoverable. Unrecoverable volumes cannot be accessed. If mirroring has been enabled, Data ONTAP will automatically access the mirrored data of the unrecoverable volume. A volume is said to be invalid if a vol-copy or SNMPmirror initial transfer has been aborted. Such invalid volumes are generally partially created and cannot be recovered fully. Operation errors are taken into account if this volume is a Single Instance Storage (SIS) volume.

This measure reports the value Yes if a volume is error-prone and the value No if it is error-free.

The numeric values that correspond to the above-mentioned values are represented in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the above-mentioned Measure Values while indicating whether/not this volume is error-prone. However, in the graph of this measure the same will be represented using the corresponding numeric equivalents only.

The detailed diagnosis capability of this measure, if enabled, lists the type of the error. In the case of an SIS operation error, the actual SIS error message will also be displayed as part of the detailed diagnosis.

This measure is applicable only to individual volumes.

Used space percentage

Indicates the percentage of space that is utilized in this volume.

Percent

Ideally, the value of this measure should be low. A high value or a consistent increase in the value of this measure is indicative of excessive space usage in a volume. 

This measure will be 0 for restricted and offline volumes.

Total size

Indicates the total size of this volume.

MB

The value of this measure will not include the WAFL reserve and the volume snapshot reserve.

This measure will be 0 for restricted and offline volumes.

Reserve space

Indicates the space that is reserved for overwriting snapshotted data in this volume.

MB

This space can be utilized only by space reserved LUNs and files and only when the volume is full.

This measure will be 0 for restricted and offline volumes.

Actual reserved space used

Indicates the percentage of reserved space that is actually used by this volume.

Percent

A low value is desired for this measure.

This measure will be 0 for restricted and offline volumes.

Files used percentage

Indicates the percentage of inodes i.e., files that are currently utilized in this volume.

Percent

A high value indicates that the inodes in the volume may get exhausted soon.

This measure will be 0 for restricted and offline volumes.

Total operations

Indicates the rate at which operations (including read and write) were performed on this volume.

Ops/Sec

This measure is a good indicator of how busy the volume is.

Comparing the value of this measure across volumes will enable you to quickly detect load-balancing irregularities (if any).

Write operations

Indicates the rate at which write operations were performed on this volume.

Ops/Sec

 

Read operations

Indicates the rate at which read operations were performed from this volume.

Ops/Sec

 

Avg latency

Indicates the average time taken by the WAFL filesystem to process all the operations performed on this volume

Microseconds

The value of this measure excludes the request processing time and the network communication time of the volume.

A high value of this measure is a cause for concern, as it indicates a processing bottleneck.

Read latency

Indicates the average time taken by the WAFL filesystem to process the read requests of this volume.

Microseconds

The value of these measures exclude the request processing time and the network communication time of the volume.

If the Avg latency of a volume is high, then you can compare the value of these measures for that volume to know when the latency occurred – while reading or writing?

Write latency

Indicates the average time taken by the WAFL filesystem to process the write requests made to this volume.

Microseconds

Read data

Indicates the rate at which data bytes were read from this volume.

Bytes/Sec

 

Write data

Indicates the rate at which data bytes were written to this volume.

Bytes/Sec

 

CIFS operations

Indicates the rate at which the CIFS operations were performed on this volume.

Ops/Sec

This measure is inclusive of all the CIFS operations i.e., read, write and other miscellaneous CIFS operations.

By comparing the value of this measure with that of the NFS operations and SAN operations measures for a volume, you can figure out which type of operation imposed the maximum load on that volume.

NFS operations

Indicates the rate at which the NFS operations were performed on this volume.

Ops/Sec

This measure is inclusive of all the NFS operations i.e., read, write and other miscellaneous NFS operations.

By comparing the value of this measure with that of the CIFS operations and SAN operations measures for a volume, you can figure out which type of operation imposed the maximum load on that volume.

SAN operations

Indicates the rate at which the SAN operations were performed on this volume.

Ops/Sec

This measure is inclusive of all the SAN operations i.e., read, write and other miscellaneous SAN operations.

By comparing the value of this measure with that of the CIFS operations and NFS operations measures for a volume, you can figure out which type of operation imposed the maximum load on that volume.

CIFS latency

Indicates the average time taken for performing the CIF operations (including read, write and other miscellaneous CIF operations) on this volume.

Microseconds

The value of these measures exclude the request processing time and the network communication time of the volume.

Ideally, the value of these measure should be low. If the Avg latency of a volume is very high, then, you can compare the value of these measures for that volume to determine the reason for the latency – is it because of processing bottlenecks experienced by CIFS operations? NFS operations? Or SAN operations?

 

 

NFS latency

Indicates the average time taken for performing the NFS operations (including read, write and other miscellaneous NFS operations) on this volume.

Microseconds

SAN latency

Indicates the average time taken for performing the block protocol operations (including read, write and other miscellaneous block protocols operations) on this volume.