NetApp System Components Test

This test periodically monitors the processors, spare disks, Vfilers, and the DMA channels used by the storage system, and proactively alerts you to abnormalities such as the following:

  • Excessive CPU usage by the storage system;
  • Over-utilization of processors supported by the storage system;
  • Write latencies experienced by the NVRAM DMA transactions;
  • Unavailability of spare disks;

Target of the test : A NetApp Unified Storage

Agent deploying the test : An external/remote agent

Outputs of the test : One set of results for the NetApp storage system being monitored.

Configurable parameters for the test
Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

Specify the port at which the specified host listens in the Port text box. By default, this is NULL.

User

Here, specify the name of the user who possesses the following privileges:

login-http-admin,api-aggr-check-spare-low,api-aggr-list-info,api-aggr-mediascrub-list-info,api-aggr-scrub-list-info,api-cifs-status,api-clone-list-status,api-disk-list-info,api-fcp-adapter-list-info,api-fcp-adapter-stats-list-info,api-fcp-service-status,api-file-get-file-info,api-file-read-file,api-iscsi-connection-list-info,api-iscsi-initiator-list-info,api-iscsi-service-status,api-iscsi-session-list-info,api-iscsi-stats-list-info,api-lun-config-check-alua-conflicts-info,api-lun-config-check-cfmode-info,api-lun-config-check-info,api-lun-config-check-single-image-info,api-lun-list-info,api-nfs-status,api-perf-object-get-instances-iter*,api-perf-object-instance-list-info,api-quota-report-iter*,api-snapshot-list-info,api-vfiler-list-info,api-volume-list-info-iter*.

If such a user does not pre-exist, then, you can create a special user for this purpose using the steps detailed in Creating a New User with the Privileges Required for Monitoring the NetApp Unified Storage.

Password

Specify the password that corresponds to the above-mentioned User.

Confirm Password

Confirm the Password by retyping it here.

Authentication Mechanism

In order to collect metrics from the NetApp Unified Storage system, the eG agent connects to the ONTAP management APIs over HTTP or HTTPS. By default, this connection is authenticated using the LOGIN_PASSWORD authentication mechanism. This is why, LOGIN_PASSWORD is displayed as the default authentication mechanism.

Use SSL

Set the Use SSL flag to Yes, if SSL (Secured Socket Layer) is to be used to connect to the NetApp Unified Storage System, and No if it is not.

API Port

By default, in most environments, NetApp Unified Storage system listens on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled) only. This implies that while monitoring the NetApp Unified Storage system, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of the NetApp Unified Storage system - i.e., if the NetApp Unified Storage system is not SSL-enabled (i.e., if the Use SSL flag above is set to No), then the eG agent connects to the NetApp Unified Storage system using port 80 by default, and if the NetApp Unified Storage system is SSL-enabled (i.e., if the Use SSL flag is set to Yes), then the agent-NetApp Unified Storage system communication occurs via port 443 by default. Accordingly, the API Port parameter is set to default by default.

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the API Port parameter, you can specify the exact port at which the NetApp Unified Storage system in your environment listens, so that the eG agent communicates with that port for collecting metrics from the NetApp Unified Storage system.

vFilerName

A vFiler is a virtual storage system you create using MultiStore, which enables you to partition the storage and network resources of a single storage system so that it appears as multiple storage systems on the network. If the NetApp Unified Storage system is partitioned to accommodate a set of vFilers, specify the name of the vFiler that you wish to monitor in the vFilerName text box. In some environments, the NetApp Unified Storage system may not be partitioned at all. In such a case, the NetApp Unified Storage system is monitored as a single vFiler and hence the default value of none is displayed in this text box.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

CPU busy

Indicates the percentage of time for which the CPU time was busy performing system-level processing.

Percent

A high value indicates that the storage system is utilizing CPU resources excessively. A consistent increase in this value could indicate a potential CPU contention on the storage system.

Avg processor busy

Indicates what percentage of time, on an average, a processor is busy processing requests.

Percent

A high value indicates that processors have been over-utilized in more than one instance. This is a cause for concern, as it reveals load-balancing irregularities and the need for additional processors to handle the load.

Total processor busy

Indicates the total percentage of time all the processors were actively serving requests.

Percent

A high value indicates that processors have been over-utilized in more than one instance. This is a cause for concern, as it reveals load-balancing irregularities and the need for additional processors to handle the load.

NVRAM DMA write latency

Indicates the NVRAM DMA wait time per transaction in this storage system.

Milliseconds

When CP (consistency point) is triggered, Data ONTAP reads the journal of write requests from the NVRAM, and uses DMA (Direct Memory Access) to update the disk with the data. Direct memory access (DMA) is a feature that allows hardware subsystems to access system memory independently of the central processing unit (CPU).

Any latencies experienced by the DMA channel can slowdown writes to the disk, consequently degrading the storage system's write performance. This is why, a low value is desired for this measure.

NVRAM DMA transaction rate

Indicates the rate at which NVRAM DMA transactions were performed in this storage system.

Ops/sec

A consistent decrease in the value of this measure could indicate latencies. Any latencies experienced by the DMA channel can slowdown writes to the disk, consequently degrading the storage system's write performance.

Are sufficient spare disks available?

Indicates whether/not sufficient spare disks are available.

 

A hot spare disk is a disk that is assigned to a storage system but is not in use by a RAID group. It does not yet hold data but is ready for use. If a disk failure occurs within a RAID group, Data ONTAP automatically assigns hot spare disks to RAID groups to replace the failed disks.

At a minimum, you should have at least one matching or appropriate hot spare available for each kind of disk installed in your storage system. However, having two available hot spares for all disks provides the best protection against disk failure.

This measure indicates the value Yes if sufficient spare disks are available, and the value No if no spare disk are available in the storage system.

The numeric values that correspond to the above-mentioned measure values are as follows:

Measure Value Numeric Value
Yes 1
No 0

By default, Data ONTAP issues warnings to the console and logs if you have fewer than one hot spare disk that matches the attributes of each disk in your storage system. You can change the threshold value for these warning messages by using the raid.min_spare_countoption.

To make sure that you always have two hot spares for every disk (a best practice), you can set the raid.min_spare_countoption to 2.

Setting the raid.min_spare_countoption to 0 disables low spare warnings. You might want to do this if you do not have enough disks to provide hot spares (for example if your storage system does not support external disk shelves). You can disable the warnings only if the following requirements are met:

  • Your system has 16 or fewer disks.
  • You have no RAID groups that use RAID4.

Note:

By default, this measure reports the above-mentioned Measure Values while indicating whether sufficient spare disks are available in this storage system. However, in the graph of this measure, spare disk availability will be represented using the corresponding numeric equivalents i.e., 0 or 1.

Num offline/inconsistent vFiler resources

Indicates the number of offline/inconsistent storage resources available across all vFilers in this storage system.

Number

MultiStore is also known as vFiler. A Unified Storage System's storage space could be divided into vFiler units. Each vFiler unit is run by a separate administrator, and is available on a separate network interface. One vFiler cannot view the storage space owned by other vFiler units (except for the special vFiler units “vFiler zero”, which is the actual physical machine).