NetApp Aggregates Test
To support the differing security, backup, performance, and data sharing needs of your users, you group the physical data storage resources on your storage system into one or more aggregates. These aggregates provide storage to the volume or volumes that they contain. Each aggregate has its own RAID configuration, plex structure, and set of assigned disks or array LUNs.
Periodically, you must monitor the state, I/O activity, processing power, and space usage of each of the aggregates configured on your storage system, so that probable space contentions and I/O overloads can be rapidly detected, and failed/inconsistent/busy aggregates can be easily identified. Also, to be able to accurately point to failed checksum storage, problematic RAID groups, or issues in plex resynchronization in an aggregate, the key components of each aggregate - such as, RAID groups, plex structures and checksum disks - should also be monitored from time to time. The NetApp Aggregates test provides all these performance insights. This test auto-discovers the aggregates configured on a storage system, and periodically reports the following:
- What is the current state of each aggregate?
- Which are the busy aggregates?
- Is any aggregate running short of storage space?
- Is I/O load uniformly distributed across all aggregates, or is any aggregate overloaded with read-write requests?
- What is the current status of the checksum storage in each aggregate?
- What is the current status of the plex structures in each aggregate?
- Are the RAID groups in an aggregate in a normal state?
- Did any aggregate experience issues during plex resynchronization?
Target of the test : A NetApp Unified Storage
Agent deploying the test : An external/remote agent
Outputs of the test : One set of results for each aggregate on the NetApp storage system being monitored.
Parameters | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Port |
Specify the port at which the specified host listens in the Port text box. By default, this is NULL. |
User |
Here, specify the name of the user who possesses the following privileges: login-http-admin,api-aggr-check-spare-low,api-aggr-list-info,api-aggr-mediascrub-list-info,api-aggr-scrub-list-info,api-cifs-status,api-clone-list-status,api-disk-list-info,api-fcp-adapter-list-info,api-fcp-adapter-stats-list-info,api-fcp-service-status,api-file-get-file-info,api-file-read-file,api-iscsi-connection-list-info,api-iscsi-initiator-list-info,api-iscsi-service-status,api-iscsi-session-list-info,api-iscsi-stats-list-info,api-lun-config-check-alua-conflicts-info,api-lun-config-check-cfmode-info,api-lun-config-check-info,api-lun-config-check-single-image-info,api-lun-list-info,api-nfs-status,api-perf-object-get-instances-iter*,api-perf-object-instance-list-info,api-quota-report-iter*,api-snapshot-list-info,api-vfiler-list-info,api-volume-list-info-iter*. If such a user does not pre-exist, then, you can create a special user for this purpose using the steps detailed in Creating a New User with the Privileges Required for Monitoring the NetApp Unified Storage. |
Password |
Specify the password that corresponds to the above-mentioned User. |
Confirm Password |
Confirm the Password by retyping it here. |
Authentication Mechanism |
In order to collect metrics from the NetApp Unified Storage system, the eG agent connects to the ONTAP management APIs over HTTP or HTTPS. By default, this connection is authenticated using the LOGIN_PASSWORD authentication mechanism. This is why, LOGIN_PASSWORD is displayed as the default authentication mechanism. |
Use SSL |
Set the Use SSL flag to Yes, if SSL (Secured Socket Layer) is to be used to connect to the NetApp Unified Storage System, and No if it is not. |
API Port |
By default, in most environments, NetApp Unified Storage system listens on port 80 (if not SSL-enabled) or on port 443 (if SSL-enabled) only. This implies that while monitoring the NetApp Unified Storage system, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of the NetApp Unified Storage system - i.e., if the NetApp Unified Storage system is not SSL-enabled (i.e., if the Use SSL flag above is set to No), then the eG agent connects to the NetApp Unified Storage system using port 80 by default, and if the NetApp Unified Storage system is SSL-enabled (i.e., if the Use SSL flag is set to Yes), then the agent-NetApp Unified Storage system communication occurs via port 443 by default. Accordingly, the API Port parameter is set to default by default. In some environments however, the default ports 80 or 443 might not apply. In such a case, against the API Port parameter, you can specify the exact port at which the NetApp Unified Storage system in your environment listens, so that the eG agent communicates with that port for collecting metrics from the NetApp Unified Storage system. |
vFilerName |
A vFiler is a virtual storage system you create using MultiStore, which enables you to partition the storage and network resources of a single storage system so that it appears as multiple storage systems on the network. If the NetApp Unified Storage system is partitioned to accommodate a set of vFilers, specify the name of the vFiler that you wish to monitor in the vFilerName text box. In some environments, the NetApp Unified Storage system may not be partitioned at all. In such a case, the NetApp Unified Storage system is monitored as a single vFiler and hence the default value of none is displayed in this text box. |
Timeout |
Specify the duration (in seconds) beyond which the test will timeout if no response is received from the device. The default is 120 seconds. |
Transfers Threshold |
You can set a threshold value for the rate at which the transfers are serviced by an aggregate. Specifying such a value in the Transfers Threshold text box implies that the aggregates violating this threshold value will be termed as Busy aggregates. The default value is 15 (Transfers/Sec). This parameter is deprecated in v5.6.5 (and above). |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation | |||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NetApp aggregates |
Indicates the number of busy aggregates in the storage system. |
Number |
This measure is applicable only to the Busy Aggregates descriptor. The detailed diagnosis capability of this measure, if enabled, lists out the name of the aggregate and the Transfer rate of each aggregate i.e., the rate at which data transfer is serviced by an aggregate. This measure is deprecated in v5.6.5 (and above). |
|||||||||||||||||||||||||||||||||
State |
Indicates the current state of this aggregate. |
|
The values that this measure can report and their corresponding numeric values have been listed in the table below. A brief description for each Measure Value is also provided:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current status of an aggregate. However, in the graph of this measure, states will be represented using the corresponding numeric equivalents i.e., 1 to 8. |
|||||||||||||||||||||||||||||||||
Is aggregate inconsistent? |
Indicates whether/not this aggregate is inconsistent. |
|
One of the reasons why an aggregate is marked as inconsistent or corrupted, is when the Lost write protection feature detects an issue. Lost write protection is a feature of Data ONTAP that occurs on each WAFL read. Data is checked against block checksum information (WAFL context) and RAID parity data. If an issue is detected, there are two possible outcomes:
If an aggregate is marked inconsistent, it will require the use of WAFL iron to be able to return the aggregate to a consistent state. This measure indicates a value of Yes if the aggregate is inconsistent and the value No if the aggregate is not inconsistent. The numeric values that correspond to the above-mentioned values are detailed in the table below:
Note: By default, this measure reports the above-mentioned Measure Values while indicating whether/not this aggregate is inconsistent. However, in the graph of this measure, the inconsistent state of an aggregate will be represented using the corresponding numeric equivalents i.e., 1 or 2. |
|||||||||||||||||||||||||||||||||
|
Mirror status: Indicates the current mirror status of this aggregate. |
|
The values that this measure can report and their corresponding numeric values have been listed in the table below. A brief description for a few Measure Values is also provided:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current mirror status of this aggregate in this storage system. However, in the graph of this measure, the mirror status will be represented using the corresponding numeric equivalents - i.e., 1 to 10. |
|||||||||||||||||||||||||||||||||
Is Raid state abnormal? |
Indicates whether/not the RAID of this aggregate is in an abnormal state currently.
|
|
This measure indicates a value of Yes if the RAID of this aggregate is in an abnormal state and the value No if the RAID of this aggregate is normal. The numeric values that correspond to the above-mentioned values are detailed in the table below:
Note: By default, this measure reports the above-mentioned Measure Values while indicating whether the RAID of this aggregate is in an abnormal state. However, in the graph of this measure, the RAID states will be represented using the corresponding numeric equivalents i.e., 1 or 2. |
|||||||||||||||||||||||||||||||||
Checksum status |
Indicates the current checksum status of this aggregate. |
|
The values that this measure can report and their corresponding numeric values have been listed in the table below.
Note: By default, this measure reports the above-mentioned Measure Values while indicating the current checksum status of this aggregate. However, the graph of this measure will be represented using the corresponding numeric equivalents i.e., 1 to 10. |
|||||||||||||||||||||||||||||||||
Are plexes offline? |
Indicates whether/not the plexes in this aggregate are currently offline. |
|
A plex is a collection of one or more RAID groups that together provide the storage for one or more WAFL file system volumes. Data ONTAP uses plexes as the unit of RAID-level mirroring when the SyncMirror feature is enabled. All RAID groups in one plex are of the same level, but may have a different number of disks. This measure reports the value Yes if the plexes in this aggregate are currently offline and the value No if the plexes are not offline. The numeric values that correspond to the above-mentioned values are detailed in the table below:
Note: By default, this measure reports the above-mentioned Measure Values while indicating whether the plexes in this aggregate are currently offline or not. However, in the graph of this measure, the state of the plexes will be represented using the corresponding numeric equivalents i.e., 1 or 2. |
|||||||||||||||||||||||||||||||||
Are plexes resyncing? |
Indicates whether/not the plexes of this aggregate are currently being resynchronized. |
|
Plex resynchronization is a process that ensures two plexes of a mirrored aggregate have exactly the same data. When plexes are unsynchronized, one plex contains data that is more up to date than that of the other plex. Plex resynchronization updates the out-of-date plex so that both plexes are identical. Data ONTAP resynchronizes the two plexes of a mirrored aggregate if one of the following situations occurs:
This measure reports the value Yes if the plexes in this aggregate are currently resyncing and the value No if the plexes are not resyncing. The numeric values that correspond to the above-mentioned values are detailed in the table below:
Note: By default, this measure reports the above-mentioned Measure Values while indicating whether the plexes in this aggregate are currently offline or not. However, in the graph of this measure, the state of the plexes will be represented using the corresponding numeric equivalents i.e., 1 or 2. |
|||||||||||||||||||||||||||||||||
Total size |
Indicates the total usable size of this aggregate. |
MB |
The size of this aggregate excludes the WAFL reserve and the aggregate snapshot reserve. This measure will report a value of 0 if the aggregate is restricted or offline. |
|||||||||||||||||||||||||||||||||
Aggregate used size |
Indicates the amount of space that is currently used in this aggregate. |
MB |
This measure will report a value 0 if the aggregate is not usable i.e., offline. |
|||||||||||||||||||||||||||||||||
Percentage size used |
Indicates the percentage of space that is currently used in this aggregate. |
Percent |
A value close to 100% is an indication of space constraint in the aggregate. |
|||||||||||||||||||||||||||||||||
Total files |
Indicates the total number of files in this aggregate. |
Number |
|
|||||||||||||||||||||||||||||||||
Used files |
Indicates the total number of files that are currently stored in this aggregate. |
Number |
|
|||||||||||||||||||||||||||||||||
Transfers |
Indicates the rate at which the transfers are serviced by this aggregate. |
Ops/Sec |
Compare the value of this measure across aggregates to identify the busy aggregates. |
|||||||||||||||||||||||||||||||||
User reads |
Indicates the rate at which the read request from the user is serviced by this aggregate. |
Ops/Sec |
A consistent decrease in the value of this measure could indicate a bottleneck when processing read requests. Compare the value of this measure across aggregates to know which aggregates service read requests slowly. |
|||||||||||||||||||||||||||||||||
User writes |
Indicates the rate at which the write request from the user is serviced in this aggregate. |
Ops/Sec |
A consistent decrease in the value of this measure could indicate a bottleneck when processing write requests. Compare the value of this measure across aggregates to know which aggregates are servicing write requests slowly. |
|||||||||||||||||||||||||||||||||
CP reads |
Indicates the rate at which the read request from the user is serviced during a Consistency Point (CP) operation in this aggregate. |
Ops/Sec |
A consistent decrease in the value of this measure could indicate that CP operations are slowing down the processing of read requests. |
|||||||||||||||||||||||||||||||||
Block read rate |
Indicates the rate at which the blocks are read from this aggregate upon a user request. |
Ops/Sec |
A consistent decrease in the value of this measure could indicate a bottleneck when processing read requests. Compare the value of this measure across aggregates to know which aggregates service block read requests slowly. |
|||||||||||||||||||||||||||||||||
Block write rate |
Indicates the rate at which the blocks are written to this aggregate upon a user request. |
Ops/Sec |
A consistent decrease in the value of this measure could indicate a bottleneck when processing write requests. Compare the value of this measure across aggregates to know which aggregates are servicing block write requests slowly. |
|||||||||||||||||||||||||||||||||
Block read rate during CP |
Indicates the rate at which the blocks are read from this aggregate during a Consistency point (CP) operation. |
Ops/Sec |
A consistent decrease in the value of this measure could indicate that CP operations are slowing down the processing of read requests. |