Storage Pools Test

A storage pool is a group of physical storage devices including PCIe SSD, SSD, and HDD devices for the cluster. The storage pool can span multiple Nutanix nodes and is expanded as the cluster scales. In most configurations, only a single storage pool is leveraged.

Since the VMs and nodes in a cluster rely heavily on the storage pools for their availability and overall performance, it is imperative that the storage pools be sized and tuned right. If not, the dependent VMs and nodes will experience serious performance setbacks ranging from a slowness to a standstill!

To determine whether/not a storage pool needs to be resized, an administrator must first know how much storage space is available to that pool, how this space has been utilized, what is the typical I/O load on the pool, and how well it processes this load. The Storage Pools test reports these statistics for each storage pool that is managed by the Nutanix Prism. With the help of this information, administrators can proactively detect a potential space contention, an I/O overload, and even processing latencies that may impact storage performance, and can initiate measures to avert them. Additionally, the test also measures and reports the effectiveness of the storage optimization methodologies that are applied currently.

Target of the test : A Nutanix Acropolis Prism

Agent deploying the test : A remote agent

Outputs of the test : One set of results for every storage pool managed by the Nutanix Acropolis Prism

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port - The port at which the specified host listens. By default, this is NULL.
  4. nutanix Prism user and nutanix prism password - To connect to the Nutanix Prism and collect metrics from it, the eG agent should be configured with the credentials of a Prism user with the Viewer role. The steps for creating such a user are detailed in the Pre-requisites for Monitoring Nutanix Prism topic.
  5. confirm password - Confirm the nutanix prism password by retyping it here.
  6. ssl - By default, the Nutanix Prism server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the Prism server via HTTPS by default.

  7. webport - By default, the Nutanix Prism server listens on port 9440. This implies that to monitor a Nutanix Prism server, the eG agent connects to the server via port 9440.

  8. DD FREQUENCY - Refers to the frequency with which detailed diagnosis measures are to be generated for this test. For a Nutanix Acropolis Prism server, this is set to 1:1 by default. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. It is recommended that you do not change the default setting of this parameter.

  9. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Physical disks

Indicates the number of disks pooled in this storage pool.

Number

The detailed diagnosis of this measure lists the ID of the disk, status of the disk and host name.

Disk physical usage

Indicates the total amount of physical storage space used by the disks in this storage pool.

GB

 

Storage capacity

Indicates the amount of space in the cluster that is available to this storage pool.

GB

Where there are multiple storage pools, you can compare the value of this measure across the pools to know which pool has been sized with the maximum storage space.

Storage used space

Indicates the total amount of physical storage space used in this storage pool.

GB

A consistent increase in the value of this measure is indicative of rapid usage of space in the pool, which could lead to a storage space contention.

Storage free space

Indicates the total amount of physical storage space that is unused in this pool.

GB

Ideally, the value of this measure should be high. A very low value for this measure could indicate that the pool is running short of storage resources and may require expansion.

Storage space usage

Indicates the percentage of physical storage space used in this storage pool.

Percent

A value close to 100% is a cause for concern as it indicates a probable contention for storage space on the pool. You may want to consider resizing the pool to make sure that VM operations continue uninterrupted.

Storage free space

Indicates the percentage of physical storage space that is unused in this storage pool.

Percent

A value less than 50% is a cause for concern as it indicates a probable contention for storage space on the pool. You may want to consider resizing the pool to make sure that VM operations continue uninterrupted.

Storage logical usage

Indicates the total amount of logical storage space used in this storage pool.

GB

 

Total I/O latency

Indicates the average I/O latency for physical disk requests in this storage pool.

Secs

Ideally, the value of this measure should be very low. A high value or a steady increase in this value could indicate an I/O processing bottleneck on the pool. In such a case, compare the value of the Read IO latency and Write IO latency measures to figure out where the slowness is worst - when processing read requests? or write requests?

Read IO latency

Indicates the average time taken by this storage pool to process read I/O requests.

Secs

If the Total I/O latency measure reports an abnormally high value, then compare the value of these measures to figure out where the slowness is maximum - when processing read requests? or write requests?

Write IO latency

Indicates the average time taken by this storage pool to process write I/O requests.

Secs

Total IO bandwidth

Indicates the bandwidth per second used by this storage pool when processing I/O requests.

KB/Sec

A high value for this measure denotes that the storage pool is processing bandwidth-intensive I/O. In such situations, you may want to compare the value of the Read IO bandwidth and Write IO bandwidth measures to know what type of I/O requests are truly contributing to the excessive bandwidth consumptions - read requests? or write requests?

Read IO bandwidth

Indicates the bandwidth per second used by this storage pool when processing read I/O requests.

KB/Sec

If the value of the Total IO bandwidth measure is high, then you may want to compare the value of the Read IO bandwidth and Write IO bandwidth measures to know what type of I/O requests are truly contributing to the excessive bandwidth consumption - read requests? or write requests?

 

Write IO bandwidth

Indicates the bandwidth per second used by this storage pool when processing write I/O requests.

GB

Total IOPS

Indicates the number of I/O operations performed currently on this storage pool.

Number

This measure is a good indicator of the level of I/O activity on the storage pool. A steady and significant increase in the value of this measure could indicate a potential I/O overload. In such situations, you may want to compare the value of the Read IOPS and Write IOPS measures of the storage pool to know what type of IO operations are contributing to the overload.

Read IOPS

Indicates the number of read I/O operations performed currently on this storage pool.

Number

 

If the value of the Total IOPS measure is unusually high, then compare the value of these measures for that storage pool to know what is contributing to the unusual I/O activity levels - read requests? or write requests?

Write IOPS

Indicates the number of write I/O operations performed currently on this storage pool.

Number

Total transformed usage

Indicates the amount of actual usage of storage (i.e., usage after compression and deduplication) in the storage pool.

GB

The Nutanix platform incorporates a wide range of storage optimization technologies that work in concert to make efficient use of available capacity for any workload. Compression and Deduplication are two such technologies.

Compression can be inline or offline. Inline compression will compress sequential streams of data or large I/O sizes (>64K) in memory before it is written to the Extent Store. Offline compression will initially write the data as normal (in an un-compressed state) and then leverage the Curator framework to compress the data cluster wide.

The Elastic Dedupe Engine in Nutanix allows for data deduplication in the capacity (Extent Store) and performance (Unified Cache) tiers. Streams of data are fingerprinted during ingest using a SHA-1 hash at a 16K granularity. This fingerprint is only done on data ingest and is then stored persistently as part of the written block’s metadata. For duplicate data that can be deduplicated in the capacity tier, the data does not need to be scanned or re-read, essentially duplicate copies can be removed.

The true effectiveness of these optimization methodologies can be measured by determining how much storage space in the pool these technologies helped save. By comparing the value of this measure with the value of the Storage usage measure of the pool, you should be able to make an accurate assessment of the effectiveness of these methodologies.