Storage Containers Test

A container is a logical segmentation of the Storage Pool and contains a group of VMs or files (vDisks). Containers typically have a 1 to 1 mapping with a datastore (in the case of NFS/SMB).

The containers need to be adequately sized to handle the load imposed by the VMs they are attached to. Latent containers and the ones with insufficient storage space will adversely impact the performance of the dependent VMs and also affect the overall storage performance of the Nutanix environment. This is why, it is important that administrators know which VMs are mapped to which containers, determine how much space each container is configured with, and also keep track of the current demand for space and processing power on every container, so that containers that are not configured to meet this demand can be proactively detected and resized (if required). This is where the Storage Containers test helps!

This test auto-discovers the containers managed by the Nutanix Prism Element , monitors the I/O load on and usage of each container , and precisely pinpoints those containers that are overloaded and under-sized. Additionally, the test also lists the VMs that are mapped to each container, so that you can quickly identify those VMs, the performance of which will be significantly affected owing to the problematic containers. Furthermore, the test also enables you to quickly review the capabilities that are turned on/off at the container-level, so that you can go back and make changes to the overall configuration of the container, if required.

Target of the test : A Nutanix AHV Prism Element

Agent deploying the test : A remote agent

Outputs of the test : One set of results for every container managed by the Nutanix AHV Prism Element

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed
Host	The host for which the test is to be configured.
Port	The port at which the specified host listens. By default, this is NULL
Nutanix Prism Element User, Nutanix Prism Element Password and Confirm Password	To connect to the Nutanix Prism Element and collect metrics from it, the eG agent should be configured with the credentials of a Prism Element user with the Viewer role. The steps for creating such a user are detailed in the Pre-requisites for Monitoring Nutanix Prism Element topic. Confirm the nutanix Prism Element password by retyping itin Confirm Password textbox.
SSL	By default, the Nutanix Prism Element server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the Prism Element server via HTTPS by default.
WebPort	By default, the Nutanix Prism Element server listens on port 9440. This implies that while monitoring a Nutanix AHV server via the Prism Element server, the eG agent connects to port 9440.
DD Frequency	Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Replication factor

Indicates the replication factor setting of this container.

Number

Each Nutanix container can be configured with a replication factor (RF) of two or three. RF=2 ensures that two copies of data are maintained at all times, allowing the cluster to survive the failure of one node or disk. Similarly, when RF is set to 3 (RF=3), three copies of the data are maintained in the cluster, providing resilience from two simultaneous failures. This level of flexibility allows administrators to dynamically configure data redundancy based on application SLAs and the criticality of the data set.

The value of this measure can be 2 or 3 depending upon the RF setting of the container.

Oplog replication factor

Indicates the replication factor setting for the Oplog of this container.

Number

The OpLog acts as a staging area to absorb incoming writes onto a low-latency SSD tier. Upon being written to the local OpLog, the data is synchronously replicated to another one or two Nutanix CVM’s OpLog, depending upon the replication factor (RF) setting of Oplog, before being acknowledged (Ack) as a successful write to the host. For instance, if the RF setting of Oplog is 2, then data will be synchronously replicated to one other Oplog. If RF is 2, then data will be replicated to two other Oplogs.

This ensures that the data exists in at least two or three independent locations and is fault tolerant. NOTE: For RF=3, a minimum of 5 nodes is required since metadata will be RF5.

The value of this measure can be 2 or 3, depending upon the RF setting for Oplog.

Compression enabled

Indicates whether/not compression is enabled for this container.

Number

The Nutanix Capacity Optimization Engine (COE) is responsible for performing data transformations to increase data efficiency on disk. Currently compression is one of the key features of the COE to perform data optimization. DSF provides both inline and offline flavors of compression to best suit the customer’s needs and type of data.

Inline compression will compress sequential streams of data or large I/O sizes (>64K) in memory before it is written to the Extent Store (SSD + HDD). This includes data draining from OpLog as well as sequential data skipping it.

Offline compression will initially write the data as normal (in an un-compressed state) and then leverage the Curator framework to compress the data cluster wide. When inline compression is enabled but the I/Os are random in nature, the data will be written un-compressed in the OpLog, coalesced, and then compressed in memory before being written to the Extent Store.

The Google Snappy compression library is leveraged which provides good compression ratios with minimal computational overhead and extremely fast compression / decompression rates.

This measure reports the value On if compression is enabled for this container, and the value Off if compression is disabled. The numeric values that correspond to these measure values are listed below:

Measure Value	Numeric Value
On	1
Off	0

Note:

By default, this measure reports the values listed in the Measure Value column to indicate whether/not compression is enabled for a container. In the graph of the measure however, the same will be represented using the numeric equivalents only.

On disk deduplication

Indicates whether/not on-disk deduplication is enabled for this container.

Deduplication is the process of eliminating duplicate data, so as to increase the effective capacity in the disk tier, and the system’s RAM and flash tiers.

Nutanix delivers two types of data deduplication to accelerate application performance and to optimize storage capacity. These are namely, Performance tier deduplication and Capacity tier / MapReduce / On-disk deduplication. On-disk deduplication, if enabled for a container, reduces repetitive data in the capacity tier to increase the effective storage capacity of a cluster. This type of deduplication is global and distributed across all nodes in the cluster, minimizing any performance overhead. MapReduce deduplication is particularly useful for virtual desktops with full clones.Performance-trier deduplication can be used without Capacity-tier deduplication but not the other way around.

This measure reports the value On if deduplication is enabled for this container, and the value Off if deduplication is disabled. The numeric values that correspond to these measure values are listed below:

Measure Value	Numeric Value
On	1
Off	0

Note:

By default, this measure reports the values listed in the Measure Value column to indicate whether/not on-disk deduplication is enabled for a container. In the graph of the measure however, the same will be represented using the numeric equivalents only.

Erasure coding

Indicates whether/not erasure coding is enabled for this container.

The Nutanix platform leverages a replication factor (RF) for data protection and availability. This method provides the highest degree of availability because it does not require reading from more than one storage location or data re-computation on failure. However, this does come at the cost of storage resources as full copies are required.

To provide a balance between availability while reducing the amount of storage required, DSF provides the ability to encode data using erasure codes (EC).

Similar to the concept of RAID (levels 4, 5, 6, etc.) where parity is calculated, EC encodes a strip of data blocks on different nodes and calculates parity. In the event of a host and/or disk failure, the parity can be leveraged to calculate any missing data blocks (decoding). In the case of DSF, the data block is an extent group and each data block must be on a different node and belong to a different vDisk.

The number of data and parity blocks in a strip is configurable based upon the desired failures to tolerate. The configuration is commonly referred to as the number of <data blocks>/<number of parity blocks>.

For example, “RF2 like” availability (e.g., N+1) could consist of 3 or 4 data blocks and 1 parity block in a strip (e.g., 3/1 or 4/1). “RF3 like” availability (e.g. N+2) could consist of 3 or 4 data blocks and 2 parity blocks in a strip (e.g. 3/2 or 4/2).

This measure reports the value On if erasure coding is enabled for this container, and the value Off if it is disabled. The numeric values that correspond to these measure values are listed below:

Measure Value	Numeric Value
On	1
Off	0

Note:

By default, this measure reports the values listed in the Measure Value column to indicate whether/not erasure coding is enabled for a container. In the graph of the measure however, the same will be represented using the numeric equivalents only.

Is marked for removal?

Indicates whether/not the container is marked for removal.

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value	Numeric Value
On	1
Off	0

Note:

By default, this measure reports the values listed in the Measure Value column to indicate whether/not the container is marked for removal. In the graph of the measure however, the same will be represented using the numeric equivalents only.

Attached VMs

Indicates the number of VMs that are attached to this container.

Number

Use the detailed diagnosis of this measure to know which VMs are attached to this container.

Total I/O latency

Indicates the average time taken by this container to process I/O requests.

Secs

Ideally, the value of this measure should be very low. A high value or a steady increase in this value could indicate an I/O processing bottleneck on the container. In such a case, compare the value of the Read IO latency and Write IO latency measures to figure out where the slowness is worst - when processing read requests? or write requests?

Read IO latency

Indicates the average time taken this container to process read I/O requests.

Secs

If the Total I/O latency measure reports an abnormally high value, then compare the value of these measures to figure out where the slowness is maximum - when processing read requests? or write requests?

Write IO latency

Indicates the average time taken by this container to process write I/O requests.

Secs

Total IO bandwidth

Indicates the bandwidth per second used by this container when processing I/O requests.

KB/Sec

A high value for this measure denotes that the container is processing bandwidth-intensive I/O. In such situations, you may want to compare the value of the Read IO bandwidth and Write IO bandwidth measures to know what type of I/O requests are truly contributing to the excessive bandwidth consumptions - read requests? or write requests?

Read IO bandwidth

Indicates the bandwidth per second used by this container when processing read I/O requests.

KB/Sec

If the value of the Total IO bandwidth measure is high, then you may want to compare the value of the Read IO bandwidth and Write IO bandwidth measures to know what type of I/O requests are truly contributing to the excessive bandwidth consumption - read requests? or write requests?

Write IO bandwidth

Indicates the bandwidth per second used by this container when processing write I/O requests.

Total IOPS

Indicates the number of I/O operations performed currently on this container.

Number

This measure is a good indicator of the level of I/O activity on the container. A steady and significant increase in the value of this measure could indicate a potential I/O overload. In such situations, you may want to compare the value of the Read IOPS and Write IOPS measures of the container to know what type of IO operations are contributing to the overload.

Read IOPS

Indicates the number of read I/O operations performed currently on this container.

Number

If the value of the Total IOPS measure is unusually high, then compare the value of these measures for that container to know what is contributing to the unusual I/O activity levels - read requests? or write requests?

Write IOPS

Indicates the number of write I/O operations performed currently on this container.

Number

Max capacity

Indicates the maximum capacity configured for this container .

The maximum capacity value reflects total available storage regardless of how many containers are defined. Therefore, when you have two containers, it can appear you have twice as much capacity because the field values for both containers show the full amount. This will normally match the storage pool size.

Used space

Indicates the amount of storage space in this container that is being used currently.

Ideally, the value of this measure should be low.

Free space

Indicates the amount of storage space in this container that is still unused.

Ideally, the value of this measure should be high.

Space usage

Indicates the percentage of storage space in this container that is in use.

Percent

A value close to 100% is a cause for concern, as it indicates that the container is rapidly running out of free space. You may want to consider allocating more space to the container to avoid loss of data.

Free space

Indicates the percentage of storage space in this container that is available for use.

Percent

A value less than 50% could be a cause for concern, as it indicates that the storage space in the container is being over-utilized. You may want to consider allocating more space to the container to avoid loss of data.

Reserved usage

Indicates the amount of space reserved for the use of this container.

If storage space is reserved for a container, then that container is guaranteed the availability of the reserved capacity. For instance, if the maximum usable capacity of a container is 40TB, but 10TB is its reserved capacity, then this 10TB of space will be available for the use of the container at any given point in time. Since containers are thin-provisioned and space is consumed on a first come first serve basis, there is no guarantee that storage will be available to this container once the reserved capacity of 10TB is consumed.

Logical usage

Indicates the amount of logical space used by this container.

Disk physical usage

Indicates the total amount of physical storage space used in the container.

Unreserved own usage

Indicates the amount of unreserved storage that can be used by the container

The value of this measure is the difference between the value of the Disk physical usage and Reserved usage measures.