Cluster Shared Volumes Test

A CSV is a disk or pool of disks which is accessible by each node in a Hyper-V cluster as if it were a logical disk on the system. Each node in the cluster willl be able to connect to the CSV simultaneously. This allows you to have a common storage location for the VM disk and machine configuration which can be passed to another node in the event of a node failure, without the need for manually mounting a volume or copying files.

To use CSV, a Hyper-V VM is configured and the associated virtual hard disk(s) are created on or copied to a CSV disk. Multiple VHDs can be placed on a CSV that in turn are associated with multiple VMs which can be running on different nodes in the cluster.

Since multiple VMs access a CSV simultaneously, the high availability of the CSV is crucial to the high uptime of the VMs. Administrators should hence be able to promptly detect the unavailability of a CSV, identify the VMs that will be impacted by the same, and initiate measures to bring the CSV back up before it causes any permanent damage to VM operations. Also, the I/O load on the CSV is bound to increase with the count of VMs sharing it! For maximizing CSV and VM performance, administrators should make sure that I/O load is always evenly distributed across the CSVs. To keep an eye on the state of and I/O load on each  CSV, and to instantly identify unavailable and/or overloaded CSVs, administrators can use the Cluster Shared Volumes test.

This test auto-discovers the CSVs, and for each CSV, reports its current state. In the process, the test promptly alerts you if a CSV goes down! Additionally, the test closely monitors the I/O load on each CSV, measures the rate at which every CSV processes the load, and thus points to those CSVs that are overloaded or are experiencing processing bottlenecks.

Note:

This test is only applicable to Microsoft Hyper-V servers running Windows 2012 (or above).

Target of the test : A Hyper-V / Hyper-V VDI server running Windows 2012

Agent executing the test : An internal agent

Output of the test : One set of results will be reported for every CSV on the server

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
Measurements reported by the test
Measurement Description Measurement Unit Interpretation

Volume state

Indicates the current state of this CSV.

 

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Description Numeric Value
Active In this state all I/O are proceeding as normal. 100
Paused In this state volume will pause any new I/O and down-level state is cleaned. 60

Initializing

In this state all files are invalidated and all IOs except volume IOs are failing 50

Draining

In this state volume will pause any new I/O, but down-level files are still opened and some down-level IOs might be still in process. 30

Down

In this state volume will pause any new I/O. The down-level state is already reapplied. 10

Note:

By default, this test reports the Measure Values displayed in the table above to indicate CSV state. In the graph of this measure however, the state is indicated using the numeric equivalents only.

Direct read throughput

Indicates the rate at which this CSV reads data from the disk in the Direct I/O Mode or in the Block Level Redirected I/O Mode.

Kbps

These measures include both Direct I/O and Block Level Redirected I/O. In Direct Mode, I/O operations from the application on the cluster node can be sent directly to the storage. It therefore, bypasses the NTFS or ReFS volume stack. In Block level redirected Mode, I/O passes through the local CSVFS proxy file system stack and is written directly to Disk.sys on the coordinator node. As a result it avoids traversing the NTFS/ReFS file system stack twice.

The technologies that let CSV-enabled volumes operate require one cluster node that's responsible for the coordination of file access. This cluster node is called the coordinator node, with each individual LUN having its own coordinator node.

If the node being monitored is a co-ordinator node, then these measures include the following:

  • the rate at which this CSV reads/writes (as the case may be) data directly to the storage, in the Direct I/O Mode.
  • the rate at which this CSV reads/writes I/O redirected by all slave nodes in the cluster directly to the storage, in the Block Level Redirected I/O Mode.

If the node being monitored is a non-coordinator node, then these measures include the following:

  • the rate at which this CSV reads/writes (as the case may be) data directly to the storage, in the Direct I/O Mode.
  • the rate at which this CSV reads/writes (as the case may be) I/O to the disk by redirecting the I/O to the coordinator node, in the Block Level Redirected I/O Mode.

Direct write throughput

Indicates the rate at which this CSV writes data to the disk in the Direct I/O Mode or in the Block Level Redirected I/O Mode.

Kbps

Total direct throughput

Indicates the rate at which this CSV reads data from and writes data to the disk in the Direct I/O Mode or in the Block Level Redirected I/O Mode.

Kbps

This is a good indicator of the level of direct I/O activity on a CSV. By comparing the value of this measure across CSVs, you can figure out which CSV is experiencing maximum direct traffic. If this max value is abnormally high for that CSV, you may want to investigate the reasons for the same.

Redirected read throughput

If the node being monitored is a co-ordinator node, then this measure indicates the rate at which this CSV reads data from the physical disk via NTFS, in the File System Redirected Mode. If the node being monitored is a non-coordinator node, then this measure indicates the rate at which this CSV readsdata from the disk by redirecting the I/O to the co-ordinator node via SMB, in the File System Redirected Mode.

Kbps

The technologies that let CSV-enabled volumes operate require one cluster node that's responsible for the coordination of file access. This cluster node is called the coordinator node, with each individual LUN having its own coordinator node.

That node can be any of your cluster hosts, with each host having an equal chance of being given the job. While this responsibility doesn't come into play often—typically, Hyper-V interacts with its disk files directly, not necessarily through a coordinator node—it's important for certain types of actions. One of those actions is copying VHD files to a LUN. Hyper-V transparently redirects the file copy through the coordinator node.

I/O redirection can also occur if slave nodes in a cluster are unable to access the disk directly. In this case, the slave nodes will redirect the I/O to the co-ordinator node via the SMB Client protocol. The coordinator node then processes the redirected I/I/O it receives using the SMB Server protocol . This redirection is performed in the File System Redirected Mode only. In File System Redirected Mode, I/O on a cluster node is redirected at the top of the CSV pseudo-file system stack over SMB to the disk. This traffic is written to the disk via the NTFS or ReFS file system stack on the coordinator node.

From this, we can conclude that for a CSV attached to a co-ordinator node, the value of the Redirected read throughput measure will represent the rate at which the read I/Os redirected by all slave nodes in the cluster are received and processed by this CSV in the File System Redirected Mode. For a CSV on a slave/non-coordinator node, the value of this measure will indicate the rate at which that CSV redirected the read I/Os to the coordinator node and read data from the disk. In case of a slave node, the value of this measure will also include the rate at which VHD files are read from that CSV to be written/copied to a CSV on the coordinator node.

The value of the Redirected write throughput measure for a CSV attached to a coordinator node will include:

  • the rate at which the write I/Os redirected by all slave nodes in the cluster are received and processed by this CSV in the File System Redirected Mode.
  • the rate at which the VHD files are copied to the LUN;

For a slave/non-coordinator node on the other hand, the value of the Redirected write throughput measure will represent only the rate at which that CSV redirects write I/Os to the coordinator node and writes data to the disk, in the File System Redirected Mode.

Redirected write throughput

If the node being monitored is a co-ordinator node, then this measure indicates the rate at which this CSV writes data to the physical disk via NTFS, in the File System Redirected Mode. If the node being monitored is a non-coordinator node, then this measure indicates the rate at which this CSV writes data to the disk by redirecting the I/O to the co-ordinator node via SMB, in the File System Redirected Mode.

Kbps

Redirected total throughput

If the node being monitored is a co-ordinator node, then this measure indicates the rate at which this CSV writes data to the physical disk via NTFS, in the File System Redirected Mode. If the node being monitored is a non-coordinator node, then this measure indicates the rate at which this CSV writes data to the disk by redirecting the I/O to the co-ordinator node via SMB, in theFile System Redirected Mode.

Kbps

This is the sum of the values of the Redirected read throughput and Redirected write throughput measures.

This is a good indicator of the level of redireced I/O activity on a CSV. By comparing the value of this measure across CSVs, you can figure out which CSV is experiencing maximum redirected traffic. If this max value is abnormally high for that CSV, you may want to investigate the reasons for the same.

Read throughput

Indicates the rate at which data was read by this CSV, both directly and via redirection - i.e.,in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Kbps

This is the sum of the values of the Direct read throughput and Redirected read throughput measures.

Write throughput

Indicates the rate at which data was written by this CSV, both directly and via redirection - i.e.,in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Kbps

This is the sum of the values of the Direct write throughput and Redirected write throughput measures.

Throughput

Indicates the rate at which data was read and written by this CSV, both directly and via redirection - i.e.,in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Kbps

This is the sum of the values of the Read throughput and Write throughput measures.

This is a good indicator of the level of I/O activity on a CSV. By comparing the value of this measure across CSVs, you can figure out which CSV is experiencing maximum traffic. If this max value is abnormally high for that CSV, you may want to investigate the reasons for the same.

Direct read rate

Indicates the rate at which this CSV performs disk reads in the Direct I/O Mode or in the Block Level Redirected I/O Mode.

Reads/Sec

These measures include both Direct I/O and Block Level Redirected I/O. In Direct Mode, I/O operations from the application on the cluster node can be sent directly to the storage. It therefore, bypasses the NTFS or ReFS volume stack. In Block level redirected Mode, I/O passes through the local CSVFS proxy file system stack and is written directly to Disk.sys on the coordinator node. As a result it avoids traversing the NTFS/ReFS file system stack twice.

If the node being monitored is a co-ordinator node, then these measures will include the following:

  • the rate at which this CSV performs reads/writes (as the case may be) directly on the storage, in the Direct I/O Mode.
  • the rate at which this CSV services read/write (as the case may be) requests redirected to it by all slave nodes in the cluster, in the Block Level Redirected I/O Mode.

If the node being monitored is a non-coordinator node, then these measures will include the following:

  • the rate at which this CSV performs read/write (as the case may be) operations directly on the storage, in the Direct I/O Mode.
  • the rate at which this CSV performs read/write (as the case may be) operations on the storage by redirecting read/write requests to the coordinator node, in the Block Level Redirected I/O Mode.

Direct write rate

Indicates the rate at which this CSV performs disk writes in the Direct I/O Mode or in the Block Level Redirected I/O Mode.

Writes/Sec

Total direct IOPS

Indicates the rate at which this CSV performs IOPS in the Direct I/O Mode or in the Block Level Redirected I/O Mode.

Operations/Sec

This is a good indicator of the level of I/O activity on the CSV in the Direct I/O Mode or in the Block Level Redirected I/O Mode.

Redirected read rate

If the node being monitored is a co-ordinator node, then this measure indicates the rate at which this CSV reads from the disk via NTFS, in the File System Redirected Mode. If the node being monitored is a non-coordinator node, then this measure indicates the rate at which this CSV reads from the disk by redirecting the read requests to the co-ordinator node via SMB, in the File System Redirected Mode.

Reads/Sec

 

Redirected write rate

If the node being monitored is a co-ordinator node, then this measure indicates the rate at which this CSV writes to the disk via NTFS, in the File System Redirected Mode. If the node being monitored is a non-coordinator node, then this measure indicates the rate at which this CSV writes to the disk by redirecting the write requests to the co-ordinator node via SMB, in the File System Redirected Mode.

Writes/Sec

 

Total redirected IOPS

Indicates the rate at which I/O reads and writes were performed by this CSV on the disk via NTFS, in the File System .Redirected Mode.

Operations/Sec

This is a good indicator of the level of I/O activity in the File System Redirected Mode.

Read IOPS

Indicates the rate at which read I/O operations were performed on this CSV, both directly and via redirection - i.e., in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Reads/Sec

The value of this measure is the sum of the values of the Direct read rate and Redirected read rate measures.

Write IOPS

Indicates the rate at which write I/O operations were performed on this CSV, both directly and via redirection - i.e., in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Writes/Sec

The value of this measure is the sum of the values of the Direct write rate and Redirected write rate measures.

IOPS

Indicates the rate at which read and write I/O operations were performed on this CSV, both directly and via redirection - i.e., in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Operations/Sec

The value of this measure is the sum of the values of the Read IOPS and Write IOPS measures.

This is a good indicator of the level of I/O activity on the CSV.

Direct read latency

Indicates the average latency between the time a read request is sent to this CSV and when its response is received, in the Direct I/O Mode or Block Level Redirected I/O Mode.

Secs

If I/Os are sent using Block Level Redirected I/O alone, then the value of this measure will be close to the value of the Read latency measure of the SMB Client Share test for that CSV.

Direct write latency

Indicates the average latency between the time a write request is sent to this CSV and when its response is received, , in the Direct I/O Mode or Block Level Redirected I/O Mode.

Secs

If I/Os are sent using Block Level Redirected I/O alone, then the value of this measure will be close to the value of the Write latency measure of the SMB Client Share test for that CSV.

Total direct latency

Indicates the latency of I/O operations performed in Direct I/O Mode or Block Level Redirected I/O Mode since the last time of data collection.

Secs

A low value is desired for this measure.

Redirected read latency

If the node being monitored is a coordinator node, this measure indicates the average latency between the time a read request redirected by a slave node is received by this CSV in the File System Redirected Mode, and when its response is sent.

If the node being monitored is a non-coordinator node, this measure indicates the average latency between the time a read request is redirected by this CSV to the coordinator node in the File System Redirected Mode, and when its response is received.

Secs

 

Redirected write latency

If the node being monitored is a coordinator node, this measure indicates the average latency between the time a read request redirected by a slave node is received by this CSV in the File System Redirected Mode, and when its response is sent.

If the node being monitored is a non-coordinator node, this measure indicates the average latency between the time a read request is redirected by this CSV to the coordinator node in the File System Redirected Mode, and when its response is received.

Secs

 

Total redirected latency

Indicates the latency of I/O operations redirected to this CSV in the File System Redirected Mode since the last time of data collection.

Secs

Ideally, the value of this measure should be low. Compare the value of this measure across CSVs to know which CSV is taking the maximum time to process I/O in the File System Redirected Mode.

Read latency

Indicates the average latency of read operations performed by this CSV on the disk, both directly and via redirection - i.e., in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Secs

The value of this measure is the sum of the values of theDirect read latency and Redirected read latency measures.

Write latency

Indicates the average latency of write operations performed by this CSV on the disk, both directly and via redirection - i.e., in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Secs

The value of this measure is the sum of the values of the Direct write latency and Redirected write latency measures.

Latency

Indicates the average latency of both read and write operations performed by this CSV on the disk, both directly and via redirection - .i.e., in the Direct I/O, Block Level Redirected I/O, and File System Redirected I/O Modes.

Secs

The value of this measure is the sum of the values of the Read latency and Write latency measures.

A low value is desired for this measure. A consistent rise in the value of the measure is an indicator of a processing bottleneck on the CSV.

Direct read queue length

Indicates the number of read I/Os currently outstanding on this CSV, in the Direct I/O or Block Level Redirected I/O Mode.

Number

 

Direct write queue length

Indicates the number of write I/Os currently outstanding on this CSV, in the Direct I/O or Block Level Redirected I/O Mode.

Number

 

Total direct queue length

Indicates the number of read and write I/Os that are currently pending processing on this CSV, in the Direct I/O or Block Level Redirected I/O Mode.

Number

A zero value is desired for this measure. A consistent increase in the value of this measure is a cause for concern, as it indicates that the CSV is having difficulty processing /O requests in the Direct I/O or Block Level Redirected I/O Mode.

Redirected read queue length

Indicates the number of reads currently outstanding on this CSV, in the File System Redirected Mode.

Number

 

Redirected write queue length

Indicates the number of writes currently outstanding on this CSV, in the File System Redirected Mode.

Number

 

Total redirected queue length

Indicates the number of writes currently outstanding on this CSV, in the File System Redirected Mode.

Number

A zero value is desired for this measure. A consistent increase in the value of this measure is a cause for concern, as it indicates that the CSV is having difficulty processing I/O requests in the File System Redirected I/O Mode.

Read queue length

Indicates the count of read operations that are currently outstanding on this CSV, in the Direct I/O, Block Redirected I/O, and File System Redirected I/O Modes.

Number

 

Write queue length

Indicates the count of write operations that are currently outstanding on this CSV, in the Direct I/O, Block Redirected I/O, and File System Redirected I/O Modes.

Number

 

Queue length

Indicates the count of read and write operations that are currently outstanding on this CSV, in the Direct I/O, Block Redirected I/O, and File System Redirected I/O Modes.

Number

A zero value is desired for this measure. A consistent increase in the value of this measure is a cause for concern, as it indicates that the CSV is having difficulty processing I/O requests.