vSAN Physical DiskTest

This test auto-discovers the physical disks in the vSAN cluster and reports the type and current health of each disk. This helps administrators to instantly identify the unhealthy disks and proactively treat the unhealthy disks to prevent prolonged delays in data access for users. This test also reveals the capacity and utilization of each disk, using which any abnormalities can be detected before users start complaining of slowdowns and reduced performance of the cluster. In the process, this test also measures the throughput of read and write operations performed on physical and vSAN layers of each disk. The measured throughput values help administrators to easily find out how well/badly the read and write operations are performed on the physical disks. In addition, the time taken to perform the read and write operations on each disk is also revealed. Using this revelation, administrators can identify the disk which experienced delay while processing the IO operations.

Note:

This test is applicable only for the vSAN enabled clusters in the VMware vCenter server.

Target of the test : A VMware vCenter server

Agent deploying the test : An internal agent

Outputs of the test : One set of results for VMware vCenter server that is being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which this test is to be configured.

Port

Refers to the port at which the specified host listens to.

VC User and VC Password

To connect to vCenter and extract metrics from it, this test should be configured with the name and password of a user with Administrator or Virtual Machine Administrator privileges to vCenter. However, if, owing to security constraints, you are not able to use the credentials of such users for test configuration, then you can configure this test with the credentials of a user with Read-only rights to vCenter. For this purpose, you can assign the ‘Read-only’ role to a local/domain user to vCenter, and then specify name and password of this user against the VC User and VC Password text boxes. The steps for assigning this role to a user on vCenter have been detailed in the Monitoring VMware Infrastructures chapter

vCenter servers terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until vCenter times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions on vCenter before it opens new sessions, then, instead of the ‘Read-only’ user, you can optionally configure the VC User and VC Password parameters with the credentials of a user with permissions to View and Stop Sessions on vCenter. For this purpose, you can create a special role on vCenter, grant the View and Stop Sessions privilege (prior to vCenter 4.1, this was called the View and Terminate Sessions privilege) to this role, and then assign the new role to a local/domain user to vCenter. The steps for this have been discussed in the Monitoring VMware Infrastructures chapter.

Confirm Password

Confirm the password by retyping it in this text box.

SSL

By default, the vCenter server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the vCenter server via HTTPS by default.

Webport

By default, in most virtualized environments, vCenter listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This implies that while monitoring vCenter, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of vCenter – i.e., if vCenter is not SSL-enabled (i.e., if the SSL flag above is set to No), then the eG agent connects to vCenter using port 80 by default, and if vCenter is SSL-enabled (i.e., if the ssl flag is set to Yes), then the agent-vCenter communication occurs via port 443 by default.  Accordingly, the Webport parameter is set to default by default. 

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the Webport parameter, you can specify the exact port at which vCenter in your environment listens, so that the eG agent communicates with that port for collecting metrics from vCenter.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Drive type

Indicates the drive type of this physical disk.

 

The values this measure reports and their numeric equivalents are provided in the table below:

Measure Value Numeric Value
FLASH 0
HDD 1

Note:

Typically, this measure reports one of the Measure Values listed in the table above. In the graph of this measure however, the drive type of an physical disk is indicated by its corresponding numeric equivalents only.

Health

Indicates the current health of this physical disk.

 

The values this measure reports and their numeric equivalents are provided in the table below:

Measure Value Numeric Value
Healthy 0
Disk health is unknown 1
Permanent disk failure 2
Permanent disk loss 3
Disk dicommissioned 4
Disk performance degraded, and components are evacuating 5
Disk performance degraded, and component evacuation failed 6
Disk performance degraded, and component evacuation get stuck 7
Disk performance degraded, and dying disk is ok to unmount 8

Note:

Typically, this measure reports one of the Measure Values listed in the table above. In the graph of this measure however, the health of an physical disk is indicated by its corresponding numeric equivalents only.

Capacity

Indicates the total capacity of this disk.

GB

 

Used capacity

Indicates the amount of space utilized on this disk.

GB

 

Used utilization

Indicates the percentage of space utilized on this disk.

Percentage

 

Reserved capacity

Indicates the amount of space that is reserved on this disk for Thick Provisioning.

GB

Some of the objects on vSAN datastore are assigned a storage policy with an Object Space Reservation (OSR) rule set to Thick Provisioning. vSAN reserves the amount of configured capacity for objects with OSR. The capacity is commonly used for an important workload that dynamically consumes storage capacity.

Reserved utilization

Indicates the percentage of space that is reserved on this disk for Thick Provisioning.

Percentage

 

Physical layer read IOPS

Indicates the number of read IO operations performed on the Physical layer of this disk.

IOPS

Compare the value of this measure across disks to know which disk handled the maximum number of read requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.

Physical layer write IOPS

Indicates the number of write IO operations performed on the Physical layer of this disk.

IOPS

Compare the value of this measure across disks to know which disk handled the maximum number of write requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.

Physical layer read throughput

Indicates the rate at which the data was read from the Physical layer of this disk.

MB/sec

A high value is desired for this measure. A very low value is a cause for concern, as it indicates that disk is very poor in handling the read requests.

Physical layer write throughput

Indicates the rate at which the data was written on the Physical layer of this disk.

MB/sec

A high value is desired for this measure. A very low value is a cause for concern, as it indicates that disk is very poor in handling the write requests.

Physical layer read latency

Indicates the time taken for performing read operations on the Physical layer of this disk.

Seconds

Ideally, this value should be low. If not, it implies that the disk is slow in processing the read requests at the Physical layer.

Physical layer write latency

Indicates the time taken for performing write operations on the Physical layer of this disk.

Seconds

 

Guest IO latency

Indicates the time taken for performing IO operations on the guests that share this disk.

Seconds

 

Device IO latency

Indicates the time taken for performing IO operations on the devices that share this disk.

Seconds

 

vSAN layer read IOPS

Indicates the number of read IO operations performed on the vSAN layer of this disk.

IOPS

 

vSAN layer write IOPS

Indicates the number of write IO operations performed on the vSAN layer of this disk.

IOPS

 

vSAN layer read latency

Indicates the time taken for performing read operations on the vSAN layer of this disk.

Seconds

 

vSAN layer write latency

Indicates the time taken for performing write operations on the vSAN layer of this disk.

Seconds