vSAN Health Test

In a Virtual SAN enabled cluster, you can use the vSAN health checks to monitor the status of cluster components, diagnose issues, and troubleshoot problems. The health checks cover hardware compatibility, network configuration and operation, advanced vSAN configuration options, storage device health, and virtual machine objects. The vSAN health checks are divided into categories. Each category contains individual health checks.

Health Check Category Description

Hardware Compatibility

Monitor the cluster components to ensure that they are using supported hardware, software, and drivers.

Performance Service

Monitor the health of vSAN performance service.

Network

Monitor vSAN network health.

Physical disk

Monitor the health of physical devices in the vSAN cluster.

Data

Monitor vSAN data health.

Cluster

Monitor vSAN cluster health.

Capacity utilization

Monitor vSAN cluster capacity.

Online health

Monitor vSAN cluster health and send to VMware’s analytics backend system for advanced analysis. You must participate in the Customer Experience Improvement Program to use online health checks.

vSAN Build Recommendation

Monitor vSAN build recommendations for vSphere Update Manager.

vSAN iSCSI target service

Monitor the iSCSI target service, including the network configuration and runtime status.

Encryption

Monitor vSAN encryption health.

Stretched cluster

Monitor the health of a stretched cluster, if applicable.

Hyperconverged cluster configuration compliance

Monitor the status of hosts and settings configured through the Quickstart workflow.

The health checks in the above-table are periodically executed on the vSAN cluster for health testing and performance guarantee. By continuously tracking the health checks on the cluster, administrators can find out current health of the cluster and quickly identify the alerts on unhealthy conditions in time. This way, administrators are enabled to act on the health check alerts that indicate failure conditions or hardware incompatibility with the highest priority. To help administrators in this regard, eG Enterprise offers the vSAN Health test.

This test monitors the tests under all the health check categories on the vSAN cluster and reports the count of tests in each health check category at different states. The revelation helps administrators to proactively identify the failures and warnings during health checks and reduces the pain involved in troubleshooting the failure conditions.

Note:

This test is applicable only for the vSAN enabled clusters in the VMware vCenter server.

Target of the test : A VMware vCenter server

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each vSAN cluster:health check category combination.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which this test is to be configured.

Port

Refers to the port at which the specified host listens to.

VC User and VC Password

To connect to vCenter and extract metrics from it, this test should be configured with the name and password of a user with Administrator or Virtual Machine Administrator privileges to vCenter. However, if, owing to security constraints, you are not able to use the credentials of such users for test configuration, then you can configure this test with the credentials of a user with Read-only rights to vCenter. For this purpose, you can assign the ‘Read-only’ role to a local/domain user to vCenter, and then specify name and password of this user against the VC User and VC Password text boxes. The steps for assigning this role to a user on vCenter have been detailed in the Creating a Special Role on vCenter and Assigning the Role to a New User topic under the eG Monitoring Capabilities -> Virtualization and Containers -> Virtualization Technologies -> VMware vSphere ESX node sequence in the eG Enterprise documentation portal Monitoring VMware Infrastructures

vCenter servers terminate user sessions based on timeout periods. The default timeout period is 30 mins. When you stop an agent, sessions currently in use by the agent will remain open for this timeout period until vCenter times out the session. If the agent is restarted within the timeout period, it will open a new set of sessions. If you want the eG agent to close already existing sessions on vCenter before it opens new sessions, then, instead of the ‘Read-only’ user, you can optionally configure the VC User and VC Password parameters with the credentials of a user with permissions to View and Stop Sessions on vCenter. For this purpose, you can create a special role on vCenter, grant the View and Stop Sessions privilege (prior to vCenter 4.1, this was called the View and Terminate Sessions privilege) to this role, and then assign the new role to a local/domain user to vCenter. The steps for assigning this role to a user on vCenter have been detailed in the Creating a Special Role on vCenter and Assigning the Role to a New User topic under the eG Monitoring Capabilities -> Virtualization and Containers -> Virtualization Technologies -> VMware vSphere ESX node sequence in the eG Enterprise documentation portal Monitoring VMware Infrastructures

Confirm Password

Confirm the password by retyping it in this text box.

SSL

By default, the vCenter server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the vCenter server via HTTPS by default.

Webport

By default, in most virtualized environments, vCenter listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This implies that while monitoring vCenter, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of vCenter – i.e., if vCenter is not SSL-enabled (i.e., if the SSL flag above is set to No), then the eG agent connects to vCenter using port 80 by default, and if vCenter is SSL-enabled (i.e., if the ssl flag is set to Yes), then the agent-vCenter communication occurs via port 443 by default.  Accordingly, the Webport parameter is set to default by default. 

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the Webport parameter, you can specify the exact port at which vCenter in your environment listens, so that the eG agent communicates with that port for collecting metrics from vCenter.

DDForPassedandInfo

By default, both this flag is set to No, indicating that by default, the test does not generate detailed diagnostic measures for Passed and Info measures. If you want the test to generate and store detailed measures for the Passed and Info measures, set the DDForPassedandInfo flag to Yes.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Passed

Indicates the number of tests under this health check category that returned the Passed state.

Number

The detailed diagnosis of this measure, if enabled using DDForPassedandInfo flag, reveals the name of the tests under each health check category that returned the Passed state, detailed message and health status of the tests.

Skipped

Indicates the number of tests under this health check category that returned the Skipped state.

Number

The detailed diagnosis of this measure lists the name of the tests under each health check category that returned the Skipped state, detailed message and health status of the tests.

Info

Indicates the number of tests under this health check category that returned the Info state .

Number

The detailed diagnosis of this measure, if enabled using DDForPassedandInfo flag, reveals the name of the tests under each health check category that returned the Info state, detailed message and health status of the tests.

Warning

Indicates the number of tests under this health check category that returned the Warning state.

Number

The detailed diagnosis of this measure lists the name of the tests under each health check category that returned the Warning state, detailed message and health status of the tests.

Failed

Indicates the number of tests under this health check category that returned the Failed state.

Number

The detailed diagnosis of this measure lists the name of the tests under each health check category that returned the Failed state, detailed message and health status of the tests.

Unknown

Indicates the number of tests under this health check category that returned the Unknown state.

Number

The detailed diagnosis of this measure lists the name of the tests under each health check category that returned the Unknown state, detailed message and health status of the tests.