Nutanix NCC Health Checks Test

Nutanix Cluster Check (NCC) is a built-in utility provided by Nutanix to help administrators assess and maintain the health and performance of Nutanix clusters. NCC health checks are a set of diagnostic tests and best practice checks that can be run to ensure the cluster is functioning optimally. NCC health checks are designed to assess the overall health and stability of a Nutanix cluster. They check various components, configurations, and software aspects to identify potential issues. NCC health checks are essential for preventive maintenance. They can identify issues before they become critical problems, helping to prevent downtime and performance degradation.

This test monitors the NCC Checks and collect number of checks passed, warnings, errors, failures etc. This information is vital for administrators to understand the problem areas and start acting if there is any problem.

Target of the test : A Nutanix Prism Central

Agent deploying the test : A remote agent

Outputs of the test : One set of results for Prism central node being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed. By default, this is 24 hours.

Host

The host for which the test is to be configured.

Port

The port at which the specified host listens. By default, this is 9440.

Nutanix Prism Central User, Nutanix Prism Central Password and Confirm Password

To connect to the Nutanix Prism Element and collect metrics from it, the eG agent should be configured with the credentials of a Prism Element user with the Viewer role. The steps for creating such a user are detailed in the Pre-requisites for monitoring Nutanix Prism Central

Confirm the Nutanix Prism Element password by retyping it in Confirm Password text box.

SSL

By default, the Nutanix Prism Element server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the Prism Element server via HTTPS by default.

WebPort

By default, the Nutanix Prism Element server listens on port 9440. This implies that while monitoring a Nutanix AHV server via the Prism server, the eG agent connects to port 9440.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

All Checks

Indicates the total number of NCC checks.

Number

Administrators need to keep an eye on all check, how many of them passed, failed etc.

Passed

Indicates the number of NCC checks which are passed.

Number

Number of checks passed should be close to 100% with some tolerance limit, for the system to be deemed fit to be operational.

Warnings

Indicates the total number of NCC checks resulting in warnings.

Number

If there is very high number of warnings from the check, these need to be investigated before theseturn into full scale errors.

The detailed diagnosis of this measure lists the additional metrics including Check ID, Name, Entity Type, UUID and Entity Name.

Errors

Indicates the total number of NCC checks resulting in errors.

Number

Administrators need to review if there are any critical check that have resulted in errors. If the checks are critical, the system may not work.

The detailed diagnosis of this measure lists the additional metrics including Check Id, Name, Entity Type, UUID and Entity Name.

Failures

Indicates the total number of NCC checks resulting in failures.

Number

If the checks have failed, it needs to be investigated why checks failed and should be fixed and run.

The detailed diagnosis of this measure lists the additional metrics including Check Id, Name, Entity Type, UUID and Entity Name.

Off

Indicates the total number of NCC checks which are off.

Number

If the check were turned off, it needs to be reviewed for reason for switching off the checks.

Scheduled

Indicates the total number of NCC checks scheduled.

Number

Administrators should be aware of the scheduled checks and when these are scheduled.

Not Scheduled

Indicates the total number of NCC checks not scheduled.

Number

Administrators should be aware of check which are not scheduled and the reason why these are not scheduled.

Event Triggered

Indicates the total number of NCC checks which have resulted in some event being triggered.

Number

If the check have triggered some events, administrators need to keep an eye on the events.