Nutanix NCC Health Checks Test
Nutanix Cluster Check (NCC) is a built-in utility provided by Nutanix to help administrators assess and maintain the health and performance of Nutanix clusters. NCC health checks are a set of diagnostic tests and best practice checks that can be run to ensure the cluster is functioning optimally. NCC health checks are designed to assess the overall health and stability of a Nutanix cluster. They check various components, configurations, and software aspects to identify potential issues. NCC health checks are essential for preventive maintenance. They can identify issues before they become critical problems, helping to prevent downtime and performance degradation.
This test monitors the NCC Checks and collect number of checks passed, warnings, errors, failures etc. This information is vital for administrators to understand the problem areas and start acting if there is any problem.
Target of the test : A Nutanix Prism Central
Agent deploying the test : A remote agent
Outputs of the test : One set of results for Prism central node being monitored.
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. By default, this is 24 hours. |
Host |
The host for which the test is to be configured. |
Port |
The port at which the specified host listens. By default, this is 9440. |
Nutanix Prism Central User, Nutanix Prism Central Password and Confirm Password |
To connect to the Nutanix Prism Element and collect metrics from it, the eG agent should be configured with the credentials of a Prism Element user with the Viewer role. The steps for creating such a user are detailed in the Pre-requisites for monitoring Nutanix Prism Central Confirm the Nutanix Prism Element password by retyping it in Confirm Password text box. |
SSL |
By default, the Nutanix Prism Element server is SSL-enabled. Accordingly, the SSL flag is set to Yes by default. This indicates that the eG agent will communicate with the Prism Element server via HTTPS by default. |
WebPort |
By default, the Nutanix Prism Element server listens on port 9440. This implies that while monitoring a Nutanix AHV server via the Prism server, the eG agent connects to port 9440. |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
All Checks |
Indicates the total number of NCC checks. |
Number |
Administrators need to keep an eye on all check, how many of them passed, failed etc. |
Passed |
Indicates the number of NCC checks which are passed. |
Number |
Number of checks passed should be close to 100% with some tolerance limit, for the system to be deemed fit to be operational. |
Warnings |
Indicates the total number of NCC checks resulting in warnings. |
Number |
If there is very high number of warnings from the check, these need to be investigated before theseturn into full scale errors. The detailed diagnosis of this measure lists the additional metrics including Check ID, Name, Entity Type, UUID and Entity Name. |
Errors |
Indicates the total number of NCC checks resulting in errors. |
Number |
Administrators need to review if there are any critical check that have resulted in errors. If the checks are critical, the system may not work. The detailed diagnosis of this measure lists the additional metrics including Check Id, Name, Entity Type, UUID and Entity Name. |
Failures |
Indicates the total number of NCC checks resulting in failures. |
Number |
If the checks have failed, it needs to be investigated why checks failed and should be fixed and run. The detailed diagnosis of this measure lists the additional metrics including Check Id, Name, Entity Type, UUID and Entity Name. |
Off |
Indicates the total number of NCC checks which are off. |
Number |
If the check were turned off, it needs to be reviewed for reason for switching off the checks. |
Scheduled |
Indicates the total number of NCC checks scheduled. |
Number |
Administrators should be aware of the scheduled checks and when these are scheduled. |
Not Scheduled |
Indicates the total number of NCC checks not scheduled. |
Number |
Administrators should be aware of check which are not scheduled and the reason why these are not scheduled. |
Event Triggered |
Indicates the total number of NCC checks which have resulted in some event being triggered. |
Number |
If the check have triggered some events, administrators need to keep an eye on the events. |