Proxmox Cluster Nodes Test

The test auto-discovers the nodes in the target Proxmox Cluster and, for each node, reveals the current status. Additionally, the test reports the total CPU and memory capacity allocated to each node, and accurately pinpoints the node on which the memory and CPU resources have been over-utilized. This way, the test warns administrators of a probable resource contention on each node.

Target of the test : A Proxmox Cluster

Agent deploying the test : An internal/remote agent

Outputs of the test : One set of results for each node in the Proxmox Cluster being monitored

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed.
Host	The IP address of the host for which this test is to be configured.
Port	The port at which the specified host listens. By default, this will be NULL.
DD Frequency	Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Status

Indicates whether/not the current status of this node.

The values that this measure reports and their corresponding numeric values are detailed in the table below:

Measure Value	Numeric Value
Online	1
Offline	0

Note:

By default, this test reports the Measure Values listed in the table above to indicate the state of a node. In the graph of this measure however, the state is indicated using the numeric equivalents only.

Total memory

Indicates the total amount of memory allocated to this node.

Used memory

Indicates the amount of memory used by this node.

Ideally, the value of this measure should be much lesser than the value of the Total memory measure. If the value of this measure is equal to or is rapidly approaching the value of the Tot measure, it means that the node is running out of memory resources.

Free memory

Indicates the amount of memory available for use in this node.

Ideally, the value of this measure should be high.

Memory utilized

Indicates the percentage of memory utilized by this node.

Percent

A value close to 100% is indicative of excessive memory usage by a node, and signals a potential memory contention on the node.

Maximum CPU

Indicates the minimum number of CPU guaranteed to this node.

Number

CPU utilization

Indicates the percentage of CPU used by this node.

Percent

A value close to 100% is indicative of excessive CPU usage by a node, and signals a potential CPU resource contention on the node.

Total uptime

Indicates the total time that this node has been up since its last reboot.

This measure displays the number of years, months, days, hours, minutes and seconds since the last reboot. Administrators may wish to be alerted if a node has been running without a reboot for a very long period. Setting a threshold for this metric allows administrators to determine such conditions.