GPU Sensors Test

This test monitors each GPU available in the hardware unit of the host system and reports the voltage, temperature and the load handled by each GPU. In addition, this test reports the speed of each GPU and the average speed of the fans in each GPU. This way, administrators may be alerted to potential overload condition of the GPU and help administrators identify potential issues that may affect the functioning of the GPU.

Target of the test: Any host system

Agent executing the test: An internal agent

Output of the test: One set of results for each GPU available in the hardware unit of the host system being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Voltage utilized

Indicates the current voltage of this GPU.

Volts

 

Clock Speed

Indicates the current speed of this GPU.

MHz

A very low value for this measure indicates that the GPU is slow.

Comparing the value of this measure across GPUs will point you to that GPU that is currently very slow.

Temperature

Indicates the current temperature of this GPU.

Celsius

The value of this measure should be within permissible limits. A sudden/gradual increase in the value of this measure may affect the functioning of the server and needs to be immediately attended to.

Load Utilized

Indicates the percentage of load handled by this GPU.

Percent

Comparing the value of this measure across GPUs will help you identify the GPU that is handling the maximum load.

Total revolutions

Indicates the average speed of the fans in this GPU.

RPM

The speed of the fans must be within the permissible range. A sudden increase/decrese in the value of this measure is a cause for concern.