PowerEdge System Health Test

The Dell PowerEdge VRTX chassis comprises of many components such as processors, memory devices, batteries, PSUs, amperage probes, voltage sensors, temperature probes, cooling units, and blade servers. Each of these components influence the availability and overall performance of the VRTX system. This is why, at any given point in time, administrators will not only need to know how well the VRTX system as a whole is performing, but will also require pointers to which component could be adversely impacting its performance. Such a useful insight on performance is provided by the PowerEdge System Health test. Besides revealing the current health of the VRTX system as a whole, this test all reports the collective state of each of the component types that form an integral part of the VRTX system. This way, administrators can figure out whether/not the VRTX is healthy, and if not, can also determine where the source of the problem lies – is it with the memory devices? the processors? the batteries? the PSUs? the cooling units? the amperage probes? the voltage sensors? or the temperature probes? Or the blade servers? Once the area of concern is isolated, administrators can use the eG test that deep dives into that realm of performance to accurately diagnose the root-cause of the problem.

For instance, if the PowerEdge System Health test reveals that one/more batteries are adversely impacting the health of the VRTX system, then administrators can use the PowerEdge System Battery test to find the defective battery.

Target of the test : A Dell PowerEdge VRTX

Agent deploying the test : An external agent

Outputs of the test : One set of results for the VRTX being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

The port at which the device listens. By default, this will be NULL.

SNMPPort

The port at which the monitored target exposes its SNMP MIB; The default value is 161.

SNMPVersion

By default, the eG agent supports SNMP version 1. Accordingly, the default selection in the SNMPversion list is v1. However, if a different SNMP framework is in use in your environment, say SNMP v2 or v3, then select the corresponding option from this list.

SNMPCommunity

The SNMP community name that the test uses to communicate with the firewall. This parameter is specific to SNMP v1 and v2 only. Therefore, if the SNMPVersion chosen is v3, then this parameter will not appear.

UserName

This parameter appears only when v3 is selected as the SNMPVersion. SNMP version 3 (SNMPv3) is an extensible SNMP Framework which supplements the SNMPv2 Framework, by additionally supporting message security, access control, and remote SNMP configuration capabilities. To extract performance statistics from the MIB using the highly secure SNMP v3 protocol, the eG agent has to be configured with the required access privileges – in other words, the eG agent should connect to the MIB using the credentials of a user with access permissions to be MIB. Therefore, specify the name of such a user against this parameter. 

Context

This parameter appears only when v3 is selected as the SNMPVersion. An SNMP context is a collection of management information accessible by an SNMP entity. An item of management information may exist in more than one context and an SNMP entity potentially has access to many contexts. A context is identified by the SNMPEngineID value of the entity hosting the management information (also called a contextEngineID) and a context name that identifies the specific context (also called a contextName). If the Username provided is associated with a context name, then the eG agent will be able to poll the MIB and collect metrics only if it is configured with the context name as well. In such cases therefore, specify the context name of the Username in the Context text box.  By default, this parameter is set to none.

AuthPass

Specify the password that corresponds to the above-mentioned UserName. This parameter once again appears only if the SNMPversion selected is v3.

Confirm Password

Confirm the AuthPass by retyping it here.

AuthType

This parameter too appears only if v3 is selected as the SNMPVersion. From the AuthType list box, choose the authentication algorithm using which SNMP v3 converts the specified username and password into a 32-bit format to ensure security of SNMP transactions. You can choose between the following options:

  • MD5 – Message Digest Algorithm
  • SHA – Secure Hash Algorithm

EncryptFlag

This flag appears only when v3 is selected as the SNMPVersion. By default, the eG agent does not encrypt SNMP requests. Accordingly, the this flag is set to No by default. To ensure that SNMP requests sent by the eG agent are encrypted, select the Yes option. 

EncryptType

If this EncryptFlag is set to Yes, then you will have to mention the encryption type by selecting an option from the EncryptType list. SNMP v3 supports the following encryption types:

  • DES – Data Encryption Standard
  • AES – Advanced Encryption Standard

EncryptPassword

Specify the encryption password here.

Confirm Password

Confirm the encryption password by retyping it here.

Timeout

Specify the duration (in seconds) within which the SNMP query executed by this test should time out in this text box. The default is 10 seconds.

Data Over TCP

By default, in an IT environment, all data transmission occurs over UDP. Some environments however, may be specifically configured to offload a fraction of the data traffic – for instance, certain types of data traffic or traffic pertaining to specific components – to other protocols like TCP, so as to prevent UDP overloads. In such environments, you can instruct the eG agent to conduct the SNMP data traffic related to the monitored target over TCP (and not UDP). For this, set this flag to Yes. By default, this flag is set to No.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Global system status

Indicates how healthy the VRTX system as a whole is.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the entire VRTX system. In the graph of this measure however, the same is represented using the numeric equivalents only.

Chassis server status

Indicates the collective state of all blade servers in the the VRTX chassis.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the blade servers in the chassis. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall power unit status

Indicates the current collective status of all the power units of the VRTX system.

 

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the power units. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall power supply status

Indicates the current collective state of all the power supply points of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the power supply points. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall cooling unit status

Indicates the current collective state of all the cooling units of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the cooling units. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall cooling device status

Indicates the current collective state of all the cooling deviices of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the cooling devices. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall voltage probe status

Indicates the current collective state of all the voltage probes of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the voltage probes. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall temperature probe status

Indicates the current collective state of all the temperature probes of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the temperature probes. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall amperage probe status

Indicates the current collective state of all the amperage probes of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the amperage probes. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall memory device status

Indicates the current collective state of all the DIMMs of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the memory devices. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall processor device status

Indicates the current collective state of all the processors of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the processor devices. In the graph of this measure however, the same is represented using the numeric equivalents only.

Overall system battery status

Indicates the current collective state of all the batteries of the VRTX system.

 

The values that this measure can report and their corresponding numeric values are discussed below:

Measure Value Numeric Value
Other 1
Unknown 2
Normal 3
NonCritical 4
Critical 5
NonRecoverable 6

Note:

By default, this measure reports one of the Measure Values listed above to indicate the current health of the batteries. In the graph of this measure however, the same is represented using the numeric equivalents only.