Blade NICs Test

This test auto-discovers the NICs (Network Interface Cards) supported by the UCS Blade servers, monitors the overall health, operational state, and load on each NIC, and promptly notifies administrators when an NIC suddenly switches to an abnormal state, becomes overloaded, or encounters errors while sending/receiving data over the network. This way, you can easily isolate problematic, over-used, and error-prone NICs.

Target of the test : A Cisco UCS manager

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each NIC supported by every blade server loaded in each chassis managed by the Cisco UCS manager being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed

Host

The IP address of the host for which the test is being configured.

Port

The variable name of the port at which the specified host listens.

UCS User and
UCS Password

Provide the credentials of a user with at least read-only privileges to the target Cisco UCS manager.

Confirm Password

Confirm the password by retyping it here.

SSL

By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL flag is set to Yes by default.

Web Port

By default, in most virtualized environments, Cisco UCS manager listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This implies that while monitoring Cisco UCS manager, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of Cisco UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL flag above is set to No), then the eG agent connects to Cisco UCS manager using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the SSL flag is set to Yes), then the agent-Cisco UCS manager communication occurs via port 443 by default. Accordingly, the WebPort parameter is set to default by default.

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the WebPort parameter, you can specify the exact port at which the Cisco UCS manager in your environment listens, so that the eG agent communicates with that port for collecting metrics from the Cisco UCS manager.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Overall status

Indicates the current state of this NIC.

 

The values reported by this measure and their corresponding numeric values are described in the table below:

Measure Value Numeric Value
Unknown 0
Operable 1
Inoperable 2
Degraded 3
Powered off 4
Power-problem 5
Removed 6
Voltage-problem 7
Thermal-problem 8
Performance-problem 9
Accessibility-problem 10
Identity-unestablishable 11
Bios-post-timeout 12
Disabled 13
Fabric-conn-problem 51
Fabric-unsupported-conn 52
Config 81
Equipment-problem 82
Decommissioning 83
Chassis-limit-exceeded 84
Discovery 101
Discovery-failed 102
Identify 103
Post-failure 104
Upgrade-problem 105
Peer-comm-problem 106
Auto-upgrade 107
Not Available -5

The detailed diagnosis of this measure provides the complete details of an NIC such as its ID, Vendor, vNIC, PCIE Address, MAC, Original MAC, Purpose, Name, and Type.

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the overall state of an NIC. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Operability

Indicates the current operational state of this NIC.

 

The values reported by this measure and their corresponding numeric values are described in the table below:

Measure Value Numeric Value
Unknown 0
Operable 1
Inoperable 2
Degraded 3
Powered-off 4
Power-problem 5
Removed 6
Voltage-problem 7
Thermal-problem 8
Performance-problem 9
Accessibility-problem 10
Identity-unestablishable 11
Bios-post-timeout 12
Disabled 13
Fabric-conn-problem 51
Fabric-unsupported-conn 52
Config 81
Equipment-problem 82
Decommissioning 83
Chassis-limit-exceeded 84
Discovery 101
Discovery-failed 102
Identify 103
Post-failure 104
Upgrade-problem 105
Peer-comm-problem 106
Auto-upgrade 107
Not Available -5

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the operational state of an NIC. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Administrative state

Indicates the current administrative state of this NIC.

 

The values reported by this measure and their corresponding numeric values are described in the table below:

Measure Value Numeric Value
Enabled 0
Reset-connectivity-active 1
Reset-connectivity-passive 2
Reset-connectivity 3

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the administrative state of an NIC. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Discovery state

Indicates the current discovery state of this NIC.

 

The values reported by this measure and their corresponding numeric values are described in the table below:

Measure Value Numeric Value
Absent 0
Present 1
Mis-connect 2
Missing 3
New 4

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the discovery state of an NIC. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Presence state

Indicates the current presence state of this NIC.

 

The values reported by this measure and their corresponding numeric values are described in the table below:

Measure Value Numeric Value
Unknown 0
Empty 1
Equipped 10
Missing 11
Mismatch 12
Equipped-not-primary 13
Equipped-identity-unestablishable 20
Mismatch-identity-unestablishable 21
Inaccessible 30
Unauthorized 40
Not Available -5

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the presence state of an NIC. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Data received

Indicates the amount of data received by this NIC during the last measurement period.

MB

These measures are good indicators of the load bandled by an NIC. By comparing the value of each measure across NICs, you can quickly identify which NIC is experiencing heavy data traffic and when - while receiving data? or while transmitting data?

Data transmitted

Indicates the amount of data transmitted by this NIC during the last measurement period.

MB

Packets received

Indicates the number of packets received by this NIC during the last measurement period.

Packets

These measures are good indicators of the load bandled by an NIC. By comparing the value of each measure across NICs, you can quickly identify which NIC is experiencing heavy data traffic and when - while receiving data? or while transmitting data?

Packets transmitted

Indicates the number of packets sent by this NIC during the last measurement period.

Packets

Dropped packets received

Indicates the number of dropped packets received by this NIC during the last measurement period.

Packets

 

Dropped packets transmitted

Indicates the number of dropped packets transmitted by this NIC during the last measurement period.

Packets

 

Errors received

Indicates the errors encountered by this NIC while receiving data during the last measurement period.

Errors

Ideally, the value of both these measures should be 0. A non-zero value indicates that one/more errors have occurred on an NIC. If these measure values increase with time, you may want to compare the value of each of these measures across NICs to quickly zero-in on the error-prone NICs and understand when the maximum number of errors occurred on those NICs - while transmitting data? or while receiving it?

Errors transmitted

Indicates the errors encountered by this NIC while transmitting data during the last measurement period.

Errors