Blade Overview Test

Blade servers are the core components of the Cisco UCS system. Unavailable/inoperable blade servers can hence bring the entire system to a standstill. Using this test, you can continuously monitor the overall health, operability, and availability of each blade server in each chassis managed by the Cisco UCS manager, and be alerted to anomalies as soon as they occur, so that you can take the required corrective actions before your mission-critical services begin to suffer. In addition, the test also captures critical power and  thermal failures experienced by the blade servers, and takes stock of the hardware (such as processors, cores, NICs, etc.) supporting the operations of the blade server.

Target of the test : A Cisco UCS manager

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each blade server in each chassis managed by the Cisco UCS manager being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed

Host

The IP address of the host for which the test is being configured.

Port

The variable name of the port at which the specified host listens.

UCS User and
UCS Password

Provide the credentials of a user with at least read-only privileges to the target Cisco UCS manager.

Confirm Password

Confirm the password by retyping it here.

SSL

By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL flag is set to Yes by default.

Web Port

By default, in most virtualized environments, Cisco UCS manager listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This implies that while monitoring Cisco UCS manager, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of Cisco UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL flag above is set to No), then the eG agent connects to Cisco UCS manager using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the SSL flag is set to Yes), then the agent-Cisco UCS manager communication occurs via port 443 by default. Accordingly, the WebPort parameter is set to default by default.

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the WebPort parameter, you can specify the exact port at which the Cisco UCS manager in your environment listens, so that the eG agent communicates with that port for collecting metrics from the Cisco UCS manager.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Overall status

Indicates the overall status of this blade server in this chassis.

 

The States reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Indeterminate 0
Unassociated 1
Ok 10
Discovery 11
Config 12
Unconfig 13
Power-off 14
Restart 15
Maintenance 20
Test 21
Compute-mismatch 29
Compute-failed 30
Degraded 31
Discovery-failed 32
Config-failure 33
Unconfig-failed 34
Test-failed 35
Maintenance-failed 36
Removed 40
Disabled 41
Inaccessible 50
Thermal-problem 60
Power-problem 61
Voltage-problem 62
Inoperable 63
Decomissioning 101
Bios-restore 201
Cmos-reset 202
Diagnostics 203
Diagnostics-failed 204

Note:

By default, this measure reports the above-mentioned states while indicating the overall status of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

The detailed diagnosis of this measure provides the Time, Slot ID, chassis ID, PID, Revision, Serial Number, Vendor, Name, UUID, Service Profile and Original UUID attributes for this blade server.

Administrative state

Indicates the current administrative state of this blade server loaded in this chassis.

 

This measure reports either In-service or Out-of-service as the adminstrative state of the blade servers. The numeric equivalents corresponding to these states are shown in the table below:

State Numeric Value
In-service 1
Out-of-service 2

Note:

By default, this measure reports the above-mentioned states while indicating the administrative state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Association state

Indicates the current associative state of this blade server loaded in this chassis i.e., indicates whether the blade server is associated with the service profile that is preconfigured in the Cisco UCS Manager.

 

A service profile represents a logical view of a single blade server, without needing to know exactly which blade you are talking about. The profile object contains the server personality (identity and network information). The profile can then be associated with a single blade at a time.

Cisco UCS Manager uses service profiles to provision the blade servers and their I/O properties. The Cisco Unified Computing System has a form factor-neutral architecture, allowing administrators to centrally manage Cisco UCS blade servers or rack-mount servers, or incorporate both within a single management domain.

Service profiles are created by server, network, and storage administrators and are stored in the Cisco UCS Fabric Interconnects. Infrastructure policies needed to deploy applications, such as power and cooling, security, identity, hardware health, and Ethernet and storage networking, are encapsulated in the service profile. The policies coordinate and automate element management at every layer of the hardware stack, including RAID levels, BIOS settings, firmware revisions and settings, adapter identities and settings, VLAN and VSAN network settings, network quality of service (QoS), and data center connectivity. Cisco UCS Manager provides granular Cisco Unified Computing System visibility for higher-level management tools from BMC, CA, HP, IBM, and others, providing exceptional alignment of infrastructure management with OS and application requirements.

This measure reports the associative state of the blade servers and their numeric equivalents as shown in the table:  

State Numeric Value
None 0
Associated 1
Removing 2
Failed 3

Note:

By default, this measure reports the above-mentioned states while indicating the associative state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Availability state

Indicates the current availability status of this blade server in this chassis.

MB

This measure reports either Available or Unavailable as the availability status of the blade servers. The states and their corresponding numeric equivalents are shown in the table below:

State Numeric Value
Unavailable 0
Available 1

Note:

By default, this measure reports the above-mentioned states while indicating the availability state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Checkpoint state

Indicates the current checkpoint status of this blade server loaded in this chassis.

 

The States reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Unknown 0
Removing 1
Shallow-checkpoint 2
Deep-checkpoint 3
Discovered 4

Note:

By default, this measure reports the above-mentioned states while indicating the checkpoint state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Discovery state

Indicates the current discovery status of this blade server loaded in this chassis.

 

The States reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Undiscovered 0
In-progress 1
Malformed-fru-ino 2
Fru-not-ready 3
Insufficiently-equipped 4
Failed 8
Complete 16
Retry 32
Throttled 64
Illegal-fru 128
Fru-identity-indeterminate 129
Fru-state-indeterminate 130
Diagnostics-in-progress 131
Efidiagnostics-in-progress 132
Diagnostics-failed 133
Diagnostics-complete 134

Note:

By default, this measure reports the above-mentioned States while indicating the discovery state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Operability

Indicates the current operating state of this blade server loaded in this chassis.

 

The States reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Unknown 0
Operable 1
Inoperable 2
Degraded 3
Powered-off 4
Power-problem 5
Removed 6
Voltage-problem 7
Thermal-problem 8
Performance-problem 9
Accessibility-problem 10
Identity-unestablishable 11
Bios-post-timeout 12
Disabled 13
Fabric-conn-problem 51
Fabric-unsupported-conn 52
Config 81
Equipment-problem 82
Decommissioning 83
Chassis-limit-exceeded 84
Discovery 101
Discovery-failed 102
Identify 103
Post-failure 104
Upgrade-problem 105
Peer-comm-problem 106
Auto-upgrade 107

Note:

By default, this measure reports the above-mentioned States while indicating the operational state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Power state

Indicates the current power status of this blade server loaded in this chassis.

 

The States reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Unknown 0
On 1
Test 2
Off 3
Online 4
Offline 5
Offduty 6
Degraded 7
Power-save 8
Error 9

Note:

By default, this measure reports the above-mentioned States while indicating the power state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Slot state

Indicates the current slot status of this blade server loaded in this chassis.

 

The States reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Unknown 0
Empty 1
Equipped 10
Missing 11
Mismatch 12
Equipped-not-primary 13
Equipped-identity-unestablishable 20
Mismatch-identity-unestablishable 21
Inaccessible 30
Unauthorized 40

Note:

By default, this measure reports the above-mentioned States while indicating the slot state of a blade server. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Effective memory

Indicates the amount of memory that can be effectively used by this blade server present in this chassis.

MB

Ideally, the value of this measure should be high.

 

Total memory

Indicates the total memory available in this blade server present in this chassis.

MB

 

Number of processors

Indicates the number of Central Proccessor Units available in this blade server loaded in this chassis.

Number

 

Number of cores

Indicates the total number of cores available on all the CPS that are installed in this blade server in this chassis.

Number

 

Number of cores enabled

Indicates the number of core processors that are enabled in this blade server in this chassis.

Number

 

Number of threads

Indicates the number of processes that can run simultaneously on this blade server in this chassis.

Number

This measures should be equal to either the number of cores or twice the number of cores if the operating system supports hyperthreading.

Number of adapters

Indicates the number of adapters available in this blade server in this chassis.

Number

 

Number of NICs

Indicates the number of physical ethernet network interface cards (NICs) available in this blade server in this chassis.

Number

 

Number of HBAs

Indicates the number of physical host bus adapters (HBAs) available in the blade servers.

Number