Blade Server - Intersight Test

Blade servers are the core components of the Cisco equipment system managed by Cisco Intersight. Unavailable/inoperable blade servers can hence bring the entire system to a standstill. Using this test, you can continuously monitor the overall health, operability, and availability of each blade server in each chassis managed by the Cisco Intersight, and be alerted to anomalies as soon as they occur, so that you can take the required corrective actions before your mission-critical services begin to suffer. In addition, the test also captures critical power and  thermal failures experienced by the blade servers, and takes stock of the hardware (such as processors, cores, NICs, etc.) supporting the operations of the blade server.

Target of the test : Cisco Intersight

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each Blade server managed by Cisco Intersight

Configurable parameters for the test

Parameter

Description

Test period

How often should the test be executed.

Host

The IP address of the Host

Port

The port number through which Cisco Intersight communicates.

API Key ID

Specify the ID for API key which is used to authenticate the eg Agent connectivity to Cisco Intersight.

API Key Filepath

Specify the filepath from where eG Agent can read the API Key.

URL

Specify the URL for Cisco Intersight. the default is intersight.com.

Proxy Host

Specify the IP address of the proxy host server.

Proxy Port

Specify the port number of the proxy host server.

Proxy User Name

The user name for the proxy server authentication.

Proxy Password

The password for the proxy server authentication.

Confirm Password

Specify the passowrd again.

Detailed Diagnoses

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Overall Status

Indicates the current overall status of blade server.

 

The Status reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Unknown 0
Operable 1
Inoperable 2
Degraded 3
Powered-off 4
Power-problem 5
Removed 6
Voltage-problem 7
Thermal-problem 8
Performance-problem 9
Accessibility-problem 10
Identity-unestablishable 11
Bios-post-timeout 12
Disabled 13
Fabric-conn-problem 51
Fabric-unsupported-conn 52
Config 81
Equipment-problem 82
Decommissioning 83
Chassis-limit-exceeded 84
Not-supported 100
Discovery 101
Discovery-failed 102
Identify 103
Post-failure 104
Upgrade-problem 105
Peer-comm-problem 106
Auto-upgrade 107

Note:

By default, this measure reports the above-mentioned Status while indicating the overall status of each blade server being managed. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

The detailed diagnosis of this measure provides the Model, Serial, Vendor, Platform type, Service profile, Revision, Slot ID, Scaled mode and Management IP address.

Operability status

Indicates the current operability status of blade server.

 

The Status reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Unknown 0
Operable 1
Inoperable 2
Degraded 3
Powered-off 4
Power-problem 5
Removed 6
Voltage-problem 7
Thermal-problem 8
Performance-problem 9
Accessibility-problem 10
Identity-unestablishable 11
Bios-post-timeout 12
Disabled 13
Fabric-conn-problem 51
Fabric-unsupported-conn 52
Config 81
Equipment-problem 82
Decommissioning 83
Chassis-limit-exceeded 84
Not-supported 100
Discovery 101
Discovery-failed 102
Identify 103
Post-failure 104
Upgrade-problem 105
Peer-comm-problem 106
Auto-upgrade 107

Note:

By default, this measure reports the above-mentioned Status while indicating the operability status of each blade server being managed. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Administrative power state

Indicates the current administrative power status of blade server in equipment chassis.

 

The State reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Policy 1

Note:

By default, this measure reports the above-mentioned States while indicating the Administrative power status of each blade server being managed. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Power state

Indicates the current power state of blade server in equipment chassis.

 

The State reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Off 0
On 1

Note:

By default, this measure reports the above-mentioned States while indicating the power state of each blade server being managed. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

BIOS post competition status

Indicates the BIOS post completion status of blade server in equipment chassis.

 

The State reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
False 0
True 1

Note:

By default, this measure reports the above-mentioned States while indicating the BIOS post completion status of each blade server being managed. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Presence state

Indicates the presence status of blade server in equipment chassis.

 

The State reported by this measure and their corresponding numeric equivalents are described in the table below:

State Numeric Value
Unknown 0
Empty 1
Equipped 10
Missing 11
Mismatched 12
Equipped-not-primary 13
Equipped-identity-unestablishable 20
Mismatch-identity-unestablishable 21
Inaccessible 30
Unauthorized 40
Not-supported 100

Note:

By default, this measure reports the above-mentioned States while indicating the presence of each chassis being managed. However, in the graph of this measure, states will be represented using their corresponding numeric equivalents only.

Fault Summary

Indicates the total number of faults attributed to blade server in equipment chassis being managed.

Number

The number of faults should always be under a tolerance limit, if the numbers are consistently high, it needs to be investigated.

Total memory

Indicates the total memory of the current blade server.

MB

The memory usage of the blade servers should be optimal, very high memory usage for long periods may not leave enough memory for new operations and blade server may become unresponsive.

 

 

 

Available memory

Indicates the memory available for blade server for use in various operations.

MB

Used memory

Indicates the total used memory of the blade server. Used memory is calculated by subtracting used available memory from total memory.

MB

Memory usage

Indicate the percentage of memory being used out of total memory available.

Percentage

Memory speed

Indicates the speed at which memory is allocated.

MB/Sec

The speed of memory allocation should be within a tolerance limit, if it goes too low, it should be investigated.

Adaptors

Indicates the total number of adaptors in blade server.

Number

 

CPU cores

Indicates the total number of CPU cores in blade server.

Number

 

Administrators need to ensure that there are enough CPUs and CPU cores available to carry out the operations under process.

 

CPU cores enabled

Indicates the number of CPU cores currently enabled for use.

Number

CPUs

Indicates the total number of CPUs in the blade server.

Number

vNICs

Indicates the total number of network interface cards available for use to current blade server.

Number

You need to make sure that enough network interface are available to current blade server so that network doesn't become a bottleneck in the operation of the server.

vHBAs

Indicates the total number of virtual host bus adaptors available to use for current blade server.

Number

You need to make sure that enough host bus adaptors are available to current blade server so that network doesn't become a bottleneck in the operation of the server.

Threads

Indicates the total number of threads currently running in blade server.

Number

You need to make sure that threads are completing and getting recycled, there shouldn't be too many long running threads.