Memory Array Errors Test

A DIMM or dual in-line memory module comprises a series of dynamic random-access memory integrated circuits. The Cisco UCS Manager may comprise of multiple DIMMs that server as the main source of memory for the blade servers hosted on the Cisco UCS Manager. The functioning of the blade servers depends extensively on the DIMMs. When errors are detected on the DIMMs, the blade servers would be the first to get affected. The errors on the DIMMs may occur due to the following reasons:

  • Use of third-party DIMMs which are not certified by Cisco;
  • When the DIMM is not oriented correctly in the slot;
  • When the DIMM is reported as unrecognized/inoperable/degraded/overheating;

The memory errors encountered by the Cisco UCS Manager are classified as follows:

  • Correctable and Uncorrectable Errors
  • Detected and Undetected Errors
  • Hard and Soft Errors

These errors when left unattended may result in the failure of some virtual servers hosted on the blade servers of the Cisco UCS Manager and in the worst case may result in the failure of the blade servers itself! To avoid such casualties, it is necessary to monitor the errors detected on the Cisco UCS Manager and rectify the same before end users start complaining about the blade servers being inaccessible. The Memory Array Errors test helps in this regard!

This test continuously tracks the memory errors occurring in the DIMM of the Cisco UCS Manager and reports the number of memory errors that occurred during various time slots. By regularly analyzing the metrics reported by this test, administrators can determine when exactly the error occurrence was high and troubleshoot the memory issues better.

Target of the test : A Cisco UCS Manager

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the Cisco UCS Manager that is being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which the test is being configured.

Port

The port at which the specified host listens. By default, it is set to NULL.

UCS User and
UCS Password

Provide the credentials of a user with at least read-only privileges to the target Cisco UCS Manager.

Confirm Password

Confirm the password by retyping it here.

SSL

By default, the Cisco UCS Manager is SSL-enabled. Accordingly, the SSL flag is set to Yes by default.

WebPort

By default, in most virtualized environments, Cisco UCS Manager listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This implies that while monitoring Cisco UCS Manager, the eG agent, by default, connects to port 80 or 443, depending upon the SSL-enabled status of Cisco UCS Manager - i.e., if Cisco UCS Manager is not SSL-enabled (i.e., if the SSL flag above is set to No), then the eG agent connects to Cisco UCS Manager using port 80 by default, and if Cisco UCS Manager is SSL-enabled (i.e., if the SSL flag is set to Yes), then the agent-Cisco UCS Manager communication occurs via port 443 by default. Accordingly, the WebPort parameter is set to default by default.

In some environments however, the default ports 80 or 443 might not apply. In such a case, against the WebPort parameter, you can specify the exact port at which the Cisco UCS Manager in your environment listens, so that the eG agent communicates with that port for collecting metrics from the Cisco UCS Manager.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Errors in last minute

Indicates the number of errors encountered during the last minute.

Number

Ideally, the value of this measure should be zero. A gradual/sudden increase in the value of this measure is a cause for concern.

Errors in last 15 minutes

Indicates the number of errors encountered during the past 15 minutes.

Number

 

Errors in last 1 hour

Indicates the number of errors encountered during the last 1 hour.

Number

 

Errors in last day

Indicates the number of errors encountered during the last 1 day.

Number

 

Errors in last week

Indicates the number of errors encountered during the last 1 week.

Number

 

Errors in last 2 weeks

Indicates the number of errors encountered during the last 2 weeks.

Number

Ideally, the value of this measure should be zero.