EMC ECS NodesTest

Nodes are the most important building block of the EMC ECS storage system. Nodes are nothing but the servers on which EMC ECS software services are hosted and are running. The performance of the entire EMS ECS storage system is dependent on the performance of each node in the system.

The health of each node is very important for the healthy functioning of the EMC ECS system. This tests monitors each node and collects the health related measures like number of good and bad disks, memory availability etc. With these metrics, administrators can dig doen to node level to investigate any issue with EMC Elastic Cloud Storage.

Target of the test : A Dell EMC Elastic Cloud Storage System

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each node in EMC ECS Storage system

Configurable parameters for the test
Parameter	Description
Test period	How often should the test be executed .
Host	The host for which the test is to be configured. Since the storage device is managed using the IP address of its storage controller, the same will be displayed as host.
Port	The port number at which the specified host listens. By default, this is NULL.
ECS REST API Port	This is the port at which REST API connectivity is provided. By default, port 4443 is used.
Username and Password	To collect performance metrics from the target storage device, the eG agent should be configured with the credentials of a user who is vested with "read-only" privileges to access REST API of the target storage device. Specify the credentials of such a user in the Username and Password text boxes.
Confirm Password	Confirm the password by retyping it here.
Timeout Seconds	Specify the time duration for which this test should wait for a response from the storage system in the Timeout text box. By default, this is 60 seconds.
DD Frequency	Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Health Status

Indicates if the node is online or offline.

The values that this measure can report and their corresponding numeric values are tabulated below:

Measure Value	Numeric Value
Good	100
Suspect	99
Bad	0

Note:

By default, this measure reports the above-mentioned Measure Values while indicating the status. However, in the graph of this measure, the same will be represented using the corresponding numeric equivalents only.

The detailed diagnosis of health status provides additional details of node including Node ID, IP address, version and Rack ID.

Number of Disks

Indicates the total number of disks attached to the node.

Number

Number of disks attached to a node depends on client requirement, an application with high rate of inputs and outputs may have a setup with less disks per node.

Good Disks

Indicates the total number of online disks attached to the node.

Number

This measures the capacity available on the node. You may need to add more disks if the capacity available is not able to meet the user needs.

Bad Disks

Indicates the total number of offline disks attached to the node.

Number

Bad disks are most likely be failed disks.

Maintenance Disks

Indicates the total number of disks attached to the node, which are under maintenance.

Number

Disks which are under maintenance and cannot be used.

Disks Offline

Indicates the total offline disk capacity.

Number

CPU Utilization

Indicates the percentage of CPU utilization of a given node at the time of the measurement.

Percentage

A value near 100 indicates that CPU is overloaded. If the CPU utilization value remains near 100% it can cause node performance degradation and will put additional load on other nodes of the system. Ideally the load balancing algorithm should not send additional request to node with high CPU usage but if that's not the case, the algorithm needs to be tweaked.

Relative Memory

Indicates the memory used by node processes as a percentage of total memory available on the node.

Percentage

a value of near 100 will be a cause of concern. Persistent high memory usage on node host can degrade the performance of the node. The load balancing algorithm needs to be tweaked if it is still sending requests to node with high memory usage.

Memory Usage

Indicates the memory usedon the node in absolute terms.

Total Disk Space

Indicates the total online capacity provided by the online disks within the node. This is the total of the capacity already used and the capacity still free for allocation, added together.

Used Disk Space

Indicates the online capacity already used.

Free Disk Space

Indicates the online disk capacity across the disks available for use.