EMC ECS NodesTest
Nodes are the most important building block of the EMC ECS storage system. Nodes are nothing but the servers on which EMC ECS software services are hosted and are running. The performance of the entire EMS ECS storage system is dependent on the performance of each node in the system.
The health of each node is very important for the healthy functioning of the EMC ECS system. This tests monitors each node and collects the health related measures like number of good and bad disks, memory availability etc. With these metrics, administrators can dig doen to node level to investigate any issue with EMC Elastic Cloud Storage.
Target of the test : A Dell EMC Elastic Cloud Storage System
Agent deploying the test : A remote agent
Outputs of the test : One set of results for each node in EMC ECS Storage system
Parameter | Description |
---|---|
Test period |
How often should the test be executed . |
Host |
The host for which the test is to be configured. Since the storage device is managed using the IP address of its storage controller, the same will be displayed as host. |
Port |
The port number at which the specified host listens. By default, this is NULL. |
ECS REST API Port |
This is the port at which REST API connectivity is provided. By default, port 4443 is used. |
Username and Password |
To collect performance metrics from the target storage device, the eG agent should be configured with the credentials of a user who is vested with "read-only" privileges to access REST API of the target storage device. Specify the credentials of such a user in the Username and Password text boxes. |
Confirm Password |
Confirm the password by retyping it here. |
Timeout Seconds |
Specify the time duration for which this test should wait for a response from the storage system in the Timeout text box. By default, this is 60 seconds. |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Health Status |
Indicates if the node is online or offline. |
|
The values that this measure can report and their corresponding numeric values are tabulated below:
Note: By default, this measure reports the above-mentioned Measure Values while indicating the status. However, in the graph of this measure, the same will be represented using the corresponding numeric equivalents only. The detailed diagnosis of health status provides additional details of node including Node ID, IP address, version and Rack ID. |
||||||||
Number of Disks |
Indicates the total number of disks attached to the node. |
Number |
Number of disks attached to a node depends on client requirement, an application with high rate of inputs and outputs may have a setup with less disks per node. |
||||||||
Good Disks |
Indicates the total number of online disks attached to the node. |
Number |
This measures the capacity available on the node. You may need to add more disks if the capacity available is not able to meet the user needs. |
||||||||
Bad Disks |
Indicates the total number of offline disks attached to the node. |
Number |
Bad disks are most likely be failed disks. |
||||||||
Maintenance Disks |
Indicates the total number of disks attached to the node, which are under maintenance. |
Number |
Disks which are under maintenance and cannot be used. |
||||||||
Disks Offline |
Indicates the total offline disk capacity. |
Number |
|
||||||||
CPU Utilization |
Indicates the percentage of CPU utilization of a given node at the time of the measurement. |
Percentage |
A value near 100 indicates that CPU is overloaded. If the CPU utilization value remains near 100% it can cause node performance degradation and will put additional load on other nodes of the system. Ideally the load balancing algorithm should not send additional request to node with high CPU usage but if that's not the case, the algorithm needs to be tweaked. |
||||||||
Relative Memory |
Indicates the memory used by node processes as a percentage of total memory available on the node. |
Percentage |
a value of near 100 will be a cause of concern. Persistent high memory usage on node host can degrade the performance of the node. The load balancing algorithm needs to be tweaked if it is still sending requests to node with high memory usage. |
||||||||
Memory Usage |
Indicates the memory usedon the node in absolute terms. |
GB |
|
||||||||
Total Disk Space |
Indicates the total online capacity provided by the online disks within the node. This is the total of the capacity already used and the capacity still free for allocation, added together. |
GB |
|
||||||||
Used Disk Space |
Indicates the online capacity already used. |
GB |
|
||||||||
Free Disk Space |
Indicates the online disk capacity across the disks available for use. |
GB |
|