v7000 Node Traffic Test

A node is a single processing unit that provides virtualization, cache, and copy services for  the cluster. SAN Volume Controller nodes are deployed in pairs called I/O groups. One node in the cluster is designated the configuration node. At any point in time, only the configuration node can operate as the focal point for configuration and monitoring requests. It is the only node that is takes the active cluster IP addresses, and is the only node that receives cluster management requests. You can use one or more of these addresses to access the system through the management GUI or the command-line interface (CLI).

To understand how well the cluster manages I/O requests, you need to monitor the data sent and received by each node in the cluster in response to these requests and the time taken by every node to process the I/O requests. This is exactly what the v7000 Node Traffic test does! This test auto-discovers the nodes configured on the IBM Storwize v7000 storage system, and for each node, reports the data sent and received by that node and the latency of the node when receiving and sending data. Besides revealing the load on each node, this will also point you to those nodes that are most likely to experience an overload soon, and those nodes that are sending/receiving data much slower than the other nodes. Based on the results reported by this test, you can investigate the reason why certain nodes delay I/O processing, and initiate measures to eliminate the reasons; in addition, you can also fine-tune the load-balancing algorithm of the cluster to ensure that load is uniformly distributed across nodes.

Target of the test : An IBM Storwize v7000 storage system

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each node of the IBM Storwize v7000 storage system being monitored.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens to. By default, this is NULL.

Timeout

Specify the duration (in seconds) beyond which the test will timeout in the Timeout text box. The default value is 60 seconds.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Messages or bulk data received

Indicates the rate at which messages or bulk data is received on this node.

Msg/sec

Compare the value of these measures across the nodes to identify the node that is overloaded i.e., the node that is busy sending/receiving messages. This way, you could identify the irregularities in load balancing across the nodes.

Messages or bulk data sent

Indicates the rate at which messages or bulk data is sent through this node.

Msg/sec

Data received

Indicates the rate at which data is received on this node.

MB/Sec

Compare the value of these measures across the nodes to identify the node that is overloaded - i.e., the node that is busy sending/receiving messages. This way, you could identify the irregularities in load balancing across the nodes.

Data sent

Indicates the rate at which data is sent through this node.

MB/Sec

Average receive latency including inbound queue

Indicates the average time taken by this node to receive messages, including the time  spent by the messages in the inbound queue.

Microsec/msg

Compare the value of each of these measures across nodes to identify that node which is the most latent – i.e., slow – when receiving messages.

You can then compare the Average receive latency including inbound queue and Average receive latency excluding inbound queue measures for that node, to understand where the messages spent maximum time – in the inbound queue? or in the node? This will point you to where the bottleneck is.

Average receive latency excluding inbound queue

Indicates the average time taken by this node to receive messages, excluding the time spent by the messages in the inbound queue.

Microsec/msg

Average send latency including outbound queue

Indicates the average time taken by this node to send messages, including the time spent by the messages in the outbound queue.

Microsec/msg

Compare the value of each of these measures across nodes to identify that node which is the most latent – i.e., slow – when sending messages. 

You can then compare the Average send latency including outbound queue and Average send latency excluding outbound queue measures for that node, to understand where the messages spent maximum time – in the inbound queue? or in the node? This will point you to where the bottleneck is.

Average send latency excluding outbound queue

Indicates the average time taken by this node to send messages, excluding the time spent by the messages in the outbound queue.

Microsec/msg