RabbitMQ Channels Test

A connection is a TCP connection between your application and the RabbitMQ broker. A channel is a virtual connection inside a connection. In other words, a channel multiplexes a TCP connection. Typically, each process only creates one TCP connection, and uses multiple channels in that connection for different threads. When you are publishing or consuming messages from a queue, it's all done over a channel.

Since unacknowledged messages are a resource-drain, RabbitMQ limits the number of unacknowledged messages a channel can hold at any point in time, using a Prefetch count configuration. Depending upon the unacknowledged message traffic on your channels and the count of consumers for those messages, you may want to fine-tune this Prefetch count time and again. By setting the global flag to true, you may even decide to configure a Consumer Prefetch count, which will be shared across all consumers on a channel.

The RabbitMQ channels test reports useful metrics on the message traffic and message prefetching on a channel. This way, the test provides administrators with effective pointers on how to tweak the Prefetch count setting and optimize RabbitMQ performance.

Target of the test : A RabbitMQ Cluster

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each channel on every node in the target cluster

First-level descriptor: Node name

Second-level descriptor: Channel name

Configurable parameters for the test
Parameters Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port at which the configured Host listens; by default, this is 15672

Username, Password, and Confirm Password

The eG agent connects to the Management Interface of the rabbitmq-management plugin of the target node, and runs HTTP-based API commands on the node using the plugin to pull metrics of interest. To connect to the plugin and run the API commands, the eG agent requires the privileges of a user on the cluster who has been assigned the 'monitoring' tag. If such a user pre-exists, then configure this test with the Username and Password of that user. On the other hand, if no such user exists, then you will have to create a user for this purpose using the Management Interface. The steps for this have been detailed in How Does eG Enterprise Monitor a RabbitMQ Cluster? In this case, make sure you configure this test with the Username and Password of the new user. Finally, confirm the password by retyping it in the Confirm Password text box.

SSL

By default, this flag is set to No, as the target node is not SSL-enabled by default. If the node is SSL-enabled, then set this flag to Yes.

Num DD Messages

By default, this parameter is set to 10. This means that, by default, the detailed diagnosis of the Number of channels connected measure will report the details of the top-10 channels in terms of their reduction count. To view the details of more channels as part of detailed metrics, you will have to increase the value of this parameter. Likewise, to view the details of less than 10 channels, reduce the value of this parameter.

Individual Channels

If you want the test to report metrics for every channel on a node, then set this flag to Yes. In this case, each channel will be a descriptor of this test. On the other hand, if you want to understand the overall message load across all channels and all nodes in a cluster, you can set this flag to No. In this case, the test will report metrics for a Summary descriptor alone.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Number of channels connected

Indicates the number of channels connected.

For the Summary descriptor, this measure will indicate the total number of channels connected.

Number

Use the detailed diagnosis of this measure to view the top-10 (by default) channels, in terms of their reduction count.

Message prefetch rate for consumer

Indicates the rate at which each consumer on this channel prefetched messages.

For the Summary descriptor, this measure will indicate the rate at which the consumers across all channels on all nodes prefetched messages.

 

Messages/Sec

If each consumer on a channel consumes messages at a steady rate at all times - i.e., if the value of this measure does not change much over time - it could be owing to a Prefetch setting per consumer.

Prefetch allows you to limit the number of unacknowledged messages for a channel and/or a consumer. Once the number reaches the configured count, RabbitMQ will stop delivering more messages on the channel and/or to that consumer unless at least one of the outstanding messages is acknowledged.

With the default Prefetch setting, which gives consumers an unlimited buffer, Rabbit will push all messages in a queue to a consumer as fast as the network and the consumer allow. The consumer will balloon in memory as they buffer all the messages in their own RAM. The queue may appear empty if you ask Rabbit, but there may be millions of messages unacknowledged as they sit in the consumers ready for processing by the client application. If you add a new consumer, there are no messages left in the queue to be sent to the new consumer. Messages are just being buffered in the existing consumer, and may be there for a long time, even if there are other consumers that become available to process such messages sooner. This means that with the default Prefetch setting, Rabbit performance will be poor. The goal is to keep the consumers saturated with work, but to minimise the client's buffer size so that more messages stay in Rabbit's queue and are thus available for new consumers or to just be sent out to consumers as they become free.

Let's say it takes 50ms for Rabbit to take a message from a queue, put it on the network and for it to arrive at the consumer. It takes 4ms for the client to process the message. Once the consumer has processed the message, it sends an ack back to Rabbit, which takes a further 50ms to be sent to and processed by Rabbit. So we have a total round trip time of 104ms. If we have a prefetch setting of 1 message then Rabbit will not send out the next message until after this round trip completes. Thus the client will be busy for only 4ms of every 104ms, or 3.8% of the time. The goal is to keep the client busy 100% of the time.

Here are some guidelines for setting the correct prefetch value:

  • If you have one single or few consumers processing messages quickly, we recommend prefetching many messages at once. Try to keep your client as busy as possible. If you have about the same processing time all the time and network behavior remains the same - you can simply take the total round trip time / processing time on the client for each message, to get an estimated prefetch value.
  • If you have many consumers, and a short processing time, we recommend a lower prefetch value than for one single or few consumers. A too low value will keep the consumers idling a lot since they need to wait for messages to arrive. A too high value may keep one consumer busy, while other consumers are being kept in an idling state.
  • If you have many consumers, and/or a long processing time, we recommend you to set prefetch count to 1 so that messages are evenly distributed among all your workers.

Please note that if your client auto-ack messages, the prefetch value will have no effect.

Global message prefetch rate

Indicates the rate at which this channel prefetched messages across all its consumers.

For the Summary descriptor, this measure will indicate the rate at which messages were prefetched across all channels and nodes.

Messages/Sec

If this measure reports a non-zero value, it could indicate that the global Prefetch count configuration is active.

By default, the Prefetch count configuration applies to each new consumer on a channel. If required, you can set the global flag in the basic.qos method to true, so that the Prefetch count configuration applies per channel (i.e., across all consumers of a channel).

For instance, say that there are two consumers of a channel, with a global prefetch count of 15. In this case, both these consumers will only ever have 15 unacknowledged messages between them.

Note that a per-channel Prefetch count and a per-consumer Prefetch count can even co-exist. For instance, say that there are two consumers of a channel. Also, assume that the per-channel Prefetch count is 15 and the per-consumer Prefetch count is 10. In this case, the two consumers will only ever have 15 unacknowledged messages between them, with a maximum of 10 messages for each consumer. This will be slower than the above example, due to the additional overhead of coordinating between the channel and the queues to enforce the global limit.

A global Prefetch setting is ideal because a single channel may consume from multiple queues, thus requiring coordination between the channel and the queue(s) for every message sent to ensure they don't go over the limit. This coordination can be slow on a single machine, and very slow when consuming across a cluster. With the global setting, there is no need for this coordination.

Unacknowledged message rate

Indicates the rate of unacknowledged messages on this channel.

For the Summary descriptor, this measure will indicate the rate of unacknowledged messages across all channels and nodes.

Messages/Sec

Ideally, the channel should hold very few unacknowledged messages, as such messages are resource-hungry.

If the value of this measure keeps varying over time, it could imply that the default Prefetch count setting is at play. In this case, use the guidelines discussed in the Interpretation column of the Message prefetch rate and Global message prefetch rate measures to know how to fine-tune the Prefetch count configuration and limit the unacknowledged messages on a channel.

Unconfirmed message rate

Indicates the rate of unconfirmed messages on this channel.

For the Summary descriptor, this measure will indicate the rate of unconfirmed messages across all channels and nodes.

Messages/Sec

Unconfirmed messages refer to those messages for which the broker is yet to send a receipt confirmation to the producer.

Uncommitted message rate

Indicates the rate of uncommitted messages on this channel.

For the Summary descriptor, this measure will indicate the rate of uncommitted messages across all channels and nodes.

Messages/Sec

Uncommitted messages are those that are received by the consumer for transactions that are not yet committed.

Uncommitted acknowledgement rate

Indicates the rate of uncommitted acknowledgements on this channel.

For the Summary descriptor, this measure will indicate the rate of uncommitted acknowledgement across all channels and nodes.

Messages/Sec

Uncommitted acknowledgements are acknowledgements that are received by the node for transactions that are not yet committed.

Reduction rate

Indicates the rate at which reductions take place on this channel.

For the Summary descriptor, this measure will indicate the rate of reductions across all channels and nodes.

Reductions/Sec

The reduction is a counter per process that is normally incremented by one for each function call. It is used for preempting processes and context switching them when the counter of a process reaches the maximum number of reductions. For example in Erlang/OTP R12B this maximum number was 2000 reductions.

The value of this measure represents the rate at which a channel makes function calls. This is the real indicator of the workload generated by channel on a particular node. To understand the workload of the cluster as a whole, use the value this measure reports for the Summary descriptor.