RabbitMQ Node Garbage Collection Test

RabbitMQ server is written in the Erlang programming language. Each Erlang process has its own stack and heap which are allocated in the same memory block and grow towards each other. When the stack and the heap meet, the garbage collector is triggered and memory is reclaimed. If the garbage collector does not reclaim enough memory, the heap will grow to accomodate more data. If heap growth is not controlled by efficient garbage collection, it can degrade the performance of the RabbitMQ node, and consequently, slowdown cluster operations as well.

Using the RabbitMQ Node Garbage Collection test, you can keep tabs on garbage collection activity on each node of a cluster and identify the node from which the least memory was reclaimed. When a cluster under-performs, you can use this test to figure out if the dip in cluster performance is owing to excessive heap growth on a node caused by inefficient garbage collection.

Target of the test : A RabbitMQ Cluster

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each node in the monitored RabbitMQ Cluster

Configurable parameters for the test
Parameters Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port at which the configured Host listens; by default, this is 15672

Username, Password, and Confirm Password

The eG agent connects to the Management Interface of the rabbitmq-management plugin of the target node, and runs HTTP-based API commands on the node using the plugin to pull metrics of interest. To connect to the plugin and run the API commands, the eG agent requires the privileges of a user on the cluster who has been assigned the 'monitoring' tag. If such a user pre-exists, then configure this test with the Username and Password of that user. On the other hand, if no such user exists, then you will have to create a user for this purpose using the Management Interface. The steps for this have been detailed in How Does eG Enterprise Monitor a RabbitMQ Cluster? In this case, make sure you configure this test with the Username and Password of the new user. Finally, confirm the password by retyping it in the Confirm Password text box.

SSL

By default, this flag is set to No, as the target node is not SSL-enabled by default. If the node is SSL-enabled, then set this flag to Yes.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Garbage collects

Indicates the number of garbage collection operations that occurred on this node during the last measurement period.

Number

Compare the value of this measure across nodes to identify the node on which garbage collection has happened very often. Such a node could have experienced rapid and abnormal heap growth, thus triggering garbage collection frequently. You may want to investigate the reasons for heap growth on that node. Typically, heaps grow in two stages, first a variation of the Fibonacci sequence is used starting at 233 words. Then at about 1 mega words the heap only grows in 20% increments. There are two occasions when the heap grows:

  • If the total size of the heap + message and heap fragments exceeds the current heap size;
  • If after a fullsweep, the total amount of live objects is greater than 75%

Either way, you may want to resize the heap to avoid frequent garbage collections. This is because, every time garbage collection happens, the garbage collector must suspend the execution of the node to ensure the integrity of the object trees. The more live objects are found, the longer the suspension, which has a direct impact on response time and throughput. This in turn may impact overall cluster performance as well.

Garbage collects data reclaimed

Indicates the rate at which garbage collections take place on this node.

Operations/Sec

A high value is indicative of frequent garbage collections. This could be owing to rapid and significant heap growth on the node. Frequent garbage collections on a node may degrade its performance. To avoid this, you may want to consider resizing the heap on that node.

Data reclaimed

Indicates the memory reclaimed by the garbage collector on this node.

MB

Compare the value of this measure across nodes to know on which node the garbage collector reclaimed the maximum memory and on which node the least memory was reclaimed.

Data reclamation rate

Indicates the rate at which memory was reclaimed by the garbage collector on this node.

MB/Sec

 

Context switch operations

Indicates the rate at which runtime context switching takes place on this node.

Operations/Sec