Message Queues Test

A message queue is a linked list of messages stored within the kernel and identified by a message queue identifier. Two (or more) processes can exchange information via access to a common system message queue.

The Linux kernel (2.6) implements two message queues: System V IPC messages and POSIX Message Queue.

IPC messaging lets processes send and receive messages, and queues messages for processing in an arbitrary order. A process can invoke msgsnd() to send a message. He needs to pass the IPC identifier of the receiving message queue, the size of the message and a message structure, including the message type and text. On the other side, a process invokes msgrcv() to receive a message, passing the IPC identifier of the message queue, where the message should get stored, the size and a value t. t specifies the message returned from the queue - a positive value means the first message with its type equal to t is returned; a negative value returns the last message equal to type t, and zero returns the first message of the queue. There are limitations upon the size of a message (max), the total number of messages (mni), and the total size of all messages in the queue (mnb). This implies that if the number or size of the messages in a message queue touches these limits or grows close to these limits, it could indicate a problem condition that should be investigated. To proactively capture such problem conditions, administrators should continuously monitor the growth in the length and size of each IPC message queue on a server. This is exactly what the Message Queues test does!

This test auto-discovers the message queues on a monitored server, and closely tracks the number and size of the messages in each queue, thus instantly pointing administrators to those queues that have too many outstanding messages or very large messages. This way, potential bottlenecks in inter-process communication can be isolated and treated!

Target of the test : A Linux, AIX, HPUX, or Solaris server

Agent deploying the test : An internal agent

Outputs of the test : One set of results for every queue owner (by default) of the server being monitored

Configurable parameters for the test
  1. TEST PERIOD - How often should the test be executed
  2. Host - The host for which the test is to be configured
  3. report by – By default, this flag is set to Owner. This implies that, by default, the test metrics for every message queue owner on the target server. You can set this flag to Total, if you want the test to report metrics for the Total descriptor alone; in this case, the test will aggregate measures across all the message queues on the server. Alternatively, you can pick the Owner and Total option. In this case, the test will report metrics per owner and also for the Total descriptor.
  4. DD frequency - Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against dd frequency.
  5. DETAILED DIAGNOSIS – To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Number of queues:

Indicates the number of queues for this owner. For the Total descriptor, this measure indicates the total number of message queues on the server.

Number

This measure will be reported for the Total descriptor, only if the report by flag is set to Total or Owner and Total.

Outstanding messages in queue:

For each owner, this measure indicates the total number of outstanding messages in all queues owned by that owner. 

Number

A high value or a consistent increase in the value of this measure is an indication that many messages are still undelivered to the receiver. Typically, this occurs if either or both the following are true:

  • The number of bytes already on the queue is equal to the maximum number of bytes that the queue can handle.
  • The total number of messages on all queues system-wide is equal to the system-imposed limit.

In such cases, you may either have to remove messages from the queue, or reset the maximum limits, so that inter-process communication remains unaffected.

Data in message queue:

For each owner, this measure indicates the total number of bytes in outstanding messages across all queues owned by that owner. 

KB

Compare the value of this measure across owners to identify that owner whose queues are of the maximum size. If the max value is abnormally high, it could mean that one or more queues owned by that owner contain heavy messages or too many messages. You may then want to identify which queues are of the maximum size and why. For this, you can use the detailed diagnosis of this measure. The detailed diagnosis, if enabled, reveals details of each queue owned by the owner. The details include the name of the creator of each message queue, the number of bytes of data that each queue contains, the number of messages in every queue,    the sender process and receiver process for the last message to the queue, and more. From this, you can easily pick the queues with the maximum number of messages and those that are of the maximum size. If any queue contains very few messages but is of a large size, it could mean that those messages are heavy. On the other hand, if any queue contains many messages and is also of a large size, it could mean that the queue is not processing messages and delivering them as quickly as it should. This could signal a potential bottleneck in inter-process communication, which would require further investigation.

Maximum size allowed:

For each owner, this indicates the total number of bytes allowed in all message queues owned by that owner.

KB

 

Is message queue full?

For each owner, this indicates whether/not any queue owned by that owner has been used upto capacity – i.e., whether/not the number of bytes in the outstanding messages on that queue is equal to the maximum number of bytes allowed.

 

If any message queue owned by an owner is full, the value of this measure will be Yes. If no message queue is full, then the value of this measure will be No

The numeric values that correspond to the above-mentioned measure values are described in the table below:

Measure Value Numeric Value

Yes

0

No

1

Note:

By default, this measure reports one of the Measure Values listed in the table above. The graph of this measure however will represent the same using the numeric equivalents only.

Number of non-zero message queues:

Indicates the total number of queues on the server that are of a size greater than 0.

Number

This measure is available only for the ‘Total’ descriptor.

To know which queues are of a non-zero size, use the detailed diagnosis of this measure.

Total data in message queue:

Indicates the total number of bytes in outstanding messages in all message queues on the server.

 

KB

This measure is available only for the ‘Total’ descriptor.

To know which queue contains the maximum number of bytes in outstanding messages, use the detailed diagnosis of this measure.