Deferred Queue Test

If a message is delivered to most of the deliverable recipients and for some recipients delivery failed due to a transient reason (may succeed in delivery later), then such messages are placed in the deferred queue. The queue manager scans the deferred queue periodically and during each scan, a fraction of the deferred queue is brought back to the active queue for retry. Each message in the deferred queue will have a cool-off time limit set beyond which the message will be retried for delivery. One of the common causes of large deferred queues is the failure to validate recipients at the SMTP input stage. This is due to spammers routinely launching dictionary attacks from unreliable sender addresses following which the invalid recipient addresses bounce and clog the deferred queue. Therefore, recipient validation is strongly recommended. Another common cause of congestion is unwarranted flushing of the entire deferred queue. The deferred queue holds messages that are likely to fail to be delivered and are also likely to be slow to fail delivery (i.e., time out). As a result the most common reaction to a large deferred queue is to flush out the deferred queue which may ease congestion to an extent. The deferred queue should not be flushed until and unless most of its content has recently become deliverable (e.g. relayhost back up after an outage)! If the deferred queue grows endlessly, then the messages will often be retried for delivery which may sometimes flood the active queue and cause a brief congestion of the queues. To avoid this, administrators should continuously monitor the deferred queue and figure out at what time the messages in the deferred queue started increasing manifold. Administrators should also identify the domain to which most of the messages failed to be delivered so that legitimacy of that domain can also be examined. To help administrators in these tasks, eG Enterprise provides you with the Deferred Queue test.

This test periodically monitors the deferred queue of the target Postfix mail server and reports the total size of the deferred queue as well as the split up of the message count in terms of time duration i.e., the number of messages that were in the deferred queue for a specified time duration.

Target of the test : A Postfix mail server

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the target Postfix mail server being monitored

Configurable parameters for the test
Parameters Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port at which the specified host listens. By default, this is NULL.

UseSUDO

By default, this flag is set to False indicating that the test does not collect the queue related statistics, by default. If this flag is set to True, then the test uses the sudo command to collect the queue related statistics.

Timeout

Specify the duration (in seconds) beyond which this test should time out in the TIMEOUT text box. The default is 30 seconds.

High Security

In highly secure environments, eG Enterprise could not perform agentless monitoring on a target Postfix server using SSH. To enable monitoring of the target host in such environments, administrators can use the High Security flag. By default, this flag is set to Yes indicating that eG Enterprise will connect to the target host in a more secure way and collect performance metrics. However, administrators can override this setting if required.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measures made by the test
Measurement Description Measurement Unit Interpretation

Queue size

Indicates the total size of the queue.

Number

An unusually high number of messages in the queue is indicative of any problem with the corresponding queue or its end points.

The Detailed diagnosis of this measure lists the message count for each domain.

Less than 5 mins

Indicates the number of messages that were in the queue for less than 5 minutes.

Number

 

Between 5-10 mins

Indicates the number of messages that were in the queue for a time duration between 5 to 10 minutes.

Number

 

Between 10-20 mins

Indicates the number of messages that were in the queue for a time duration between 10 to 20 minutes.

Number

 

Between 20-40 mins:

Indicates the number of messages that were in the queue for a time duration between 20 to 40 minutes.

Number

 

Between 40-80 mins

Indicates the number of messages that were in the queue for a time duration between 40 to 80 minutes.

Number

 

Between 80-160 mins

Indicates the number of messages that were in the queue for a time duration between 80 to 160 minutes.

Number

 

Between 160-320 mins:

Indicates the number of messages that were in the queue for a time duration between 160 to 320 minutes.

Number

 

Between 320-640 mins

Indicates the number of messages that were in the queue for a time duration between 320 to 640 minutes.

Number

 

Between 640-1280 mins

Indicates the number of messages that were in the queue for a time duration between 640 to 1280 minutes.

Number

 

More than 1280 mins

Indicates the number of messages that were in the queue for more than 1280 minutes.

Number

A low value is desired for this measure.

When a host with lots of deferred mail is down for some time, it is possible for the entire deferred queue to reach its retry time simultaneously. This can lead to a very full active queue once the host comes back up. The phenomenon can repeat approximately every maximal_backoff_time seconds if the messages are again deferred after a brief burst of congestion. Since the messages are retired constantly, it is important for the administrators to keep a constant vigil on the value of this measure. If this measure is at a high always, then the messages will always be retried leding to congestion.