Deferred Queue Test
If a message is delivered to most of the deliverable recipients and for some recipients delivery failed due to a transient reason (may succeed in delivery later), then such messages are placed in the deferred queue. The queue manager scans the deferred queue periodically and during each scan, a fraction of the deferred queue is brought back to the active queue for retry. Each message in the deferred queue will have a cool-off time limit set beyond which the message will be retried for delivery. One of the common causes of large deferred queues is the failure to validate recipients at the SMTP input stage. This is due to spammers routinely launching dictionary attacks from unreliable sender addresses following which the invalid recipient addresses bounce and clog the deferred queue. Therefore, recipient validation is strongly recommended. Another common cause of congestion is unwarranted flushing of the entire deferred queue. The deferred queue holds messages that are likely to fail to be delivered and are also likely to be slow to fail delivery (i.e., time out). As a result the most common reaction to a large deferred queue is to flush out the deferred queue which may ease congestion to an extent. The deferred queue should not be flushed until and unless most of its content has recently become deliverable (e.g. relayhost back up after an outage)! If the deferred queue grows endlessly, then the messages will often be retried for delivery which may sometimes flood the active queue and cause a brief congestion of the queues. To avoid this, administrators should continuously monitor the deferred queue and figure out at what time the messages in the deferred queue started increasing manifold. Administrators should also identify the domain to which most of the messages failed to be delivered so that legitimacy of that domain can also be examined. To help administrators in these tasks, eG Enterprise provides you with the Deferred Queue test.
This test periodically monitors the deferred queue of the target Postfix mail server and reports the total size of the deferred queue as well as the split up of the message count in terms of time duration i.e., the number of messages that were in the deferred queue for a specified time duration.
Target of the test : A Postfix mail server
Agent deploying the test : A remote agent
Outputs of the test : One set of results for the target Postfix mail server being monitored
Parameters | Description |
---|---|
Test period |
How often should the test be executed |
Host |
The host for which the test is to be configured. |
Port |
The port at which the specified host listens. By default, this is NULL. |
UseSUDO |
By default, this flag is set to False indicating that the test does not collect the queue related statistics, by default. If this flag is set to True, then the test uses the sudo command to collect the queue related statistics. |
Timeout |
Specify the duration (in seconds) beyond which this test should time out in the TIMEOUT text box. The default is 30 seconds. |
High Security |
In highly secure environments, eG Enterprise could not perform agentless monitoring on a target Postfix server using SSH. To enable monitoring of the target host in such environments, administrators can use the High Security flag. By default, this flag is set to Yes indicating that eG Enterprise will connect to the target host in a more secure way and collect performance metrics. However, administrators can override this setting if required. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Queue size |
Indicates the total size of the queue. |
Number |
An unusually high number of messages in the queue is indicative of any problem with the corresponding queue or its end points. The Detailed diagnosis of this measure lists the message count for each domain. |
Less than 5 mins |
Indicates the number of messages that were in the queue for less than 5 minutes. |
Number |
|
Between 5-10 mins |
Indicates the number of messages that were in the queue for a time duration between 5 to 10 minutes. |
Number |
|
Between 10-20 mins |
Indicates the number of messages that were in the queue for a time duration between 10 to 20 minutes. |
Number |
|
Between 20-40 mins: |
Indicates the number of messages that were in the queue for a time duration between 20 to 40 minutes. |
Number |
|
Between 40-80 mins |
Indicates the number of messages that were in the queue for a time duration between 40 to 80 minutes. |
Number |
|
Between 80-160 mins |
Indicates the number of messages that were in the queue for a time duration between 80 to 160 minutes. |
Number |
|
Between 160-320 mins: |
Indicates the number of messages that were in the queue for a time duration between 160 to 320 minutes. |
Number |
|
Between 320-640 mins |
Indicates the number of messages that were in the queue for a time duration between 320 to 640 minutes. |
Number |
|
Between 640-1280 mins |
Indicates the number of messages that were in the queue for a time duration between 640 to 1280 minutes. |
Number |
|
More than 1280 mins |
Indicates the number of messages that were in the queue for more than 1280 minutes. |
Number |
A low value is desired for this measure. When a host with lots of deferred mail is down for some time, it is possible for the entire deferred queue to reach its retry time simultaneously. This can lead to a very full active queue once the host comes back up. The phenomenon can repeat approximately every maximal_backoff_time seconds if the messages are again deferred after a brief burst of congestion. Since the messages are retired constantly, it is important for the administrators to keep a constant vigil on the value of this measure. If this measure is at a high always, then the messages will always be retried leding to congestion. |