Active Directory DFS Replication Backlog Test

DFS Replication is an efficient, multiple-master replication engine that you can use to keep folders synchronized between servers across limited bandwidth network connections.

To use DFS Replication, you must create replication groups and add replicated folders to the groups. Replication groups, replicated folders, and members are illustrated in the following figure.

Figure 1 : How does replication work?

A replication group is a set of servers, known as members , which participates in the replication of one or more replicated folders. A replicated folder is a folder that stays synchronized on each member.

The Replicated folders should be in sync at all times to ward off any data loss that may occur in the event of a disaster! This is why, it is imperative that administrators keep an eye on the replication process and make sure that there is no replication backlog - i.e., pending file updates between the replication folders - at any given point in time. The Active Directory DFS Replication Backlog test eases the pain of administrators in this regard!

This test automatically discovers the replication groups configured on a target AD server and the replication folders within each group. For every replication folder, the test then reports the number of pending file updates. This way, the test proactively alerts administrators to a sudden/steady rise in the count of backlogged updates, and thus points them to replication issues that need to be addressed immediately.

The test also supports a Summary descriptor. Check the metrics reported for the Summary descriptor to know the total number of replication groups, folders and servers participating in the replication, and the folders with backlogs. Detailed diagnostics reveal the names of the groups and folders.

 

Target of the test : An Active Directory or Domain Controller

Agent deploying the test : An internal agent

Outputs of the test : One set of results for every replication folder in every replication group of the target Active Directory server

First level descriptor: Replication group

Second level descriptor: Replication folder

Metrics are also reported for a Summary descriptor

Configurable parameters for the test
Parameters Description

Test period

This indicates how often should the test be executed.

Host

The IP address of the machine where the Active Directory is installed.

Port

The port number through which the Active Directory communicates. The default port number is 389.

DD Frequency

  • Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.
  • Detailed Diagnosis

    To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
    Measurements made by the test
    Measurement Description Measurement Unit Interpretation

    Total backlogs

    Indicates the total number of pending file updates for this replication folder.

    Number

    This measure is not reported for the Summary descriptor.

    A consistent rise in the value of this measure is a cause for concern, as it indicates that changes are not being replicated as far as they are being made. If the situation persists, then the replicated folders will stay out-of-sync, making complete data recovery impossible when disaster strikes. To avoid it, as soon as this measure starts exhibiting disturbing trends, administrators should quickly figure out why replication is slow and fix the hole. Some of the common causes for a replication slowdown are:

    • Missing Windows network connectivity-related hot fixes
    • Missing DFSR Service's latest binary
    • Out-of-date network card and storage drivers
    • DFSR staging directory could be too small for the amount of data being modified
    • Bandwidth throttling or schedule windows could be too aggressive;
    • Large amounts of sharing violations
    • RDC could have been disabled over a WAN link
    • Incompatible anti-Virus software or other file system filter drivers
    • File Server Resource Manager (FSRM) could have been configured with quotas/screens that block replication;
    • Un-staged or improperly pre-staged data leading to slow initial replication

    You can use the detailed diagnosis of this measure to know which server the updates were sent from and which server received it. In the event of slowness in replication, the detailed diagnostics will reveal to you which two servers participated in the slow replication.

    Replication groups

    Indicates the number of replication groups configured on this AD server.

    Number

    This measure is reported only for the Summary descriptor.

    Use the detailed diagnosis of this measure to know the names of the replication groups.

    Replication folders

    Indicates the number of replication folders configured on this AD server.

    Number

    This measure is reported only for the Summary descriptor.

    Use the detailed diagnosis of this measure to know which are the replication folders.

    Sending and receiving member servers

    Indicates the total number of servers participating in the replication.

    Number

    This measure is reported only for the Summary descriptor.

    Use the detailed diagnosis of this measure to know which servers are participating in the replication.

    Replication folders with backlog

    Indicates the number of replication folders with backlogged updates.

    Number

    This measure is reported only for the Summary descriptor.

    Ideally, the value of this measure should be 0. If the measure reports a non-zero value, then use the detailed diagnosis of this measure to know which are the replication folders with backlogged updates.

    The detailed diagnosis of the Replication groups measure lists the replication groups configured on the monitored AD server.

    Figure 2 : Detailed diagnosis of the Replication groups measure

    The detailed diagnosis of the Replication folders measure lists all the replication folders on the member servers.

    Figure 3 : The detailed diagnosis of the Replication folders measure

    The detailed diagnosis of the Sending and receiving member servers measure lists the servers that are participating in the replication.

    Figure 4 : Detailed diagnosis of the Sending and receiving member servers measure

    To know which replication folders on which member servers had backlogged updates, use the detailed diagnosis of the Replication folders with backlog measure. The sending and receiving member servers, the replication folder on those servers, and the count of backlogged updates on that folder are displayed as part of detailed diagnostics.

    Figure 5 : Detailed diagnosis of the Replication folders with backlogs measure

    With the help of the detailed diagnosis of the Total replication backlogs measure, you can quickly identify the member servers on which the backlogs were detected, and which of these servers are the receiving and sending servers of the replication. This eases troubleshooting, as it reveals between which two servers replication was slow.

    Figure 6 : Detailed diagnosis of the Total replication backlogs measures