Exchange DAG Health Details Test

By tracking the status of the mailbox database copies, administrators can receive early warnings of data inconsistencies that are likely to creep into your DAG. By also understanding how the mailbox database copies have been configured, administrators can gauge how the current configuration will impact fail-over and the speed with which data replication occurs in the DAG, and figure out if configuration changes are warranted to ensure high availability and zero data loss. The Exchange DAG Health Details test provides the mailbox database copy-level insights that will enable administrators take such decisions. The test keeps an eye on the status of each mailbox database copy and alerts administrators to abnormalities in status. The test also reports the activation preference configured for every mailbox database copy and reveals how the lag time configurations per mailbox database copy are affecting the copy and replay queue lengths. Using this information, administrators can determine if changing the lag time configuration and activation preference will help enhance the DAG performance.

Target of the test : A Microsoft Exchange 2013/2016 server

Agent deploying the test : An internal agent

Outputs of the test : One set of results for every mailbox database copy in the DAG being monitored

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port – The port at which the host listens.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Activation preference:

Indicates the activation preference number configured for this mailbox database copy.

Number

When creating a mailbox data copy,  you can specify the activation preference number, which is used as part of Active Manager's best copy selection process. It's also used to redistribute active mailbox databases throughout the DAG when using the RedistributeActiveDatabases.ps1 script. The value for the activation preference is a number equal to or greater than one, where one is at the top of the preference order. The position number cannot be larger than the number of mailbox database copies.

Status:

 

 

Indicates the current status of this mailbox database copy.

 

 

 

 

 

The values that this measure can report, their description, and their corresponding numeric values are detailed in the table below:

Measure Value Description Numeric Value

Healthy

The mailbox database copy is successfully copying and replaying log files, or it has successfully copied and replayed all available log files.

0

Mounted

The active copy is online and accepting client connections. Only the active copy of the mailbox database copy can have a copy status of Mounted.

1

FailedandSuspended

The Failed and Suspended states have been set simultaneously by the system because a failure was detected, and because resolution of the failure explicitly requires administrator intervention.

2

Failed

The mailbox database copy is in a Failed state because it isn't suspended, and it isn't able to copy or replay log files. While in a Failed state and not suspended, the system will periodically check whether the problem that caused the copy status to change to Failed has been resolved. After the system has detected that the problem is resolved, and barring no other issues, the copy status will automatically change to Healthy.

3

ServiceDown

The Microsoft Exchange Replication service isn't available or running on the server that hosts the mailbox database copy.

4

Note:

By default, the test reports the Measure Values listed in the table above to indicate the current state of a mailbox database copy.

In the graph of this measure however, the same is represented using the numeric equivalents only.

Copy queue:

Indicates the length of the copy queue of this mailbox database copy.

Number

The copy queue length signifies the number of logs still to be copied to the passive mailbox database copy.

Ideally, a passive mailbox database copy should not have a copy queue length thatis more than 10 logs. A consistent rise in the value of this measure therefore could indicate slowness in copying logs to the passive copy.  

Replay queue:

Indicates the length of the replay queue of this database copy.

Number

A steady increase in the value of this measure could indicate a replication bottleneck, as it implies that log files are not getting replayed into the database copy rapidly. This could be owing to a high Replay lag time setting.

Replay lag time is a property of a mailbox database copy that specifies the amount of time, in minutes, to delay log replay for the database copy. The replay lag timer starts when a log file has been replicated to the passive copy and has successfully passed inspection. By delaying the replay of logs to the database copy, you have the capability to recover the database to a specific point in time in the past. A mailbox database copy configured with a replay lag time greater than 0 is referred to as a lagged mailbox database copy, or simply, a lagged copy.

To reduce the length of the replay queue, you may want to consider reducing the Replay lag time specification of the database copy.

A steady increase in the value of this measure could indicate that log files that have been replayed into the database copy and are not truncated quickly enough. This could be because of a high Truncation lag time setting for the database copy.

Truncation lag time is a property of a mailbox database copy that specifies the amount of time, in minutes, to delay log deletion for the database copy after the log file has been replayed into the database copy. The truncation lag timer starts when a log file has been replicated to the passive copy, successfully passed inspection, and has been successfully replayed into the copy of the database. By delaying the truncation of log files from the database copy, you have the capability to recover from failures that affect the log files for the active copy of the database.

You may want to reduce this setting to minimize the copy queue length.

Is replay lagged?:

Indicates whether/not replay is lagged for this database copy.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Yes

1

No

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate whether/not replay was lagged for the database copy. In the graph of this measure however, the same is represented using the numeric equivalents only.

Is truncation lagged?

Indicates whether/not truncation is lagged for this database copy.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Yes

1

No

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate whether/not truncation was lagged for the database copy. In the graph of this measure however, the same is represented using the numeric equivalents only.

Content index:

Indicates the content index status of this mailbox database copy.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Healthy

0

Mounted

1

FailedAndSuspended

2

Crawling

3

Failed

4

Note:

By default, the test reports the Measure Values listed in the table above to indicate content index status for the database copy. In the graph of this measure however, the same is represented using the numeric equivalents only.