Exchange DAG Member Health Status Test

In order to ensure that all DAG members are rightly configured to ensure continuous data replication between the database copies and instant fail-over in the event of anomalies, administrators must periodically monitor each DAG member for the status of all critical services related to these activities. This can be achieved using the Exchange DAG Member Health Status test. This test proactively monitors the continuous replication and the continuous replication pipeline, the availability of Active Manager, and the health and status of the underlying cluster service, quorum, and network components. In the process, the test points administrators to those services/specifications that may not be up and running or may be improper/incorrect on the DAG member, thereby impeding replication or fail-over.

Target of the test : A Microsoft Exchange 2013/2016 server

Agent deploying the test : An internal agent

Outputs of the test : One set of results for every DAG member being monitored

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port – The port at which the host listens.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cluster service:

Verifies that the Cluster service is running and reachable on the specified DAG member, or if no DAG member is specified, on the local server.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the cluster service status. In the graph of this measure however, the same is represented using the numeric equivalents only.

Replay service:

Verifies that the Microsoft Exchange Replication service is running and reachable on the specified DAG member, or if no DAG member is specified, on the local server.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Yes

1

No

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the replay service status. In the graph of this measure however, the same is represented using the numeric equivalents only.

Active manager:

Verifies that the instance of Active Manager running on the specified DAG member, or if no DAG member is specified, the local server, is in a valid role (primary, secondary, or stand-alone).

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the active manager status. In the graph of this measure however, the same is represented using the numeric equivalents only.

Tasks RPC listener:

Verifies that the TCP log copy listener is running and reachable on the specified DAG member, or if no DAG member is specified, on the local server.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the listener status. In the graph of this measure however, the same is represented using the numeric equivalents only.

TCP listener

Verifies that the TCP log copy listener is running and reachable on the specified DAG member, or if no DAG member is specified, on the local server.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the listener status. In the graph of this measure however, the same is represented using the numeric equivalents only.

DAG members up:

Verifies that all DAG members are available, running, and reachable.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the member status. In the graph of this measure however, the same is represented using the numeric equivalents only.

Cluster network:

Verifies that all cluster-managed networks on the specified DAG member, or if no DAG member is specified, the local server, are available.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the cluster network status. In the graph of this measure however, the same is represented using the numeric equivalents only.

Quorum group:

Verifies that the default cluster group (quorum group) is in a healthy and online state.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the quorum group state. In the graph of this measure however, the same is represented using the numeric equivalents only.

File share quorum:

Verifies that the witness server and witness directory and share configured for the DAG are reachable.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the file share quorum state. In the graph of this measure however, the same is represented using the numeric equivalents only.

Database copy suspended:

Checks whether any mailbox database copies are in a state of Suspended on the specified DAG member, or if no DAG member is specified, on the local server.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the suspension state of the DAG member. In the graph of this measure however, the same is represented using the numeric equivalents only.

Database initializing:

Checks whether any mailbox database copies are in a state of Initializing on the specified DAG member, or if no DAG member is specified, on the local server.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the initialization of the DAG member. In the graph of this measure however, the same is represented using the numeric equivalents only.

Database disconnected:

Checks whether any mailbox database copies are in a state of Disconnected on the specified DAG member, or if no DAG member is specified, on the local server.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate the connection state of the database copies of the DAG member. In the graph of this measure however, the same is represented using the numeric equivalents only.

Database log copy keeping up:

Verifies that log copying and inspection by the passive copies of databases on the specified DAG member, or if no DAG member is specified, on the local server, are able to keep up with log generation activity on the active copy.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate whether/not database log copying is able to keep up with log generation. In the graph of this measure however, the same is represented using the numeric equivalents only.

Database log relay keeping up:

Verifies that replay activity for the passive copies of databases on the specified DAG member, or if no DAG member is specified, on the local server, is able to keep up with log copying and inspection activity.

 

The values that this measure can take and their corresponding numeric values have been discussed in the table below:

Measure Value Numeric Value

Passed

1

Failed

0

Note:

By default, the test reports the Measure Values listed in the table above to indicate whether/not replay activity is able to keep up with log copying and inspection. In the graph of this measure however, the same is represented using the numeric equivalents only.