SIOS Data Replication Test

At the highest level, SIOS DataKeeper provides the ability to mirror a volume on one system (source) to a different volume on another system (target) across any network. When the mirror is created, all data on the source volume is initially replicated to the target volume by overwriting the target volume. When the initial synchronization (also referred to as a full resync of the data) of the volumes is complete, the target volume is an exact replica of the source volume in terms of size and data content. The mirroring is performed in one of the following ways:

  • Synchronous Mirroring - With synchronous mirroring, each write is intercepted and transmitted to the target system to be written on the target volume at the same time that the write is committed to the storage device on the source system. Once both the local and target writes are complete, the write request is acknowledged as complete and control is returned to the application that initiated the write.
  • Asynchronous Mirroring -In most cases, SIOS recommends using asynchronous mirroring. With asynchronous mirroring, each write is intercepted and a copy of the data is made on the source system. The copy of the data is queued to be transmitted to the target system as soon as the network will allow it. Meanwhile, the original write request is committed to the storage device on the source system and control is immediately returned to the application that initiated the write.

Once the mirror is established between the source and target volumes, the SIOS DataKeeper intercepts all writes to the source volume and resynchronizes the data to the target volume. If the Asynchronous or Synchronous mirroring is interrupted or failed due to poor network connection, then the sync between the source and target volumes will be lost. In such cases, SIOS DataKeeper will use an intent log (also referred to as a bitmap file) to resynchronize the source and target volumes. The intent log has changes made to the source, or to target volume and gives SIOS DataKeeper the ability to survive a source or target system failure or reboot without requiring a full mirror resync after the recovery of the system.

If the source volume is unable to reach the target volume (eg., due to network errors), then the data on the source and target will not be in sync. In the event that the source volume crashes, this non-sync can lead to data loss, reduce data integrity, and render the system unreliable. To avoid such eventualities, administrators should track the progress of mirroring and resynchronization between the target and source volumes, promptly detect failures/bottlenecks in the replication process, investigate the reasons for the same, and fix them in time. This can be easily achieved using the SIOS Data Replication test!

This test auto-discovers the volumes on the source system. For each volume, the test reports the mirror state, and thus sheds light on broken synchronization attempts. Besides, this test also captures network reconnects, which can potentially cause delays in replication. The test additionally reports the length of the write queue, so that you can figure out if too many writes are pending. The count of dirty blocks in the intent log and its impact on resynchronization time is also revealed.

Target of the test : A SIOS DataKeeper

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each volume on the source system being monitored.

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port number at which the specified host listens to.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Mirror elapsed time

Indicates the total time that this volume has been in the Mirror state.

Secs

If the value of this measure is zero, it indicates that the volume is not involved in mirroring.

Mirror state

Indicates the current mirroring status of this volume.

 

The values that this measure can report and their corresponding numeric values have been discussed below:

Measure Value

Numeric Value
None 0
Mirror 1
Resync 2

Broken

3

Paused

4

Resync pending

5

Note:

By default, this measure reports the States listed in the table above to indicate the current mirroring status of each volume on the source system. The graph of this measure however, represents the same using the numeric equivalents only.

Mirror type

Indicates the type of mirroring this volume is involved in.

 

The values that this measure can report and their corresponding numeric values have been discussed below:

Measure Value

Numeric Value Description
None 0 The volume is not currently involved in a mirror.
Synchronous 1 Data is put on the Write Queue to be sent to the target volume, and written to the local volume, simultaneously. The write is not acknowledged as "complete" until operations on both the volumes complete.
Asynchronous 2 Data is put on the Write Queue to be sent to the target volume, and written to the local volume, simultaneously. The write is acknowledged when the write operation on the local volume is complete.

Note:

By default, this measure reports the States listed in the table above to indicate the current mirroring type of each volume on the source system. The graph of this measure however, represents the same using the numeric equivalents only.

Network number of reconnects

Indicates the number of network reconnections that have been made while this volume has been mirrored.

Number

A low value is desired for this measure. A high value of this measure indicates the poor connectivity between the source and target volumes.

Queue byte limit

Indicates the maximum number of bytes that can be allocated for write queue of this volume. This value is set against the WriteQueueByteLimitMB registry value.

Number

If zero is set against the WriteQueueByteLimitMB, this indicates that there is no limit for the bytes that can be allocated to the write queue.

Lets say for an example, the byte limit is set to 200. In such a case, if the Queue current bytes value reaches the value of this measure, the SIOS DataKeeper driver momentarily pauses the mirror volume, drains the queue and automatically starts a partial resync. Then, you may need to increase the WriteQueueByteLimitMB registry value and execute READREGISTRY command so that the SIOS DataKeeper immediately will start using the new value. This way, you can avoid loss of bytes during synchronization due to inadequate byte limit setting.

Queue current bytes

Indicates the number of bytes allocated for the Write Queue of this volume.

Number

If the value of this measure is close to the value of the Queue byte limit, it is a cause for concern. Then, you may need to increase the value of the WriteQueueByteLimitMB in the registry as required.

Queue current length

Indicates the number of writes to be mirrored in the Write Queue for the selected mirror.

Number

 

Queue high water

Indicates the high water mark of the Write Queue that is set in the WriteQueueHighWater registry value.

Number

If the Queue current length reaches the value of this measure due to intensive I/O traffic, SIOS DataKeeper driver momentarily pauses the mirror, drains the queue and automatically starts a partial resync. This value represents the number of write requests in the queue, not the number of bytes. After updating this registry value, executing the READREGISTRY command enables the SIOS DataKeeper to immediately start using the new value.

Resync current block

Indicates the number of blocks that are currently resynchronized to the target volume from this volume.

Number

 

Resync dirty blocks

Indicates the number of blocks that are marked as dirty in the intent log while resynchronizing this volume.

Number

Dirty blocks are the blocks in the intent log that need to be updated and transferred to the target volume before synchronization is complete.

Note that the value of this measure may actually increase during a mirror synchronization if a large number of incoming writes are made to the volume.

Resync elapsed time

Indicates the time taken for resynchronizing this volume.

Secs

The value will be 0 for volumes that either never have been synchronized or volumes that were not synchronized during the last boot. Compare the value of this measure to identify the volume which took maximum time to synchronize.

Resync new writes

Indicates the number of writes to this volume since the resynchronization operation has begun.

Number

A high value of this measure indicates that the synchronization process will take longer duration to finish.

Resync pass

Indicates the number of passes that have been made by this volume to update the target volume during the resynchronization process.

Number

The value of this measure increases/decreases based on the value of the Resync new writes measure. If the value of the Resync new writes measure increases, the value of this measure also increases simultaneously.

Resync phase

Indicates the current resynchronization phase of this volume.

 

The values that this measure can report and their corresponding numeric values have been discussed below:

Measure Value

Numeric Value
Unknown 0
Initial 1
Update 2

Done

3

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current phase of the resynchronization process. The graph of this measure however, represents the same using the numeric equivalents only.

Resync reads

Indicates the maximum number of disk blocks that can be kept in progress during the resynchronization of this volume.

Number

 

Resync total blocks

Indicates the total number of 64k blocks used for resynchronization of this volume.

Number

The value of this measure is approximately equal to the file system size of each volume divided by 64K.