Data Deduplication Volumes Test

Once the Data Deduplication feature is enabled for a volume, the Data Deduplication engine can potentially process all of the data on the selected volume (except a file size less than 32 KB, files in folders that are excluded, or files that have age settings applied). The deduplication engine involves finding and removing duplication within volume data without compromising its fidelity or integrity. After the volume is enabled for deduplication and the data is optimized, the volume contains the following:

  • Unoptimized files - For example, unoptimized files could include files that do not meet the selected file-age policy setting, system state files, alternate data streams, encrypted files, files with extended attributes, files smaller than 32 KB, other reparse point files, or files in use by other applications (the “in use” limit is removed in Windows Server 2012 R2).
  • Optimized files - Files that are stored as reparse points that contain pointers to a map of the respective chunks in the chunk store that are needed to restore the file when it is requested.
  • Chunk store - Location for the optimized file data.
  • Additional free space - The optimized files and chunk store occupy much less space than they did prior to optimization.

Using this test, administrators can find out the statistics related to the above-said files. This test also reveals the space utilization on each volume and the size of the optimized files, datastores and chunks.

This test is disabled by default. To enable the test, select the Enable / Disable option from the Tests menu of the Agents tile in the Admin tile menu. Select Microsoft Windows as the Component type, and pick Performance as the Test type. From the list of disabled tests, pick this test and click the < button to enable it. Finally, click Update.

Target of the test : A Windows host

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each deduplication volume on the target host.

Configurable parameters for the test
  1. Test period - How often should the test be executed
  2. Host - The host for which the test is to be configured.
  3. port - Refers to the port used by the specified host. Here it is NULL.
  4. Domain - Specify the name of the Windows domain to which the target host belongs.
  5. Username - Here, enter the name of a valid domain user with login rights to the target host.
  6. Password - Provide the password of the above-mentioned user in this text box.
  7. Confirm password - Confirm the password by retyping it here.
  8. DD Frequency - Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.
  9. Detailed Diagnosis - To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability
    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Capacity

Indicates the total capacity of this volume.

GB

 

Free space

Indicates the amount of space available for use on this volume.

GB

A high value is desired for this measure.

Used space

Indicates the amount of space utilized on this volume.

GB

 

Unoptimized space

Indicates the total logical size of all (optimized and non-optimized) files on this volume.

GB

 

Saved space

Indicates the difference between the logical size of the optimized files and the logical size of the chunk store (i.e. the sum of the deduplicated user data and deduplication metadata).

GB

 

Saving rate

Indicates the percentage of deduplication saved space on this volume.

Percent

 

Optimized files

Indicates the number of the optimized files on this volume.

Number

 

Optimized files size

Indicates the total size of the all optimized files on this volume.

GB

 

Optimized file saving rate

Indicates the percentage of space utilized for saving the optimized files on this volume.

Percent

 

In-policy files

Indicates the number of files that are currently qualifies for optimization.

Number

 

In-policy files size

Indicates the total size of files that are currently qualifies for optimization.

GB

 

Last optimization result

Indicates the result of an optimization job that was run last on this volume.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
Success 0
Failure 1

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Last garbage collection result

Indicates the result of an garbage collection job that was run last on this volume.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
Success 0
Failure 1

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Last scrubbing result

Indicates the result of an scrubbing job that was run last on this volume.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
Success 0
Failure 1

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Usage type

Indicates the type of data to be stored in this volume.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
Default 1
HyperV 2
Backup 3

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Minimum file age

Indicates the minimum number of days since users have accessed a file before the deduplication engine optimizes the file

Number

 

Minimum files size

Specifies the minimum size threshold for files that are to be optimized.

GB

The deduplication engine optimizes the files that meet the minimum size threshold.

Is data compressed after deduplication?

Indicates whether/not the data is compressed after deduplication on this volume.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Chunk redundancy threshold

Indicates the chunk redundancy threshold set for this volume.

 

This measure specifies that if the data deduplication engine discovers 50 chunks of identical data, it makes one redundant copy as a safeguard.

Is byte-by-byte verification performed?

Indicates whether/not byte-by-byte verification is performed for each duplicated chunk.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Are files in use optimized?

Indicates whether the files in this volume are optimized or not.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Are files partially optimized?

Indicates whether the files in this volume are partially optimized.

 

The values that this measure can report and their corresponding numeric values are discussed in the table above:

Measure Value Numeric Value
No 0
Yes 1

Note:

By default, this measure reports the Measure Values listed in the table above. In the graph of this measure however, the value of this measure is represented using their numeric equivalents only.

Data chunks

Indicates the number of data chunks in a container.

Number

 

Data containers

Indicates the number of containers in the data store.

Number

 

Average data chunk size

Indicates the average size of data chunk in the data store.

GB

 

Data chunk median size

Indicates the number of data streams in a container.

GB

 

Data store uncompacted freespace

Indicates the amount of uncompacted space that is available for use on this volume.

GB

 

Stream map chunks

Indicates the number of stream map chunks in a container.

Number

 

Stream map containers

Indicates the number of containers in the stream map store.

Number

 

Average stream map chunks

Indicates the stream map store size divided by the total number of streams in the store.

GB

 

Median stream map chunks

Indicates the number of median stream chunks stored in this volume.

Number

 

Maximum stream map chunks

Indicates the maximum number of stream map chunks that can be stored in this volume.

Number

 

Hotspot chunks

Indicates the number of hotspots in a container.

Number

 

Hotspot containers

Indicates the number of hotspots containers in the stream map store.

Number

 

Median hotspot references

Indicates the number of median hotspot references.

Number

 

Corruption log entries

Indicates the number of log entries on data corruptions on this volume.

Number

Some of the most common causes for deduplication to report corruption are:

  • Incompatible Robocopy options used when copying data
  • Incompatible Backup/Restore program used on a dedup volume
  • Migrating a deduplicated volume to a down-level Windows Server version
  • Enabling compression on volume roots also enabled with deduplication
  • Hardware issues
  • File System corruption

Ideally, a low value is desired for this measure. A sudden/gradul increase in the value of this measure indicates decrease in data integrity of the volume.

Total chunk store size

Indicates total chunk store size on this volume.

GB

The chunk store is an organized series of container files in the System Volume Information folder that Data Deduplication uses to uniquely store chunks.