Data Deduplication Jobs Test
Data deduplication works by finding portions of files that are identical and storing just a single copy of the duplicated data on the disk. The technology required to find and isolate duplicated portions of files on a large disk is pretty complicated. Microsoft uses an algorithm called chunking, which scans data on the disk and breaks it into chunks whose average size is 64KB. These chunks are stored on disk in a hidden folder called the chunk store. Then, the actual files on the disk contain pointers to individual chunks in the chunk store. If two or more files contain identical chunks, only a single copy of the chunk is placed in the chunk store and the files that share the chunk all point to the same chunk.
Microsoft has tuned the chunking algorithm sufficiently that in most cases, users will have no idea that their data has been deduplicated. Access to the data is as fast as if the data were not deduplicated. For performance reasons, data is not automatically deduplicated as it is written. Instead, regularly scheduled deduplication jobs scan the disk, applying the chunking algorithm to find chunks that can be deduplicated. Data deduplication works through the following jobs:
Job Name | Description |
---|---|
Optimization |
The Optimization job deduplicates by chunking data on a volume per the volume policy settings, (optionally) compressing those chunks, and storing chunks uniquely in the chunk store. |
Garbage Collection |
The Garbage Collection job reclaims disk space by removing unnecessary chunks that are no longer being referenced by files that have been recently modified or deleted. |
Integrity Scrubbing |
The Integrity Scrubbing job identifies corruption in the chunk store due to disk failures or bad sectors. When possible, Data Deduplication can automatically use volume features (such as mirror or parity on a Storage Spaces volume) to reconstruct the corrupted data. Additionally, Data Deduplication keeps backup copies of popular chunks when they are referenced more than 100 times in an area called the hotspot. |
Unoptimization |
The Unoptimization job, which is a special job that should only be run manually, undoes the optimization done by deduplication and disables Data Deduplication for that volume. |
Data Deduplication uses a post-processing strategy to optimize and maintain a volume's space efficiency so it is important that Data Deduplication jobs are successfully completed without any delay. If, for any reason, the Data Deduplication jobs are not completed quickly, it will result in an overload condition due to long-winding job queues. This in turn will cause slowdown on the target host. In the event of such abnormalities, administrators will have to instantly figure out the count of jobs that are being queued. The Data Deduplication Jobs test helps administrators in this regard!
This test monitors the jobs on the target host, and reports the number of jobs that are currently running and the number of jobs that are in queue. Using these metrics, administrators can instantly know the current workload on the host as well as the overload condition (if any).
This test is disabled by default. To enable the test, select the Enable / Disable option from the Tests menu of the Agents tile in the Admin tile menu. Select Microsoft Windows as the Component type, and pick Performance as the Test type. From the list of disabled tests, pick this test and click the < button to enable it. Finally, click Update.
Target of the test : A Windows host
Agent deploying the test : An internal agent
Outputs of the test : One set of results for the target host being monitored.
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Running jobs |
Indicates the number of jobs that are currently running on the target host. |
Number |
This measure is a good indicator of the workload on the target host. |
Queued jobs |
Indicates the number of jobs that are currently in queue. |
Number |
|