Disk Usage Test

Dataset is the generic name that is used to refer to the following ZFS components: clones, file systems, snapshots, and volumes. Each dataset is identified by a unique name in the ZFS namespace. Datasets are identified using the following format:

pool/path[@snapshot]

pool - Identifies the name of the storage pool that contains the dataset

path - Is a slash-delimited path name for the dataset component

snapshot - Is an optional component that identifies a snapshot of a dataset

A snapshot is a read-only copy of a file system or volume. A clone on the other hand is a writable volume or file system whose initial contents are the same as the snapshot from which it was created. Both snapshots and clones do not consume any disk space initially, but as and when changes are made to the underlying dataset, snapshots and clones start using disk space. This implies that the existence of too many snapshots/clones or the presence of large sized snapshots and clones can add significantly to the disk space consumption of a dataset, causing a serious contention for disk space resources on the host! To conserve disk space usage therefore, administrators often resort to configuring a quota limit for each dataset or enabling compression on a ZFS folder. But how will an administrator ascertain the effectiveness of these configurations? This is where the ZFS Disk Usage test helps!

For every dataset on ZFS, this test reports the total space usage of the dataset, thus pointing you to those datasets that are rapidly eroding storage space. Alongside, the test enables administrators to keep track of the quota limit set for a dataset and the compression ratio achieved by a dataset, so that the impact of these configurations on the total disk space usage of the dataset can be effectively assessed; the results of this analysis can later be used to fine-tune the configurations! In addition, the test monitors the count of snapshots and clones created from each dataset and reports the space usage of these snapshots and clones, thus leading you to why a particular dataset is consuming too much space – is it because too many snapshots were created from that dataset? Is it because of the large size of the snapshots? Is it owing to incessant cloning of the snapshots? Or is it due to the large size of the snapshot clones?

Target of the test : A Solaris host

Agent deploying the test : An internal agent

Outputs of the test : One set of results for each dataset

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed.
Host	The host for which the test is to be configured.
DD Frequency	Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Available space:

Indicates the amount of disk space currently available to this dataset and all its children, assuming no other activity in the pool.

A high value is desired for this measure. You can compare the value of this measure across datasets to know which databse has very little space available.

Used space:

Indicates the amount of space currently consumed by this dataset and all its descendents.

Ideally, the value of this measure should be low.

You can even compare the value of this measure across datasets to identify the dataset that is over-utilizing the disk space.

Referred space:

Indicates the total space currently allocated to this dataset.

This is the sum of Available space and Used space.

Percentage of space used:

Indicates the percentage of space used by this dataset.

Percent

A low value is desired for this measure. A consistent rise in the value of this measure is a cause for concern, as it indicates gradual erosion of disk space by a dataset.

Compare space usage across datasets to know which dataset is consuming disk space excessively. To know why this dataset is hogging disk space, check out the value reported by the Total space used by snapshots and Total space used by clones measures for that dataset. This will indicate what is causing the space crunch – snapshots of the dataset? Or clones of the snapshots of the dataset? Based on this analysis, you may want to consider identifying and destroying some snapshots and/or clones – say, the ones that are no longer used actively - so as to free disk space.

You may also want to take a look at the value of the Quota and the Compression ratio measures for that dataset to understand whether/not altering the quota and/or compression algorithm will help in reducing disk space usage of the dataset.

Snapshots count:

Indicates the number of snapshots currently available for this dataset.

Number

By correlating Snapshots count with Total space used by snapshots you can understand whether too many snapshots of small sizes were created for the dataset or few snapshots of very large sizes.

In the event of a space crunch, you can also compare the value of the Total space used by snapshots with that of the Total space used by clones measure to know what is occupying too much space – snapshots? Or clones? Based on this analysis, you may want to consider identifying and destroying some snapshots and/or clones – say, the ones that are no longer used actively - so as to free disk space.

Total space used by snapshots:

Indicates the total amount of disk space currently used by the snapshots of this dataset.

Clones count:

Indicates the number of clones currently associated with this dataset.

Number

By correlating Clones count with Total space used by clones you can understand whether too many clones of small sizes were created for the dataset or few clones of very large sizes.

In the event of a space crunch, you can also compare the value of the Total space used by snapshots measure with that of the Total space used by clones measure to know what is occupying too much space – snapshots? Or clones? Based on this analysis, you may want to consider identifying and destroying some snapshots and/or clones – say, the ones that are no longer used actively - so as to free disk space.

Total space used by the clones:

Indicates the total amount of disk space currently used by the clones associated with this dataset.

Compression status:

Indicates the current compression status of this dataset.

‘Compression’ is a feature of ZFS, which when turned on, saves disk space and improves performance of the system. Internally, ZFS allocates data using multiples of the device's sector size, typically either 512 bytes or 4KB. When compression is enabled, a smaller number of sectors can be allocated for each block.

If compression is enabled for the dataset, this measure will report the value On. If compression is disabled, this measure will report the value Off.

The numeric values that correspond to these measure values are listed below:

Measure Value	Numeric Value
On	1
Off	0

Note:

By default, this measure reports one of the Measure Values listed in the table above. The graph of this measure however will represent the compression status using the numeric equivalents only.

Compression ratio:

Indicates the current compression ratio of this dataset.

Ratio

A consistent drop in this value is disconcerting, as it indicates that data blocks are not been compressed efficiently, thereby increasing disk space consumption. Under such circumstances, you may want to change the compression algorithm in use. LJZB is the default compression algorithm for ZFS. Specifically, it provides fair compression, has a high compression speed, has a high decompression speed and detects incompressible data quickly. The other options available are:

LZ4
GZIP
ZLE

A good alterative to LJZB would be LZ4. Tests have revealed that LZ4 averages a 2.1:1 compression ratio, while GZIP is much slower.

Quota:

Indicates the current quota limit set for this dataset.

Quota limits the amount of disk space a dataset and its descendents can consume. This property enforces a hard limit on the amount of disk space used, including all space consumed by descendents, such as file systems and snapshots.

If the load on the dataset is consistently high, you may want to increase the quota limit to ensure that there is no loss of data. Likewise, if the dataset is consuming space excessively owing to too many unused snapshots/clones, you may want to reduce the quota limit, so that administrators are discouraged from needlessly creating snapshots and clones.