6

What is the order for data written to a zfs filesystem on zfs on linux?

The only specific document i found at http://docs.oracle.com/cd/E36784_01/html/E36835/gkknx.html says; When a file is written, the data is compressed, encrypted, and the checksum is verified. Then, the data is deduplicated, if possible.

but if that was true then dedup would not dedup blocks compressed with different compression algorithms.

I tested mysqlf and i believe that the order is the following: dedup, compress, encrypt.

My test-Setting:

zpool create tank /dev/sdb
zfs create tank/lz4
zfs create tank/gzip9
zfs set compression=lz4 tank/lz4
zfs set compression=gzip-9 tank/gzip9
zfs set dedup=on tank

Output of zfs list

NAME         USED  AVAIL  REFER  MOUNTPOINT
tank         106K  19,3G    19K  /tank
tank/gzip9    19K  19,3G    19K  /tank/gzip9
tank/lz4      19K  19,3G    19K  /tank/lz4

generate a random file with dd if=/dev/urandom of=random.txt count=128K bs=1024

131072+0 Datensätze ein
131072+0 Datensätze aus
134217728 Bytes (134 MB) kopiert, 12,8786 s, 10,4 MB/s

Output of zpool list on empty pool:

NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank  19,9G   134K  19,9G         -     0%     0%  1.00x  ONLINE  -

Then copy the files to the datasets with different compression algos:

 cp random.txt /tank/lz4
 cp random.txt /tank/gzip9

Output of zfs list after copying:

NAME         USED  AVAIL  REFER  MOUNTPOINT
tank         257M  19,1G    19K  /tank
tank/gzip9   128M  19,1G   128M  /tank/gzip9
tank/lz4     128M  19,1G   128M  /tank/lz4

Output of zpool list afer copying:

NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank  19,9G   129M  19,7G         -     0%     0%  2.00x  ONLINE  -

The dedup-ratio is 2.0 after copying the same file to different datasets. In my opinion this means that dedup is done on data-blocks before compression and encryption.

Please could someone verify if this is correct?

2 Answers 2

9

It turns out, that http://docs.oracle.com/cd/E36784_01/html/E36835/gkknx.html is right.

When a file is written, the data is compressed, encrypted, and the checksum is verified. Then, the data is deduplicated, if possible.

My assumption with the random file was incorrect. It seems that ZFS aborts compression if it cannot achieve a certain minimum compression ratio.

quote from https://wiki.illumos.org/display/illumos/LZ4+Compression

Another particular thing to note is that LZ4's performance on incompressible data is very high. It achieves this by incorporating an "early abort" mechanism which will trigger if LZ4 can't meet the expected minimum compression ratio (12.5% on ZFS).

For testing i created a textfile from my filesystem with find / >> tree.txt.

After copying the file to both datasets and then zpool get dedupratio did return:

NAME  PROPERTY    VALUE  SOURCE
tank  dedupratio  1.00x  -

Dedup is really the last part in this write chain. Choosing different compression-algorithms will result in poor dedupratio!

Unfortunately my ZoL-version does not support encryption. But it seems that encrypting different datasets could also ruin dedup. Info on encryption: https://docs.oracle.com/cd/E53394_01/html/E54801/gkkih.html

1

I confirm it: Dedup is really the last part in this write chain. Choosing different compression-algorithms will result in poor dedupratio!

I conducted a full experiment, here is the complete output:

zpool create -m /zpool/zp-test-dedup-compress zp-test-dedup-compress /dev/sde

zfs set dedup=on zp-test-dedup-compress

# Create datasets with different types of compression
zfs create zp-test-dedup-compress/lz4
zfs create zp-test-dedup-compress/gzip9
zfs set compression=lz4 zp-test-dedup-compress/lz4
zfs set compression=gzip-9 zp-test-dedup-compress/gzip9

# Check compression and dedup settings
root@test-zfs-pool-degr-debian11etalon:~# zfs get dedup
NAME                          PROPERTY  VALUE          SOURCE
zp-test-dedup-compress        dedup     on             local
zp-test-dedup-compress/gzip9  dedup     on             inherited from zp-test-dedup-compress
zp-test-dedup-compress/lz4    dedup     on             inherited from zp-test-dedup-compress
root@test-zfs-pool-degr-debian11etalon:~# zfs get compress
NAME                          PROPERTY     VALUE           SOURCE
zp-test-dedup-compress        compression  off             default
zp-test-dedup-compress/gzip9  compression  gzip-9          local
zp-test-dedup-compress/lz4    compression  lz4             local

root@test-zfs-pool-degr-debian11etalon:~# zfs list
NAME                           USED  AVAIL     REFER  MOUNTPOINT
zp-test-dedup-compress        1.49M  28.6G      208K  /zpool/zp-test-dedup-compress
zp-test-dedup-compress/gzip9   192K  28.6G      192K  /zpool/zp-test-dedup-compress/gzip9
zp-test-dedup-compress/lz4     192K  28.6G      192K  /zpool/zp-test-dedup-compress/lz4

# Get compression ratio for all datasets
root@test-zfs-pool-degr-debian11etalon:~# zfs list -o name,compressratio
NAME                          RATIO
zp-test-dedup-compress        1.00x
zp-test-dedup-compress/gzip9  1.00x
zp-test-dedup-compress/lz4    1.00x

root@test-zfs-pool-degr-debian11etalon:~# zpool list
NAME                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zp-test-dedup-compress  29.5G  1.49M  29.5G        -         -     0%     0%  1.00x    ONLINE  -

# Create a text file to list paths to all files. A random file won't work as there will be little compression.
find / >> tree.txt

# Copy files to datasets with different compression algorithms:
cp tree.txt /zpool/zp-test-dedup-compress/gzip9
cp tree.txt /zpool/zp-test-dedup-compress/lz4

# Shows disk usage without deduplication but with compression (i.e., size after compression)
du -sh /path to file

# Shows file size without compression
ls -lh /path to file

root@test-zfs-pool-degr-debian11etalon:~# du -sh /zpool/zp-test-dedup-compress/gzip9/tree.txt
1009K   /zpool/zp-test-dedup-compress/gzip9/tree.txt
root@test-zfs-pool-degr-debian11etalon:~# ls -lh /zpool/zp-test-dedup-compress/gzip9/tree.txt
-rw-r--r-- 1 root root 7.6M Jun 23 17:53 /zpool/zp-test-dedup-compress/gzip9/tree.txt
root@test-zfs-pool-degr-debian11etalon:~# du -sh /zpool/zp-test-dedup-compress/lz4/tree.txt
1.7M    /zpool/zp-test-dedup-compress/lz4/tree.txt
root@test-zfs-pool-degr-debian11etalon:~# ls -lh /zpool/zp-test-dedup-compress/lz4/

# Get compression ratio for all datasets
root@test-zfs-pool-degr-debian11etalon:~# zfs list -o name,compressratio
NAME                          RATIO
zp-test-dedup-compress        4.97x
zp-test-dedup-compress/gzip9  7.32x
zp-test-dedup-compress/lz4    4.44x

# Check deduplication. We see that there is no deduplication, 1.00x.
zpool list
NAME                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zp-test-dedup-compress  29.5G  4.30M  29.5G        -         -     0%     0%  1.00x    ONLINE  -

We see that there is no deduplication, 1.00x, meaning deduplication did not find identical blocks. This confirms that deduplication occurs AFTER compression! When writing a file, data is compressed, encrypted, and a checksum is checked. Then, data is deduplicated if possible.

Thus, data is deduplicated AFTER compression! Therefore, if you change the compression algorithm, the blocks after compression will be different, and deduplication will not work on it. Therefore, it's better to choose the compression algorithm right away to avoid changing it later.

Copy the file again into the same datasets. Deduplication will become 2.0 as expected.

# Copy files to datasets with different compression algorithms:
cp tree.txt /zpool/zp-test-dedup-compress/gzip9/2
cp tree.txt /zpool/zp-test-dedup-compress/lz4/2

# Check deduplication
root@test-zfs-pool-degr-debian11etalon:~# zpool list
NAME                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zp-test-dedup-compress  29.5G  5.72M  29.5G        -         -     0%     0%  2.00x    ONLINE  -
1
  • Thanks a lot for confirmation! Commented Jun 23 at 20:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .