3

Been tinkering with btrfs with consideration of moving from ext4 to that.

However, when wanting to compare R/W speeds, I seem to have come across an (to me at least) unusual behavior by du on the btrfs disk, where it apparently don't report the filesize in same way as with files on my ext4.

(Apologies for the Norwegian locale. Though most are probably familiar enough with the English outputs to see what's going on)


Making a testfile

  1. I create a 5GB "testfile" with dd on the mounted btrfs disk :

    $ sudo dd if=/dev/urandom of=5G_dd_test_file.tmp bs=1 count=0 seek=5G
    0+0 oppføringer inn
    0+0 oppføringer ut
    0 byte kopiert, 0,00393248 s, 0,0 kB/s
    
  2. In similar fashion I create a testfile using fallocate in same location :

    $ sudo fallocate -l 5G 5G_fallocate_test_file.tmp
    
  3. ls confirms they're both there:

    $ ls
    5G_dd_test_file.tmp  5G_fallocate_test_file.tmp
    

du acting weird..(?)

The size output from du <file> :

$ sudo du 5G_dd_test_file.tmp 
0   5G_dd_test_file.tmp

$ sudo du 5G_fallocate_test_file.tmp 
5242880 5G_fallocate_test_file.tmp

Note the 0 filesize on the dd-generated file

In comparison, ls and stat on the very same files :

$ ls -l *.tmp
-rw-r--r-- 1 root root 5368709120 mars  24 18:07 5G_dd_test_file.tmp
-rw-r--r-- 1 root root 5368709120 mars  24 18:12 5G_fallocate_test_file.tmp
$ stat *.tmp
  Fil: 5G_dd_test_file.tmp
  Størrelse: 5368709120[tab]Blokker: 0          IO Blokk: 4096   vanlig fil
Device: 0,40    Inode: 258         Links: 1
Tilgang: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Tilgang: 2022-03-24 18:07:34.646755042 +0100
Omgjøring: 2022-03-24 18:07:34.646755042 +0100
Endring: 2022-03-24 18:07:34.646755042 +0100
 Fødsel: 2022-03-24 18:07:34.646755042 +0100
  Fil: 5G_fallocate_test_file.tmp
  Størrelse: 5368709120[tab]Blokker: 10485760   IO Blokk: 4096   vanlig fil
Device: 0,40    Inode: 259         Links: 1
Tilgang: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Tilgang: 2022-03-24 18:12:11.768422242 +0100
Omgjøring: 2022-03-24 18:12:11.768422242 +0100
Endring: 2022-03-24 18:12:11.768422242 +0100
 Fødsel: 2022-03-24 18:12:11.768422242 +0100

If however I add the -b parameter to du (not usually needed) when doing the same dd generated file that showed 0 size. Then du seems to be acting as usual.

$ sudo du -b 5G_dd_test_file.tmp
5368709120  5G_dd_test_file.tmp

Another oddity from du (?)

So just out of curiosity, i decided to simply gzip the file from dd :

$ sudo gzip 5G_dd_test_file.tmp

$ sudo du 5G_dd_test_file.tmp.gz
5092    5G_dd_test_file.tmp.gz

Now it's showing a non-zero size..

$ sudo ls -l 5G_dd_test_file.tmp.gz
-rw-r--r-- 1 root root 5210230 mars  24 18:07 5G_dd_test_file.tmp.gz
sudo stat 5G_dd_test_file.tmp.gz
  Fil: 5G_dd_test_file.tmp.gz
  Størrelse: 5210230   [tab]Blokker: 10184      IO Blokk: 4096   vanlig fil
Device: 0,40    Inode: 260         Links: 1
Tilgang: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Tilgang: 2022-03-24 18:07:34.646755042 +0100
Omgjøring: 2022-03-24 18:07:34.646755042 +0100
Endring: 2022-03-24 18:43:41.061926016 +0100
 Fødsel: 2022-03-24 18:42:27.554141544 +0100

Questions are

  • Is this normal behavior and actually to be expected?
  • If not, could this potentially break e.g scripts or programs reliant on du returns?
1
  • 1
    Good luck with btrfs. I've been using it for several years. There are few downsides but so far I'm happy with it as a general storage. There were use cases when ext4 was significantly better for me (at least if tested against btrfs created and mounted with default options). Like with du vs wc -c, one needs to choose the right tool for the job; here: the right filesystem type. And like in many other areas, the ability to choose the right tool comes from experiences of choosing a wrong tool. Commented Mar 24, 2022 at 19:18

1 Answer 1

3

Is this normal behavior and actually to be expected?

Basically yes.

Using dd seek=… when creating a file is a way to create a sparse file. Using dd seek=… and writing nothing (count=0) is a way to create a fully sparse file.

[…] a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to the data storage media instead of the actual "empty" space which makes up the block, thus consuming less storage space. The full block size is written to the media as the actual size only when the block contains "real" (non-empty) data.

The way I prefer is with truncate. On the other hand the main purpose of fallocate is to actually allocate blocks. fallocate created a non-sparse file for you.

du reports disk usage. A fully sparse file uses zero blocks for data. It's just a directory entry with zero blocks allocated.

Your gzip created a non-sparse file. No fully sparse file can be a valid gzip archive, because a fully sparse file returns null bytes when being read, but gzip header alone contains non-null bytes. Additionally I wouldn't expect any gzip archive to be (able to be) even partially sparse, because blocks of zeros (i.e. the hypothetical sparse parts) are highly compressible with virtually no effort and their existence would mean gzip bodged its job.


Could this potentially break e.g scripts or programs reliant on du returns?

No, unless the script uses du when it should use du -b or wc -c; but then it's a bug in the script.

Use du for what it is designed for. Some insight here: Why are there so many different ways to measure disk usage?


Ext4 supports sparse files as well. With your dd command I created a fully sparse file in my ext4 filesystem and separately in my btrfs filesystem. The whole "issue" is absolutely not about ext4 vs btrfs.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .