10

I was copying ~3.7TB of data from one 4TB external drive with HFS+ filesystem, to another 4TB external drive with an exFAT filesystem. The new HDD got filled after only ~75% of the data could be transferred, due to, I am guessing, the allocation unit size of exFAT using up more space per file for small file sizes.

I am copying a lot (~ millions) of small files (1.5kB each). So I am trying to figure out how to do this.

Therefore I need a filesystem that fullfills the following requirements:

  • Block size small enough that I can fit millions of files sized 1.5kB wasting minimal space. (here exFAT has a problem)

  • Read/write compatible with Linux. (here HFS+ has a problem)

  • Able to make a 4T partition on Linux. (here ext4 has problems)

Any alternative filesystem?

UPDATE: This question was flagged as already been answered in another post (Optimizing file system for lots of small files?). However the accepted answer does not work for me:

mkfs.ext4 -b 1024 /dev/your_partition

Warning: specified blocksize 1024 is less than device physical sectorsize 4096
/dev/sdc: Cannot create filesystem with requested number of inodes while setting up superblock
9

2 Answers 2

8

Files systems that you can use

However given the scale of your problem which involves millions of files, ReiserFS/Reiser4, Btrfs and ZFS may be the best solution

For more details read below


Since your files are around 1.5 KB, the ideal block size for your case would be 512-byte. However your disk has 4 KB physical sector size (A.K.A Advanced Format) as can be seen from the error message:

Warning: specified blocksize 1024 is less than device physical sectorsize 4096

which means you can't create a block size smaller than that. You need to use block suballocation to reduce wasted space. You can open Comparison of file systems - Allocation and layout policies and sort on Block suballocation / Tail packing / Variable block size to know which file systems support such features

Allocation and layout policies


Another alternative is to store data in metadata space where multiple records is allocated into a single block

In NTFS each file is represented by an MFT record which is the analog of inode in *nix. Files that are small will be stored in the MFT record directly, saving space and also improving access time because you don't need another disk read to get the real data. Those are called resident files. Later in ext4 a similar feature was added and called inline files:

The inline data feature was designed to handle the case that a file's data is so tiny that it readily fits inside the inode, which (theoretically) reduces disk block consumption and reduces seeks. If the file is smaller than 60 bytes, then the data are stored inline in inode.i_block.

The 60-byte value is for the default inode size of 256-byte. The inode structure consumes 156 bytes and the other 40 bytes may be for some extended feature. However you can change the inode size to 2 KB to fit all of your 1.5 KB files with the -I inode-size option while formatting

In Btrfs there's also a similar feature where small files are written directly into the metadata stream

max_inline=bytes

( default: min(2048, page size) )

Specify the maximum amount of space, that can be inlined in a metadata B-tree leaf. The value is specified in bytes, optionally with a K suffix (case insensitive).

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)

It seems Reiser4 also has such a feature although I can't confirm it.

In NTFS the current default MFT record is 1 KB although it was 4 KB in NTFS 1.0 in Windows NT 3.1. That only allows files around 600-900 bytes or less to be resident so you'll have to change the MFT record size. It's possible although you'll have a hard time finding a formatting software that allows changing the default MFT record size

Some people have roughly the same situation as yours


There are also many misunderstandings from your side

Read/write compatible with Linux. (here HFS+ has a problem)

There are many read/write HFS+ drivers in Linux available, so this shouldn't be a problem. The only issue with HFS+ is that it's in the same era as ext2 so it's far more inferior compared to modern file systems like ext4, NTFS, ZFS or Btrfs

Able to make a 4T partition on Linux. (here ext4 has problems)

Neither ext4 nor exFAT has issue creating a 4 TB partition with 1 KB block size. In fact any 32-bit filesystem can create a 4 TB volume with 1 KB block size because 232 blocks × 210 bytes/block = 4 × 240 bytes = 4 TB, and with the default 4 KB block size then the maximum partition size is 16 TB. ext4 uses 48-bit address, thus has much bigger maximum size

Block size small enough that I can fit millions of files sized 1.5kB wasting minimal space. (here exFAT has a problem)

In fact the only issue with exFAT is that the default block size is too big. The minimum block size on exFAT is 1 sector so it can have 512-byte block size in a disk with 512-bit sector (See 9.2 Cluster Size Limits in the spec). You can see 512-byte and 1 KB options in the format dialog as below. Unfortunately your disk has 4 KB sector so exFAT doesn't work

exFAT block sizes in format dialog

exFAT block sizes

-2

"Able to make a 4T partition on Linux. (here ext4 has problems)"

Ext4 don't have any problem with 4TB partition.

~# LANG=C; df -hT |grep T
Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/sdb1      ext4       33T   12T   20T  37% /RAIDDATA
3
  • As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
    – Community Bot
    Commented Aug 31, 2023 at 20:36
  • Submitting as comment would have been more appropriate IMO. Commented Aug 31, 2023 at 21:30
  • read my comments in the question, the problem is the tiny files, not the volume size
    – phuclv
    Commented Oct 26, 2023 at 3:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .