13

I was playing around with a thumb drive and I noticed a counter-intuitive trend.

The bigger I make the cluster size (allocation unit size in windows, block size in linux), the less capacity gets reported.

Which is weird, because basic logic dictates the opposite - bigger clusters should result in less filesystem metadata, which should yield more usable space. This is also repeated by every page of advice on what the "best" cluster size is I could find on the internet (more than a dozen at this point).

Here are some numbers for exFAT.

Capacity [Bytes] Cluster size [KiB] Difference [KiB]
15792537600 64
15792472064 128 64
15792340992 256 128
15792078848 512 256
15791554560 1024 512
15789457408 2048 2048
15783165952 4096 6144

Also, the pattern in the difference column breaks in the last row...

And now NTFS.

Capacity [Bytes] Cluster size [KiB] Difference [KiB]
15794679808 4
15794675712 8 4
15794667520 16 8
15794667520 32 0
15794634752 64 32
15794569216 128 64
15794438144 256 128

Again we get like one anomalous difference.

Method: Format done via the windows explorer format utility. Capacity data collected via windows explorer properties. Partition table: GPT.

So why do bigger clusters yield less capacity?

Random trivia: exFAT got sort of "open sourced" in 2019.
exFAT File System Specification

6
  • 1
    Using a data recovery tool like DMDE you can get a 'custermap' and determine number of actual clusters. Take no. clusters x clustersize and you can do the math. Commented May 6, 2023 at 13:24
  • 1
    What is the exact size of your partition in bytes or sectors? Commented May 6, 2023 at 13:52
  • @user1686 You're asking for a single answer, so I will assume it doesn't vary for any of the above. Using the software suggested above, the data partition starts at LBA 2048 and ends at 30851038. Total size of 30848990 sectors, or 15794682880 bytes, which is 2095 KiB less than windows reports.
    – martixy
    Commented May 6, 2023 at 14:59
  • 1
    @JoepvanSteen Really neat software. But it uses LBA, and LBAs map to sectors, not clusters (and see comment above), so I'm not sure what kind of math I'm supposed to be doing here.
    – martixy
    Commented May 6, 2023 at 15:02
  • Create cluster-map, it will give number of clusters. From boot sector you can read sectors per cluster. I wild 'answer' to explain what I mean. Commented May 6, 2023 at 16:12

4 Answers 4

12
+150

The case of exFAT.

Given that exFAT has a public, and easily accessible specification, it was the easiest example to tackle.

Using the DMDE tool suggested by Joep van Steen, we can examine the exact structure of the filesystem on disk.

Disks are divided in sectors in hardware^. That is the fundamental unit of data.

There are several types of metadata and effects that dictate the exact amount of usable data.

  1. Partitioning: The partitioning scheme and the partition boundaries themselves carve up disks in usable sections according to some very simple rules. 2 of the most popular schemes are: MBR and GPT. I have not looked into MBR in detail.

    • GPT (GUID Partition Table) overhead has a static size. It puts one partition table spanning 34 sectors at the start of the disk and a backup table spanning 33 sectors at the end. GUID Partition Table
    • The first partition is not obligated to start right after the partition (i.e. LBA34). Its starting (and ending) sector is set by the partition entry (one of those little boxes in LBA2 in the diagram above). Leaving gaps may be done for alignment purposes^^.
  2. File system structure: At the start of a partition begin file-system related structures and metadata. The first sector is the filesystem's boot sector (this has nothing to do with booting your machine). It defines what type of filesystem it is and its layout. Different types of filesystems put different types of data in there, according to their needs and design. Here is the boot sector of my flash drive:

    exFAT boot sector

    exFAT is a FAT-based filesystem, so for example it say how long the FAT table is, and where it is located.
    It also says where the cluster heap is located. The cluster heap is all the space the user can put snowboarding videos and cat pictures in.

  3. File system metadata: So far we have measured data and location in sectors. The filesystem is a structure used to manage those sectors in an easy and consistent way. It creates its own units of data, according to its design, that it manages based on its internal mechanisms. (This is an example of adding a layer of abstraction.) Those units of data are called clusters. Each cluster can be one or more contiguous sectors. They only make sense within the scope of the filesystem. The boot sector above says how big they are (as a power of 2), where this cluster heap begins, and how many of them we have.

    • The first 2 clusters in an exFAT fs are always empty.
    • Additionally exFAT maintains a cluster allocation bitmap ($BITMAP). Each bit in the bitmap specifies whether the corresponding cluster is free.
    • An Up-case table (exFAT is a case-insensitive fs, and table helps it implement that). It has a fixed size.
  4. Leftover space: We chop up the space on the drive, and the space in a partition in little bits. But sometimes the space doesn't neatly align with all the little pieces. Like when you tile your bathroom, and start with whole tiles, but at the end, you can't put a whole piece, so you have to cut it partially. Imperfect tiling

    Except in our case, we can't cut it, so a little bit (any small amount of space that can't fit a whole cluster) is left unused. The larger the cluster size the more space can potentially go unused.

Complete answer

All of the above form the necessary and sufficient explanation of the question for exFAT.

Broadly the above apply to all filesystems, but the details may be different (for example another file system might not leave the first 2 clusters empty like exFAT).

I have created a spreadsheet to demonstrate these relationships:

Cluster size vs usable space in exFAT

Observations:

  • The numbers line up perfectly^^^ with the reported capacities of the drive.

  • It does not matter what kind of files you put in the filesystem in terms of metadata efficiency. (For exFAT anyway, and other static metadata filesystems like ext4.)

  • File size still matters, because of the tiling problem (see 4 above). Statistically, on average you will waste
    (number of files) * (half cluster size)
    of space

    • Some advanced file systems like btrfs are capable of using up this slack space.
      This is called Block Suballocation.
    • Also, some filesystems can store small files "inline" - along with the metadata block, instead of allocating clusters for it. (NTFS, btrfs, ext4)
  • Depending on the size of a particular drive/partition, there will be a sweet-spot for metadata efficiency (the bolded column in the spreadsheet). Seek the lowest number.

  • You may look to the formulas in each cell to find how each value is calculated.

  • There are many additional comments in the sheet

  • There are 2 hardcoded parameters (discovered by looking at the formatted results afterwards, rather than calculated from prior data): FAT length and Cluster heap offset.

    • I don't know what algorithms are used to calculate those. I tried looking through Rufus' source code for the answer, but it just calls into a native function (fmifs.dll::FormatEx) to perform the actual operation.
    • There is clearly a pattern to the FAT len values. See the end columns, and also note the same values cropping up in the min FAT len column. However I do not have the math chops to deduce it. I welcome help.
  • Edit: Bonus optimization time. There is a point at which the wasted space of a high cluster size becomes greater than savings from metadata. This depends on the number of files on the filesystem. I've added a new column at the end to demonstrate this relationship.

N.B. I welcome contributions to the spreadsheet. If you wish to contribute, request access and ye shall be granted.


^ There are abstraction layers here like Advanced Format, and NAND page sizes that we will not get into here. These abstractions are imposed by the devices themselves and are (mostly) transparent to the OS.
^^ OSes may break the abstraction when formatting to avoid alignment issues. See Advanced Format.
^^^ Except for 1 cluster = 1 sector, the reasons for which I have not explored in detail.

1
  • 2
    What an excellent and well researched answer! Commented May 7, 2023 at 14:09
7

This is not the answer but it may help you determine what's going on and expands on my comment. Has been a while since I dug deep into this, so bit rusty ..

First we need some values from boot sector:

enter image description here

We can now compute the data area, the area of the file system that remains after substracting meta data such as the FAT's:

data area start = reserved + (2 * big sectors per FAT), so data area start = 7166 + (2 * 513) = 8192.

Also we can determine sectors per cluster, we read 8 and total sectors 532480. So then data area size = 524288

Total cluster from clustermap for this same partition:

enter image description here

  1. Multiply by sect/clus, so 65538 * 8 = 524304

So we see discrepancy 524304 - 524288 = 16 which actually accounts for 2 clusters. Uhmm. This may be normal actually, I'd have to check.

Now what I was getting at is you can experiment with different cluster and see what happens to the numbers, see where this weirdness you observe comes from.

My theory is/was format will 'play' with reserved sectors value to perhaps align data area at 4k boundary, for example in case partition starts at odd LBA but it may also want to avoid an odd number of clusters or not fully put to use FAT sectors.

The reserved sectors is largely 'lost space' and this could somehow affect how math works out for number of addressable clusters. But note that this is just a hypothesis. The larger this area, the less space for clusters. So by modifying it's size it could align clusters, could avoid odd number of clusters and could make sure all FAT entries correspond with an actual cluster.

So again, not the answer but perhaps it helps narrow down the answer.

For NTFS it'd be an entirely different story as it does not have a fixed set of file system meta data structures like FAT. $MFT can grow/shrink (although I have never seen it do the latter), $Bitmp I suppose could be largely sparse as long as there are large amounts of free clusters, just to name some differences exFAT <> NTFS.

For 'data area' as a whole it will not matter as file system meta data themselves are considered files by NTFS. And the entire partition is divided into clusters, so first sector is also first sector of first cluster.

2
  • 1
    I think DMDE gave me enough clues to solve the issue. Working on it right now, so far the numbers line up perfectly, which is encouraging. Ultimately it's an aliasing issue, but that's sort of an obvious deduction. The details are the hard part. brb :)
    – martixy
    Commented May 6, 2023 at 17:46
  • 1
    I believe I have solved single-partition exFAT. Some very interesting findings so far. (Even found a bug in the spec!) But I need to verify the results and I'm out of time today. Stay tuned. ;P
    – martixy
    Commented May 6, 2023 at 18:44
6

I've written a little python script to give us some insight:

def meh(i):
    cluster_size_in_byte = i[0] * 1024
    cluster_size_divisible_volume_size = 15794682880 - 15794682880 % cluster_size_in_byte
    unknown_taken_up_size_in_byte = cluster_size_divisible_volume_size - i[1]
    unknown_taken_up_size_in_cluster = unknown_taken_up_size_in_byte / cluster_size_in_byte
    print((i[0], unknown_taken_up_size_in_byte, unknown_taken_up_size_in_cluster))

print("exfat:")
for i in [
        (64, 15792537600),
        (128, 15792472064),
        (256, 15792340992),
        (512, 15792078848),
        (1024, 15791554560),
        (2048, 15789457408),
        (4096, 15783165952)
]:
    meh(i)

print("ntfs:")
for i in [
        (4, 15794679808),
        (8, 15794675712),
        (16, 15794667520),
        (32, 15794667520),
        (64, 15794634752),
        (128, 15794569216),
        (256, 15794438144)
]:
    meh(i)

And here's the output:

exfat:
(64, 2097152, 32.0)
(128, 2097152, 16.0)
(256, 2097152, 8.0)
(512, 2097152, 4.0)
(1024, 2097152, 2.0)
(2048, 4194304, 2.0)
(4096, 8388608, 2.0)
ntfs:
(4, 0, 0.0)
(8, 0, 0.0)
(16, 0, 0.0)
(32, 0, 0.0)
(64, 0, 0.0)
(128, 0, 0.0)
(256, 0, 0.0)

The answer for the case of NTFS is simple: when the cluster size gets bigger, the unusable / "non-clusterable" "remainder" of the partition gets bigger.

For the exFAT case, that is one of the reasons as well, but it is more complicated since as per the reported capacity you got, at least 2MiB would been taken up for unknown purpose, and it gets even more complicated as, apparently, that taken up part would be at least 2-cluster big.

I'm not familiar with the internals of exFAT though, so I have no info about that 2MiB / 2-cluster taken up part to offer.


According to some research and tests I have done (with exfatprogs), it seems that the 2MiB is a choice for "Cluster Heap Offset", which consists of a half-size "FAT Offset". (Basically, 1MiB-alignment, which is consistent with the partitioning behavior in Windows.)

Also, apparently "FAT Length" is often the same size as the cluster size, and Microsoft seems to have chosen to make sure that "FAT Offset" is always half of "Cluster Heap Offset", so when cluster size and in turn "FAT Length" exceeds 1MiB, "FAT Offset" will be equated to "FAT Length", which results in the "Cluster Heap Offset" becoming 2-cluster big. (The behavior is NOT observed / the default in exfatprogs' mkfs.exfat.)

EDIT: As I had thought of but not written, instead of having "FAT Offset" being half of "Cluster Heap Offset", "FAT Offset" can be 1-MiB all / most of the time, i.e., the remaining padding / gap, if any, in "Cluster Heap Offset" resides after the FAT instead of before.

I haven't really check the formations produced in Windows with dump.exfat in exfatprogs though. In case you want to know the exact and confirmed details, you can try the program yourself in a Linux environment (maybe even, WSL).


By the way, to state the obvious, the reported capacities in your tables are cluster size * number of clusters. In other words, (sizes of the) data and metadata in any of the cluster are irrelevant to the numbers.

1
  • "the unusable / "non-clusterable" "remainder" of the partition gets bigger" - Right! IOW not all space within the partition is actually 'taken' by the file system. There seems so some rounding or aligning taking place. Commented May 6, 2023 at 20:42
2

I'll try to attempt to answer this question, that truly seems at first sight to be contrary to logic.

First to clarify the terminology used in the post. Cluster size is not the same as block size, as block size is determined by the hardware, but a cluster contains multiple blocks and is the allocation unit for the disk.

On the one hand, the larger the cluster size, the less clusters there are on the disk, so less overhead is required to manage the clusters for allocation bitmaps and FAT entries.

On the other hand, the exFAT disk format (really all formats) allocates space by clusters, so that if data (of any kind) does not occupy exactly a whole cluster, then the remaining space is wasted.

My idea is that not only files can waste space this way, but also disk tables (or data structures) that are allocated as part of the exFAT disk.

Looking at the exFAT file system specification, I tried to count the defined areas (or regions).

My count was around 15 regions that are allocated when the exFAT format is created and that which make up its structure.

Each of these areas does not contain more data when larger and less clusters are defined, some are actually smaller. The space occupied by some of these regions is counted in clusters, so when enlarging the clusters, the wasted space is also enlarged.

This might explain some of the waste of usable space, but the irregularities in the waste measurements by the poster may also point to either errors in the allocation of these tables, or to missing information in the documentation.

4
  • AFAICT, the reported capacities given in the OP's tables isn't even "usable capacities" (aka the "numerator"), but like "total capacities" (aka the "denominator"). In other words, it isn't about the problem of small-file-big-cluster.
    – Tom Yan
    Commented May 6, 2023 at 18:54
  • @TomYan: This isn't what I say - I don't talk about files but about exFAT data structures. It actually explains the numbers found by Joep van Steen.
    – harrymc
    Commented May 6, 2023 at 18:56
  • Well, at least that's what one of your "hands" (paragraph 4) said, and even followed by My idea is that not only files can waste space this way, but also disk tables that are allocated for exFAT.
    – Tom Yan
    Commented May 6, 2023 at 18:59
  • Right, if we consider 'reserved sectors', file allocation table(s) regions as you way, the spacing of these affect space remaining for 'data' or the actual space for clusters. Commented May 6, 2023 at 20:46

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .