646

Does it matter how many files I keep in a single directory? If so, how many files in a directory is too many, and what are the impacts of having too many files? (This is on a Linux server.)

Background: I have a photo album website, and every image uploaded is renamed to an 8-hex-digit id (say, a58f375c.jpg). This is to avoid filename conflicts (if lots of "IMG0001.JPG" files are uploaded, for example). The original filename and any useful metadata is stored in a database. Right now, I have somewhere around 1500 files in the images directory. This makes listing the files in the directory (through FTP or SSH client) take a few seconds. But I can't see that it has any effect other than that. In particular, there doesn't seem to be any impact on how quickly an image file is served to the user.

I've thought about reducing the number of images by making 16 subdirectories: 0-9 and a-f. Then I'd move the images into the subdirectories based on what the first hex digit of the filename was. But I'm not sure that there's any reason to do so except for the occasional listing of the directory through FTP/SSH.

0

23 Answers 23

852
+50

FAT32:

  • Maximum number of files: 268,173,300
  • Maximum number of files per directory: 216 - 1 (65,535)
  • Maximum file size: 2 GiB - 1 without LFS, 4 GiB - 1 with

NTFS:

  • Maximum number of files: 232 - 1 (4,294,967,295)
  • Maximum file size
    • Implementation: 244 - 26 bytes (16 TiB - 64 KiB)
    • Theoretical: 264 - 26 bytes (16 EiB - 64 KiB)
  • Maximum volume size
    • Implementation: 232 - 1 clusters (256 TiB - 64 KiB)
    • Theoretical: 264 - 1 clusters (1 YiB - 64 KiB)

ext2:

  • Maximum number of files: 1018
  • Maximum number of files per directory: ~1.3 × 1020 (performance issues past 10,000)
  • Maximum file size
    • 16 GiB (block size of 1 KiB)
    • 256 GiB (block size of 2 KiB)
    • 2 TiB (block size of 4 KiB)
    • 2 TiB (block size of 8 KiB)
  • Maximum volume size
    • 4 TiB (block size of 1 KiB)
    • 8 TiB (block size of 2 KiB)
    • 16 TiB (block size of 4 KiB)
    • 32 TiB (block size of 8 KiB)

ext3:

  • Maximum number of files: min(volumeSize / 213, numberOfBlocks)
  • Maximum file size: same as ext2
  • Maximum volume size: same as ext2

ext4:

16
  • 27
    I assume these are the maximum number of files for the entire partition, not a directory. Thus, this information isn't too useful regarding the problem, because there'd be an equal number of files regardless of the method (unless you count directories as files).
    – strager
    Commented Jan 21, 2009 at 19:28
  • 28
    Since we're in 2012 now, I think its time to make clear that ext4 doesn't have any limit concerning the number of subdirectories. Also maximum filesize grew to 16 TB. Furthermore, the overall size of the filesystem may be up to 1 EB = 1,048,576 TB.
    – devsnd
    Commented Jun 25, 2012 at 23:13
  • 9
    Apparently, ext3 also has a limit of 60,000 files(or directories or links) per directory. I found out the hard way about this.
    – stackular
    Commented Mar 13, 2014 at 20:59
  • 13
    Old answer, I know… but when you write EXT4Maximum number of files: 2³² - 1 (4,294,967,295) and Maximum number of files per directory: unlimited you really confused me because 2³² - 1 != “unlimited”. I guess I need a coffee now. ;) Nevertheless +1
    – e-sushi
    Commented Aug 21, 2015 at 22:32
  • 21
    hard filesystem limits do not answer the question "Does it matter how many files I keep in a single directory?"
    – Etki
    Commented Dec 30, 2016 at 10:30
214

I have had over 8 million files in a single ext3 directory. libc readdir() which is used by find, ls and most of the other methods discussed in this thread to list large directories.

The reason ls and find are slow in this case is that readdir() only reads 32K of directory entries at a time, so on slow disks it will require many many reads to list a directory. There is a solution to this speed problem. I wrote a pretty detailed article about it at: http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/

The key take away is: use getdents() directly -- http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html rather than anything that's based on libc readdir() so you can specify the buffer size when reading directory entries from disk.

3
  • 7
    Interesting read! Can I ask in what situation you had 8 millions files in one directory? haha Commented Aug 3, 2016 at 10:52
  • I had the same. I have migrated a table's blob column, each blob column I have exported as a file. It's around 8 million files :)
    – RBB
    Commented Aug 26, 2019 at 14:42
  • 3
    Using ls -U can help at times (list entries in directory order) - it probably write out entries as they are read (not needing to wait for the whole directory be loaded first).
    – miku
    Commented Oct 12, 2020 at 12:43
86

I have a directory with 88,914 files in it. Like yourself this is used for storing thumbnails and on a Linux server.

Listed files via FTP or a php function is slow yes, but there is also a performance hit on displaying the file. e.g. www.website.com/thumbdir/gh3hg4h2b4h234b3h2.jpg has a wait time of 200-400 ms. As a comparison on another site I have with a around 100 files in a directory the image is displayed after just ~40ms of waiting.

I've given this answer as most people have just written how directory search functions will perform, which you won't be using on a thumb folder - just statically displaying files, but will be interested in performance of how the files can actually be used.

4
  • 7
    This is the only useful answer. We've made similar experiences. Our limit is 1.000 files to reduce problems with backups (too much directories slow down, too).
    – mgutt
    Commented Aug 1, 2012 at 18:15
  • 1
    It can be useful to mount a drive with noatime as well: howtoforge.com/… and read this, too: serverfault.com/questions/354017/…
    – mgutt
    Commented Aug 1, 2012 at 18:29
  • 3
    What filesystem are you using where it slows down so much? XFS, for example, should be able to easily handle 100,000 files in a directory without any noticeable slowdown.
    – Ethan
    Commented Mar 21, 2013 at 15:38
  • 2
    Contradicting the opinion of most others, I want to confirm this answer. We have hundreds of thousands of images in our social network website. In order to improve the performance we were forced to have 100 (or 1000 for some files) sub directories and distribute the files into them (ext3 on linux+ Apache for us).
    – wmac
    Commented Jul 24, 2014 at 17:58
57

It depends a bit on the specific filesystem in use on the Linux server. Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.

So speed shouldn't be an issue, other than the one you already noted, which is that listings will take longer.

There is a limit to the total number of files in one directory. I seem to remember it definitely working up to 32000 files.

6
  • 4
    Gnome and KDE load large directories at a snails pace, windows will cache the directory so its reasonable. I love Linux, but kde and gnome are poorly written.
    – rook
    Commented Apr 20, 2010 at 2:26
  • 1
    And ext4 seems to have the equivalent of dir_index on by default. Commented Feb 22, 2012 at 13:22
  • 24
    There is a limit of around 32K subdirectories in one directory in ext3, but the OP is talking about image files. There is no (practical?) limit on files in an ext3 file system with Dir Index enabled. Commented May 31, 2012 at 4:41
  • 3
    This answer is outdated, nowadays the default is ext4.
    – user3064538
    Commented Jan 11, 2019 at 9:44
  • 4
    "There is no (practical?) limit on files in an ext3 file system with Dir Index enabled" - I just ran out of file space in a directory on a 4TB ext4 filesystem, with dir_index enabled. I had about 17 million files in the directory. The answer was to turn on large_dir with tune2fs.
    – lunixbochs
    Commented Feb 6, 2020 at 20:09
50

Keep in mind that on Linux if you have a directory with too many files, the shell may not be able to expand wildcards. I have this issue with a photo album hosted on Linux. It stores all the resized images in a single directory. While the file system can handle many files, the shell can't. Example:

-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long

or

-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long
6
  • 34
    @Steve, use find(1) and/or xargs(1) for these cases. For the same reason it's a good idea to use such tools in scripts instead of command line expansion.
    – Dave C
    Commented Jan 21, 2009 at 21:25
  • 3
    @Steve do you see performance going down when the number of files in a folder increase? Or is there no relation?
    – Pacerier
    Commented Mar 5, 2012 at 16:40
  • 7
    This is a good point but to nitpick, the reason given is wrong. The Argument list too long is a limitation not of the shell, but of the system's exec implementation. The shell typically can expand the wildcard just fine - it's the call to exec with that many arguments that returns the error.
    – jw013
    Commented Nov 30, 2012 at 20:34
  • I had the same error last night (Fedora 15) with "rm" (somefiles*) with about ~400,000 files in a directory. I was able to trim the older files with "find" to the point where I could "rm" with a wildcard.
    – Jay Brunet
    Commented Mar 15, 2013 at 3:47
  • 1
    10.000.000 files to a directory on etx4 works fine. Not much of a performance hit when accessing. But rather slow with wildcard. Be careful when using shell programs that likes to sort filenames! :) Commented Dec 3, 2018 at 18:30
32

I'm working on a similar problem right now. We have a hierarchichal directory structure and use image ids as filenames. For example, an image with id=1234567 is placed in

..../45/67/1234567_<...>.jpg

using last 4 digits to determine where the file goes.

With a few thousand images, you could use a one-level hierarchy. Our sysadmin suggested no more than couple of thousand files in any given directory (ext3) for efficiency / backup / whatever other reasons he had in mind.

2
  • 2
    This is a pretty nice solution. Every level of your directory down to the file would have at most 100 entries in it if you stick with the 2 digit breakdown, and the bottom most directory would just have 1 file.
    – RobKohr
    Commented May 31, 2016 at 12:51
  • PHP implementation: stackoverflow.com/a/29707920/318765
    – mgutt
    Commented Nov 19, 2019 at 12:21
30

For what it's worth, I just created a directory on an ext4 file system with 1,000,000 files in it, then randomly accessed those files through a web server. I didn't notice any premium on accessing those over (say) only having 10 files there.

This is radically different from my experience doing this on ntfs a few years back.

2
  • 1
    what kind of files? text or images?i am on ext4 and have to import 80000 images in a single directory under wordpress and would like to know if it will be okay
    – Yvon Huynh
    Commented Aug 19, 2017 at 20:29
  • 5
    @YvonHuynh: The kind of file is completely irrelevant. The overhead in the directory of listing/tracking the file is the same regardless. Commented Aug 19, 2017 at 21:11
18

I've been having the same issue. Trying to store millions of files in a Ubuntu server in ext4. Ended running my own benchmarks. Found out that flat directory performs way better while being way simpler to use:

benchmark

Wrote an article.

3
  • 2
    A link to a solution is welcome, but please ensure your answer is useful without it: add context around the link so your fellow users will have some idea what it is and why it’s there, then quote the most relevant part of the page you're linking to in case the target page is unavailable. Answers that are little more than a link may be deleted. Commented Dec 22, 2018 at 4:32
  • 4
    Interesting. We found that after even 10,000 files the performance degraded very very quickly to the point of being unusable. We settled with breaking the files into subdirectories of about 100 at each level to achieve optimum performance. I guess the moral of the story is to always benchmark it for yourself on your own systems with your own requirements. Commented Oct 16, 2019 at 20:22
  • 3
    Reading the article, I didn't understand the conclusions. You conclude "use a flat structure" in the first part. Then "I could not run the benchmark with many more files" (thus, no real benchmark here?). Finally, "stick to common sense and use a deep directory structure". Also, you might want to write ms instead of s in the bars on the left on your plots. Commented Aug 21, 2020 at 7:51
13

The biggest issue I've run into is on a 32-bit system. Once you pass a certain number, tools like 'ls' stop working.

Trying to do anything with that directory once you pass that barrier becomes a huge problem.

0
10

It really depends on the filesystem used, and also some flags.

For example, ext3 can have many thousands of files; but after a couple of thousands, it used to be very slow. Mostly when listing a directory, but also when opening a single file. A few years ago, it gained the 'htree' option, that dramatically shortened the time needed to get an inode given a filename.

Personally, I use subdirectories to keep most levels under a thousand or so items. In your case, I'd create 256 directories, with the two last hex digits of the ID. Use the last and not the first digits, so you get the load balanced.

3
  • 6
    If the filenames were completely random, it wouldn't matter which digits were used.
    – strager
    Commented Jan 21, 2009 at 19:30
  • Indeed, these filenames are generated randomly.
    – Kip
    Commented Jan 21, 2009 at 19:58
  • 2
    Or use the first N bytes of the SHA-1 digest of the filename.
    – gawi
    Commented Mar 4, 2015 at 20:49
8

If the time involved in implementing a directory partitioning scheme is minimal, I am in favor of it. The first time you have to debug a problem that involves manipulating a 10000-file directory via the console you will understand.

As an example, F-Spot stores photo files as YYYY\MM\DD\filename.ext, which means the largest directory I have had to deal with while manually manipulating my ~20000-photo collection is about 800 files. This also makes the files more easily browsable from a third party application. Never assume that your software is the only thing that will be accessing your software's files.

2
  • 7
    I advertise against partitioning by date because bulk imports might cluster files at a certain date.
    – max
    Commented Jan 27, 2009 at 21:31
  • A good point. You should definitely consider your use cases before picking a partitioning scheme. I happen to import photos over many days in a relatively broad distribution, AND when I want to manipulate the photos outside F-Spot date is the easiest way to find them, so it's a double-win for me.
    – Sparr
    Commented Jan 28, 2009 at 0:28
7

It absolutely depends on the filesystem. Many modern filesystems use decent data structures to store the contents of directories, but older filesystems often just added the entries to a list, so retrieving a file was an O(n) operation.

Even if the filesystem does it right, it's still absolutely possible for programs that list directory contents to mess up and do an O(n^2) sort, so to be on the safe side, I'd always limit the number of files per directory to no more than 500.

7

"Depends on filesystem"
Some users mentioned that the performance impact depends on the used filesystem. Of course. Filesystems like EXT3 can be very slow. But even if you use EXT4 or XFS you can not prevent that listing a folder through ls or find or through an external connection like FTP will become slower an slower.

Solution
I prefer the same way as @armandino. For that I use this little function in PHP to convert IDs into a filepath that results 1000 files per directory:

function dynamic_path($int) {
    // 1000 = 1000 files per dir
    // 10000 = 10000 files per dir
    // 2 = 100 dirs per dir
    // 3 = 1000 dirs per dir
    return implode('/', str_split(intval($int / 1000), 2)) . '/';
}

or you could use the second version if you want to use alpha-numeric characters:

function dynamic_path2($str) {
    // 26 alpha + 10 num + 3 special chars (._-) = 39 combinations
    // -1 = 39^2 = 1521 files per dir
    // -2 = 39^3 = 59319 files per dir (if every combination exists)
    $left = substr($str, 0, -1);
    return implode('/', str_split($left ? $left : $str[0], 2)) . '/';
}

results:

<?php
$files = explode(',', '1.jpg,12.jpg,123.jpg,999.jpg,1000.jpg,1234.jpg,1999.jpg,2000.jpg,12345.jpg,123456.jpg,1234567.jpg,12345678.jpg,123456789.jpg');
foreach ($files as $file) {
    echo dynamic_path(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>

1/1.jpg
1/12.jpg
1/123.jpg
1/999.jpg
1/1000.jpg
2/1234.jpg
2/1999.jpg
2/2000.jpg
13/12345.jpg
12/4/123456.jpg
12/35/1234567.jpg
12/34/6/12345678.jpg
12/34/57/123456789.jpg

<?php
$files = array_merge($files, explode(',', 'a.jpg,b.jpg,ab.jpg,abc.jpg,ddd.jpg,af_ff.jpg,abcd.jpg,akkk.jpg,bf.ff.jpg,abc-de.jpg,abcdef.jpg,abcdefg.jpg,abcdefgh.jpg,abcdefghi.jpg'));
foreach ($files as $file) {
    echo dynamic_path2(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>

1/1.jpg
1/12.jpg
12/123.jpg
99/999.jpg
10/0/1000.jpg
12/3/1234.jpg
19/9/1999.jpg
20/0/2000.jpg
12/34/12345.jpg
12/34/5/123456.jpg
12/34/56/1234567.jpg
12/34/56/7/12345678.jpg
12/34/56/78/123456789.jpg
a/a.jpg
b/b.jpg
a/ab.jpg
ab/abc.jpg
dd/ddd.jpg
af/_f/af_ff.jpg
ab/c/abcd.jpg
ak/k/akkk.jpg
bf/.f/bf.ff.jpg
ab/c-/d/abc-de.jpg
ab/cd/e/abcdef.jpg
ab/cd/ef/abcdefg.jpg
ab/cd/ef/g/abcdefgh.jpg
ab/cd/ef/gh/abcdefghi.jpg

As you can see for the $int-version every folder contains up to 1000 files and up to 99 directories containing 1000 files and 99 directories ...

But do not forget that to many directories cause the same performance problems!

Finally you should think about how to reduce the amount of files in total. Depending on your target you can use CSS sprites to combine multiple tiny images like avatars, icons, smilies, etc. or if you use many small non-media files consider combining them e.g. in JSON format. In my case I had thousands of mini-caches and finally I decided to combine them in packs of 10.

6

ext3 does in fact have directory size limits, and they depend on the block size of the filesystem. There isn't a per-directory "max number" of files, but a per-directory "max number of blocks used to store file entries". Specifically, the size of the directory itself can't grow beyond a b-tree of height 3, and the fanout of the tree depends on the block size. See this link for some details.

https://www.mail-archive.com/[email protected]/msg01944.html

I was bitten by this recently on a filesystem formatted with 2K blocks, which was inexplicably getting directory-full kernel messages warning: ext3_dx_add_entry: Directory index full! when I was copying from another ext3 filesystem. In my case, a directory with a mere 480,000 files was unable to be copied to the destination.

5

What most of the answers above fail to show is that there is no "One Size Fits All" answer to the original question.

In today's environment we have a large conglomerate of different hardware and software -- some is 32 bit, some is 64 bit, some is cutting edge and some is tried and true - reliable and never changing. Added to that is a variety of older and newer hardware, older and newer OSes, different vendors (Windows, Unixes, Apple, etc.) and a myriad of utilities and servers that go along. As hardware has improved and software is converted to 64 bit compatibility, there has necessarily been considerable delay in getting all the pieces of this very large and complex world to play nicely with the rapid pace of changes.

IMHO there is no one way to fix a problem. The solution is to research the possibilities and then by trial and error find what works best for your particular needs. Each user must determine what works for their system rather than using a cookie cutter approach.

I for example have a media server with a few very large files. The result is only about 400 files filling a 3 TB drive. Only 1% of the inodes are used but 95% of the total space is used. Someone else, with a lot of smaller files may run out of inodes before they come near to filling the space. (On ext4 filesystems as a rule of thumb, 1 inode is used for each file/directory.) While theoretically the total number of files that may be contained within a directory is nearly infinite, practicality determines that the overall usage determine realistic units, not just filesystem capabilities.

I hope that all the different answers above have promoted thought and problem solving rather than presenting an insurmountable barrier to progress.

4

The question comes down to what you're going to do with the files.

Under Windows, any directory with more than 2k files tends to open slowly for me in Explorer. If they're all image files, more than 1k tend to open very slowly in thumbnail view.

At one time, the system-imposed limit was 32,767. It's higher now, but even that is way too many files to handle at one time under most circumstances.

1
  • 1
    The question is related to Linux but your answer only gives an answer to a different question on Windows. I don't consider this to be an answer to the question.
    – xdevs23
    Commented Apr 4, 2023 at 18:01
4

I ran into a similar issue. I was trying to access a directory with over 10,000 files in it. It was taking too long to build the file list and run any type of commands on any of the files.

I thought up a little php script to do this for myself and tried to figure a way to prevent it from time out in the browser.

The following is the php script I wrote to resolve the issue.

Listing Files in a Directory with too many files for FTP

How it helps someone

4

I recall running a program that was creating a huge amount of files at the output. The files were sorted at 30000 per directory. I do not recall having any read problems when I had to reuse the produced output. It was on an 32-bit Ubuntu Linux laptop, and even Nautilus displayed the directory contents, albeit after a few seconds.

ext3 filesystem: Similar code on a 64-bit system dealt well with 64000 files per directory.

3

I respect this doesn't totally answer your question as to how many is too many, but an idea for solving the long term problem is that in addition to storing the original file metadata, also store which folder on disk it is stored in - normalize out that piece of metadata. Once a folder grows beyond some limit you are comfortable with for performance, aesthetic or whatever reason, you just create a second folder and start dropping files there...

1

Not an answer, but just some suggestions.

Select a more suitable FS (file system). Since from a historic point of view, all your issues were wise enough, to be once central to FSs evolving over decades. I mean more modern FS better support your issues. First make a comparison decision table based on your ultimate purpose from FS list.

I think its time to shift your paradigms. So I personally suggest using a distributed system aware FS, which means no limits at all regarding size, number of files and etc. Otherwise you will sooner or later challenged by new unanticipated problems.

I'm not sure to work, but if you don't mention some experimentation, give AUFS over your current file system a try. I guess it has facilities to mimic multiple folders as a single virtual folder.

To overcome hardware limits you can use RAID-0.

1

There is no single figure that is "too many", as long as it doesn't exceed the limits of the OS. However, the more files in a directory, regardless of the OS, the longer it takes to access any individual file, and on most OS's, the performance is non-linear, so to find one file out of 10,000 takes more then 10 times longer then to find a file in 1,000.

Secondary problems associated with having a lot of files in a directory include wild card expansion failures. To reduce the risks, you might consider ordering your directories by date of upload, or some other useful piece of metadata.

0

The limits may depend on the length of filenames

Longer filenames generally mean less allowed files.

E.g. when I test 64-byte filenames:

#!/usr/bin/env python

import os
import shutil

tmpdir = 'tmp'
if os.path.isdir(tmpdir):
    shutil.rmtree(tmpdir)
os.mkdir(tmpdir)
for i in range(10000000):
    print(i)
    with open(os.path.join(tmpdir, f'{i:064}'), 'w') as f:
        pass

I get about 5.6 million files on an ext4 system as mentioned at: Python causing: IOError: [Errno 28] No space left on device: '../results/32766.html' on disk with lots of space

But if instead I just use:

    with open(os.path.join(tmpdir, f'{i}'), 'w') as f:

which has short filenames such as 0, 1, 2 ... then I can go way past 20 million.

So it might not be easy to put a single number to it.

-3

≈ 135,000 FILES

NTFS | WINDOWS 2012 SERVER | 64-BIT | 4TB HDD | VBS

Problem: Catastrophic hardware issues appear when a [single] specific folder amasses roughly 135,000 files.

  • "Catastrophic" = CPU Overheats, Computer Shuts Down, Replacement Hardware needed
  • "Specific Folder" = has a VBS file that moves files into subfolders
  • Access = the folder is automatically accessed/executed by several client computers

Basically, I have a custom-built script that sits on a file server. When something goes wrong with the automated process (ie, file spill + dam) then the specific folder gets flooded [with unmoved files]. The catastrophe takes shape when the client computers keep executing the script. The file server ends up reading through 135,000+ files; and doing so hundreds of times each day. This work-overload ends up overheating my CPU (92°C, etc.); which ends up crashing my machine.

Solution: Make sure your file-organizing scripts never have to deal with a folder that has 135,000+ files.

Not the answer you're looking for? Browse other questions tagged or ask your own question.