3

I need a file storage solution that will provide read/write for files to a collection of web servers. The space demands are modest -- about 2 TiB right now but will probably grow to twice that. NFS is what is used now and it looked good until I saw that almost all the files are in one single directory. Considering there are about 15 million files right now and the total could grow to 20 or 30 million I am worried that a linux filesystem might have a problem with that many.

I proposed that the application be modifed to split the files up across several sub-directories but the powers-that-be say "no" to that. That seems to leave me with two options:

  1. NFS. This would be the simplest but I am not sure how well it can handle the number of files in the directory.

  2. Cloud storage -- here that means Azure. I don't know enough about cloud storage to have an opinion on expected performance. Also I do not know what kind if rewriting will be necessary. Can object storage in the cloud be made to appear like part of the local file system like I can with NFS?

8

2 Answers 2

1

I just realized I never posted what I finally did to "solve" this.

I built a GlusterFS cluster consisting of four servers. Servers 1 & 2 mirror to each other. Server 3 & 4 mirror to each other. The new files are written alternately to the 1/2 and 3/4 pair. Sort of like a Raid10 for file storage. I think this called a 2x2 cluster by the glusterfs folks.

The volumes are managed by lvm and formatted as XFS.

So far it has held up well. We just passed the 25 million file mark and performance is still acceptable. It takes a while (about 3 hours) to get a listing but I only have to that once per day for statistical purposes. According to df we are using about 5.2T of 8.0T total though bear in mind the actual storage used is twice that because of the mirroring.

Delayed thanks to all who answered. It helped me arrive at a compromise that should hold us for a while.

0

He seems to think this is a systems problem and he is not entirely wrong.

To some extent, yes, older filesystems were really bad at handling millions of files; newer ones do it differently. For example, ext2 and FAT use simple linear lists of files, so the scalability problems were indeed problems with ext2 and FAT, which were subsequently improved to use HTrees or B+trees in ext4 and NTFS.

(Eventually, however, the design of the filesystem can only do so much – I suspect it's not easy to optimize a general-purpose filesystem to handle billions of files per directory on a server and still remain usable for tens of files per directory on a desktop computer without too much overhead...)

But the way you use the filesystem also matters a lot. Even if you have millions of files on e.g. XFS, chances are that direct lookups by exact path will remain reasonably fast, as they only involve reading a small part of the directory data; but trying to list the directory will be much slower in comparison. So your program should be designed to never need to list the entire directory, but to know exactly what files it needs.

(As an analogy, if you use a SQL database, you already know that the correct way to search for data is to let server-side queries of "SELECT WHERE this=that" do the job – you don't usually try to retrieve the entire table every single time and then blame it on your network being too slow.)

NFS. This would be the simplest but I am not sure how well it can handle the number of files in the directory.

NFS doesn't store the directory lists on its own, it only provides access to the capabilities of the remote "storage" filesystem.

So if all your operations only deal with exact paths (i.e. read this specific file, write this file) then the capabilities of NFS itself should be completely irrelevant to your problem, as NFS will never need to look at the complete list of files – it will only forward the exact requested paths to the fileserver, where the NFS server's on-disk filesystem (e.g. zfs or ext4) needs to worry about handling the whole directory listing.

In other words, you're only shifting the problem to a different machine, but it still remains the exact same problem there. (Though the NFS file server certainly could use a filesystem that handles many files better than the one used on the web server, but you can do that locally as well.)

Any strategy I can devise to break up the files between several directoiries would require code changes and the project manager is not willing to do anythng beyond the most trivial of changes.

The most trivial change would be to use part of the file name itself that becomes the subdirectory name, as this makes it easy to find the files later – just apply the same transformation to the file name as you did when storing it.

Take a look at how .git/objects/ works. It can accumulate many object files (especially if you travel back in time to when Git didn't yet have packfiles), so they are separated into subdirectories based on the first 2 digits of the object ID.

For example, the Git object c813a148564a5.. is found at objects/c8/13a148564a5.., using one level of subdirectories with a 8-bit prefix – there are 256 possible subdirectories, and the number of files within each subdirectory is reduced approximately 256 times (e.g. only ~40k files per directory, in a 10-million-object repository) – and the software knows exactly where to find each object knowing only its name.

If you want to spread files out even more, you can use longer subdirectory names (e.g 12-bit for 1/4096) or even create a second level of subdirectories.

This works best if the names are evenly distributed, like hash-based names usually are. If your file names tend to start with the same text over and over, you should hash the names to avoid that (and store the mapping of real name to hash name in a database).

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .