0

I'm working on a system and we need to do some refactoring on the upload service, that handles all files and images uploads to the system. Nothing fancy, but I it got me thinking about the folder structure we are using, and how others do, if there is any benefits doing one way or another.

For example, the folder structure we currently use in our server is:

- assets/
  - images/
    /- Images that are static, such as the website logo, favicon, etc..

  - gallery/
    /- Images uploaded by the users {jpg, png, webp}

  - files/
    /- Files uploaded by the users {pdf, txt, csv, xml}

Other services, Wordpress for example, stores it in a folder structure split by year and month.

Is there any benefits doing one way or the other? Even other structures not mentioned here? Either in terms of code development or even performance.


Just to mention, I'm using PHP and MySql on the backend, so for each file/image uploaded, we do a hash to name the file on the server, so if an image is called cute-dog.jpg on the server it would be something like 123456789abc.jpg. Then, on the database, we store original name, hashed name as well as the key and path to remove the image when necessary.

3 Answers 3

5

This is really a question of: What are your requirements?

It's quite likely that content your team creates comes from a source control system and is assembled as part of a build process, so you probably want to keep it seperate from any end user content.

If you have a requirement to quickly identify all content uploaded by a particular customer, it may make sense to group customer content in directories named for that customer.

If customers have content that is either public (accessible without authentication) or private (requires some security checks) you may choose to group them separately as part of a defense in depth strategy. This may also make sense if you plan to put a CDN in front of the public content / set different cache control headers.

Do you have any SEO requirements for publicly accessible content - i.e. the files should have specific file names - rather than hashes.

How you are going to scale your application beyond a single server? Specifically how is the user content going to be shared/replicated between servers? Using directory names that include upload dates may be a simply way to identify content that needs to be replicated (or rather ignore all directories older than a set age).

Another option would be not to store user uploaded content on the server at all, instead use a cloud storage option like AWS S3.

Finally (back in the day) we used to have problems when a single directory had huge numbers of files in it, you should verify what happens on your OS/Filesystem, when you stores large numbers of files in single directory - if thats a problem you may choose to introduce additional structure to avoid any directory filling up with too many files, note I can't predict what sort of issues you might encounter, but some things to check might be:

  • You can still write new files?
  • You can still read/list all the files?
  • Does a large directory have any impact of performance?
1

There are benefits, yes:

        store files' metadata in database to decouple searching files from the operating system (OS), that is searching files without running OS specific commands.

        store files based on file type to support scaling, moving media files (video, audio and images) to media servers.

When lack of requirements the folder structure could follow SOLID principles that although are object oriented programming principles could be applied to develop a resilient setup. Other alternatives would be to use document based databases or content repositories.

0

stores it in a folder structure split by year and month.

The only thing to be aware of is that in most systems, the more items you have in a folder the more performance deteriorates. This probably won't be noticeable until you have tens of thousands of items, but if you put a million items in an NTFS folder you will definitely notice.

Hence scalable systems tend to be multi-level.

Not the answer you're looking for? Browse other questions tagged or ask your own question.