0

I am generating several million jpg images using a python script for a deep learning project. I am storing these on a 14TB hard drive on my Mac. The total size of the image folder containing the images is projected to be about 1.2TB.

I notice that as the script proceeds (gets up into the 400-500k range), it starts to slow and the images don't immediately show up on the HD. For example, I stopped the script at 8am this morning but the most recent image was saved to the HD at 03:46. If I relaunch finder (it's a Mac), more of the most recently saved images are shown, but not all.

Now I am finding when I open the HD and look in the folder, nothing is shown even though if I look at it via terminal I can see all the images.

This to me seems like a bandwidth issue with the HD saving the images and that it is still saving images well after I stopped the script i.e. trying to catch up.

I am no expert with this sort of thing but I wonder if someone could help

  1. Pin point the issue (it may be simple...

  2. Give me some advice on the best way to fix this problem. Is there a better way to format the HD so that one can store millions of small files?

Lastly, I should add that I have removed the HD from Spotlight on my mac (to prevent it from trying index all the files)...so I don't think this is the issue.

4
  • 3
    As a general rule of thumb: Don't stuff a million files in a single folder. Break it up as it will help most tools to have an easier time. With your Mac it's likely trying to get the properties of those 500k files and maybe even generator a preview. That takes quite a bit of time.
    – Seth
    Commented Jan 24, 2020 at 14:25
  • 2
    The HD is doing fine, otherwise you wouldn't see the files in Terminal. Any "bandwith" issue is with Finder here, it may just be overwhelmed by the number of files involved.
    – nohillside
    Commented Jan 24, 2020 at 15:39
  • 2
    try putting them in 26 folders, one for each letter of the alphabet the file starts with. if they are all the same letter, change it somehow to better distribute the files across folders.
    – dandavis
    Commented Jan 24, 2020 at 17:32
  • All of these comments point to the same answer which is correct; I have re-written the code to break the files up and create multiple folders and it runs perfectly. If one of you wants to take the honours....
    – GhostRider
    Commented Jan 25, 2020 at 11:46

0

You must log in to answer this question.

Browse other questions tagged .