7

I have 16 subdirectories which all contain somewhere between 1m-1.5m files each (roughly 18m files in total), but I need all the files to be in a single directory. Each file is tiny (35-100 bytes each). The total combined size of the files is relatively small - around 600mb - but it appears to be the sheer amount of them that's causing the issues.

So far I've tried:

Windows move: Didn't even get started. It said it would take 'about a day' to calculate the move. Gave up after 2 hours of calculating.

DOS move: This works great for the first 500-600k files (moving around 10k files per second), but starts to slow down noticeably as it drags towards the million mark, doing about 100 files every 2 seconds.

7Zip: I've read suggestions that zipping up the entire folder and then extracting it in the destination would be WAY quicker; however using the GUI it just crashed explorer after a few minutes; using the command line was incredibly slow (100 files every few seconds)

DOS robocopy: Having already moved ~1m files yesterday, I ran robocopy src_folder dest_folder *.log just to shift the last of what was in the first directory. It took 27 minutes to move ~12k files.

No matter what method I choose, it seems that the number of files in the destination folder is what causes the issue. If there are more than a million files in the destination, the move/copy slows to an absolute crawl regardless of the method.

Any ideas on how to achieve this that won't take days/weeks? For reference it's on a single SSD on a single machine: 64-bit, 16gb RAM, 8 threads.

10
  • I'd put money on this being a combination of two factors: NTFS being dog-slow at anything & your processes trying to hold the entire move in RAM, hence themselves going into paging before you get very far. You might need more of an iterative process to combat the 2nd.
    – Tetsujin
    Commented Jul 6, 2021 at 9:55
  • Just out of idle curiosity I tried this on macOS with an APFS SSD. I could only be bothered waiting for it to generate 100,000 small files, so a much smaller test. That took about 15 mins using a looped mkfile. For the move itself I had to use Finder as bash was going to hit the maxflies limit, so I had to drag & drop. It took about 5 minutes to enumerate the move before it started, but then completed it in about 1 minute.
    – Tetsujin
    Commented Jul 6, 2021 at 10:45
  • 1
    @Tetsujin Yeah I think it's the enumeration that's killing it. I just tried the command-line 7z - took around 50 minutes to zip up ~1m files. Moving that one file (which was only 97mb) took less than a second. Currently unpacking that in the destination folder to see how long it takes.
    – indextwo
    Commented Jul 6, 2021 at 11:12
  • 3
    @indextwo When moving files on the same partition, the files don't actually move AFAIK, their locations are simply updated in the MFT [Master File Table], so the slow down may be due to the temperature of the drive - have you checked if it's quite hot when progress begins to slow? If so, you may want to use a script and pause/sleep for a specified amount of time after doing 500K files. (FYI: moving/copying files is always faster via command line in Windows - leave the Windows Shell [explorer.exe] out of it)
    – JW0914
    Commented Jul 6, 2021 at 11:44
  • 1
    NTFS behaves very terribly when the number of files in a folder reaches 100000s or millions. So moving all of the into a single directory is even worse
    – phuclv
    Commented Jul 6, 2021 at 13:57

2 Answers 2

2

This PowerShell script, which has been tested with many positive responses, invokes Robocopy and is much faster; simply change a few parameters [destination, etc.] and you're good to go:

$max_jobs = 10
$tstart = get-date
$log = "C:\Robo\Logs"

$src = Read-Host -Prompt 'Source path'
  if(! ($src.EndsWith("\") )){$src=$src + "\"}

$dest = Read-Host -Prompt 'Destination path'
  if(! ($dest.EndsWith("\") )){$dest=$dest + "\"}

if((Test-Path -Path $src ))
{
  if(!(Test-Path -Path $log )){New-Item -ItemType directory -Path $log}
  if((Test-Path -Path $dest)){
    robocopy $src $dest
    $files = ls $src

    $files | %{
      $ScriptBlock = {
        param($name, $src, $dest, $log)
        $log += "\$name-$(get-date -f yyyy-MM-dd-mm-ss).log"
        robocopy $src$name $dest$name /E /nfl /np /mt:16 /ndl > $log
        Write-Host $src$name " completed"
      }

      $j = Get-Job -State "Running"
      while ($j.count -ge $max_jobs) 
      {
       Start-Sleep -Milliseconds 500
       $j = Get-Job -State "Running"
      }
      Get-job -State "Completed" | Receive-job
      Remove-job -State "Completed"
      tart-Job $ScriptBlock -ArgumentList $_,$src,$dest,$log
    }

    While (Get-Job -State "Running") { Start-Sleep 2 }
    Remove-Job -State "Completed" 
    Get-Job | Write-host

    $tend = get-date

    Cls
    Echo 'Completed copy'
    Echo 'From: $src'
    Echo 'To: $Dest'
    new-timespan -start $tstart -end $tend

  } else {echo 'invalid Destination'}
} else {echo 'invalid Source'}
2
  • Please quote the essential parts of the answer from the reference link(s), as the answer can become invalid if the linked page(s) change.
    – DavidPostill
    Commented Jul 6, 2021 at 10:10
  • 1
    I initiated this script a little over an hour ago, and it still hasn't actually started copying yet. It does look like it's just a recursive shell for robocopy, which I've already tried direct from the command line.
    – indextwo
    Commented Jul 6, 2021 at 11:21
0

Use DSynchronise, as it's free! The reason why it's a good option for you, is that it doesn't do what Windows Explorer does by counting the amount and size every file in the queue before copying. It just copies straight away.

However you can tick the checkbox so that it will count the disk space first. You can also choose to store a backup of every file that is deleted or overwritten in advance. And you can use preview mode so you can test how the synchronise will occur before you actually do it for real.

Also keep in mind that it doesn't always copy files in alphanumerical order, so if the copying or synchronising suddenly stops halfway to your detriment, then you might have to start again from the beginning.

I find that the old version 2.30.1 is easier to use and faster than the newer version (that was 2.41.1 at the time).

dsyncrhonize 2.30.1

dsynchronize 2.41.1

3
  • 2
    Please read How do I recommend software for some tips as to how you should go about recommending software. You should provide at least a link, some additional information about the software itself, and how it can be used to solve the problem in the question.
    – DavidPostill
    Commented Jul 6, 2021 at 10:11
  • Oooh… the 90's rang & want their web design back ;))
    – Tetsujin
    Commented Jul 6, 2021 at 10:13
  • @DavidPostill I've updated my answer.
    – desbest
    Commented Jul 14, 2021 at 23:47

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .