I need to copy some data that includes 900,000 tiny files totaling around 30 gigabytes to a windows computer. However, it needs to be able to be copied and set up in under an hour and a half, and it can take 5+ hours to copy off a USB hard drive mostly due to the crazy amount of files. Is there a better/faster way to deal with this, such as doing some sort of block copying? Thanks

    That's going to be a very aggressive target for any file-based copy out of a single, non-SSD disk. Just enumerating all the files and dealing with the filesystem entries and metadata for that many files takes a significant amount of time.
    Are you able to just remove the hard drive and place in the destination? this would be the fastest speed, transferring over the motherboard backplane.
With large number of files, it is best to create tar archive so that you have less files to deal with. If you are using USB 2.0 external hard drive you should look at using USB 3.0, eSATA, or use a fast local network.

What is your source OS? If both of your operating systems were Linux, you could pipe the files through tar, gzip, and ssh to the target machine. You could install cygwin on Windows to get this type of functionality for Windows as well.

"Using Tar and SSH to improve SCP Speeds" post describes the commands needed to send tar contents over ssh.

  • tar will still read all the files one by one. Commented Dec 5, 2011 at 23:19
  • Yes, but you would do it only once and other commands like scp would not be delayed by dealing with individual files. If files can be kept in tar archive on the target machine, then you are also saving time on writing one file instead of many.
  • I assume that the USB connection is the bottleneck. So taking a raw image of the drive as @arcyqwerty suggests seems like the best solution to me. I'd be curious to see some actual results though, maybe I'm completely wrong! Commented Dec 6, 2011 at 0:47
    @TomA - In my experience when dealing with lots of tiny files, the hard drive is the bottleneck. Each file read requires a head-seek to the Allocation bitmap, and then another head-seek to the actual file contents. As a result, the hard drive spends the majority of it's time seeking, and very little time actually reading files.
    As an example, with 1K files, I get ~1-5 MBps over my gigabit lan. With large files (gigabytes each), I get ~80-90 MBps. It's all about the sequential reads.
If the hard drive can be removed from the USB interface and onto SATA/ATA, I would install it in the destination computer. You'll get much faster transfer speeds, as others have noted. For copying, assuming you're on Windows, I would do a simple ROBOCOPY. It's about as fast as you can really hope for, though there are other alternatives.

ROBOCOPY /E /B /MT 'source' 'destination'

I'd try to avoid compressing all these files though, there is a good chance the time to compress and move would exceed simply moving the files.

*Added the /MT option for robocopy. It can greatly speed up transfers when you're in a multi-thread environment.

  • Will /MT help on local single disk transfers, or cause massive disk thrashing? It's common for copying large shares between servers, but they often have both added latency due to going over the network and improved IOPS from RAID arrays.
  • I can't really comment too heavily on this aspect. I use robocopy to backup all of my VS projects every night, though its only around 10GBs worth. Only takes a matter of moments since files that already exist aren't copied. I did find these Robocopy benchmarks a few months ago: demartek.com/Reports_Free/… Commented Dec 6, 2011 at 14:51
  • in my experience, if the USB to SATA adapter in your drive enclosure supports native command queuing then i get a significant performance boost from the /MT switch.
You could try taking an image of the entire folder/drive

On Linux systems you can use dd to get a raw copy of the filesystem and copy it as a single large file.

To extract the image onto Windows you may need to install cygwin or a program that is able to process dd images.

  • That's the ticket. Commented Dec 5, 2011 at 23:20

Install the disk where the source files reside in the computer you're setting up and transfer disk to disk. Forget USB. Even transferring over the wire (network the machines together) would be faster than USB (Assuming usb 2.0 and 1Gbps NIC). If this is a recurring event, look into replication instead.


Use rsync with z option.

-z,         --compress              compress file data during the transfer
            --compress-level=NUM    explicitly set compression level
            --skip-compress=LIST    skip compressing files with suffix in LIST

This will increase speed of transfer over the network. So most probably not useful in your case.

After some more reading I realized that, We should not use the ‘-z’ flag when copying data from one local hard drive to another, it increases overhead. Thank to comment from @FakeName.

    rsync does not compress for local transfers (or over a LAN, I believe). Also, this would have no benefit, since the real issue is the time taken to read the files, not send them over the wire (as each file read will take two seek operations at minimum).
  • @FakeName +1 you are right, after some reading I got your point. I have updated my answer. Commented Dec 6, 2011 at 4:49

You are fast approaching the limitations of your hard drive. In fact, with current commodity drives, it is impossible to meed your transfer time with a per-file copy operation.

Assuming each file requires 1 HD seek and seek time is 7 ms (which is a bit idealized, realistically, each file will require two seeks, unless the volume bitmap is cached in ram), at best you will manage ~142 files/sec (\$\frac{1000}{7} = 142.8...\$).

With the OP specs (30 GB, 900,000 files) that is ~33K per file (\$\frac{30,000,000k}{900,000} = ~33.3...\$. 33KB * 142 = 4.68 MBps.

The minimum time to transfer 30 GB at 5 MBps, is ~1 hour, 40 minutes (\$\frac{30,000}{5}= = 6,000\$ seconds. \$ \frac{6,000}{60} = 100\$ minutes, or 1:40 hours)

Therefore, it is impossible to achieve a speed better then ~5 MBPS, and that is with an ideal drive (and fewer seek operations. This is for one seek per file. Realistically, it would be two). You are limited entirely by disk performance.

The only way to get performance better then this is to copy the entire file-system & partition sequentially. dd can do this on linux.

What are you trying to do?

Try 7zip to archive the files into one file only. If possible, use WLAN with a adhoc connection to a notebook.

    or better yet, a direct ethernet connection, preferably gig-e
Related to @arcyqwery 's answer, you might be able to compress it, and simultaneously turn it into one large file. This will speed up the process some.

It might also be worth it to use a program like TeraCopy, as it usually is faster than the default windows copy performance. You should test with similar circumstances in your case to check though.


I had a similar case. I turned off Anti-virus, copy speed changed from 3MB/s to 12MB/s.

