0

This is not just for local drive transfers. I've even noticed that when pushing or pulling files to servers, or simply uploading to a service like Google Drive, it is usually significantly faster to zip up many small files (even using just the "store" ZIP option), transfer them, and then unzip them.

If this is the case, why don't most services (like rsync, Windows/macOS file transfer, etc.) that do this automatically? For example, if the user is trying to transfer more than 1000x files of very small size, automatically "zip up" the files to a temporary location, transfer them, and then unzip them? Or maybe do this "on the fly" so that it doesn't take up unnecessary disk space?

9
  • 2
    Can you clarify why you're surprised that a network transfer of 1 million small files is slow?
    – MonkeyZeus
    Commented Sep 15, 2022 at 18:49
  • 1
    Overheads and latency. The reading of files on one side and then writing of on the other will be the same in either case, but the network introduces latency for every file request which says "I'm sending you a file" "Cool brah, send it" "Here it is..." The time cost to zip up all the files in one single file is a far lower than the collective network latency. squillmans answer is pretty much bang on. Both sides have to read all the files anyway, so if they can compress or absorb all the time spent talking to each other then it will be faster.
    – Mokubai
    Commented Sep 15, 2022 at 18:55
  • Actually transferring many small files is not necessarily always slower than transferring a single zipped file. It can be as fast (or possibly faster in some cases) if the transfers are parallelized. I am not aware of transfer protocols with built-in parallelization, but at least some front end tools do that, for instance Filezilla, which can run up to 10 simultaneous transfer tasks. While a task is talking to the remote machine (overheads etc...) another task is effectively transferring data, so that the whole bandwidth is used.
    – PierU
    Commented Sep 15, 2022 at 20:14
  • @Mokubai with reading/writing multiple small files, the file system introduces the overheads too (need to read/write the directory for each file). If you just copy 1 million files 1 KB each on the disk, within the same computer, without network transfer, it will be significantly slower than copying a single 1 GB file. This is particularly visible with slow media like USB drives.
    – raj
    Commented Sep 15, 2022 at 23:11
  • @raj yes, but the overhead there is orders of magnitude lower than the latency involved in network communications. The effect is similar, but when the speed of doing it locally is 10, 100, 1000 or even 10000 times faster then the net difference will end up showing more and more in the end copy speed. Networking is slower than any modern CPU in doing cache lookups, memory copies, compression and other events that are held entirely within a local system. A 1millisecond network latency is far slower than taking 100nano seconds to compress a 1kb file and adding it to an archive buffer in memory.
    – Mokubai
    Commented Sep 15, 2022 at 23:17

1 Answer 1

2

With each file individually, the overhead of a separate network connection is involved for all files which adds a considerable amount to the overall time it takes. If the connections are encrypted, even moreso.

With a single zip file, only a single file gets transferred with only that network overhead. That is significantly faster as the number of files increases.

Why it's not implemented isn't really something we can answer, that's a developer / vendor question. One thing could be that it's not guaranteed that the same zip technology is available on the remote side.

8
  • Packing the files could be implemented within the transfer protocol, and even without the need of a temporary file.
    – PierU
    Commented Sep 15, 2022 at 19:37
  • Sure it could. With existing versions of the protocols, though, there is no option for that. It would be an excellent feature to add, IMHO. Updates to existing transfer protocols could implement this, doing the packing internally.
    – squillman
    Commented Sep 15, 2022 at 19:47
  • I'm not sure, but rsync is maybe working (more or less) like that.
    – PierU
    Commented Sep 15, 2022 at 19:54
  • Could be, I'm not sure either
    – squillman
    Commented Sep 15, 2022 at 19:59
  • Another (obvious) reason why a zipped transfer can be faster is the compression : the size compressed data can be much smaller in some cases.
    – PierU
    Commented Sep 15, 2022 at 20:00

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .