15

For recursively copying a directory, using tar to pack up a directory and then piping the output to another tar to unpack seems to be much faster than using cp -r (or cp -a).

Why is this? And why can't cp be made faster by doing it the same way under the hood?

Edit: I noticed this difference when trying to copy a huge directory structure containing tens of thousands of files and folders, deeply nested, but totalling only about 50MB. Not sure if that's relevant.

1

2 Answers 2

9

Cp does open-read-close-open-write-close in a loop over all files. So reading from one place and writing to another occur fully interleaved. Tar|tar does reading and writing in separate processes, and in addition tar uses multiple threads to read (and write) several files 'at once', effectively allowing the disk controller to fetch, buffer and store many blocks of data at once. All in all, tar allows each component to work efficiently, while cp breaks down the problem in disparate, inefficiently small chunks.

6
  • Can we really say that's true of all cp implementations? How do we know that's true? And why would cp be written in such an inefficient way? Any textbook implementation of a file copy reads a buffer of n bytes at a time, and writes them to disk before reading another n bytes. But you're saying cp always reads the whole file before writing the whole copy?
    – LarsH
    Commented Mar 13, 2017 at 2:54
  • "uses multiple threads to read (and write) several files 'at once'" -> that makes no sense. One disk can only read or write one file at once. Plus, the fastest way to read / write is to do it in big, contiguous chunks to take advantage of readahead optimizations and minimize seeking (including solid state drives), so using multiple threads to try to parallelize things only makes the process more inefficient.
    – hmijail
    Commented Sep 22, 2020 at 7:00
  • @hmijailmournsresignees Your assumptions are correct for large files, but not for huge numbers of small ones or for heavily fragmented files. In that case it's optimal to group partial operations on multiple files into these contiguous chunks. Modern disks have huge buffers - up to 256 MB for HDDs - that can be employed to shuffle operations to improve efficiency. There's also NCQ which can reorder operations to take advantage of disk's mechanical properties. Disks do read/write concurrently, just not in parallel.
    – gronostaj
    Commented Sep 22, 2020 at 7:31
  • @gronostaj aren't you making my point? It's as simple as "sequential accesses good, random accesses bad". The buffers are there just to compensate for the bottleneck that is just afterwards: the actual storage. Data fragmentation is a problem, which you make even worse by fragmenting the accesses and therefore requiring random accesses. NCQ tries to coalesce those random accesses, but if you do big bulk reads (by avoiding multithreading!) there is nothing to coalesce.
    – hmijail
    Commented Sep 23, 2020 at 0:36
  • @hmijailmournsresignees Increased fragmentation due to concurrent writes is an aspect worth exploring, I think. I suppose a modern FS would try to reduce the impact, but I'm not sure if that's the case and how effective that would be. But for reads I think it's a different story: assuming you're dealing with plenty of small files, they are almost guaranteed to be non-contiguous. NCQ + long read queue should improve the performance. The less concurrent your reads are, the less room for NCQ to work you have.
    – gronostaj
    Commented Sep 23, 2020 at 6:42
1

Your edit goes in the good direction: cp isn't necessarily slower than tar | tar. Depends for example on the quantity and size of files. For big files a plain cp is best, since it's a simple job of pushing data around. For lots of small files the logistics are different and tar might do a better job. See for example this answer.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .