1

I'm currently using the following command to copy a subset of a co-worker's logfiles over to another location for my own records and further analysis.

find . -name '*somestring*' -type f -exec cp -v --update -i {} '//anetworkdrive/logfiles/'  \;

Over time, as the number of files in each location grows, this has been getting slower (obviously), but seems to be slowing down more than I'd expect.

If I run time find . -name '*somestring*' -type f in the source and destination folders, it finds < 1,000 files in each location and that takes about 0.2s (real).

In a scenario where nothing has changed on either end since the last run, I would have thought the above copy command wouldn't take much longer than the find alone. The find returns a list of files in < 1s, and I thought cp --update would then check the modified date on both files (src, dest) very quickly and skip if they match.

However, my full copy command is now taking almost a full minute, making wonder if it's doing a more detailed compare than just mod date, e.g. a full diff or something.

Can someone explain to me why the above command takes so long even when nothing has changed?

And is there a faster way to do this? Would it be faster to pipe the find results to cp?

Thanks.

1
  • 1
    cp can take multiple source definitions. You should absolutely take advantage for that. Also, rsync.
    – Daniel B
    Commented Mar 14, 2018 at 17:52

1 Answer 1

1

OK, so based on the comment from Daniel B above, I tested three methods.

I tested these on a local drive to local drive transfer in which find . -name '*somestring*' found 495 files, averaging 5.8MB, and totaling 2.82GB. The first timing result for each method is with the destination directory empty so all 495 files are copied. The second timing result is with the destination already matching the source so no files are copied.

1 - Using exec from the find command:

time find . -name '*somestring*' -type f -exec cp -v --update -i {} -t '../dst/'  \;
real    2m2.037s
real    0m35.043s

2 - Piping list of files directly to cp:

time find . -name '*somestring*' -type f -print0 | xargs -0 cp -v --update -t '../dst/'
real    1m42.354s
real    0m3.463s

3 - Using rsync

time rsync -vh --update *somestring* '../dst/'
real    1m53.605s
real    0m2.300s

So in this situation rsync basically tied with cp. However, when I went back to my real application of copying from one network location to another, I found rsync took the lead. In my real scenario, piping find to cp took about 15 seconds when the dst directory already matched src, while rsync took about 7 seconds.

So rsync it is!

1
  • Did you try using -exec with + so it take many arguments to a single command? I'd be interested in that time as well, e.g., find ... -exec cp ... -t ../dst/ {} + Commented Mar 14, 2018 at 22:04

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .