4

I'd like to copy files from one external drive to a network connected drive. I want to make sure that there are no errors.

I know of MD5 hash files that I see for large single files that are download from the internet. I also know that I can use the terminal to 'diff' the two sets of files.

However, I'm looking for a piece of software that could confirm that all the files in the file structure have been copied by the finder completely, or a piece of software that would do the copying for me and confirm that all the bits are exactly the same.

The files are video related - and are often over a TB in size (not in one file - complex multi-folder nested files)

I'm fine if there is a terminal command that can do the copy and compare for me as well. (although being able to see the amount of time left on a copy is a very nice feature... additionally, I want to make sure I get ALL of the file - resource forks and all)

3 Answers 3

8

or a piece of software that would do the copying for me and confirm that all the bits are exactly the same.

This is what rsync does. It can copy one file, entire trees of files, just about anything really. And it does it with a reliable algorithm. At the end of an rsync you can be confident the sink got exactly what you sent it from the source.

There's a GUI for rsync called arRsync that can make working with it a little friendlier, though it doesn't work for rsync-over-ssh calls.

The other advantage to rsync is it can do resume-on-interrupt copying. Particularly nice if you're copying lots of files or even big files over networks that are less than reliable.

If you're copying a local drive and a network that are both available to the same Mac you can do:

rsync -avz /Volumes/LocallyAttachedDrive/path/to/big-movie.mov /Volumes/RemoteShare/path/

And you're all done.

If it's a directory (which bundles like .app are) you can do a complete, recursive copy with:

rsync -avz /Volumes/LocallyAttachedDrive/path/to/my-bundle.app /Volumes/RemoteShare/path/

If you don't have Finder/Bonjour-level access to the sink you can do rsync-over-ssh:

rsync -avz /Volumes/LocallyAttachedDrive/path/to/my-bundle.app remoteuser@remotehost:/Volumes/RemoteShare/path/

A shared key can be generated so you don't have to enter a password if you like.

Edit: you can use rsync to verify if two trees are the same with the --dry-run/-n option. If I have the source tree /Volumes/dir and I've copied it to /Volumes/ConnectedDrive/dir I can compare them with:

rsync -acvn /Volumes/dir /Volumes/ConnectedDrive/

The output will tell me if any files need to be copied to the sink in order to make the sink equal the source.

For example, if I sync'ed two trees:

> rsync -avz ./8779 ./a/
building file list ... done
8779/
8779/.DS_Store
8779/logs/
8779/logs/MasterLog.txt
8779/logs/StartLog.txt

sent 893980 bytes  received 98 bytes  596052.00 bytes/sec
total size is 10034671  speedup is 11.22

Comparing them for equality should yield no operations required:

> rsync -acvn ./8779 ./a/
building file list ... done

sent 213 bytes  received 20 bytes  155.33 bytes/sec
total size is 10034671  speedup is 43067.26

If sink was slightly different we'd see output like this:

> echo foo >> a/8779/logs/MasterLog.txt 

> rsync -acvn ./8779 ./a/              
building file list ... done
8779/logs/MasterLog.txt

sent 219 bytes  received 26 bytes  490.00 bytes/sec
total size is 10034671  speedup is 40957.84

Now we know the MasterLog.txt file isn't the same.

3
  • Thanks Ian, am I understanding it correctly that I could run rsync -vcn /Volumes/dir /Volumes/ConnectedServer/dir to have it give me a report that would say if any of the files are 'corrupt'? (If I'd copied some with the Finder already?) Commented Mar 28, 2012 at 19:01
  • 1
    Close. / matter in rsync. Try: rsync -avcn /Volumes/dir /Volumes/ConnectedServer/ That should do the trick. I'll expand my answer with more details.
    – Ian C.
    Commented Mar 28, 2012 at 19:21
  • Thanks for the extended explanation! That is very clear. Thanks again. Commented Mar 28, 2012 at 19:42
1
md5 -r file1 file2

will print the hashes of both files so you can eyeball-compare them. The -r flag prints the hashes first (rather than filename first) so the hashes line up with each other at the left margin - makes it easier to visually compare.

2
  • MD5 has been broken, I suggest replacing it with shasum -a 256 /path/to/file
    – Alexander
    Commented Nov 16, 2014 at 15:53
  • MD5 is only broken for cryptographic use. As long as there is no threat model where an attacker is trying to modify a file without it being detected, MD5 is fine to use here.
    – Mattie
    Commented Oct 23, 2016 at 12:21
0

I'd stick with MD5 if you need absolute verification that the files are the same. You could write a small bash script to recurse the directories and generate / check the hash for each file.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .