0

What I tend to do to go about making sure I don't lose my data is I pick and choose what files I want to keep, and once I have my large (in bytes, not in heft) external hard disk attached (usually 2TB, and usually USB3 these days), I use the usual drag-and-drop method to move the files to the backup drive.

Then, later on I may go as far as to copy the contents again to a third hard drive.

It's all very ad-hoc.

I know about all the ways to do it "better", but I care a bit more about that which is practical. For example, with a single external hard disk, it means I have one single 2.5" external USB3 drive that I put in the bag along with my MacBook Pro so that I have a total of 2TB external + 256GB local storage.

Anything with more redundancy will necessarily add more bulk to the setup. 2.5" has always been the sweet spot on transfer speed and capacity vs. physical bulk vs USB flash. For proper redundancy I now need to lug around two external drives. I just know I will not end up doing it properly, plugging both in.

Other ways to do it "better" are various programs that invariably cost exorbitant sums of real money that force a certain workflow on you or help you back up entire drives. I usually find that I do not want any of this. However, I am still open to suggestions, in particular something that helps check files after transfer/backup by hashing.

Back to the question at hand. In either Windows (7, 8, 8.1) or OS X (10.8, 10.9) when I tell Explorer (or Finder) to move an enormous directory from local disk to external mounted disk, does the OS hash the files after copying, before erasing the original contents?

In anticipation of some answers, Yes, I know about rsync. Yes I use Time Machine with this external hard disk I just mentioned (it is a 500GB partition on the 2TB disk). I know Time Machine uses rsync under the hood. The problem is that Time Machine will start to lose your stuff, and you have to allocate an enormously larger amount of space to reasonably expect everything to be retrievable. If you have a 4GB file and edit 2 bytes in it, Time Machine will consume an additional 4GB (and take the requisite amount of time copying all of it) at the next backup. These are just a few of the qualities I am aware of that make Time Machine fall well short of being perfect. I am quite happy to let it play around with a 500GB partition, though.

The flow of important files (ignoring all the source code which is already in Git servers) goes as such: First it's on the local disk only and every few days makes its way onto the Time Machine partition. Eventually it may disappear from the Time Machine partition as it fills and past images' granularity becomes decreased. Every few weeks I manually pick out files which take up lots of local space but that I don't see myself using, and I move it to a partition on the external drive. Now it exists as one copy and only lives on the external drive. If I really might need this data I store the data in one more location as well.

If indeed it is the case that telling the OS to move a file does not cause it to verify contents upon delivery, then I have to completely change my protocol. This is because if the target media is faulty then the data will be certain to get corrupted.

During the course of writing this question I am starting to think that I asked the wrong question. Perhaps it is just much smarter to simply always use two backup targets and change the protocol from 1 move, 1 copy to 2 copies and one delete. This produces an arguably higher assurance of full data retention.

However, performing hashing prior to delete/move is also quite important, because it is the only way to know that the target media is not faulty!

Perhaps what I want is an rsync GUI, or better yet, shell extensions/plugins that lets me perform an action like "Copy and Hash" and "Move and Hash" as I do my usual copy/move with my directories in Explorer/Finder.

There are some variations on this as well: With two external disks plugged in, I would like to issue a "Copy to 2 destinations and Hash". This will prevent the source data from being unnecessarily hashed twice as would happen with two consecutive copies.

4
  • 1
    You could write a script that does a md5sum of the files before the move, does the move, does a md5sum of the files after the move, compares it to make sure they're the same, then copys the md5sums onto the hard drive too. I don't know how that could integrate into the OSX Gui as I'm a Linux user, but that's an option.
    – Lawrence
    Commented Jul 25, 2014 at 7:36
  • Theres probably some applescript way to do this that's dead simple. Anyone know applescript?
    – Steven Lu
    Commented Jul 25, 2014 at 7:46
  • I found this which is a start.
    – Steven Lu
    Commented Jul 25, 2014 at 7:51
  • Half way there already !
    – Lawrence
    Commented Jul 25, 2014 at 7:56

1 Answer 1

0

I can confirm that Windows 7 does not verify the correctness of a copy/move operation as I've just experienced a case of corruption on copying a file to an external drive.

In terms of workarounds, as far as Windows is concerned, you can use Teracopy--I've just started using it and it seems to work well (just enable "Always check after copy" in preferences first). It also does shell integration, and although I haven't found a way to specify multiple copy destinations, you can manually select another destination to copy to after the first copy finishes, in which case it remembers the hash of the source file. Perhaps you could contact the authors and request the feature if you find it important.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .