1

I have to transfer >2G folders with 196K files from one external hard drive to another. The older drive has some problems and so I get an error while transferring. I skip the files which have the problem. Once the transfer has succeeded, is there a efficient way to find out which files were not transferred (or partially transferred). I am looking for a way to do that on either Windows or Linux (Centos/Ubuntu)

2
  • 1
    Can't you just "move" the files? The one succeeding won't be on the old drive anymore, so the remaining files will be only those who have a problem.
    – m4573r
    Commented Mar 11, 2014 at 13:13
  • 2
    Do not move the files. This causes additional writes to the old drive, which will exacerbate the problem if there is a hardware failure issue responsible for the read errors. Even if the problem is just the filesystem (software), writes could make the filesystem more unstable. I highly recommend that you do not write anything at all to the old drive. Wish you could downvote comments because m4573r's comment is incorrect and dangerous. Commented Mar 11, 2014 at 13:21

2 Answers 2

1

the older drive has some problems

What kind of problems? You mean it gives random read errors?

If the older drive is not working correctly, there is no way to reliably know whether the data was transferred correctly, because you can't even be sure whether you are reading the correct data from the physical media in the first place!

For example:

  • To calculate an MD5 or SHA1 sum of a file, you have to read the whole file from disk. What if the disk silently (without throwing an error) reads the wrong bits? You get a different hash. What if you then read it again and the second time it reads it correctly, and that time you are copying it to the other drive? Then you would have the "correct" data on the new drive, and the hash wouldn't match what you calculated originally.

  • To "delete" a file (if you were to move them), you have to write to the metadata of the filesystem on the old drive. If the drive is failing, I definitely wouldn't trust it with writes; reads are bad enough. Just having the drive on could contribute to it degrading at this point, hard to say.

  • To write hash (MD5/SHA1 sums) to the old disk, that'd be reading and writing to it, which is really not recommended.

At this point, if you value what you have on the old drive at all, I highly recommend that you unplug the old drive and have a professional data recovery expert recover as much data as possible from it.

If you don't do that, you are playing with fire. At any point you could lose all your data. And even while you are trying to copy the files over, you could be copying increasingly corrupt data and making things worse. I don't know how severe your disk failure is, but certain conditions could make this extremely time-sensitive. Again, if the data is at all valuable or irreplaceable, stop what you're doing and unplug the drive.

In the future, I highly recommend that you use one of the following filesystems on any drive that contains irreplaceable data. These filesystems are "failure-evident", because all the data is checksummed, so that if the disk reads the data incorrectly (aka "silent corruption"), the filesystem knows about it, and can report the error. You are fairly lucky in that you are already receiving notifications from your OS that files are not copying correctly; disks can fail in silent, insidious ways that the disk controller can't detect, which would lead to disk corruption without the OS notifying you normally, on a filesystem without checksumming.

The following modern filesystems support checksumming:

  • Btrfs on Linux
  • ZFS on Linux, BSD, or Solaris
  • ReFS on Windows Server 2012 or Windows 8.1 (although you can't boot from ReFS, so you'll need an NTFS system partition)
2
  • Thanks for the prompt and informative answer. I also have more data to copy from another similar drive (which is hopefully working properly). Is there a utility that will calculate checksums over 2G of data and tell me what files (if any) are not properly transferred. This is imp data and I need to make sure that the archiving is not faulty. I have tools like beyond compare but they don't work for (around 1.5G) worth of files. I can split and test but that takes time. This appeared to be a generic problem to me and so was hoping for a popular tool that may do this job without intervention
    – doon
    Commented Mar 11, 2014 at 17:53
  • Just use the sha1sum or md5sum tool in a find script... you said Linux (CentOS/Ubuntu), so the tools you need are there right in front of you. You just have to put them together to form a useful shell script. Commented Mar 11, 2014 at 17:55
0

rsync has a feature to only copy files which are not already there and by default will only update files that are changed. It can also compare simple hashes of the files at source and destination to check if they are correctly copied with the -c option.

It has lots of options for configuring what and how it copies stuff. http://linux.die.net/man/1/rsync

6
  • 1
    This will work perfectly fine when both the source and destination disks are working reliably, but the OP says he is experiencing disk failure of an old disk. This could be filesystem or hardware related, but since we don't know, I would not recommend that he attempt this. Still, if we assume that he doesn't have any problems of this nature, your answer would be fine. So I'm leaving it as it is (not upvoting or downvoting). Commented Mar 11, 2014 at 13:24
  • Hm, the OP did not really say what kind of disk problems the old disk has and wether the old one is the destination or the origin of the data. I had both, a failing disk from which I wanted to transfer files to a new disk, as well as transfering data from a good disk to a bad one for transport where data integrity was not really important. Commented Mar 11, 2014 at 13:35
  • Right. The OP didn't say. But the problem he is having is centered around the fact that one of the disks is giving him errors when trying to copy. Since the OP did not provide enough information, we have to assume the worst. Commented Mar 11, 2014 at 13:37
  • So in the worst case he would want to get the data from the old disk to the new with the least corruption possible. So he should mount it readonly and copy the data over. (Edit3: damn keyboard) He could use Answer 1 afterwards or pipe the errors of the chosen copy program to a logfile. Commented Mar 11, 2014 at 13:40
  • In the worst case, the disk could actually fail completely as a result of continued usage. So trying to repeatedly read and copy files and compare them with hashes, etc. is just going to create even more activity for the dying disk, thus accelerating its demise and reducing the chances that he'll be able to recover the data at all. That is the worst case. That is why I recommended in my answer that he unplug his disk right away and go to a data recovery expert. Commented Mar 11, 2014 at 13:43

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .