0

Over the past years (since ~2000) I have been moving and copying data between disks every now and then. Data is MP3, applications, videos, backups, emails abit of everything and they have been moved/copied over among several disks (and disk arrays!) and among several file systems FAT, NTFS, ReiserFS, Ext3, Ext4, jfs…

It has recently come to my attention that some RAR archives, some executables (that are compressed archives actually) and perhaps other files that I have not noticed yet, are corrupt. I don’t know if there is a pattern here among compressed files only. I assume it is not confined only there.

To start of with something, I checked the integrity of rar files in my linux fileserver using find along with the command:

unrar t <rar files>

This is nice but, I can not check all my archives integrity like this, let alone all my other files (docs, photographs, MP3s, WAVs, ZIPs, the list is of course endless)

I would like to check all files in a number of ways.

  1. Filesystem check is obviously good, but it can’t really help if the data were already corrupt before being copied in the disk they are currently residing, right? the current filesystem is JFS.

  2. Could the 2nd level of checking be MD5 checksums? I have backups of all my data and I could try to match checksums but would corrupt files give me different checksums? This still doesn’t solve the problem if a file was corrupt before the last copy of my data.

  3. What I else could I be checking to get some peace of mind?

  4. There is a huge catch when comparing my data against my backups; as with everything alive, my data have “changed” over time and the backups are snapshots in time that where never changed afterwards. For one, directory structure has changed, files have obviously been deleted or transferred to other locations. Obviously it will be a mess using the command find in order to match a file between the backup tree and the current tree!

So has anyone dealt with something similar and may have scripts (using locate or otherwise) that can quickly find and use indexed entries of the files?

1 Answer 1

3

MD5/SHA checksums are the gold standard for checking file integrity these days. If you have the originals from which to create the checksums, or already have the checksums, that would be the most thorough way to verify the files' contents. This can however be tedious if you have as many files as you seem to suggest.

Additionally, the RAR, ZIP, and 7Z file formats should contain the CRC32 checksum of any files stored inside. This is weaker (i.e., it's more likely that corrupted data will go undetected) than MD5 or SHA, but it can still detect if there is corruption when extracting a file, meaning that the archive is damaged. These are automatically verified any time you extract a file from an archive. unrar -t <rar files> is basically just testing the CRC32 checksums for each file in the archive.

Additionally, archiving tools should give you the option of generating a .sfv file when building an archive, which is an additional CRC32 of the archive as a whole. You can use this to further verify the integrity of an archive.

If you're copying files from one filesystem to another, you can use something a specialized tool for verifying that the copy was successful and correct. For Windows I use TeraCopy - just enable the 'verify' option before starting the copy, and TeraCopy will re-read the copied files to check that they were actually written to the disk correctly at their new location.

2
  • hi! thank you for clarifying Q2, i will look into verification options on copying in linux (As this is currently what my fileserver is running).
    – nass
    Commented Jan 2, 2013 at 18:04
  • 1
    No problem! You might have better luck breaking your sub-questions out into separate questions on this site, where they can be focused upon instead of being lumped together. Q3 isn't a question which really fits here (it's more of a fishing expedition, whereas this site is designed for specific questions), but Q4 is something that could be asked on its own if it's not already been asked here. Commented Jan 2, 2013 at 18:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .