Over the past years (since ~2000) I have been moving and copying data between disks every now and then. Data is MP3, applications, videos, backups, emails abit of everything and they have been moved/copied over among several disks (and disk arrays!) and among several file systems FAT, NTFS, ReiserFS, Ext3, Ext4, jfs…
It has recently come to my attention that some RAR archives, some executables (that are compressed archives actually) and perhaps other files that I have not noticed yet, are corrupt. I don’t know if there is a pattern here among compressed files only. I assume it is not confined only there.
To start of with something, I checked the integrity of rar files in my linux fileserver using find along with the command:
unrar t <rar files>
This is nice but, I can not check all my archives integrity like this, let alone all my other files (docs, photographs, MP3s, WAVs, ZIPs, the list is of course endless)
I would like to check all files in a number of ways.
Filesystem check is obviously good, but it can’t really help if the data were already corrupt before being copied in the disk they are currently residing, right? the current filesystem is JFS.
Could the 2nd level of checking be MD5 checksums? I have backups of all my data and I could try to match checksums but would corrupt files give me different checksums? This still doesn’t solve the problem if a file was corrupt before the last copy of my data.
What I else could I be checking to get some peace of mind?
There is a huge catch when comparing my data against my backups; as with everything alive, my data have “changed” over time and the backups are snapshots in time that where never changed afterwards. For one, directory structure has changed, files have obviously been deleted or transferred to other locations. Obviously it will be a mess using the command
find
in order to match a file between the backup tree and the current tree!
So has anyone dealt with something similar and may have scripts (using locate or otherwise) that can quickly find and use indexed entries of the files?