Here's my situation. I have two cold-storage archive volumes that (should) contain identical sets of data. These volumes contain infrequently accessed backups. I am concerned that, eventually, bitrot will get to one or both of them and subtly corrupt the data contained within. I know I can diff -r
the two volumes and find files that have changed or disappeared between the two, but I get no helpful indication about which volume has the "good" copy. These are USB disks, and converting them to something like ZFS seems... onerous.
What I'd like is a tool that will recursively walk the directory tree and write a manifest file containing the path and filename along with a hash of the file's contents. I'd run this tool immediately after writing the data to each volume, and store the resulting manifest file on warm storage, perhaps under revision control of some sort.
From this file I'd like to be able to run something that works exactly like diff -r
-- it would tell me if files were added, removed, or their contents changed. Only instead of comparing one volume to the other, it would compare one volume to the known-good manifest file. Using this method, I should be able to tell if the data I'm reading off the disk months/years in the future is identical to the data I originally put on it.
I would have to think something like this exists already. I can get something approximating a manifest file using:
find /mnt/my-volume -type f -exec md5sum {} + > manifest.txt
but so far I haven't come up with a good way to parse this file and check each hash recursively. Also, somewhat less importantly, this won't tell me if an empty directory appeared or disappeared. (I can't think of why it would matter, but it would be nice to know that it occurred.)
Am I on the right track with this, or is there a more appropriate tool that can do this type of thing?