When copying photos from varying sources to my main archive, I'd like to copy only the files that are not already in the archive. rsync or file comparing applications like WinMerge and Meld can't seem to check if a file exists already with a different name or in a different subfolder. First copying all of the new source pictures to the archive, then deleting duplicates and then organizing the files seems to be extra work.

If there a way to check which files in a source folder are not found anywhere in a destination folder? Subfolders should be checked too. The file can have a different name or a different location.

  • If your destination contains files in a different location with different names but same contents then it sounds like you've got consistency problems in the first place that you might want to solve. Solving the archiving is more like fixing the symptom, not the ailment. Just saying. Commented Aug 26, 2010 at 6:30

3 Answers 3


Use find with md5sum to get the checksums for all the files in the source and destination, then use comm to find the checksums missing from the destination.

Or you could try the -y option of rsync.


How to do this depends on whether your image files bitwise identical, or only visually similar (e.g. because they might have different comments or been recompressed, cropped...).

If the files are identical, and you can rename them in both the source and your archive, it's easy to rename them to always have the same name. You can keep the old name as a symbolic link. Untested:

for orig in *.jpg; do
  canon=$(<"$orig" md5sum | sed 's/ .*//').jpg
  mv -i "$orig" "$canon"
  ln -s "$canon" "$orig"

This assumes your archive is a single directory. If there are subdirectories, you'll need to change *.jpg to **/*.jpg (requires bash 4 or zsh), and arrange to add the right amount of ../ to the ln command.

After this rsync will copy only the content and name of new photos and new names for existing photos.

If you can rename the files only in your archive, you can still arrange something with clever use of symbolic links and probably rsync --copy-unsafe-links.

If the files are only visually similar, it's more complicated, and there can't be a fully automated answer (between a cropped version and a low-quality version, a human being has to make the choice). Some tools to compare visually similar images may help, e.g. gqview (interactive) or findimagedupes (command line).

Note that since you don't specify your operating system, I've made suggestions that work on mine. They'll work on any unix-like system, including OSX and Cygwin. The symbolic link idea will also work natively on Windows XP and newer (maybe even earlier) but requires installing additional tools.


The whole "different name or in a different subfolder" thing might be a bit screwy to figure out... Sure you can do do a hash compare, but as your destination directory grows, the time to do the merge will increase.

It isn't very fancy, but what about using something like Robocopy for Windows?

robocopy /E source destination

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .