1

I use rsync to update the remote backup of my digital photos and videos. I have around 100GB data.

Now I have heavily reorganized my media folder into a more logical structure. Done manually, not just using some script like exiftool. The problem is that for rsync the remote directory looks completely different. If I run an update it will delete files that actually exists (albeit in another location) after which it will upload the same files again.

With my slow upload speed it will take ages.

Since all filename are unique I suppose it will be possible to write a script to rearrange the remote folder according to the local one.

Does anyone know a solution for this problem?

1 Answer 1

1

Option 1: go back in time and read this: http://lincolnloop.com/blog/2012/jan/6/detecting-file-moves-renames-rsync/ This uses a trick with hardlinks to optimise the operation (note carefully the omission of trailing "/" in the example though, that might be different to what you are used to). If you have a local backup copy prior to the reorg, you can perhaps restore and use that (the complication is being able to create the hardlink copy as required).

Option 2: if you have no spaces/quotes in your file or directory names, and do not intend keeping any duplicates, you can create a quick and dirty copy/rename script like this.

On source:

cd /wherever
find . -type f | xargs sha1sum | sort > /tmp/src.out

On destination

cd /wherever
find . -type f | xargs sha1sum | sort > /tmp/dst.out

Copy the dst.out file over to source, and then on source do:

join -j 1 /tmp/src.out /tmp/dst.out | while read sum src dst; do
    if [ "$src" != "$dst" ]; then
        echo mkdir -p $(dirname "$src")
        echo cp -ipl "$dst" "$src"
    fi
done > fixup.sh

This will output a set of mkdir/cp commands that you can run (fixup.sh) in the top directory of your copy on the destination. Make sure that the script output will do what you require it to do. cp -ipl will not overwrite without prompting, and will copy by hard-linking. A subsequent rsync --delete ... will delete the old files, assuming that want an identical copy. Use rsync --dry-run ... afterwards to confirm the extent of any remaining differences.

(It is possible to use "mv -i" instead of the non-destructive "cp -ipl", saving the duplication and cleanup.)

If you have problematic file/directory names, you'll need to do some intermediate processing of filenames, or try one of the solutions here: https://unix.stackexchange.com/questions/6411/any-way-to-sync-directory-structure-when-the-files-are-already-on-both-sides

Update: If you can tolerate the explosion-in-a-punctuation-factory that passes for a sed command line:

find . -type f -print0 | xargs -0 sha1sum | 
sed -re $'s/(^[0-9a-z]*)  /\\1__/; 
    s/ /\\\\x20/g;  s/\'/\\\\\'/g; 
    s/(^[0-9a-z]*)__/\\1  /; 
    s/  (.*)$/  $\'\\1\'/g;'

This will handle spaces and single/double quotes (though we're approaching perl territory for proper heavy lifting). It uses the bash $'' construction for quoting troublesome strings.

6
  • There is something I don't understand in option 1. It says "Start by doing the usual synchronization of the tree". That make no sense to me, because if I have completely changed the structure of my directory, that initial synchronization will try to upload just about everything which is exactly what I want to avoid. Unless I misunderstand something.
    – marlar
    Commented Jan 28, 2013 at 19:57
  • Option 2 on the other hand seems to work! It is slow since it calculates the sha1sum of the big video files, but it is of course much faster than uploading the whole bunch again. I do have spaces in my file and dir names, but it works fine with this small modification: find . -type f -print0 | xargs -0 sha1sum | sort > /tmp/src.out
    – marlar
    Commented Jan 28, 2013 at 19:59
  • Ahh, after reading option 1 again, I think I understand. The preparation with hardlinking etc. is to be done before reorganizing the folder so that you end up with a new and old structure which, due to hardlinking, does not take up much extra space. In my case the reordering has already been done, so I will stick to option 2.
    – marlar
    Commented Jan 28, 2013 at 20:06
  • You'll need to modify the generated script because spaces will also break the "read sum src dst" code. One of the solutions in the link uses sed to quote everything after the checksum output, you should be able to easily adapt that. Commented Jan 28, 2013 at 20:18
  • I've found it possible to use the hardlink rearrangement despite not having done it beforehand. Simply cp -rlp the common files into a directory on both sides... Commented Mar 14, 2018 at 5:36

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .