3

I am currently looking for solutions that can optimize network bandwidth usage.

Scenario: Server has a file. Client downloads it through a REST API. Client makes some changes and uploads the changed file back to server through REST. Server will replace the original file with the uploaded file.

Possible approaches I have two possible approaches in mind.

1- Local Diff Before making any changes client will make a copy of original file. After making the changes client will use algorithms like BSDiff or XDelta to extract changes by comparing original and changed files. These changes will be then sent to the server. Server will apply the diff on original file.

2- Use R-Sync Make a REST call to server and ask for initial Rolling Checksum and MD5 Hash. Then based on response generate the diff and POST it to the server. Server will the merge the changes.

I did some rough testing and found out that BSDiff is the most efficient solution (diff size wise - which is the primary optimization target). It generates the smallest possible diff BUT it takes huge memory which makes it impossible to use at client side for large file sizes. On the other hand results of X-Delta and rest of the binary diff tools I tried aren't that great in terms of generated diff size. Local diff also has disadvantage of using extra disk space because of keeping the copy of original file. This can be a problem in case of large files.

Memory issue of BSDiff makes R-Sync the most suitable choice (because rest of the tools are not that efficient in finding the diff). So I have decided to go for R-Sync.

R-Sync works in two steps. First it gets the signatures based on file and then data is sent back based on the signatures sent earlier. I am planning to further optimize the R-Sync by keeping the signatures of original file at client side before making any changes in it. This will remove the need of client requesting server to compute and send signatures at time of upload. Client can just send data based on already computed signatures to server whenever client wants to upload the file.

Question I know this is a bit of odd question that's why I asked a question before actually asking it here. I just would like to know if there are any better alternatives out there to solve this kind of problem? I want to confirm my approach just to make sure that I am on the right path and not missing anything major.

2
  • Are you only interesting in file level traffic optimization?
    – John Siu
    Commented Jan 5, 2013 at 3:09
  • yes, only file level Commented Jan 5, 2013 at 7:23

2 Answers 2

1
+100

I think your decision to go with rsync is the best one. It is cost effective, accurate and well thought of. Please keep in mind to use --strict option for md5sum as otherwise you may run into troubles. You may want to consider skipping some checks on large files, as it just will eat resources and yield the same result. Imagine you compare two 2GB files - it is much easier to just delete the old one, copy the new one and update the hash and the checksum than to create new hash, compare it against the old one, and then merge the changes. For small files makes no difference.

Another idea is to just run diff on hashes and then transfer files partially - rsync's --checksum, --update and --inplace are your friends.

To further optimize the network bandwidth usage you may consider --compress and --bwlimit= options.

I don't know how often you need to transfer this files, how often the sync shall occur. If too often it may be better to setup Unison. More about it in Linux Journal.

Good luck!

0

This is mostly over my head - especially with binary files, but doesn't plain old rsync transfer only the changes to a non-local destination by default? (If not, it has a --nowhole-files option.)

If so, it will do all the work for you.

If you need more details, post at [email protected] where very helpful rsync experts hang out. You can subscribe at https://lists.samba.org/mailman/listinfo/rsync .

For questions like this, it would help a lot if you would specify what operating system and version you are using and the version of the programs in question. I have never heard of R-Sync, but I use rsync on Linux all the time.

I'm not sure, but I believe I've seen references to running rsync under cygwin on Windows if that's your environment.

This forum in particular handles questions about Windows and Linux, so it's even more important to specify your environment here.

See also https://en.wikipedia.org/wiki/Rsync It covers rsync itself as well as other utilities such as rdiff that you may find useful.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .