0

I'm backing up some VMs of VirtualBox with somewhat large image files for their HDDs using RSYNC currently. Some of those images are 100 GiB in size, one is even 750. Backing those files up by letting RSYNC calculate differences only takes too much time, most likely because the backup target is some older Synology DS1512+ NAS. While it's not too slow, the difference between using --whole-file and not is significant: 3 hours vs. aborting after 12.

That's why I'm considering splitting up those large images into smaller chunks of e.g. 2 GiB in size. That should be possible with VirtualBox by simply cloning available images to VMDK in splitted variant. The expectation is that because VMs don't always overwrite their whole available data, but only some parts of it, that only some of the splitted files would be recognized as changed by RSYNC and would need to be backed up, lowering overall time to do so in the end.

The used RSYNC options currently are the following:

--owner \
--numeric-ids \
--compress-level=0 \
--group \
--perms \
--rsh=rsh \
--devices \
--hard-links \
--inplace \
--whole-file \
--links \
--recursive \
--times \
--delete \
--delete-during \
--delete-excluded \
--rsync-path=[...] \
--specials

Is that correct or does VirtualBox write to ALL individual parts always for some reason? Other thanks to need to take into account or why this approach might not work?

Thanks!

7
  • What does your rsync command look like? Commented Oct 24, 2020 at 9:44
  • @roaima Added the options, but the important part is --whole-file only anyway. Commented Oct 24, 2020 at 9:50
  • I'm not particularly interested in the options by themselves. It's the shape of the actual command I want to see Commented Oct 24, 2020 at 9:51
  • What "shape"? The only things missing are where RSYNC is stored and source/target host/directory. What has that to do with how VirtualBox writes to splitted images and how RSYNC decides which files to transfer? Commented Oct 24, 2020 at 11:53
  • Exactly. The rsync command is what I need to see, please. Replace usernames, hostnames and paths with something innocuous if you want to, but I need to see the command Commented Oct 24, 2020 at 11:55

2 Answers 2

0

Is that correct or does VirtualBox write to ALL individual parts always for some reason?

I've converted one of my Windows 10 to use splitted VMDK files and let it idle around most of the time for the past week. While VirtualBox doesn't write to all of the individual files, it writes to far more than I would have expected in that scenario. I've converted at 25.10.2020 and the following is what RSYNC put onto the NAS today in the end.

It's important to note that RSYNC DOES NOT decide what to transfer based on checksums, but really only the default file size and timestamp. So there's some risk depending on how VirtualBox writes to individual files, when it flushes/closes those and stuff like that. In the end, in my scenario I don't have another choice, I simply can't effort reading all data to calculate checksums to compare those between backup source and target.

total 73548036
358       0 drwxr-xr-x 1 1000 1000       2876 Oct 25 13:38 .
357       0 drwx------ 1 1000 1000        270 Oct 25 13:43 ..
364 1949056 -rw------- 1 1000 1000 1995833344 Oct 31 03:28 win10-s001.vmdk
365 2090752 -rw------- 1 1000 1000 2140930048 Oct 31 03:29 win10-s002.vmdk
366 2077056 -rw------- 1 1000 1000 2126905344 Oct 31 03:29 win10-s003.vmdk
367 2077120 -rw------- 1 1000 1000 2126970880 Oct 31 03:21 win10-s004.vmdk
368 2065664 -rw------- 1 1000 1000 2115239936 Oct 31 02:41 win10-s005.vmdk
369 2064960 -rw------- 1 1000 1000 2114519040 Oct 31 03:22 win10-s006.vmdk
370 2095232 -rw------- 1 1000 1000 2145517568 Oct 30 23:32 win10-s007.vmdk
371 2095744 -rw------- 1 1000 1000 2146041856 Oct 31 01:57 win10-s008.vmdk
372 2089600 -rw------- 1 1000 1000 2139750400 Oct 31 03:21 win10-s009.vmdk
373 2064256 -rw------- 1 1000 1000 2113798144 Oct 31 03:28 win10-s010.vmdk
374 2066112 -rw------- 1 1000 1000 2115698688 Oct 31 03:28 win10-s011.vmdk
375 2092480 -rw------- 1 1000 1000 2142699520 Oct 31 03:28 win10-s012.vmdk
376 2065664 -rw------- 1 1000 1000 2115239936 Oct 31 03:12 win10-s013.vmdk
377 2096384 -rw------- 1 1000 1000 2146697216 Oct 31 03:21 win10-s014.vmdk
378 2093184 -rw------- 1 1000 1000 2143420416 Oct 30 14:42 win10-s015.vmdk
379 2082688 -rw------- 1 1000 1000 2132672512 Oct 31 03:23 win10-s016.vmdk
380 2095872 -rw------- 1 1000 1000 2146172928 Oct 31 03:23 win10-s017.vmdk
381 2083264 -rw------- 1 1000 1000 2133262336 Oct 31 03:23 win10-s018.vmdk
382 1996416 -rw------- 1 1000 1000 2044329984 Oct 30 22:37 win10-s019.vmdk
383 2096384 -rw------- 1 1000 1000 2146697216 Oct 30 17:41 win10-s020.vmdk
384 2092160 -rw------- 1 1000 1000 2142371840 Oct 31 03:28 win10-s021.vmdk
385 2096448 -rw------- 1 1000 1000 2146762752 Oct 25 13:39 win10-s022.vmdk
386 2093504 -rw------- 1 1000 1000 2143748096 Oct 30 13:51 win10-s023.vmdk
387 2095680 -rw------- 1 1000 1000 2145976320 Oct 31 03:03 win10-s024.vmdk
388 2001728 -rw------- 1 1000 1000 2049769472 Oct 28 11:50 win10-s025.vmdk
389 2091008 -rw------- 1 1000 1000 2141192192 Oct 28 11:50 win10-s026.vmdk
390 2091648 -rw------- 1 1000 1000 2141847552 Oct 25 13:39 win10-s027.vmdk
391 2096000 -rw------- 1 1000 1000 2146304000 Oct 30 13:51 win10-s028.vmdk
392 2094976 -rw------- 1 1000 1000 2145255424 Oct 31 03:03 win10-s029.vmdk
393 2096448 -rw------- 1 1000 1000 2146762752 Oct 25 13:39 win10-s030.vmdk
394 1856640 -rw------- 1 1000 1000 1901199360 Oct 25 13:39 win10-s031.vmdk
395 2089600 -rw------- 1 1000 1000 2139750400 Oct 30 13:51 win10-s032.vmdk
396 2094336 -rw------- 1 1000 1000 2144600064 Oct 25 13:39 win10-s033.vmdk
397 2096448 -rw------- 1 1000 1000 2146762752 Oct 25 13:39 win10-s034.vmdk
398 1759936 -rw------- 1 1000 1000 1802174464 Oct 25 13:39 win10-s035.vmdk
399     320 -rw------- 1 1000 1000     327680 Oct 25 13:39 win10-s036.vmdk
400  289024 -rw------- 1 1000 1000  295960576 Oct 26 01:43 win10-s037.vmdk
401 1074240 -rw------- 1 1000 1000 1100021760 Oct 26 01:43 win10-s038.vmdk
402       4 -rw------- 1 1000 1000       2828 Oct 31 03:29 win10.vmdk

So while it's somewhat clear now that there is some benefit, it seems to be less than at least I hoped. But that might simply have to do with the current OS, especially Windows is having automatisms like Windows Search etc. running in the background and writing data. Am going to try the same with some Linux-VM having an image of ~100 GiB currently as well and see if it changes anything.

-1

You've got a number of issues with your rsync command, which when addressed will help resolve the underlying speed issues. As a result, I'm going to address that instead of discussing how to split VirtualBox files as described in this question. Fix those and the rest will become largely irrelevant.

Looking at the most recent comment you are using the client/server approach for rsync, which is good as it means that block-based changes can be used:

rsync …options… /path/to/source/ remoteHost:/path/to/destination/

Therefore you should almost certainly not be using --whole-file, as this will require a source file to be copied in its entirely even if only one byte has been changed. For the sizes of files you're considering (i.e. VirtualDisk images, whether split or entire), the relatively slow network speed is almost certainly going to swamp any direct disk reading on the two sides of the rsync conversation.

Now we've removed --whole-file we should also remove --in-place unless you are really tight for disk space on the destination, as it requires lots of extra disk activity (reading and writing) as blocks get shuffled around. I'd recommend including -z (compression) unless you have either a gigabit network connection (or faster) or an extremely slow CPU.

I'm going to restate this again as it's so important: the net effect of --whole-disk --in-place is disastrous for throughput in this scenario.

I'm not sure about your using --numeric-ids, either; I think you're probably trying to reinvent -M--fake-super, which could be a better option for a backup. (If you use -M--fake-super you will need to use it every time the backup file is accessed, whether for writing or for reading.) Note that --fake-super requires a filesystem with Extended Attributes enabled. I don't know if a Synology does this by default (QNAPs do, so I would assume Synology devices do, too), so you may need to revert -M--fake-super back to numeric-ids.

Try this set of options instead of your current set, and see what gives

-azH --delete -M--fake-super
3
  • Downvoting, because this answer DOES NOT address my question in any way. I'm using --whole-file because the backup is SLOWER otherwise and --inplace because I do LACK storage in the backup target and want to use BTRFS-snapshots. All those aspects are mentioned/linked in the question already. Commented Oct 24, 2020 at 13:43
  • They are not mentioned in the question. This answer does address the underlying issues - you've got several things wrong with your rsync options. I use rsync for backing up disk images ~ 100+ GB across an international link with no issues; the delta algorithm works really well for me here Commented Oct 24, 2020 at 14:10
  • Your comment is absolutely pointless of course, because it ignores the important detail that your environment is simply not mine. Simply reread my question more carefully, everything you are ignoring is described in the first paragraph already. Some trigger words: takes too much time is a link somewhere, NAS, hours etc. Commented Oct 25, 2020 at 11:31

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .