2

I have a folder that contain files of current and previous projects that I plan backup using versioned rsync. For more a more robust backup strategy I want to store a monthly snapshot offsite (eg amazon glacier) at regular intervals.

To save space and bandwidth I want to compress the the backup before sending it offsite. However, since only a small fraction of the total number of files change from month to month, sending the whole compressed library each backup will also be a huge waste of bandwidth.

Ideally what I want to do, is to compress the backup into volumes of 500mb (or some other size) and upload them to my offsite storage. Next time I backup, most of these volumes should be identical to the previous backup, except for those containing files that have been changed since the last backup. In this scenario I only need to upload the changed volumes, saving bandwidth (and file write requests).

Is it possible to do what I describe using a combination of tar and gzip (split maybe?). Or other command line tools?

One issue I can imagine is that if a change happens to a file contained in some volume, the content of all the subsequent volumes may be offset, requiring a re-upload of the changed volume and the subsequent volumes. Perhaps its better to segment the volumes by folders somehow?

I would love to hear any input or suggestion you have Best regards M

1 Answer 1

1

tar can do this with the --listed-incremental flag so as described I would probably do that. You can use whatever compressors tar supports to compress it (or just pipe it through an arbitrary compressor). See https://www.gnu.org/software/tar/manual/html_section/tar_39.html

I'm not sure what sort of projects these are, but if it's code or some other text-based format I'd probably look into using git or some other source control system.

I should also point out that this is GNU tar. If you are on a BSD or other unix, you might need to install gnutar because I don't think bsdtar supports this.

2
  • That looks very much like what i want. Ill give it a through read. A large number (probably more than 90% by number) of the files are small ascii files produced by various scientific instruments I have been working with through my career. They don't change and are only very rarely used. Ideally they should have been sorted differently, but i'm kind of locked in to this structure by now.
    – Mesalas
    Commented Apr 26, 2018 at 15:07
  • I see, tar is probably perfect in that case then. If it were code or some other text format that you edit yourself, git or another source control system just because they're intended for that and have lots of features that would make your life easier.
    – BytePorter
    Commented Apr 26, 2018 at 15:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .