1

I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.

In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).

I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.

So the idea is to have a growing list of files like so:

  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup
  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)
  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)
  • backup-3.tar.bz2 - ...
  • backup-4.tar.bz2 - ...
  • backup-5.tar.bz2 - ...

If that doesn't make sense hopefully this will.

First time:

  1. $ touch /tmp/file1
  2. $ touch /tmp/file2
  3. make backup-0.tar.bz2

At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.

Second time:

  1. $ touch /tmp/file3
  2. $ rm /tmp/file2
  3. ..do the magic

At this point:

  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3
  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2

Third time:

  1. $ touch /tmp/file1
  2. $ touch /tmp/file4
  3. ..do the magic

At this point:

  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4
  • backup-1.tar.bz2 has /tmp/file1 because it was changed
  • backup-2.tar.bz2 has /tmp/file2

Like so:

|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0   | backup-0    | backup-0 and   backup-1 |
| file2 | backup-0   | backup-1    | backup-2                |
| file3 |            | backup-0    | backup-0                |
| file4 |            |             | backup-0                |

I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.

  1. first time = take backup-0
  2. second time
    1. rename backup-0 to backup-1
    2. take backup-0
    3. remove everything from backup-1 that matches backup-0
  3. third time
    1. rename backup-1 to backup-2
    2. rename backup-0 to backup-1
    3. take backup-0
    4. remove everything from backup-1 that matches backup-0
  4. fourth time
    1. rename backup-2 to backup-3
    2. rename backup-1 to backup-2
    3. rename backup-0 to backup-1
    4. take backup-0
    5. remove everything from backup-1 that matches backup-0

I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.

My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.

1
  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying. Commented Nov 17, 2018 at 5:52

1 Answer 1

1

If I use tar's --listed-incremental it'll do the reverse of what I am trying.

It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:

  1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.
  2. Restore full backup (now backup-1) to a temporary directory.
  3. Create backup-0 from the current data with a new snapshot file.
  4. Remove backup-1 (previous full backup).
  5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).

You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:

-g, --listed-incremental=FILE
Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.

So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:

When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.

This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).


I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.

For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:

rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.

Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.

1
  • Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive. Commented Nov 23, 2018 at 22:12

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .