I am somewhat familiar with how to use tar
's --listed-incremental
flag to take incremental backups. The end result is a backup-0
file that has the first full back-up and then backup-1
, backup-2
, ..., backup-x
with the changes in order of the backups.
In the past I have used rsync
and hard-links to make backups where backup-0
is current state and each backup-x
folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).
I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.
So the idea is to have a growing list of files like so:
backup-0.tar.bz2
- this is the current backup and will be the biggest because it is a full backupbackup-1.tar.bz2
- this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2
)backup-2.tar.bz2
- this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2
)backup-3.tar.bz2
- ...backup-4.tar.bz2
- ...backup-5.tar.bz2
- ...
If that doesn't make sense hopefully this will.
First time:
$ touch /tmp/file1
$ touch /tmp/file2
- make
backup-0.tar.bz2
At this point backup-0.tar.bz2
has /tmp/file1
and /tmp/file2
.
Second time:
$ touch /tmp/file3
$ rm /tmp/file2
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
and/tmp/file3
backup-1.tar.bz2
has/tmp/file2
; it doesn't havefile1
cause it didn't change so it's inbackup-0.tar.bz2
Third time:
$ touch /tmp/file1
$ touch /tmp/file4
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
,/tmp/file3
, and/tmp/file4
backup-1.tar.bz2
has/tmp/file1
because it was changedbackup-2.tar.bz2
has/tmp/file2
Like so:
| | first time | second time | third time |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |
I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.
- first time = take
backup-0
- second time
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- third time
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- fourth time
- rename
backup-2
tobackup-3
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
I feel like it's that last step (remove everything from backup-1
that matches backup-0
) that is inefficient.
My question is, how can I do this? If I use tar
's --listed-incremental
it'll do the reverse of what I am trying.
tar
's--listed-incremental
it'll do the reverse of what I am trying.