3

I want to backup my systems to split tar archives with a script uploading them one by one. It has to create the split archive and then run a script. The script uploads the part and deletes it. This is to ensure backups do not use so much space on the system. I could create the split archives and upload them, but I'd need 50% free space. So I need to create them one at a time. I am looking for advice on the best approach. I have a couple in mind, you can suggest a better one.

Approach one: Split the archives with tar itself, and use --new-volume-script. The problem with this is that I have to calculate how big the backup is going to be. Tar seems to require specific directions for how many parts are going to exist and how big they have to be. This means my script would have to calculate this and generate the parameters to tar.

tar -c -M -L 102400 --file=disk1.tar --file=disk2.tar --file=disk3.tar largefile.tgz

This creates three 100Mb files for each part. If there is a way to do this dynamically with tar automatically naming the files and creating as many as it needs, I would like to know because it would make this approach workable.

Approach two: write my own script that behaves like split. The output from tar is given to it on stdin, and it uploads the files and makes tar wait. This would be the easiest solution.

2 Answers 2

0

This solution doesn't use tar, but you may be able to make it work with afio. All the logic to split the archive is build-in, with the option to run scripts after each volume split:

cd /path/to/files -print | \
   afio -oxv -s 1g -H rotate.sh backup-`date -Imin`-Vol%V.afio

And rotate.sh is your script to upload and delete each archive file. This generates archives:

backup-2014-11-29T18:04-0500-Vol1.afio
backup-2014-11-29T18:04-0500-Vol2.afio
backup-2014-11-29T18:04-0500-Vol3.afio
...

And runs rotate.sh after each volume is complete.

Other options:

-o              # create an archive
-x              # perserve ownership suid/sgid
-v              # verbose
-s 1g           # split archives after 1g
-H rotate.sh    # run this script after each 'tape change'
-Z -P xz        # Compress, and use xz instead of gzip
 # Also, %V, below, inserts the volume number into the file name
 backup-`date -Imin`-Vol%V.afio               

Other afio aspects: It is similar to cpio, except that it is specifically geared towards scripted backups. Also it is safer for compressed archives because it stores each file compressed individually, instead of compressing the whole stream. This way a data corruption after compression only affects one file, rather than the whole archive. In the same way, it can also gpg encrypt each file as it is stored, which is great for cloud storage.

0

This answer was first posted at https://unix.stackexchange.com/a/752289/320221

I didn't have real SSH access to the server because it was a managed hosting. I'm using https://github.com/flozz/p0wny-shell to have something like a shell.

The initial answer I found to the problem was this: https://unix.stackexchange.com/a/628242/320221
It created the parts and paused with a read. The problem is, that p0wny-shell doesn't provide an stdin stream, so the read command didn't stop the script and the parts still were created one after another without pausing.

I did a modification so that it automatically moved the parts to the new server one by one:

  1. Create part
  2. Upload that part and delete it
  3. Repeat until all parts are created
  4. Upload the last part manually
  5. Unpack it on the remote server with the original myscript.sh (without the read to not stop between the parts)
#!/bin/bash
# For this script it's advisable to use a shell, such as Bash,
# that supports a TAR_FD value greater than 9.

if [[ $TAR_SUBCOMMAND != '-c' ]]; then
  echo 'This script can only be used to compress with -c option'
  exit 1;
fi

# $TAR_ARCHIVE per run:
# 1. archive.tar
# 2. archive.tar-2
# 3. archive.tar-3
# ...

# $TAR_ARCHIVE_NAME per run
# 1. <empty>
# 2. archive.tar
# 3. archive.tar
# ...
TAR_ARCHIVE_NAME=`expr $TAR_ARCHIVE : '\(.*\)-.*'`

# $TAR_ARCHIVE_BASE_NAME per run
# 1. archive.tar
# 2. archive.tar
# 3. archive.tar
# ...
TAR_ARCHIVE_BASE_NAME=${TAR_ARCHIVE_NAME:-$TAR_ARCHIVE}

if (( $TAR_VOLUME == 2 )); then
  # On the first run $TAR_VOLUME will be '2', we want to use the base name
  TAR_ARCHIVE_PREV_PART=$TAR_ARCHIVE_BASE_NAME
elif (( $TAR_VOLUME >= 3 )); then
  # On the next runs $TAR_VOLUME we want to build the name with the previous $TAR_VOLUME
  TAR_PREV_VOLUME=$(($TAR_VOLUME-1))
  TAR_ARCHIVE_PREV_PART=$TAR_ARCHIVE_BASE_NAME-$TAR_PREV_VOLUME
fi


echo "Copying $TAR_ARCHIVE_PREV_PART..."
# SSH key was previously created with `ssh-keygen -f ./id_rsa_user` and public key was added to remote
scp \
  -o StrictHostKeyChecking=no \
  -i '/usr/www/users/user/.ssh/id_rsa_user' \
  $TAR_ARCHIVE_PREV_PART \
  [email protected]:/home/user/path/to/target/


echo "Removing $TAR_ARCHIVE_PREV_PART..."
rm $TAR_ARCHIVE_PREV_PART


echo Preparing volume $TAR_VOLUME of $TAR_ARCHIVE_BASE_NAME.
echo $TAR_ARCHIVE_BASE_NAME-$TAR_VOLUME >&$TAR_FD

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .