0

I'd like to get a backup of some files from a server. The basic idea is to create an archive using tar, saving it to disk, and downloading it. The problem is the lack of remaining disk space (or RAM for a tmpfs), forcing me to split the archive and download it in chunks.

Is there an easy way (e.g. by adding another command between the pipe from tar to split) to make split pause, when the disk is too full for the next piece, and continue when the disk is free again? (By default, split just exits with an error message when writing failed due to a full disk.)

Alternatives that I'd like to avoid:

  • Piping the tar via SSH to save it directly on the destination -- the connection might break when the download runs too long, and the downloading client is running Windows.
  • Using dsplit (or sth similar to create multiple tar archives) -- I expect that this will prevent me from concatenating the archives after downloading them.

1 Answer 1

1

Most below there is quick and dirty sh script designed to work as a filter (between tar and split in your case). It was built in Ubuntu and may need some adjustments for other systems (e.g. I'm not sure if column -t | cut -d " " -f 7 is the right way to parse df regardless of OS). It requires /proc.

Save it as ensuredf where your $PATH points to, make executable (chmod -x ensuredf) and use like this:

… | ensuredf path requirement | …

where

  • path is the directory you want to monitor;
  • requirement is the desired free space (df -B must understand this);

Example:

… | ensuredf /mnt/foo/data/ 2G | …

The idea is to let background cat pass data from stdin (of the script) to stdout but pause it immediately. Then invoke df for a given path, parse its output and check if there's more space than requirement. If so, cat is resumed, otherwise it is paused. This loops with a hardcoded interval of 1 second as long as /proc entry for this cat exists.

Other notes:

  • some filesystems (especially BTRFS) make df output not as exact as you'd like;
  • if your tar is very fast and your required space is very low, the interval of 1 second may be too long;
  • but even if the interval were zero, when the free space gets below the requirement there will be some delay before cat is paused;
  • if for some reason the foreground script gets delayed and the background cat works well, the disk may get full anyway.

This means you should set your requirement with proper safety margin. Use this code as an example and adjust to your needs. I managed to write a safer script that keeps invoking foreground dd to pass a chunk of data if and only if there's enough disk space, but these multiple dd processes were a lot slower than a single cat.

#!/bin/sh

[ $# -eq 2 ] || { printf '%s\n' "usage: $0 path requirement" >&2 ; exit 1;}

pth="$1"
rqrmnt="$2"
intrvl=1

</proc/$$/fd/0 cat >/proc/$$/fd/1 &
kill -s STOP $!

while [ -d /proc/$! ] ; do
  if [ $(df -P -B "$rqrmnt" "$pth" | tail -n 1 | column -t | cut -d " " -f 7) -ge 2 ]
  then kill -s CONT $!
  else kill -s STOP $!
  fi
  sleep "$intrvl"
done
1
  • Thanks -- I hoped for a ready-to-use packaged tool, but this will work for now. Looks as if I have to write something counting and throttling/pausing the transferred data myself on the long term, I fear some timing issues. Or I might have buy some new disks... ;)
    – phi1010
    Commented Nov 29, 2017 at 18:42

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .