2

When I use tar to archive a directory and then compress it separately using e.g. xz, there will be a point where I have three files on my system - dir, dir.tar and dir.tar.xz. As soon as the compression is completed, dir.tar is deleted, but it seems like I must still make sure I have enough free disk space to accommodate all three files in this setup.

When using the compression flag with tar directly, there compressed file is created without an observable .tar intermediate and it appears I only need free space equal to the directory and the compressed file.

I was initially hypothesizing that maybe the tar archive was created and deleted bit by bit as it was compressed, but at the same time, I remember reading somewhere that the entire tar archive needs to be created before compression. I can't observe any temporary tar file, hidden or not.

Does using tar with a compression flag, actually need less free disk space than when first using tar followed by a compression utility? Why/why not (maybe a step by step of what tar+compression flag does)?

1 Answer 1

3

Yes, using the compression flags in the tar command directly (eg, tar czf) will reduce intermediate disk usage as it does not create any temporary uncompressed tar file, but rather uses pipes to pass the stdout of tar directly to stdin of the compression utility.

Depending on how pipes are implemented on your particular system, tar might appear to be writing a file, but that file will actually be a FIFO queue with no appreciable space consumption.

Without the flag:
Files > tar = original files + .tar the same size
.tar > gzip = .tgz = original files + .tar + .tgz
Total disk usage just before deleting the .tar is 2-3x the original files depending on the compression ratio.

With the flag:
Files > tar > gzip = files + .tgz
Worst case usage is 2x the original files.

2
  • Thank you! Does this mean that the tar archive is created and piped to the compression utility in bits of a few tarred files at a time, which are each deleted just after they are compressed? So rather than archiving the entire directory, gzip will compress a file as soon as tar is done putting that file into the archive format? Commented Feb 12, 2015 at 16:08
  • 1
    The tar-to-gzip pipeline is a stream... as soon as tar writes some bits, gzip begins compressing them, with no knowledge of what constitutes a file. More to your question, the --remove-files flag will cause tar to delete each file as soon as tar finishes handling it.
    – BowlesCR
    Commented Feb 12, 2015 at 18:10

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .