The way you are doing this, with compressing a .tar
file the answer is for sure no.
Whatever you use for compressing the .tar
file, it doesn't know about the contents of the file, it just sees a binary stream, and whether parts of that stream are uncompressable, or minimally compressible, there is no way this is known. Don't be confused by the options for the tar
command to do the compression, tar --create --xz --file some.tar file1
is as "dumb" as knowing about the stream contents as doing tar --create file1 | xz > some.tar
is.
You can do multiple things:
- you switch to some container format other than
.tar
which allows you to compress on an individual basis, but this is unfavourable if you have lots of small files in one directory that have similar patterns (as they get compressed individually). The zip format is an example that would work.
- you compress the files, if appropriate before putting them in the tar file. This can be done transparently with e.g. the python
tarfile
and bzip2
modules This also has the disadvantages of point 1. And there is no straight extraction from the tar file as some files will come out compressed that might not need decompression (as the already were compressed before backup).
- Use tar as is and live with the fact that th happens and select a not so high compression for
gzip
/bzip2
/xz
so that they will not try too hard to compress the stream, thereby not wasting time on trying to get another 0.5% compression which is not going to happen.
You might want to look at the results of paralleling xz
compression (not specific to tar files), to see some results of trying to speed up xz
as published on my blog