2

It seems a tar archive being seekable can make a large difference when listing or extracting just a few files. Unfortunately the man page is really scarce on information. It seems that compressed archives are not seekable [1], but that post provides no evidence. Is there a more reliable source of information to read up on this issue?

[1] https://serverfault.com/questions/59795/is-there-a-smarter-tar-or-cpio-out-there-for-efficiently-retrieving-a-file-store

7
  • 1
    What is wrong with this answer?
    – DavidPostill
    Commented Jul 28, 2017 at 10:26
  • @DavidPostill: There's nothing wrong with that answer, it's just an answer to a different question.
    – Peltier
    Commented Jul 28, 2017 at 12:31
  • Really? So "GNU tar creates "seekable" archives by default." and "Compressed archives are not "seekable" because current (1.26) GNU tar offloads compression to external program" doesn't answer your question?
    – DavidPostill
    Commented Jul 28, 2017 at 12:33
  • That was not the original question, and it provides no supporting evidence. I agree it's a good start, though.
    – Peltier
    Commented Jul 28, 2017 at 12:57
  • The supporting evidence is the source code.
    – DavidPostill
    Commented Jul 28, 2017 at 12:58

1 Answer 1

1

The file header for each file includes its size in the archive. This allows the file content to be skipped if not needed. Tar just seeks to the next header that follows the file content. There is documentation on the header format.

Compressed tar files are just that. You can freely switch been an uncompressed and compressed tar file format by using the appropriate uncompression program (often gunzip) or compression program (gzip). With some tar programs this is the only option. The tar file itself remains seekable even if it is compressed.

What is not seekable is the compressed format. Compression works by finding a relatively small number of bytes to represent the data being compressed. Blocks of data with relatively few byte values or repeated byte strings compress well. Block of data with lots of different byte values and few repeated byte stings do not compress well if at all. For some data, compression can actually increase the size of the file. The compression ratio for blocks within the file varies. The variance can be extreme for a tar file which may consist of very compressible files, and relatively non-compressible files.

There is no mechanism within the compressed data to seek to some position in the uncompressed data. While some compression programs allow seeking to an individual file with a compressed archive, the only file the compressed archive would have access to is the tar file. Tar files are rarely compressed with such tools, although compressed or uncompressed tar files may be included when archiving sets of files.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .