Revisions to tar reverse of '--to-command', or: compressing tar entries individually, or: tar with plenty of large files

replaced http://stackoverflow.com/ with https://stackoverflow.com/

Source Link

edited May 23, 2017 at 12:41

Community Bot

1

use an archive tool with compression, such as 7zip
write my own script to do this by compressing the files first, then feeding them to tar (already found a handy Python script to write a tar file already found a handy Python script to write a tar file and the Python gzip library seems easy to use.

refinement

Source Link

edited Nov 2, 2014 at 13:58

Yuval

139
5

Perhaps I was too lengthy forrealize that I need to re-phrase and refine my own goodquestion. HereEspecially since, as Robin Hood pointed out, there are existing rather easy solutions to create compressed archives (namely, zip). So here it is a re:

Is there are way to use tar that allows true random-phrasing ofaccess to the questionarchive while still keeping it compressed? If not, is there another tar-replacement for Linux (that is built with the same rationale, and, ideally, with support for the same command-line options), that does achieve this?

Right now I can replace tar in shorta general sense with zip, by changing:

Is there a method to use tar that makes it compress files before it adds them to the archive? This is, as opposed to what is more commonly done, which is to compress the archive after all files are added to it. Please note that the question is, whether this is already implemented in tar.

If the answer to the above is no, is there a common, portable, readable (in the sense that someone else will understand the commands I will be writing down) way creating archives of compressed files, in Linux, without the hassle of first compressing them in the filesystem?

tar c path/to/file1 path/to/file2 | gzip > arc.tar.gz
gunzip < arc.tar.gz | tar x

to:

zip -qr - path/to/file1 path/to/file2 > arc.zip
unzip -qoX test.zip

However, this has the disadvantage that it does not support all the options that tar does for archiving, namely:

piping each extracted file individually into a pipe (the --to-command switch)

unzip does not accept an archive in standard input. funzip does, however - it only outputs the first file in the archive

So it's rather limiting.

Perhaps I was too lengthy for my own good. Here is a re-phrasing of the question, in short:

Is there a method to use tar that makes it compress files before it adds them to the archive? This is, as opposed to what is more commonly done, which is to compress the archive after all files are added to it. Please note that the question is, whether this is already implemented in tar.

If the answer to the above is no, is there a common, portable, readable (in the sense that someone else will understand the commands I will be writing down) way creating archives of compressed files, in Linux, without the hassle of first compressing them in the filesystem?

I realize that I need to re-phrase and refine my question. Especially since, as Robin Hood pointed out, there are existing rather easy solutions to create compressed archives (namely, zip). So here it is:

Is there are way to use tar that allows true random-access to the archive while still keeping it compressed? If not, is there another tar-replacement for Linux (that is built with the same rationale, and, ideally, with support for the same command-line options), that does achieve this?

Right now I can replace tar in a general sense with zip, by changing:

tar c path/to/file1 path/to/file2 | gzip > arc.tar.gz
gunzip < arc.tar.gz | tar x

to:

zip -qr - path/to/file1 path/to/file2 > arc.zip
unzip -qoX test.zip

However, this has the disadvantage that it does not support all the options that tar does for archiving, namely:

piping each extracted file individually into a pipe (the --to-command switch)

unzip does not accept an archive in standard input. funzip does, however - it only outputs the first file in the archive

So it's rather limiting.

added clarifications

Source Link

edited Nov 2, 2014 at 13:18

Yuval

139
5

I would like to create tar gzip archive, but do it the reverse manner of what is most commonly done -- have the files in the archive be compressed individually rather than compress the entire archive: that way it retains the seekable property it should have. It makes much more sense to me, and I don't know why this has not been favored.

I have some ideas on how to do this:

use an archive tool with compression, such as 7zip
write my own script to do this by compressing the files first, then feeding them to tar (already found a handy Python script to write a tar file and the Python gzip library seems easy to use.

However, ideally, I would like to continue to use tar for this, as it is a familiar, de-facto tool for archiving where I work. tar has the --to-command switch, which allows piping extracted files to a program. If I had a symmetric command such as --from-command I would easily implement my wish with:

tar cf my_archive.tar file1 file2 --from-command=gzip
tar xf my_archive.tar --to-command=gunzip

My motivation comes from dealing with archives containing a large number of large files. I currently tar-gzip them, but then extracting any files from the archive takes a long time - it needs to be decompressed before tar can access the file, and it does so in a serial manner!

So here are my questions:

Is there an evident way to achieve this that I am disregarding?
Has anyone already written a tool to do with, specifically with tar?
If one would call tar and gzip and standard methods of archiving and compressing in Linux, what would be the equivalent, popular method for archiving with compression in the manner I mentioned about (i.e. not tar.gz)
Is there another way I am overlooking to circumvent the large amount of time it takes to extract a file from a large tar-gzipped archive?

Thanks!

EDIT

Perhaps I was too lengthy for my own good. Here is a re-phrasing of the question, in short:

Is there a method to use tar that makes it compress files before it adds them to the archive? This is, as opposed to what is more commonly done, which is to compress the archive after all files are added to it. Please note that the question is, whether this is already implemented in tar.

If the answer to the above is no, is there a common, portable, readable (in the sense that someone else will understand the commands I will be writing down) way creating archives of compressed files, in Linux, without the hassle of first compressing them in the filesystem?

Thanks again!

I would like to create tar gzip archive, but do it the reverse manner of what is most commonly done -- have the files in the archive be compressed individually rather than compress the entire archive: that way it retains the seekable property it should have. It makes much more sense to me, and I don't know why this has not been favored.

I have some ideas on how to do this:

use an archive tool with compression, such as 7zip
write my own script to do this by compressing the files first, then feeding them to tar (already found a handy Python script to write a tar file and the Python gzip library seems easy to use.

However, ideally, I would like to continue to use tar for this, as it is a familiar, de-facto tool for archiving where I work. tar has the --to-command switch, which allows piping extracted files to a program. If I had a symmetric command such as --from-command I would easily implement my wish with:

tar cf my_archive.tar file1 file2 --from-command=gzip
tar xf my_archive.tar --to-command=gunzip

My motivation comes from dealing with archives containing a large number of large files. I currently tar-gzip them, but then extracting any files from the archive takes a long time - it needs to be decompressed before tar can access the file, and it does so in a serial manner!

So here are my questions:

Is there an evident way to achieve this that I am disregarding?
Has anyone already written a tool to do with, specifically with tar?
If one would call tar and gzip and standard methods of archiving and compressing in Linux, what would be the equivalent, popular method for archiving with compression in the manner I mentioned about (i.e. not tar.gz)
Is there another way I am overlooking to circumvent the large amount of time it takes to extract a file from a large tar-gzipped archive?

Thanks!

I would like to create tar gzip archive, but do it the reverse manner of what is most commonly done -- have the files in the archive be compressed individually rather than compress the entire archive: that way it retains the seekable property it should have. It makes much more sense to me, and I don't know why this has not been favored.

I have some ideas on how to do this:

use an archive tool with compression, such as 7zip
write my own script to do this by compressing the files first, then feeding them to tar (already found a handy Python script to write a tar file and the Python gzip library seems easy to use.

However, ideally, I would like to continue to use tar for this, as it is a familiar, de-facto tool for archiving where I work. tar has the --to-command switch, which allows piping extracted files to a program. If I had a symmetric command such as --from-command I would easily implement my wish with:

tar cf my_archive.tar file1 file2 --from-command=gzip
tar xf my_archive.tar --to-command=gunzip

My motivation comes from dealing with archives containing a large number of large files. I currently tar-gzip them, but then extracting any files from the archive takes a long time - it needs to be decompressed before tar can access the file, and it does so in a serial manner!

So here are my questions:

Is there an evident way to achieve this that I am disregarding?
Has anyone already written a tool to do with, specifically with tar?
If one would call tar and gzip and standard methods of archiving and compressing in Linux, what would be the equivalent, popular method for archiving with compression in the manner I mentioned about (i.e. not tar.gz)
Is there another way I am overlooking to circumvent the large amount of time it takes to extract a file from a large tar-gzipped archive?

Thanks!

EDIT

Perhaps I was too lengthy for my own good. Here is a re-phrasing of the question, in short:

Is there a method to use tar that makes it compress files before it adds them to the archive? This is, as opposed to what is more commonly done, which is to compress the archive after all files are added to it. Please note that the question is, whether this is already implemented in tar.

If the answer to the above is no, is there a common, portable, readable (in the sense that someone else will understand the commands I will be writing down) way creating archives of compressed files, in Linux, without the hassle of first compressing them in the filesystem?

Thanks again!

Source Link

asked Oct 31, 2014 at 0:39

Yuval

139
5

Loading

Stack Exchange Network

Return to Question