6

I have a directory containing many files. Together, these files take up several gigabytes of space. I'd like to compress this directory.

But compressing the directory into a single file will make that file difficult to move around, so I'd like to have several files.

I could use:

tar cvzf - dir/ | split --bytes=200MB - sda1.backup.tar.gz.

To do this, but I'm worried that I would then need all of the backup files in order to resurrect any of the data. I'd much prefer that each file was its own, independent unit containing a part of the source data.

One way I can think of to do this would be to build a script which calculates the size of each input file and greedily appends files to a list until a maximum size is reached. The list of files is then tar-ed and a new list is begun. This is repeated until all files are in tars. The tars can then be independently extracted.

This is not a duplicate of other questions because I am specifically wondering how to perform this operation in such a way that every part of the total archive is itself a valid archive and every file can be reconstructed without needing to union archives.

Is there a utility that does such a thing?

0

1 Answer 1

4

tar can cope with partial archives after splitting. When you try to restore part of such an archive, it will skip over whatever it can't use at the start, and tell you about any partial file at the end; everything in between will be restored properly. You can instruct tar itself to split archives as it creates them, using the tape length options; see Create a tar archive split into blocks of a maximum size for details.

There are utilities which do better than that though, and produce parts of archives which stand alone (as long as the size limit is sufficient to store the largest file in the archive); unfortunately the ones I know of don't meet all your requirements. On most platforms, there's zipsplit which can split zip files, but it only copes with archives up to 2GB in size. On Plan 9, there's tarsplit which splits tarballs, but I'm not sure it can be easily ported to whatever system you're using (I suspect you're not using Plan 9...).

1
  • This is helpful, thank you. I don't like the idea of needing to union two archives to avoid issues with partial files, though. I'll look into the other utilities you cite, though.
    – Richard
    Commented May 7, 2016 at 21:26

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .