4

I'm currently using tar for archiving some files. Problem is: archives are pretty big, contains many data and tar is very slow when listing and extracting.

I often need to extract single files or folders from the archive, but I don't currently have an external index of files.

So, is there an alternative for Linux, allowing me to build uncompressed archive files, preserving the file attributes AND having fast access list table?

I'm talking about archives of 10 to 100 GB, and it's pretty impractical to wait several minutes to access a single file.

Anyway, any trick to solve this problem is welcome (but single archives are non-optional, so no rsync or similar).

Thanks in advance!

EDIT: I'm not compressing archives, and using tar I think they are too slow. To be precise about "slow", I'd like that:

  • listing archive content should take time linear in files count inside the archive, but with very little constant (e.g. if a list of all the files is included at the head of the archive, it could be very fast).
  • extraction of a target file/directory should (filesystem premitting) take time linear with the target size (e.g. if I'm extracting a 2MB PDF file in a 40GB directory, I'd really like it to take less than few minutes... If not seconds).

Of course, this is just my idea and not a requirement. I guess such performances could be achievable if the archive contained an index of all the files with respective offset and such index is well organized (e.g. tree structure).

5
  • Is it a requirement that the files be stored in a single file like tar or can they all reside in a directory? Commented Oct 5, 2012 at 22:22
  • Yes, it is a requirement. EDIT: of course it's not a requirement to use tar... It could be anything.
    – AkiRoss
    Commented Oct 5, 2012 at 22:25
  • Are you compressing tar files using gzip or bzip2, or are uncompressed tar files too slow as well?
    – Daniel Beck
    Commented Oct 5, 2012 at 22:26
  • @Daniel, I'll add some details about this in the question.
    – AkiRoss
    Commented Oct 5, 2012 at 22:34
  • lzo seems to be faster imo
    – kobaltz
    Commented Oct 5, 2012 at 22:35

3 Answers 3

2

Check out pixz, or p7zip using the -ms=off option.

pixz is a bit faster, works well with tar files, preserves permissions, and has a much better linux CLI.

7zip has better cross-platform support.

See here for more detail.

1

I found a similar topic on serverfault.

https://serverfault.com/questions/59795/is-there-a-smarter-tar-or-cpio-out-there-for-efficiently-retrieving-a-file-store

I'm looking at DAR, which seems to be what I need, but I'll leave this question open for other suggestions.

2
  • Ok, I tried both DAR and XAR. XAR seems that just can't handle archiving many files (10GB test) and seems to work with tons of RAM, while DAR works pretty well and I successfully created a 30GB archive with compression and very fast index.
    – AkiRoss
    Commented Oct 6, 2012 at 13:05
  • 1
    DAR was definitely what I was looking for. It's a very nice tool!
    – AkiRoss
    Commented Oct 26, 2012 at 17:43
0

If tar is not a requirement, a quick search says ar will allow for an indexed archive.

2
  • Yes, nice, but I also read ar is a bit strict with file names length. Anyway I could try performance comparison and see if it fits.
    – AkiRoss
    Commented Oct 5, 2012 at 22:41
  • Beats me on restrictions like that. Commented Oct 6, 2012 at 0:48

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .