0

I have 77 GB big partition backup image in single file and I'd like to compress it quickly. It should compress very well even on worst settings, but choosing GZIP with "Fastest" setting takes about ~1 hour. How can I speed up the process?

More details: The image file is raw, binary copy of partitions (output of linux ddrescue tool). I store the file on WD hard drive connected with USB 3.

6
  • 1
    Is the process CPU bound or IO bound?
    – Hennes
    Commented Jul 25, 2014 at 21:20
  • 1
    Does the partition backup has some kind of compression already in it, or is it raw data? If it already has a compression in it, it will likely not compress much, so looking into this might not be the best option.
    – LPChip
    Commented Jul 25, 2014 at 21:21
  • 1
    To explain that a bit more: If you have a fairly normal rotating disk which is doing nothing else but ready data for your compression program you might get about 100MB/s. With that speed 77GB should take about 13 minutes just to read it. Writing is usually slower. Especially when you are also reading from that disk. Which means that even without any compression you would need about half an hour just to copy the data. Possibly more. Possible most of the ~1h you see now. If that is the case then you are IO bound. Also, LPChip has a good point. Already compressed data will not compress again.
    – Hennes
    Commented Jul 25, 2014 at 21:23
  • LPChip: it's completely raw data, without any compression. I used linux's dd tool to copy partitions.
    – stil
    Commented Jul 25, 2014 at 21:32
  • 1
    A note though. Although you do state its raw data, the fact is that it is a backup image of an apparently used partition. So, the compression ratio and speed will depend on the contents within which if they were already compressed (executables, pictures, videos) will stifle the compression process. Commented Jul 25, 2014 at 22:02

1 Answer 1

4

In regards to improving compression speed:

  1. Parallelize it. http://blog.codinghorror.com/file-compression-in-the-multi-core-era/.
  2. Different compression algorithms (like lzop) are very fast.
  3. Get optimized zlib implementations. Google intel zlib, for instance.

In regards to improving read speed:

  • You will only get close to your rated disk throughput if you are using an SSD or if you are reading from your disk in "block" mode. This has to do with the file seeks associated with reading small files.
  • In linux you would use something like dd if=/dev/sda1 bs=1m | lzop > my_backup.dd.lzop

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .