46

I am running this command:

pg_dumpall | bzip2 > cluster-$(date --iso).sql.bz2

It takes too long. I look at the processes with top. The bzip2 process takes about 95% and postgres 5% of one core. The wa entry is low. This means the disk is not the bottleneck.

What can I do to increase the performance?

Maybe let bzip2 use more cores. The servers has 16 cores.

Or use an alternative to bzip2?

What can I do to increase the performance?

6
  • 8
    Unless you need bzip2 for legacy reasons, it's been my personal experience that xz gives better compression/time than bzip2. It's also threaded if you get a new enough program, and it lets you scale time and memory usage from gzipish up to massive depending on what you want.
    – Perkins
    Commented Sep 8, 2017 at 20:58
  • 7
    "pigz" is another option - it produces gzip output rather than bzip2 output. And basically everything understands gzip.
    – Criggie
    Commented Sep 8, 2017 at 22:31
  • 2
    You haven't stated the requirements of your alternative algorithm. Bzip2 is splittable. Is that important to you? Commented Sep 9, 2017 at 19:53
  • 7
    "What can I do to increase the performance?" - not compress it? You don't actually say that you need it compressed, and not-doing-work is always faster than doing-work. Make disk the bottleneck. Commented Sep 10, 2017 at 2:18
  • 1
    pigz has been proposed, and I support the idea; I just wanted to add that the xz version, pixz, is about ten times slower, while offering negligible improvement to compression ratio. Commented Jun 23, 2020 at 10:29

5 Answers 5

73
+50

There are many compression algorithms around, and bzip2 is one of the slower ones. Plain gzip tends to be significantly faster, at usually not much worse compression. When speed is the most important, lzop is my favourite. Poor compression, but oh so fast.

I decided to have some fun and compare a few algorithms, including their parallel implementations. The input file is the output of pg_dumpall command on my workstation, a 1913 MB SQL file. The hardware is an older quad-core i5. The times are wall-clock times of just the compression. Parallel implementations are set to use all 4 cores. Table sorted by compression speed.

Algorithm     Compressed size        Compression          Decompression

lzop           398MB    20.8%      4.2s    455.6MB/s     3.1s    617.3MB/s
lz4            416MB    21.7%      4.5s    424.2MB/s     1.6s   1181.3MB/s
brotli (q0)    307MB    16.1%      7.3s    262.1MB/s     4.9s    390.5MB/s
brotli (q1)    234MB    12.2%      8.7s    220.0MB/s     4.9s    390.5MB/s
zstd           266MB    13.9%     11.9s    161.1MB/s     3.5s    539.5MB/s
pigz (x4)      232MB    12.1%     13.1s    146.1MB/s     4.2s    455.6MB/s
gzip           232MB    12.1%     39.1s     48.9MB/s     9.2s    208.0MB/s
lbzip2 (x4)    188MB     9.9%     42.0s     45.6MB/s    13.2s    144.9MB/s
pbzip2 (x4)    189MB     9.9%    117.5s     16.3MB/s    20.1s     95.2MB/s
bzip2          189MB     9.9%    273.4s      7.0MB/s    42.8s     44.7MB/s
pixz (x4)      132MB     6.9%    456.3s      4.2MB/s     7.9s    242.2MB/s
xz             132MB     6.9%   1027.8s      1.9MB/s    17.3s    110.6MB/s
brotli (q11)   141MB     7.4%   4979.2s      0.4MB/s     3.6s    531.6MB/s

If the 16 cores of your server are idle enough that all can be used for compression, pbzip2 will probably give you a very significant speed-up. But you need more speed still and you can tolerate ~20% larger files, gzip is probably your best bet.

Update: I added brotli (see TOOGAMs answer) results to the table. brotlis compression quality setting has a very large impact on compression ratio and speed, so I added three settings (q0, q1, and q11). The default is q11, but it is extremely slow, and still worse than xz. q1 looks very good though; the same compression ratio as gzip, but 4-5 times as fast!

Update: Added lbzip2 (see gmathts comment) and zstd (Johnny's comment) to the table, and sorted it by compression speed. lbzip2 puts the bzip2 family back in the running by compressing three times as fast as pbzip2 with a great compression ratio! zstd also looks reasonable but is beat by brotli (q1) in both ratio and speed.

My original conclusion that plain gzip is the best bet is starting to look almost silly. Although for ubiquity, it still can't be beat ;)

10
  • 2
    For a similar-ish table with far more algorithms, see mattmahoney.net/dc/text.html.
    – Danica
    Commented Sep 8, 2017 at 23:49
  • 1
    @Dougal Fair enough. My test is on similar data as the OP though (pg_dumpall output), so it's probably a bit more representative :)
    – marcelm
    Commented Sep 9, 2017 at 0:06
  • 2
    zstd is another one that's missing from the table -- for compressing our log files, I found that a single core zstd process outperforms 16 cores of pbzip2 with comparable compression ratios.
    – Johnny
    Commented Sep 10, 2017 at 6:20
  • 1
    lz4 is slighty faster and more efficient than lzop, by the way. It uses more RAM though, which is relevant in embedded systems.
    – Daniel B
    Commented Sep 10, 2017 at 19:06
  • 2
    If you are willing to test multi-threaded versions, you could try zstd -T4 too. For very fast settings, you can try zstd -T4 -1, as zstd defaults to -3, which is probably the setting you tested.
    – Cyan
    Commented Sep 11, 2017 at 18:21
46

Use pbzip2.

The manual says:

pbzip2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbzip2 can be decompressed with bzip2).

It auto-detects the number of processors you have and creates threads accordingly.

5
  • That is fine if you are compressing a file, this works horribly through a pipe though
    – camelccc
    Commented Sep 8, 2017 at 21:38
  • @camelccc Why do you say that? I don't find that to be the case, at all. You need a fast producer or a large buffer on the pipe in front of it for optimal performance, but that's equally true of pixz and pigz on a pipe as well. Commented Sep 8, 2017 at 22:22
  • Depends how large what he is compressing is. If you have a large buffer it's fine as you say, if you are piping something that's much larger than physical ram, I have found things can get rather more interesting. As you say probably true for any compression algorithm though.
    – camelccc
    Commented Sep 9, 2017 at 0:02
  • 5
    bzip2 can use a fair bit of ram, so running 16 bzip workers at a time could consume non-trivial ram, over 1GB. BTW, lbzip2 seems to give better speed, memory usage and marginally better compression than pbzip2. There are benchmarks here: vbtechsupport.com/1614
    – gmatht
    Commented Sep 9, 2017 at 4:51
  • @gmatht lbzip2 looks nice! I added it to my answer :)
    – marcelm
    Commented Sep 10, 2017 at 12:58
8

You didn't mention an operating system. If Windows, 7-Zip with ZStandard (Releases) is a version of 7-Zip that has been modified to provide support for using all of these algorithms.

1
  • Interesting, I had heard of brotli before, but I forgot about it. I added it to the table of benchmarks in my answer! I was actually a little disappointed with its performance, except at quality setting 1, where it provided the same compression ratio as gzip at a much higher speed.
    – marcelm
    Commented Sep 9, 2017 at 15:59
2

Use zstd. If it's good enough for Facebook, it's probably good enough for you as well.

On a more serious note, it's actually pretty good. I use it for everything now because it just works, and it lets you trade speed for ratio on a large scale (most often, speed matters more than size anyway since storage is cheap, but speed is a bottleneck).
At compression levels that achieve comparable overall compression as bzip2, it's significantly faster, and if you are willing to pay some extra in CPU time, you can almost achieve results similar to LZMA (although then it will be slower than bzip2). At sligthly worse compression ratios, it is much, much faster than bzip2 or any other mainstream alternative.

Now, your are compressing a SQL dump, which is just about as embarrassingly trivial to compress as it can be. Even the poorest compressors score well on that kind of data.
So you can run zstd with a lower compression level, which will run dozens of times faster and still achieve 95-99% the same compression on that data.

As a bonus, if you will be doing this often and want to invest some extra time, you can "train" the zstd compressor ahead of time, which increases both compression ratio and speed. Note that for training to work well, you will need to feed it individual records, not the whole thing. The way the tool works, it expects many small and somewhat similar samples for training, not one huge blob.

1
  • better still, use pzstd (parallel version) on multicore machines
    – borowis
    Commented Sep 7, 2018 at 21:11
1

It looks like adjusting (lowering) the block size can have a significant impact on the compression time.

Here are some results of the experiment I did on my machine. I used the time command to measure the execution time. input.txt is a ~250mb text file containing arbitrary json records.

Using the default (biggest) block size (--best merely selects the default behaviour):

# time cat input.txt | bzip2 --best > input-compressed-best.txt.bz

real    0m48.918s
user    0m48.397s
sys     0m0.767s

Using the smallest block size (--fast argument):

# time cat input.txt | bzip2 --fast > input-compressed-fast.txt.bz

real    0m33.859s
user    0m33.571s
sys     0m0.741s

This was a bit surprising discovery, considering that the documntation says:

Compression and decompression speed are virtually unaffected by block size

2
  • My current favorite is pbzip2. Have you tried this, too? This question is about an environment where 16 cores are available.
    – guettli
    Commented Jul 25, 2018 at 11:37
  • @guettli unfortunately I have to stick with bzip. I'm using it for Hadoop jobs and bzip is one of the built-in compression there. So in a way it's already parallelised. Commented Jul 26, 2018 at 12:30

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .