0

My input data comes in small chunks and the output is saved to a file on disk. I'm looking for the fastest strategy for both input and output.

  1. Does it make sense to create a bigger input buffer to accumulate more data before calling deflate(), or it's better to call deflate() for each small input chunk? What is the optimal buffer size for input, if any?

  2. What is faster: using deflate() and writing the output from memory to a file with fwrite(), or using a combo function like gzfwrite which writes directly to file?

  3. Is file mapping even faster than any of the 2. above?

  4. Is there a way to parallelize the compression in multiple threads?

1 Answer 1

3
  1. deflate accumulates the input data internally, so there wouldn't be much savings in accumulating it yourself before feeding it to deflate. (The story for inflate is different, where there is a significant advantage to feeding large amounts of compressed data to each inflate call.)

  2. There is no difference, and you have more control over what's going on if you use deflate directly. You can use the low-level i/o functions to avoid an extra level of buffering.

  3. I doubt that there would be any advantage, and there may be disadvantages, to using mmap() for a simple sequential write of a file. That's not what mmap() is for.

  4. Yes. See pigz for an example.

Not the answer you're looking for? Browse other questions tagged or ask your own question.