How does DEFLATE optimize this so much?

Question

I am trying to understand the deflate algorithm, and I have read up on Huffman codes as well as LZ77 compression. I was toying around with compression sizes of different strings, and I stumbled across something I could not explain. The string aaa when compressed, both through zlib and gzip, turns out to be the same size as aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (36 as).

Before reading about this I would have assumed the compressor does something like storing 36*a instead of each character individually, but I could not find anywhere in the specifications where that is mentioned.

Using fixed Huffman code yielded the same result, so I assume the space-saving lies in LZ77, but that only uses distance-length pairs. How would that allow a 3 length string to expand by 12 times without increasing in size?

Interrupting the string of as with one or several bs in the middle drastically increases the size. If distance-length pairs is what's doing the job, why could it not just skip over the bs when searching backwards? Or is Huffman codes being utilized and I misunderstood what fixed Huffman codes implies?

Mark Adler · Accepted Answer · 2021-12-12 20:55:52Z

1

The 36 "a"s are effectively run-length encoded by LZ77 by giving the first "a" as a literal, and then a match with a distance of one, and a length of 35. The length can be as much as 258 for deflate.

Look online for tutorials on LZ77, Huffman coding, and deflate. You can disassemble the resulting compressed data with infgen to get more insight into how the data is being represented.

edited Dec 12, 2021 at 20:55

answered Dec 12, 2021 at 17:57

Mark Adler

109k15 gold badges127 silver badges168 bronze badges

Ah that makes sense, so its basically just looping the search-buffer. Thanks!
– Blupper
Commented Dec 12, 2021 at 18:35

Add a comment |

Collectives™ on Stack Overflow

How does DEFLATE optimize this so much?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
compression
gzip
zlib
deflate
lz77
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged compressiongzipzlibdeflatelz77 or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
compression
gzip
zlib
deflate
lz77
or ask your own question.