0

From what i understood Deflate is LZ77 and Huffman encoding but i have been looking at the structure of .zip file but i haven't seen where to extract information about Huffman tree that was used in compression. I suspect it might be stored in the compressed data but where exactly in it, its not specified anywhere.

My goal is to see if can do Deflate compression and decompression manually to better understand it before i start my python programming.

And frankly my knowledge is short on block sizes and Huffman tree, so that's where i want to try to understand.

1 Answer 1

0

Yes, the Huffman codes are stored at the start of every dynamic block in deflate compressed data. There may be many dynamic blocks, so many Huffman trees in one deflate compressed stream. Each dynamic block has three Huffman codes, one for literal/length codes, one for distance codes, and third that is used to compress the descriptions of the first two codes. Each Huffman code is represented not as a tree, but rather as the number of bits in the prefix code for each coded symbol.

It is most definitely "specified anywhere". See RFC 1951 for a detailed description, and infgen for a deflate stream disassembler which will allow you to view the detailed contents of a deflate stream.

4
  • this means if zip file is encrypted then Huffman codes are encrypted too? Commented Feb 16 at 17:56
  • Yes............
    – Mark Adler
    Commented Feb 16 at 19:54
  • to clarify, in dynamic blocks, the Huffman codes are indeed stored in the compressed data, but they are not at the start of every dynamic block; rather, they are part of the block header and are specific to the symbols being compressed in that particular block. Commented Feb 19 at 0:39
  • To clarify, no. A set of Huffman code descriptions are in fact at the start of every dynamic block. A dynamic block is a three-bit header identifying the block as dynamic and whether or not it is the last block, followed by the Huffman code descriptions, followed by a series of literal and length/distance codes, and lastly terminated by an end-of-block code.
    – Mark Adler
    Commented Feb 19 at 1:41

Not the answer you're looking for? Browse other questions tagged or ask your own question.