8

I was reading on Wikipedia about hash trees, and I don't understand the benefits or purposes of this structure - they seem to require more hashes than just one per leaf with no significant use of the extra hashes.

For example, the use case on wikipedia is that they are used to validate data received in a P2P system. But why is this better than having a one-to-one mapping of blocks numbers and their hashes, without having the tree structure?

Could someone please explain how and why hash trees are useful?

Thanks in advance,

Moshe

1 Answer 1

12
  1. Hash trees can be computed in parallel. If you have two blocks of data to hash, you can use two processors to compute the hash twice as fast. This only works if your hash speed is lower than your IO speed, which is unlikely.

  2. Hash trees can be computed from hashes of individual blocks, or from hashes of larger sections that are aligned correctly. This is important.

For example, if I want to send you a file, I can break it up into chunks of 1 MiB and send you each chunk with its SHA-256 hash. If the hash for any of the individual chunks is incorrect, then you can ask for that chunk again. At the end, I can sign the tree hash for the file and send you the signed hash. You can verify the hash just by hashing each of the block hashes (which you already verified), which is a lot faster than rehashing the entire file.

Why use a tree hash?

A tree hash is advantageous any time that you want to compute the hash of both a portion of a file and the entire file. Using a regular hash like SHA-256, you will have to hash the file chunk and the entire file separately. If the file is 8 GiB, this might take quite some time. With a tree hash, because the hash of the chunk is used to compute the hash of the file, it takes no extra work to compute both hashes.

How much extra work is a tree hash?

The "extra work" for computing a tree hash is actually minimal. Yes, it does require computing extra hashes -- but only O(1) extra work. If your block size is 1 MiB, then the extra work is approximately zero if your file is 1 MiB or smaller. As the data size increases, the amount of extra work will approach 1 extra hash of two hashes for every block of data -- for SHA-256, the core will only be evaluated two extra times per 1 MiB of data at most (once for the input hashes, once for the padding). That's not very much.

4
  • You can verify the tree hash just by hashing each of the block hashes (which you already verified), which is a lot faster than rehashing the entire file. Then why don't we just find the hash collectively from all the block hashes we have, instead of constructing a tree?? Like for example, if the hashes of block1...blockN are h1...hn, then why don't we just do hash(h1+h2+...+hn), instead of constructing a tree with unnecessary intermediate internal hashes (internal nodes)? Can you please explain the necessity to use a tree?
    – Gokul NC
    Commented Jul 9, 2017 at 11:50
  • @GokulNC: That would require storing and transmitting all the intermediate hashes. With a tree, you only need to store/transmit the parent node hashes. You certainly could do it the way you describe, but I don't see what the benefit would be. Commented Jul 10, 2017 at 0:24
  • I think I didn't convey my question with proper words. I meant to ask, why do we calculate the hashes level by level (in the tree, using intermediate levels) to get the parent node hash, when instead we can do that by combining all hashes (of blocks/leaves) in a single step and computing the hash of the combination? Why tree?
    – Gokul NC
    Commented Jul 10, 2017 at 11:22
  • @GokulNC: That's exactly what I thought you asked. Using only one intermediate level would mean that you would have to store / transmit all of the intermediate hashes in order to compute the hash of the entire file, i.e., you would need to keep around the h1...hn until you can roll them up. For a 1TB file, that's about 32MB of space you need to verify the hash. That's a lot of extra storage space, but what's the benefit? And then you are stuck with the 1 MiB block size, you can't e.g. change it to 2 or 4 like you can with the tree hash. Commented Jul 10, 2017 at 12:35

Not the answer you're looking for? Browse other questions tagged or ask your own question.