10

I've been reading about Merkel Patricia Trie and I think that I understand how it works, but it looks a bit overcomplicated to me, which suggests that I do not understand it fully.

Here is how Merkle Patricia Trie looks like:

https://i.sstatic.net/YZGxe.png

Merkel Patricia Trie is similar to a regular Trie data structure but has three types of nodes

  • Extension Node - contains a part of the key that is common for multiple nodes and contains a reference to a next node
  • Branch Node - contains pointers to different nodes that have the same key prefix
  • Leaf Node - contains a value and the remaining part of the key (called key-end). A key of a leaf node is a concatenation of a prefix from all its parents and the key-end in a leaf node.

Since an Extension Node has only a single pointer to another node, it seems that it will always point to a Branch Node. If it were pointing to a Leaf Node, these nodes could be combined into a single Leaf Node. And if it were pointing to another Extension Node, then these two Extension Nodes could be combined into a single Extension Node.

Why do we need to separate Extension Node and Branch Node? Can all branches from a Branch Node be embedded directly into an Extension Node? In this case, we would have just:

  • Extension Node - that would contain a common prefix for two or more other nodes and would have pointers to other nodes
  • Leaf Node - that would contain value and the remaining part of the key

If I understand it correctly this would:

  • remove one level of indirection
  • improve performance, since there would be fewer lookups in the LevelDB database

Am I missing something?

2 Answers 2

4

I think, I have an idea. An important property of the Modified Merkle Patricia Tree is that it’s versioned. You can take a tree version 1 with root hash H1, insert new key-value pairs into the tree, and you will get a new tree version 2 with root hash H2. However, H1 will still be in the database, will still be accessible, and will still denote the complete tree version 1. This stricture is also very space-efficient, because most of the nodes will stay the same and will be shared between tree versions. Only the nodes along the path of added key-value pairs will be new.

Now, branch nodes are much bigger (17 elements) than extension nodes (2 elements). (This is, as I understand, the only way to distinguish one from another.) Because of that, we would like to try to keep them unmodified as much as we can when updating the tree to a new version.

Imagine now that you have a combined extension-branch node N1 that has a prefix 12345. We are inserting a new value with the key 123ffaaee. We would have to:

  • Replace N1 with a new branch node N2 with prefix 123
  • Insert a new leaf node N3 for N2 branch f (with prefix faaee)
  • Copy N1 to N4, modify only its prefix from 12345 to 5 and attach it to N2 branch 4.

Now we have two big extension-branch nodes, N1 and N4, with the same branching part and only different prefixes. If we would have used separate extension and branch nodes, we wouldn’t have to duplicate the branch node, only its extension parent node would have been split into two.

An important architectural decision here is that we much-much-much less care about the performance of the tree than about its footprint (which only grows in size with every transaction, has to be kept on every full Ethereum node, and therefore pose a threat to the scalability of Ethereum).

However, I also see a flaw in my logic: if there is a branch node with a prefix, big chances are that it does not have a lot of branches. In which case it will not have a large footprint, because the RLP encoding will encode every null branch with a single byte.

0

Probably an anticlimactic answer, but the things is that the Yellow Paper kinda-seems to leave the workings of the trie as an implementation detail. It says in Appendix D:

The core of the trie, and its sole requirement in terms of the protocol specification is to provide a single value that identifies a given set of key-value pairs, which may be either a 32-byte sequence or the empty byte sequence. It is left as an implementation consideration to store and maintain the structure of the trie in a manner that allows effective and efficient realisation of the protocol.

... though just after saying that it goes on to talk about implementation optimizations like those 3 kinds of nodes.

So your idea could be another possible implementation. Which one would be best? Probably only some study would decide.

A number of design decisions in Ethereum could have been better, particularly with the benefit of hindsight. This might be one of them (or not).

Not the answer you're looking for? Browse other questions tagged or ask your own question.