7

So my understanding is that ethereum uses levelDB to store key-value pairs, where the values are serialized using RLP and the keys are the corresponding hashes. Why is there a need for a lookup system like the modified Merkle-Patricia Trie? Say, one needs to look up the account state for the account 0x12345678...abcdef. I could just store the value (i.e. account state with the nonce and storageRoot and what not) directly with the key (0x123...def) without the need to do a couple of db lookups (navigating down the trie) to eventually resolve the data at that account. My feeling is that it's not so much about the look-up as much more about the efficient Merkle-proofs that are possible with the Merkle-part of the Merkle-Patricia Trie. Or is there an actual argument for the radix part?

2 Answers 2

4
+50

Tries are not related to the lookups, but to how the blockchain operates.

Each block must hash the Ethereum virtual machine state to a single hash at the end of the block. Merkle tree makes updating this hash a few orders of magnitudes faster. Otherwise, all nodes would need to hash all ~200 GB of data in the EVM state for every block. With Merkle tree, you can do incremental updates.

Each block contains:

  • stateRoot: Hash of the root node of the world state trie (after all transactions are executed).
4
  • 2
    Thanks. Yes its huge. I synced ethereum once and it was like more than 500GB. So if the tries are not related to the lookups (thanks by the way for that insight, because in some places it says its about efficient look ups) and only for the hashing efficiency, then why doesnt one just create the new hash by: hash(oldhash, hash(all accounts that changed in the block)) It wouldnt require hashing all data again and doesnt need a trie or tree either. It would provide the security of temper resistance. But since the tree is still more efficient and if its not needed for lookups i understand it.
    – Marlo
    Commented Oct 27, 2023 at 21:13
  • I guess hash(oldhash, hash(all accounts that changed in the block)) would require that all nodes maintain all the historical blocks and state. You would need to walk through all the transactions in the block as opposite to download block headers? The opposite would be "the latest state as is". If you do no have a full state hash it would make writing state-verifying light clients impossible? Not an expert here. Commented Oct 28, 2023 at 10:34
  • Maybe this is helpful "By only having the block header with the Merkle root, the transaction it wants to verify, and the Merkle proof structure obtained from a trusted Bitcoin node, it can try to reconstruct the Merkle root hash and validate the transaction." medium.com/coinmonks/… Commented Oct 28, 2023 at 10:40
  • @MikkoOhtamaa Not sure it is correct that the trie is not for lookups. Considering key-value mappings in contract data, without some kind of trie or hash table or equivalent for the "storage slots", it would not be possible to locate the position of a key in the "merkle tree" without using some kind of index data in LevelDB, no?
    – BipedalJoe
    Commented Dec 17, 2023 at 23:18
3

If we had a simple table with key-value pairs, reading data was easy and quick with a complexity of 1. However, because we need to create a hash of the entire database, updating the data becomes more complicated as its complexity depends on the number of elements in the database. In the current data structure of Ethereum, the complexity of both read and write is determined by the height (per my recent observations for several random accounts, it is around 8) of the tree. The reason for hashing all the data is to show the state of the blockchain in every block header by stateRoot.

Additionally, there are other important parts of Ethereum based on this data structure, like the Snap protocol. Now, this protocol is crucial for nodes to synchronize their data. In this protocol, nodes request and share a portion of the tree. Nodes send both proof and data to each other. The proof is like a path from root of tree, and the receiver can verify the data simply by knowing the stateRoot due to the data structure's hashing property. Without this technique, syncing part of data would be almost impossible.

Not the answer you're looking for? Browse other questions tagged or ask your own question.