6

In the yellow paper(section 4.1, storageRoot) at it is mentioned

storageRoot: A 256-bit hash of the root node of a Merkle Patricia tree that encodes the storage contents of the account (a mapping between 256-bit integer values), encoded into the trie as a mapping from the Keccak 256-bit hash of the 256-bit integer keys to the RLP-encoded 256-bit integer values. The hash is formally denoted σ[a]_s.

What I can understand from the above is that for all key value pair of storage mapping (k,v), (Kecak(k), RLP(v)) will be used in the modified merkle trie of the storage where Keccak(k) will act as a key for the value RLP(v)

My questions are:

1.In the storage contents of the account what are these (k,v) i.e keys and values?

An explanation with an example would be really helpful.

2.Why does Keccak(k) is used as a key of the Merkle patricia tree instead of directly using the k?

Thanks in advance.

1 Answer 1

5

In the storage contents of the account what are these (k,v) i.e keys and values?

The contract code is free to use these as it wishes. For example, in this Solidity code the compiler will take index 0 for var1 and index 1 for var2:

pragma solidity 0.4.20;

contract Test {

  uint public var1;
  address public var2;

  function test() public {
    var1 = 5;
    var2 = address(6);
  }
}

So the (k,v) pairs after calling test():

(0,5)
(1,6)

You can find more details in this article: https://medium.com/@hayeah/diving-into-the-ethereum-vm-part-2-storage-layout-bc5349cb11b7

2.Why does Keccak(k) is used as a key of the Merkle patricia tree instead of directly using the k?

It's explained in the Design Rationale document in Ethereum's wiki:

Using sha3(k) as the key in the "secure tree" (used in the state and account storage tries): this makes it much more difficult to DoS the trie by setting up maximally unfavorable chains of diverge nodes 64 levels deep and repeatedly calling SLOAD and SSTORE on them. Note that this makes it more difficult to enumerate the tree; if you want to have enumeration capability in your client, the simplest approach is to maintain a database mapping sha3(k) -> k.


Previous answer

I'm not sure but my guess is that using hashes results in a more balanced trie. If you imagine a Patricia trie where entries are added by the index sequentially, one side of the trie will often be heavier than the other. When using hashes, on the other hand, new elements are added into a random position. Insertions/deletions/lookups give more consistent performance in a balanced trie.

It's also possible that using hashes will produce a smaller-sized trie because presumably there will be more leaf and extension nodes, while in index-based trie there will be more branch nodes.

To demonstrate how using contiguous indexes as keys results in an unbalanced trie let's imagine a situation where at some level of the trie there is a branch node where only the 1st and the 2nd nibbles are poiting to some other nodes:

<hash0> branch [<hash1>, <hash2>, NULL,..<13 NULLs>, value]

If this branch is at level n of the trie (there can be 64 levels because keys are 32 bytes, root is at level 0), then it will take 2 ^ ((64 - n) * 4 - 8) insertions into the trie until the 3rd nibble becomes non NULL.

For example if this branch is at the root you will need to insert 2 ^ 248 keys before the 3rd nibble becomes non NULL which is just impossible (it's impossible that such a branch will be at the root level in the first place though, because it will take 2 ^ 247 keys to reach to this state, but it's possible for higher levels of the trie). In general, it's much more likely that left-side nibbles will be non NULL than the right-side nibbles which means the trie is heavier on the left side.

3
  • Thank you for answering. For the first part is there any sources where I can read in more detail? For the 2nd part, for sure the branch nodes will be of larger size, but what I am getting is if we use contiguous indexes to store the value, we will get a more balanced trie in case of using directly the k values as keys for the Merkle tree.
    – sourav
    Commented Feb 18, 2018 at 12:30
  • @sourav I updated the answer with a link about the storage layout in Solidity. For the 2nd part I added my thoughts in the Update section below. Not sure my logic is correct though, you can point out if you find flaws in it. Commented Feb 18, 2018 at 13:52
  • @sourav found the answer why hashed keys are used github.com/ethereum/wiki/wiki/Design-Rationale. It's for DoS protection. Updated my answer. Commented Feb 25, 2018 at 15:59

Not the answer you're looking for? Browse other questions tagged or ask your own question.