3

I read that TrieMap in scala is based on has array mapped trie, whike Vector reads bit mapped vector trie.

Are both darastructures backed by the same idea of a hash trie or is there a difference between these?

1 Answer 1

6

There are some similarities, but fundamentally they are different data structures:

Vector

There is no hashing involved in Vector. The index directly describes the path into the tree. And of course, the occupied indices of a vector are consecutive.

Disregarding all the trickery with the display pointers in the production implementation of scala.collection.immutable.Vector, every branch node in a vector except for the last one at a level has the same number of children (32 in case of the scala Vector). That allows indexing using simple bit manipulation. The downside is that splicing elements in the middle of a vector is expensive.

enter image description here

HashMap

In a HashTrieMap, the hash code is the path into the tree. That means that the occupied indices are not consecutive, but evenly distributed. This requires a different encoding of the tree branch nodes.

In a HashTrieMap, a branch node has up to 32 children (But if you have a very bad hash code distribution it is entirely possible to have a branch node with only one child). There is an Int bitmap to encode which child corresponds to which position, which means that looking up values in a HashTrieMap requires frequent calls to Integer.bitCount, which fortunately is a CPU intrinsic on modern CPUs.

enter image description here

Here is a fun project that allows you to look at the internals of scala data structures such as Vector and HashMap: https://github.com/stanch/reftree

The images in this answer were generated using this project.

Not the answer you're looking for? Browse other questions tagged or ask your own question.