Scala has a TrieMap collection.
What is a TrieMap and what is its advantages/disadvantages compared to a HashMap?
Scala has a TrieMap collection.
What is a TrieMap and what is its advantages/disadvantages compared to a HashMap?
A Scala TrieMap
is a trie-based concurrent scalable map implementation. Unlike normal trie maps, a Scala TrieMap
has an efficient, non-blocking, O(1) time snapshot
operation (and a slightly optimized readOnlySnapshot
) operation.
Absolute performance of a TrieMap
is slightly below the JDK8 ConcurrentHashMap
, but the advantage is that it provides consistent iterators, something that concurrent data structures typically do not have. This means that you can capture all the elements in the trie at one point in time (performance numbers and analysis here). You should use the TrieMap
if you need to capture all the elements at once (e.g. to list all its elements in the UI, or to consistently analyze them).
O(1) time snapshot operation
interpreted.
TrieMap
will be created in a constant amount of time (i.e. in the same amount of time independent of how many keys you stored). Later, when you try to access some of the keys in that snapshot or in the original TrieMap
, the relevant parts of the TrieMap
will be lazily duplicated.
the relevant parts of the TrieMap will be lazily duplicated.
Does that mean when the key is accessed from the clone/snapshot, it is no longer O(1) get()
as in hashmap? What will be the run time complexity for this lazy duplication
? the height of the underlying tree like O(lgN)?
get
is never O(1)
in the standard Scala TrieMap - it's always expected O(log n)
, but with a low constant factor (because the nodes are very wide). This is the case both with and without lazy duplication. The similar data structure that has expected O(1)
operations is called cache-trie: dl.acm.org/doi/abs/10.1145/3178487.3178498 (but this is not implemented in the Scala stdlib, it's available on github as a standalone repo) If you added snapshots to cache-tries, the get
before duplication would be O(1)
, but after the duplication, it would briefly be O(log n)
.
TrieMaps are Maps using trie data structure which are essentially shallow trees. For example, If you have a 32 bit hash you break it up into sections for example 4 times 8 and at every level of the tree you branch to up to 256 sub trees. Obviously this gives O(1) performance due to the fixe size of hash(assuming few collisions).
A trie structure can be made immutable efficently, reusing the structure of a trie to create a new trie with an added or removed element. The relative performance in time/memory affect on GC depend greatly on implementation and load so rather then attempt a generic answer I would say run a benchmark. Though for a single thread with no immutability requirement a classic hashmap will normally produce better average performance and inferior worst case performance.
As a side note I will mention since the trieMap also uses a hash it is also a hashmap so I would recommend calling it a trie backed hashmap vs array backed hashmap.