0

Here are my knowing about immutable data structures, especially immutable map, correct me if I'm wrong:

  1. When updating, they don't mutate internal structure but they create a copy version with updated data.
  2. They are usually implemented by tree-like data structures because copying a tree branch is much more efficient than copying a linear list.
  3. When there are many threads using them, their states among those threads are not synchronized. Some threads might be using an old version of them but the reading thread doesn't have to wait for the updating thread to finish.
  4. There is at least a lock is required to synchronize the assignments to a shared variable referencing to the data structure (but in Java assignment is atomic so this is not a problem).

If there are two threads t1 and t2 inserting to an immutable map the same key. Right before t1 done its job and updating the shared reference, t2 starts its job and receives the old version of the map whose key is not inserted yet and continue to insert the same key. Now the final version of the map depends on which thread updating the shared reference last. In both cases, there is an abandoned key-value which is cleaned soon by the garbage collector.

My question is how can I solve the problem when two threads are inserting the same key to an immutable map whose reference is shared among thread? Or using shared reference is wrong in the first place?

1
  • That's not possible on immutable Map by nature
    – cchantep
    Commented Nov 5, 2018 at 20:00

2 Answers 2

3

On terminology: there are two kinds of collections that are called 'immutable'.

  1. Collections that throw an exception on an attempt to call a mutator operation (like Guava's ImmutableMap).
  2. Collections that, when a mutator is called, create and return a new version of itself instead of mutating shared state (something like copy-on-write).

You are probably asking about the second type of 'immutability'.

I suppose that you have something like this:

private CopyOnWriteTree tree;

void insertValue(Value value) {
    tree = tree.insert();
}

In such a case, the usage of a shared reference does not seem useful.

If insertValue() is called by more than one thread using the same shared reference to object containing tree field, then the best thing you could do is serialize updates to this field using usual techniques like synchronized, volatile or AtomicReference, but one of them will be lost. (Please also note that reference update atomicity is not an issue here; the issue is that without such a synchronization, one thread is not even guaranteed to ever see updates made by other threads).

Copy-on-write collections are good for the cases when each thread works on its own copy of data and throws it away after the work is finished.

If you need both updates to remain, don't use a copy-on-write collection; instead try some concurrent collection (ConcurrentHashMap or something like that).

PS. I've just realised that the question was on Scala collections and I was answering like it were about Java. Nevertheless, the logic remains the same as memory model is the same.

2

The question breaks down into two parts. The first question is

What happens if two threads write to the same shared reference?

The answer is that it will probably break, and you shouldn't do it. If you want to have two threads writing to the same reference you need to protect it with a synchronisation mechanism such as AtomicReference.

The second question you need to ask is

Is it always safe for two threads to read an immutable object at the same time?

The answer to this is no.

An immutable object may contain mutable data and may therefore not be thread safe. You need to look a the specification of the object in question to determine whether it is thread safe or not.

Not the answer you're looking for? Browse other questions tagged or ask your own question.