4

I want to create a thread-safe container that uses a Scala Map as a backing store. Rather than expose the user to the underlying Map, I would rather only expose a subset of its methods.

Example might look something like the following...

class MyContainer[A] {

  def add(thing: A): Unit = {
    backingStore = backingStore + (thing.uuid -> thing)
  }

  def filter(p: A => Boolean): Option[Iterable[A]] = {
    val filteredThings = backingStore.values.filter(p)
    if (filteredThings.isEmpty) None else Some(filteredThings)
  }

  def remove(uuid: UUID): Option[A] = backingStore.get(uuid) match {
    case optionalThing @ Some(thing) =>
      backingStore = backingStore - uuid; optionalThing
    case None => None
  }

  @ volatile private[this] var backingStore = immutable.HashMap.empty[UUID, A]

}

...I suspect that even though the underlying backing store is immutable and its reference is volatile, the container is not thread-safe.

Suppose that I have two separate threads running with access to an instance of the above container. Thread 1 filters the underlying collection and gets some results; at the same time Thread 2 removes an item. The results that thread one has might contain a reference to the item that Thread 2 removed? There might be other problems.

Am I correct that the above implementation is not thread-safe? What would be the most idiomatic way to make the above thread-safe using Scala?

Edit: I would prefer to avoid blocking and synchronization if possible. If blocking/synchronization must be used then is the volatile reference needed? And what would be the point of the immutable collection? Couldn't I just as well use a mutable collection?

2 Answers 2

3

You are using a copy-on-write approach, so your problem of a concurrent read and write is that they are not strictly ordered, but that's not really a problem: it's simply a timing issue in that if A is writing while B is reading there is no guarantee about whether A will see B's edits.

Your real problem is when you have C and D writing simultaneously: then they can both read the same starting map, update their own copies and then write only their own edits. Whoever writes first will have their changes overwritten.

Consider a starting map containing (A,B), and threads C and D adding entries 'C' and 'D' trespectively, while threads E anf F read the map; all this happenning concurrently. One possible reuslt is:

C reads map (A,B)
D reads map (A,B)
C writes map (A,B,C)
E reads map (A,B,C)
D writes map (A, B, D)
F reads map (A, B, D)

The 'C' entry appeared trnasiently and was then lost forever.

The only way to reliably sequence the writes is to ensure it is never entered concurrently. Either with a synchronize locke nforce single entry the write block or ensure it is serialised by using a single Akka actor to perform updates.

You need to synchronize reads also if you care about ordering of reads vs writes, but if you have multiple threads accessing this, that's unlikely to be a real concern.

3
  • @ladams If I understand you correctly I need to sychronize the add/remove (i.e., write) methods; but for read methods like filter I do not need to synchronize (unless I really care about the ordering of read and writes).
    – davidrpugh
    Commented Sep 18, 2016 at 10:17
  • @ladams If I only sychronize the add and remove methods, then I still need to annotate the reference to the backing store as @volatile correct?
    – davidrpugh
    Commented Sep 19, 2016 at 1:56
  • Yes, you need to synchronize both the add/remove methods. Marking the reference as volatile helps the timeliness of update visibility: if you don't sdo this, a reader thread may retain a reference to an old version of the backing store for an indeterminate amount of time after other thread(s) have updated it, potentially multiple times. To ensure visibility to other threads asap you need to do two things: make the writer cause a memory write from processor cache to main memory (which happens on exit of a synchronize block), make the reader read it into its cache, which volatile forces.
    – Iadams
    Commented Sep 19, 2016 at 7:45
1

Am I correct that the above implementation is not thread-safe?

Yes. It is not thread safe. But it does have the right memory visibility semantics.

For simplicity you could make it thread safe by:

class MyContainer[A <: {def uuid: UUID}] {

  def add(thing: A): Unit = this.synchronized{
    backingStore = backingStore + (thing.uuid -> thing)
  }

  def filter(p: A => Boolean): Option[Iterable[A]] = this.synchronized{
    val filteredThings = backingStore.values.filter(p)
    if (filteredThings.isEmpty) None else Some(filteredThings)
  }

  def remove(uuid: UUID): Option[A] = this.synchronized{
    backingStore.get(uuid) match {
      case optionalThing @ Some(thing) =>
        backingStore = backingStore - uuid; optionalThing
      case None => None
    }
  }

  import scala.collection.immutable.HashMap
  private[this] var backingStore = HashMap.empty[UUID, A]
}
7
  • What does this syntax mean: [A <: {def uuid: UUID}] ? Have never seen this.
    – Samar
    Commented Sep 16, 2016 at 10:05
  • @Samar It means that a structural type containing a method called uuid is an upper bound to A. Commented Sep 16, 2016 at 10:44
  • Enlightened :) Thanks Yuval!
    – Samar
    Commented Sep 16, 2016 at 10:48
  • @Jatin Am I correct that this.synchronized blocks? Is it possible to solve this problem without blocking? Note that I would be open to a redesign of MyContainer if it is possible to avoid blocking.
    – davidrpugh
    Commented Sep 16, 2016 at 17:27
  • @davidrpugh You could avoid some blocking by using docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/…. You could have read locks at filter and write lock in remove and add. You could instead have a spin based lock instead of blocking. But in that case it would keep spinning till it finds access to lock
    – Jatin
    Commented Sep 16, 2016 at 18:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.