0

Why by adding a println statement in the foreach function is changing results?

var sum = 0
val list = (1 to 100).toList.par
 list.tasksupport = 
   new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(4))
 list.foreach ((x: Int) => { println (x,sum); sum += x})
 //5050
 println (sum)
 sum = 0
 list.foreach ((x: Int) => sum += x)
 //results vary
 println (sum)

2 Answers 2

2

Thats a race condition, since List is a parallel Collection foreach will run in parallel and mutate the un-synchronised variable sum.

Now why it is printing the right result in the first foreach? Because of println inside the block, remove it and you will encounter data race.

println delegates to PrintStream.println which has a synchronized block inside.

 public void println(Object x) {
    String s = String.valueOf(x);
    synchronized (this) {
        print(s);
        newLine();
    }
}

Btw, thats not a good way for parallelising sum.

1
  • Thanks for the reply. Is there a way to trace the mutation as in second foreach (tracking when the variable sum gets updated in which thread)? That was my original goal. Commented May 2, 2016 at 23:35
0

Scala encourages immutability over mutability specifically because things like this happen. When you have val variables, which can be changed, you can create race conditions due to changing values in memory that or may not already have been read by another thread that doesn't realize the change.

Doing sum in parallel like this causes the following to happen: All threads being to call the function * 3 threads read the value sum as 0, * 1 thread writes sum + x, which happens to be 34, because its parallel, the addition happens in any order * 1 more thread writes sum + x, which it computes as 0 + 17 (assuming * it was 17) because it read the value 0 before it was written to memory * 2 more threads read 17 * the last of the first three threads writes 0 + 9, because it had read 0.

TLDR, the reads and writes to memory get out of sync because several threads may read while other are writing, and overwrite each others changes.

The solution is to find a way to do this in sequence, or leverage paralelization in a non destructive way. Functions like sum should be done in sequence, or in ways that always generate new values, for example, foldLeft:

Seq(1, 2, 3, 4).foldLeft(0){case (sum, newVal) => sum + newVal}

Or you could write a funciton that creates subsets of sums, adds them in paralel, and then adds all of those together in sequence:

Seq(1, 2, 3, 4, 5, 6, 7, 8).grouped(2).toSeq.par.map {
  pair =>
   pair.foldLeft(0){case (sum, newVal) => sum + newVal}
}.seq.foldLeft(0){case (sum, newVal) => sum + newVal}

Not the answer you're looking for? Browse other questions tagged or ask your own question.