Scala encourages immutability over mutability specifically because things like this happen. When you have val
variables, which can be changed, you can create race conditions due to changing values in memory that or may not already have been read by another thread that doesn't realize the change.
Doing sum in parallel like this causes the following to happen:
All threads being to call the function
* 3 threads read the value sum as 0,
* 1 thread writes sum + x
, which happens to be 34
, because its parallel, the addition happens in any order
* 1 more thread writes sum + x
, which it computes as 0 + 17
(assuming * it was 17) because it read the value 0 before it was written to memory
* 2 more threads read 17
* the last of the first three threads writes 0 + 9
, because it had read 0.
TLDR, the reads and writes to memory get out of sync because several threads may read while other are writing, and overwrite each others changes.
The solution is to find a way to do this in sequence, or leverage paralelization in a non destructive way. Functions like sum should be done in sequence, or in ways that always generate new values, for example, foldLeft:
Seq(1, 2, 3, 4).foldLeft(0){case (sum, newVal) => sum + newVal}
Or you could write a funciton that creates subsets of sums, adds them in paralel, and then adds all of those together in sequence:
Seq(1, 2, 3, 4, 5, 6, 7, 8).grouped(2).toSeq.par.map {
pair =>
pair.foldLeft(0){case (sum, newVal) => sum + newVal}
}.seq.foldLeft(0){case (sum, newVal) => sum + newVal}