The breakOut
mentioned in the other answer resolves to a builder factory for the collection of the expected type of map
. The expected type of map
is mutable.Map[Int, Boolean]
.
Since the builder factory is provided by a sequential collection, the collect
will not proceed in parallel:
scala> val cond1: Int => Boolean = _ % 2 == 0
cond1: Int => Boolean = <function1>
scala> val dataList = 1 to 10
dataList: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val map: mutable.Map[Int,Boolean] = dataList.par.collect{case p if cond1(p) => println(Thread.currentThread); (p, true)}(breakOut)
Thread[Thread-8,5,main]
Thread[Thread-8,5,main]
Thread[Thread-8,5,main]
Thread[Thread-8,5,main]
Thread[Thread-8,5,main]
map: scala.collection.mutable.Map[Int,Boolean] = Map(10 -> true, 8 -> true, 4 -> true, 6 -> true, 2 -> true)
You can see that from the thread name - the thread should contain a name ForkJoin
-something.
The Correct Way
The correct way to do it should be to first use the breakOut
with the expected type being a parallel map, so that the collect
proceeds in parallel:
scala> val map: parallel.mutable.ParMap[Int,Boolean] = dataList.par.collect{case p if cond1(p) => println(Thread.currentThread);(p, true)}(breakOut)
Thread[Thread-9,5,main]
Thread[Thread-9,5,main]
Thread[Thread-9,5,main]
Thread[Thread-9,5,main]
Thread[Thread-9,5,main]
map: scala.collection.parallel.mutable.ParMap[Int,Boolean] = ParHashMap(10 -> true, 8 -> true, 4 -> true, 6 -> true, 2 -> true)
and then call seq
on the result of collect
, since seq
is always O(1)
.
UPDATE: just checked - this seems to work correctly with trunk, but not with 2.9.1.final.
The Patch
But, as you can see, this doesn't work either because it is a bug, and will be fixed in the next version of Scala. A workaround:
scala> val map: parallel.mutable.ParMap[Int, Boolean] = dataList.par.collect{case p if cond1(p) => println(Thread.currentThread);(p, true)}.map(x => x)(breakOut)
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-0,5,main]
Thread[ForkJoinPool-1-worker-8,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
map: scala.collection.parallel.mutable.ParMap[Int,Boolean] = ParHashMap(10 -> true, 8 -> true, 4 -> true, 6 -> true, 2 -> true)
scala> val sqmap = map.seq
sqmap: scala.collection.mutable.Map[Int,Boolean] = Map(10 -> true, 8 -> true, 4 -> true, 6 -> true, 2 -> true)
With a note that the final map
will currently be done sequentially.
Alternatively, if just a parallel.ParMap
is ok with you, you can do:
scala> val map: Map[Int, Boolean] = dataList.par.collect{case p if cond1(p) => println(Thread.currentThread);(p, true)}.toMap.seq
Thread[ForkJoinPool-1-worker-2,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
Thread[ForkJoinPool-1-worker-8,5,main]
map: scala.collection.Map[Int,Boolean] = Map(10 -> true, 6 -> true, 2 -> true, 8 -> true, 4 -> true)