0

Suppose I have an array of ~10K elements and I need to process all elements of the array. I would like to process them in such a way that only K elements are processed in parallel.

I use Scala 2.9. I tried parallel collections (see below) but I saw more than K elements processed in parallel.

import collection.parallel.ForkJoinTasks.defaultForkJoinPool._
val old = getParallelism
setParallelism(K)
val result = myArray.par.map(...) // process the array in parallel
setParallelism(old)

How would you suggest process an array in Scala 2.9 in such a way that only K elements are processed in parallel ?

1 Answer 1

1

The setParallelism method sets the recommended number of parallel workers that fork/join pool of the parallel collection is supposed to use. Those K workers may work on any part of the collection - it is up to the scheduler to decide which elements the workers will be assigned to.

If you would like to include only first K elements in the parallel operation, you should use the take method, followed by a map:

myArray.par.take(K).map(...)

You can alternatively use view.take(K).map(...).force to create a parallel view before doing the mapping.

2
  • Thanks. Could you please elaborate a bit on using view ?
    – Michael
    Commented Jun 26, 2013 at 15:11
  • 1
    The idea behind views is to postpone evaluating the intermediate collections that would otherwise be created in memory using take, map or filter until the collection is forced. Due to certain abstraction penalties and indirections, this may or may not increase performance, depending on what the transformations are. You can read more about it here: docs.scala-lang.org/overviews/collections/views.html
    – axel22
    Commented Jun 26, 2013 at 15:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.