Scala Parallel Collections

Scala Parallel Collections
Aleksandar Prokopec
EPFL

Scala collections
for {
s <- surnames
n <- names
if s endsWith n
} yield (n, s)
McDonald

Scala collections
for {
s <- surnames
n <- names
if s endsWith n
} yield (n, s)
1040 ms

Scala parallel collections
for {
s <- surnames
n <- names
if s endsWith n
} yield (n, s)

for {
s <- surnames.par
n <- names.par
if s endsWith n
} yield (n, s)

for {
s <- surnames.par
n <- names.par
if s endsWith n
} yield (n, s)
2 cores
575 ms

for {
s <- surnames.par
n <- names.par
if s endsWith n
} yield (n, s)
4 cores
305 ms

for comprehensions
surnames.par.flatMap { s =>
names.par
.filter(n => s endsWith n)
.map(n => (n, s))
}

for comprehensions nested parallelized bulk operations
names.par
.filter(n => s endsWith n)
.map(n => (n, s))
}

Nested parallelism parallel within parallel
composition
surnameToCollection(s)
// may invoke parallel ops
}

Nested parallelism going recursive
def vowel(c: Char): Boolean = ...

def gen(n: Int, acc: Seq[String]): Seq[String] =
if (n == 0) acc

if (n == 0) acc
else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
recursive algorithms

if (n == 0) acc
if (s.length == 0) s + c

if (n == 0) acc
else if (vowel(s.last) && !vowel(c)) s + c
else if (!vowel(s.last) && vowel(c)) s + c

if (n == 0) acc
else s
gen(5, Array(""))

if (n == 0) acc
else s
gen(5, Array(""))
1545 ms

def gen(n: Int, acc: ParSeq[String]): ParSeq[String] =
if (n == 0) acc
else s
gen(5, ParArray(""))

if (n == 0) acc
else s
1 core
1575 ms

if (n == 0) acc
else s
2 cores
809 ms

if (n == 0) acc
else s
4 cores
530 ms

So, I just use par and I’m home free?

Character count use case for foldLeft
val txt: String = ...
txt.foldLeft(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}

6
5
4
3
2
1
0
txt.foldLeft(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
going left to right - not parallelizable!
A
B
C
D
E
F
_ + 1

txt.foldLeft(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
going left to right – not really necessary
3
2
1
0
A
B
C
_ + 1
3
2
1
0
D
E
F
_ + 1
_ + _
6

Character count in parallel
txt.fold(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}

Character count in parallel
txt.fold(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
3
2
1
A
B
C
_ + 1
3
2
1
A
B
C
: (Int, Char) => Int

Character count fold not applicable
txt.fold(0) {
case (a, ‘ ‘) => a
case (a, c) => a + 1
}
3
2
1
A
B
C
_ + _
3
3
3
2
1
A
B
C
! (Int, Int) => Int

Character count use case for aggregate
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)

3
2
1
A
B
C
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)
_ + _
3
3
3
2
1
A
B
C
_ + 1

aggregation  element
3
2
1
A
B
C
_ + _
3
3
3
2
1
A
B
C
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)
_ + 1

aggregation  aggregation
aggregation  element
3
2
1
A
B
C
_ + _
3
3
3
2
1
A
B
C
txt.aggregate(0)({
case (a, ‘ ‘) => a
case (a, c) => a + 1
}, _ + _)
_ + 1

Word count another use case for foldLeft
txt.foldLeft((0, true)) {
case ((wc, _), ' ') => (wc, true)
case ((wc, true), x) => (wc + 1, false)
case ((wc, false), x) => (wc, false)
}

Word count initial accumulation
case ((wc, _), ' ') => (wc, true)
}
0 words so far
last character was a space
“Folding me softly.”

Word count a space
case ((wc, _), ' ') => (wc, true)
}
last seen character is a space

Word count a non space
case ((wc, _), ' ') => (wc, true)
}
last seen character was a space – a new word

Word count a non space
case ((wc, _), ' ') => (wc, true)
}
last seen character wasn’t a space – no new word

Word count in parallel
“softly.“
“Folding me “
P1
P2

“softly.“
“Folding me “
wc = 2; rs = 1
wc = 1; ls = 0

P1
P2

“softly.“
“Folding me “
wc = 2; rs = 1
wc = 1; ls = 0

wc = 3
P1
P2

Word count must assume arbitrary partitions
“g me softly.“
“Foldin“
wc = 1; rs = 0
wc = 3; ls = 0

P1
P2

Word count must assume arbitrary partitions
“g me softly.“
“Foldin“
wc = 1; rs = 0
wc = 3; ls = 0

P1
P2
wc = 3

Word count initial aggregation
txt.par.aggregate((0, 0, 0))

# spaces on the left
# spaces on the right
#words

# spaces on the left
# spaces on the right
#words
””

Word count aggregation  aggregation
...
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
““
“Folding me“

“softly.“
““


...
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, 0), (0, rwc, rrs)) =>
(lls, lwc + rwc - 1, rrs)
“e softly.“
“Folding m“


...
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
case ((lls, lwc, _), (_, rwc, rrs)) =>
(lls, lwc + rwc, rrs)
“ softly.“
“Folding me”


Word count aggregation  element
txt.par.aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
”_”
0 words and a space – add one more space each side

case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
” m”
0 words and a non-space – one word, no spaces on the right side

case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
” me_”
nonzero words and a space – one more space on the right side

case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, 0), c) => (ls, wc, 0)
” me sof”
nonzero words, last non-space and current non-space – no change

case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, 0), c) => (ls, wc, 0)
case ((ls, wc, rs), c) => (ls, wc + 1, 0)
” me s”
nonzero words, last space and current non-space – one more word

case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, 0), c) => (ls, wc, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
})

Word count using parallel strings?
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, 0), c) => (ls, wc, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
})

Word count string not really parallelizable
scala> (txt: String).par

collection.parallel.ParSeq[Char] = ParArray(…)

different internal representation!

ParArray

ParArray
 copy string contents into an array

Conversions going parallel
// `par` is efficient for...
mutable.{Array, ArrayBuffer, ArraySeq}
mutable.{HashMap, HashSet}
immutable.{Vector, Range}
immutable.{HashMap, HashSet}

// `par` is efficient for...
mutable.{Array, ArrayBuffer, ArraySeq}
mutable.{HashMap, HashSet}
immutable.{Vector, Range}
immutable.{HashMap, HashSet}
most other collections construct a new parallel collection!

sequential
parallel
Array, ArrayBuffer, ArraySeq
mutable.ParArray
mutable.HashMap
mutable.ParHashMap
mutable.HashSet
mutable.ParHashSet
immutable.Vector
immutable.ParVector
immutable.Range
immutable.ParRange
immutable.HashMap
immutable.ParHashMap
immutable.HashSet
immutable.ParHashSet

// `seq` is always efficient
ParArray(1, 2, 3).seq
List(1, 2, 3, 4).seq
ParHashMap(1 -> 2, 3 -> 4).seq
”abcd”.seq
// `par` may not be...
”abcd”.par

Custom collection
class ParString(val str: String)

Custom collection
extends parallel.immutable.ParSeq[Char] {

Custom collection
def apply(i: Int) = str.charAt(i)
def length = str.length

Custom collection
def seq = new WrappedString(str)

Custom collection
def splitter: Splitter[Char]

Custom collection
def splitter =
new ParStringSplitter(0, str.length)

Custom collection splitter definition
class ParStringSplitter(var i: Int, len: Int)
extends Splitter[Char] {

Custom collection splitters are iterators
class ParStringSplitter(i: Int, len: Int)
extends Splitter[Char] {
def hasNext = i < len
def next = {
val r = str.charAt(i)
i += 1
r
}

Custom collection splitters must be duplicated
...
def dup = new ParStringSplitter(i, len)

Custom collection splitters know how many elements remain
...
def dup = new ParStringSplitter(i, len)
def remaining = len - i

Custom collection splitters can be split
...
def psplit(sizes: Int*): Seq[ParStringSplitter] = {
val splitted = new ArrayBuffer[ParStringSplitter]
for (sz <- sizes) {
val next = (i + sz) min ntl
splitted += new ParStringSplitter(i, next)
i = next
}
splitted
}

Word count now with parallel strings
new ParString(txt).aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, 0), c) => (ls, wc, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
})

Word count performance
case ((wc, _), ' ') => (wc, true)
}
new ParString(txt).aggregate((0, 0, 0))({
case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
case ((ls, 0, _), c) => (ls, 1, 0)
case ((ls, wc, 0), c) => (ls, wc, 0)
}, {
case ((0, 0, 0), res) => res
case (res, (0, 0, 0)) => res
})
100 ms
cores: 1 2 4
time: 137 ms 70 ms 35 ms

Hierarchy
GenTraversable
GenIterable
GenSeq
Traversable
Iterable
Seq
ParIterable
ParSeq

Hierarchy
def nonEmpty(sq: Seq[String]) = {
val res = new mutable.ArrayBuffer[String]()
for (s <- sq) {
if (s.nonEmpty) res += s
}
res
}

Hierarchy
def nonEmpty(sq: ParSeq[String]) = {
for (s <- sq) {
}
res
}

Hierarchy
for (s <- sq) {
}
res
}
side-effects!
ArrayBuffer is not synchronized!

Hierarchy
for (s <- sq) {
}
res
}
side-effects!
ArrayBuffer is not synchronized!
ParSeq
Seq

Hierarchy
def nonEmpty(sq: GenSeq[String]) = {
for (s <- sq) {
if (s.nonEmpty) res.synchronized {
res += s
}
}
res
}

Accessors vs. transformers some methods need more than just splitters
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …

These return collections!

Sequential collections – builders

Sequential collections – builders
Parallel collections – combiners

Builders building a sequential collection
1
2
3
4
5
6
7
Nil
Nil
ListBuilder
+=
+=
+=
result

Combiners building parallel collections
trait Combiner[-Elem, +To]
extends Builder[Elem, To] {
def combine[N <: Elem, NewTo >: To]
(other: Combiner[N, NewTo]):
Combiner[N, NewTo]
}

Combiner[N, NewTo]
}
Combiner
Combiner
Combiner

Combiner[N, NewTo]
}
Should be efficient – O(log n) worst case

Combiner[N, NewTo]
}
How to implement this combine?

Parallel arrays
1, 2, 3, 4
5, 6, 7, 8
4
6, 8
3, 1, 8, 0
2, 2, 1, 9
8, 0
2, 2
merge
merge
merge
copy
allocate
2
4
6
8
8
0
2
2

Parallel hash tables
ParHashMap

ParHashMap
0
1
2
4
5
7
8
9
e.g. calling filter

ParHashMap
0
1
2
4
5
7
8
9
ParHashCombiner
ParHashCombiner
e.g. calling filter

ParHashMap
0
1
2
4
5
7
8
9
ParHashCombiner
0
1
4
ParHashCombiner
5
7
9

ParHashMap
0
1
2
4
5
7
8
9
ParHashCombiner
0
1
4
ParHashCombiner
5
9
5
7
0
1
4
7
9

ParHashMap
ParHashCombiner
ParHashCombiner
How to merge?
5
7
0
1
4
9

5
7
8
9
1
4
0
buckets!
ParHashCombiner
ParHashCombiner
ParHashMap
2
0 = 00002
1 = 00012
4 = 01002

ParHashCombiner
ParHashCombiner
0
1
4
9
7
5
combine

ParHashCombiner
ParHashCombiner
9
7
5
0
1
4
ParHashCombiner
no copying!

9
7
5
0
1
4
ParHashCombiner

9
7
5
0
1
4
ParHashMap

Custom combiners for methods returning custom collections
new ParString(txt).filter(_ != ‘ ‘)
What is the return type here?

creates a ParVector!

creates a ParVector!
...

extends immutable.ParSeq[Char]
with ParSeqLike[Char, ParString, WrappedString]
{
...

{
...
protected[this] override def newCombiner
: Combiner[Char, ParString]

{
...
protected[this] override def newCombiner =
new ParStringCombiner

class ParStringCombiner
extends Combiner[Char, ParString] {

var size = 0

var size = 0
size

var size = 0
val chunks = ArrayBuffer(new StringBuilder)
size

var size = 0
size
chunks

var size = 0
var lastc = chunks.last
size
chunks

var size = 0
size
lastc
chunks

var size = 0
def +=(elem: Char) = {
lastc += elem
size += 1
this
}

var size = 0
def +=(elem: Char) = {
lastc += elem
size += 1
this
}
size
lastc
chunks
+1

...
def combine[U <: Char, NewTo >: ParString]
(other: Combiner[U, NewTo]) = other match {
case psc: ParStringCombiner =>
sz += that.sz
chunks ++= that.chunks
lastc = chunks.last
this
}

...
def combine[U <: Char, NewTo >: ParString]
(other: Combiner[U, NewTo])
lastc
chunks
lastc
chunks

...
def result = {
val rsb = new StringBuilder
for (sb <- chunks) rsb.append(sb)
new ParString(rsb.toString)
}
...

...
def result = ...
lastc
chunks
StringBuilder

Custom combiners for methods expecting implicit builder factories
// only for big boys
...
with GenericParTemplate[T, ParColl]
...
object ParColl extends ParFactory[ParColl] {
implicit def canCombineFrom[T] =
new GenericCanCombineFrom[T]
...

Custom combiners performance measurement
txt.filter(_ != ‘ ‘)

106 ms

106 ms
1 core
125 ms

106 ms
1 core
125 ms
2 cores
81 ms

106 ms
1 core
125 ms
2 cores
81 ms
4 cores
56 ms

1 core
125 ms
2 cores
81 ms
4 cores
56 ms
t/ms
proc
125 ms
1
2
4
81 ms
56 ms

1 core
125 ms
2 cores
81 ms
4 cores
56 ms
t/ms
proc
125 ms
1
2
4
81 ms
56 ms
def result
(not parallelized)

Custom combiners tricky!
•two-step evaluation
–parallelize the result method in combiners
•efficient merge operation
–binomial heaps, ropes, etc.
•concurrent data structures
–non-blocking scalable insertion operation
–we’re working on this

Future work coming up
•concurrent data structures
•more efficient vectors
•custom task pools
•user defined scheduling
•parallel bulk in-place modifications

Thank you!
Examples at:
git://github.com/axel22/sd.git

Scala Parallel Collections

More Related Content

Scala Parallel Collections