Why do we need for ParHashMap from Scala while there is ConcurrentHashMap from Java

Question

I considered two collections with a similar concept - ParHashMap from Scala and ConcurrentHashMap from Java. Both of them have the same time complexity and both of them are thread safe and lock-free, but they only are based on different concepts under the hood - trie and hash table accordingly. And this reasoning leads to question: why do we need for ParHashMap from Scala while there is ConcurrentHashMap from Java?

@talex You are wrong - en.wikipedia.org/wiki/Java_ConcurrentMap#Lock-free_atomicity — pacman, Commented Jan 9, 2017 at 23:08
@talex I think it used to be non-lock-free but that changed in Java 8. — Louis Wasserman, Commented Jan 9, 2017 at 23:11
@ Louis Wasserman ConcurrenHashMap always was lock-free, but in Java 7 it had an another structure with Node via leveraging a Lock stripping concurrent pattern, but in Java 8 it was changed - this approach was disabled and rb tree in case of collision would build. — pacman, Commented Jan 9, 2017 at 23:15
It provides lock-free queries, certain lock-free update operations and concurrent updates of different keys. But when performing atomic updates on the same key, synchronization is unavoidable. — Holger, Commented Jan 16, 2017 at 16:36

marios · Accepted Answer · 2017-11-09 03:23:43Z

5

ConcurrentHashMap is a thread safe Map<> implementation. If you have multiple threads accessing it at the same time they will be in sync.

ParHashMap is a parallel collection. If you execute operations here (like map(), filter(), aggregate()) Scala will parallelize it for you (similar to Spark but only within a single JVM).

To summarize, ConcurrentHashMap gives the primitive to synchronize threads for concurrency, ParHashMap takes care of both sync and execution.

Edit: Note that ParHashMap is not itself necessarily thread-safe. The idea is to call its methods from a single thread and let the parallelism be handled by the parallel data structure itself.

edited Nov 9, 2017 at 3:23

answered Jan 10, 2017 at 5:16

marios

8,9743 gold badges38 silver badges63 bronze badges

@ You said: "If you execute operations here (like map(), filter(), aggregate()) Scala will parallelize it for you" Doest it work similar to Spark parallelization mechanism?
– pacman
Commented Jan 10, 2017 at 9:44
1

Actually the inventor of Spark said that he drew inspiration from the parallel collection of Scala while inventing the RDD abstraction.
– marios
Commented Jan 10, 2017 at 17:30
2

I think your best bet for a thread safe mutable hashmap in scala is to use the implementation from here: stackoverflow.com/a/17542165/1553233. Using the ParHashMap for this is not the right tool for the job (even if you can force it to be).
– marios
Commented Jan 10, 2017 at 18:53
2

Starting with Java 8, ConcurrentHashMap also supports several parallel processing methods, see for the methods starting with forEach…, reduce… or search… (or generally all methods having a first parameter long parallelismThreshold).
– Holger
Commented Jan 16, 2017 at 16:45
1

I didn’t say it’s thread safe. Probably it’s not. I don’t think you would want to call this from multiple threads. It kind of beats the purpose.
– marios
Commented Nov 9, 2017 at 1:03

| Show 8 more comments

Collectives™ on Stack Overflow

Why do we need for ParHashMap from Scala while there is ConcurrentHashMap from Java

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
java
scala
data-structures
trie
concurrenthashmap
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged javascaladata-structurestrieconcurrenthashmap or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
java
scala
data-structures
trie
concurrenthashmap
or ask your own question.