Below code is implementation of running some code for every user in a list. The code is just comparing each user and concatenating their attributes :
case class UserObj(id: String, nCoordinate : String)
val userList = List(UserObj("a1" , "1234"),UserObj("a2" , "1234"), UserObj("a3" , "1234"))
val map1 = new java.util.concurrent.ConcurrentHashMap[String, Double]
userList.par.map(xUser => {
userList.par.map(yUser => {
if (!xUser.id.isEmpty() && !yUser.id.isEmpty()) {
println("Total is "+xUser.id+yUser.id+","+xUser.nCoordinate+yUser.nCoordinate)
map1.put(xUser.id + "," + yUser.id , getJaccardDistance(xUser.nCoordinate, yUser.nCoordinate))
}
})
println("")
}) //> Total is a1a1,12341234
//| Total is a3a1,12341234
//| Total is a2a1,12341234
//| Total is a3a2,12341234
//| Total is a1a2,12341234
//| Total is a3a3,12341234
//| Total is a2a2,12341234
//|
//| Total is a1a3,12341234
//| Total is a2a3,12341234
//|
//|
//| res0: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((), (), (
//| ))
def getJaccardDistance(str1: String, str2: String) = {
val zipped = str1.zip(str2)
val numberOfEqualSequences = zipped.count(_ == ('1', '1')) * 2
val p = zipped.count(_ == ('1', '1')).toFloat * 2
val q = zipped.count(_ == ('1', '0')).toFloat * 2
val r = zipped.count(_ == ('0', '1')).toFloat * 2
val s = zipped.count(_ == ('0', '0')).toFloat * 2
(q + r) / (p + q + r)
}
This was previously an imperative solution :
for (xUser <- userList) {
for (yUser <- userList) {
if (!xUser.id.isEmpty() && !yUser.id.isEmpty()) {
println("Total is "+xUser.id+yUser.id+","+xUser.nCoordinate+yUser.nCoordinate)
}
}
println("")
}
But I want to make use of Scala's parallel collections and I think using map the recommended method of achieving this. As imperative code above could result in multiple threads running same code. Note : the above code being executed : println("Total is "+xUser.id+yUser.id+","+xUser.nCoordinate+yUser.nCoordinate)
is just a simpler version of the algorithm being actually run.
The functional solution posted at beginning of question behaves as expected but once the list contains more thatn 3000 elements it almost crawls to a halt. Why is this occurring ? Is my implementation correct ?