1

I have List of users and I'm comparing each user to every other user by totaling their values. This works fine but when I try to parallelise I get multiple race conditions as the variable totalVar is accessed by multiple threads at same time.

Can I have some pointers on how to parallelise the addition of the values for each list ?

import scala.collection.JavaConversions._
import collection.JavaConverters._

object Question {

  case class User(id: String, value : BigInt) 

  var userList : java.util.List[User] = new java.util.ArrayList[User] 

  def main(args: Array[String]) {

      userList.add(new User("1" , 1))
      userList.add(new User("2" , 1))
      userList.add(new User("3" , 1))
      userList.add(new User("4" , 1))

      //Compare each user to every other user by totaling their
      //values
          var totalValue : BigInt = 0

      for(u <- userList.par){
        for(u1 <- userList.par){
           totalValue = totalValue + u1.value
           println("Adding "+u1.id+","+u1.value+ ","+totalValue)
        }
        println("Total is "+totalValue)
        totalValue = 0
      }

    }

}

Update :

I know this may seem like a pointless example but I'm facing a much more complicated problem and here I'm trying to ask the simplest question which will guide me in the direction of actual problem I am trying to resolve.

4
  • What I don't understand is why you do the same thing for each element of userList. This will just print the same stuff 4 times.
    – Carsten
    Commented Aug 22, 2013 at 13:56
  • There is a chance that this will work in parallel if you simply put .par on the outer-most for loop. You probably have a quite limited number of cores anyway, so parallelizing on a single list should be enough. If you have a problem like this which runs in n^2, you can probably parallelize it to run in n.
    – Felix
    Commented Aug 22, 2013 at 14:09
  • @user470184 by the way, are you okay to count user with itself? Because your current code compares every with every not every with everyone else
    – om-nom-nom
    Commented Aug 22, 2013 at 15:13
  • 1
    I recently learned this mistake myself, you should never mutuate anything outside the scope of a par loop. (VARS) If you do, that item has to be atomic (ie. synchronized). Think about any VAR as like a single phone line: if everyone is calling it at once, some will just get a busy signal and fail. But a synchronized object is kind of like a queue, where java will make the next caller wait, and so on. Or the more scala way is to "yield" the whole loop which seems to be a lot safer it can be done! See answer by @om-nom-nom Commented Aug 22, 2013 at 18:45

2 Answers 2

1

It is likely that you would not gain much by having two parallel collections (summing two integers is quite lightweight task and scheduling it over and over is not a good idea), it is easier to do with just one:

val total = 
  for(u <- userList.par) yield {
    var partialTotal = 0
    for(u1 <- userList){
      partialTotal += u1.value
      println("Adding "+u1.id+","+u1.value+ ","+totalValue)
    }
    partialTotal
  }.sum
2
  • if I was to removed the 'yield' keyword and not return 'partialTotal' would this affect parallelism ?
    – blue-sky
    Commented Aug 22, 2013 at 14:54
  • @user470184 it would affect correctness of you code: instead of List of partial counts of every user versus others you will get List of Unit values
    – om-nom-nom
    Commented Aug 22, 2013 at 14:56
0

Try using an AtomicInt which is synchronized:

  val totalValue : new AtomicInt()

  for(u <- userList.par){
    for(u1 <- userList.par){
       totalValue.incrementAndGet(u1.value)
    }
  }
1
  • Isn't using synchronized word is kinda wrong, when we talk about atomic objects?
    – om-nom-nom
    Commented Aug 22, 2013 at 14:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.