5

I have an AWS instance. I would like to run a bunch of tasks, some memory and cpu intensive. Ideally, I would like to compute timing information on each task. If I run them in serial, it computes accurate timing information, but it's slow. If I run them in parallel, the whole thing is faster, but individual tasks are slower, as reported by both wall time and thread CPU time.

This slowdown increases as the number of threads increases up to the number of CPUs

Cursory examination with ghc-events-analyze and +RTS -s suggests that the source of the slowdown is (unsurprisingly) GC pauses. Playing with RTS options reveals that +RTS -qg -qb -qa -A256m (disabling parallel GC, disabling load balancing GC, disabling thread migration, and increasing the GC allocation area) improves this, but does not completely eliminate it.

I am running threads using forkIO, but the threads are independent and pure apart from printing progress information. I'm using parallel-io to manage the number of running threads, but when I briefly tried a more conventional approach of having a fixed pool of threads and a task queue, I still had this problem.

Any suggestions for how to debug?

EDIT:

@jberryman asked for an example. Each of the tasks looks like the below code

computation params = do
  !x <- force params
  print $ "Starting computation on " ++ show params
  t1 <- getCPUTime
  !y <- fmap force $ do $
    ...some work with x ...
  t2 <- getCPUTime
  print $ "Finished computation on " ++ show params
  return (t2 - t1, y)
3
  • Can you post some code that exhibits the issue? I'm having trouble even understanding what you mean in the first paragraph re. "compute timing information"
    – jberryman
    Commented Nov 2, 2016 at 18:27
  • 1
    And you compiled with -threaded and are running with -N? An actual executable program that exhibits the issue is what I was hoping for
    – jberryman
    Commented Nov 2, 2016 at 19:49
  • Yes. In fact, the -N parameter is the only thing I'm changing. I can't provide the actual code. I'll see if I can get build an MWE, but I'm not hopeful.
    – Alex R
    Commented Nov 2, 2016 at 20:08

1 Answer 1

3

Since the tasks are all independent, and you're on an AWS instance (which is probably Linux), you'll probably have better results using forkProcess. This way, each process will have its own GC pool, which will get freed up when the process exits, and the parent doesn't have to worry about holding more than the process IDs for the children and waiting for them to die.

Not the answer you're looking for? Browse other questions tagged or ask your own question.