40

I am running a long-lived Haskell program that holds on to a lot of memory. Running with +RTS -N5 -s -A25M (size of my L3 cache) I see:

715,584,711,208 bytes allocated in the heap
390,936,909,408 bytes copied during GC
  4,731,021,848 bytes maximum residency (745 sample(s))
     76,081,048 bytes maximum slop
           7146 MB total memory in use (0 MB lost due to fragmentation)

                                  Tot time (elapsed)  Avg pause  Max pause
Gen  0     24103 colls, 24103 par   240.99s   104.44s     0.0043s    0.0603s
Gen  1       745 colls,   744 par   2820.18s   619.27s     0.8312s    1.3200s

Parallel GC work balance: 50.36% (serial 0%, perfect 100%)

TASKS: 18 (1 bound, 17 peak workers (17 total), using -N5)

SPARKS: 1295 (1274 converted, 0 overflowed, 0 dud, 0 GC'd, 21 fizzled)

INIT    time    0.00s  (  0.00s elapsed)
MUT     time  475.11s  (454.19s elapsed)
GC      time  3061.18s  (723.71s elapsed)
EXIT    time    0.27s  (  0.50s elapsed)
Total   time  3536.57s  (1178.41s elapsed)

Alloc rate    1,506,148,218 bytes per MUT second

Productivity  13.4% of total user, 40.3% of total elapsed

The GC time is 87% of the total run time! I am running this on a system with a massive amount of RAM, but when I set a high -H value the performance was worse.

It seems that both -H and -A controls the size of gen 0, but what I would really like to do is increase the size of gen 1. What is the best way to do this?

5
  • 6
    This generally happens when lots of things survive the nursery generation unnecessarily, making them way more expensive to collect. The first thing I'd check for is a space leak preventing short-lived values from being collected immediately.
    – Carl
    Commented Jan 18, 2015 at 16:39
  • Related (an answer there may or may not help you, depending on what's going on in your case): stackoverflow.com/questions/27630833/…
    – dfeuer
    Commented Jan 18, 2015 at 23:01
  • 1
    Start by getting a heap profile. e.g. stackoverflow.com/a/3276557/83805 Commented Jan 19, 2015 at 17:38
  • 4
    Is this still relevant? Do you have a program that reproduces this behaviour?
    – Zeta
    Commented Jul 30, 2015 at 16:30
  • 1
    Can you publish a program that reproduces this issue?
    – sinelaw
    Commented Oct 12, 2015 at 8:16

1 Answer 1

1

As Carl suggested, you should check code for space leaks. I'll assume that your program really requires a lot of memory for good reason.

The program spent 2820.18s doing major GC. You can lower it by reducing either memory usage (not a case by the assumption) or number of major collections. You have a lot of free RAM, so you can try -Ffactor option:

 -Ffactor

    [Default: 2] This option controls the amount of memory reserved for
 the older generations (and in the case of a two space collector the size
 of the allocation area) as a factor of the amount of live data. For
 example, if there was 2M of live data in the oldest generation when we
 last collected it, then by default we'll wait until it grows to 4M before
 collecting it again.

In your case there is ~3G of live data. By default major GC will be triggered when heap grows to 6G. With -F3 it will be triggered when heap grows to 9G saving you ~1000s CPU time.

If most of the live data is static (e.g. never changes or changes slowly,) then you will be interested in stable heap. The idea is to exclude long living data from major GC. It can be achieved e.g. using compact normal forms, though it is not merged into GHC yet.

Not the answer you're looking for? Browse other questions tagged or ask your own question.