If garbage collection always sacrifice performance in multi-threaded languages, then how do some languages approch this problem?

Question

Mark-and-Sweep garbage collection need to work over the graph of all objects in the process, and would stall threads to make the operation safe with regard to threads. I can think of no way to work around this and make it efficient.

But how are some system/application-programming languages solving it? The specific question I'm asking is, how do (the reference implementation, or if the details're not available, the secondary open-source implementation), Microsoft C#, Sun/Oracle Java, and Apple Swift, these 3 languages approach the problem?

There are various garbage collection algorithms (such as the train algorithm) that are more suited to minmizing stalls. — Chris Dodd, Commented Feb 7 at 2:53
This is a very broad question without a single answer. Probably the most important optimisation is generational garbage collection. Beyond that, the overhead of tracing may be less than you think, because it is amortised and may enable more efficient (time-efficient and cache-efficient) allocation strategies than piecemeal allocation. If you are interested in gc implementation I recommend reading the gc handbook. — Moonchild, Commented Feb 7 at 9:24
"I want to know how the specific 3 languages I've enumerated approach the problem" – They don't. They leave it up to the implementation. None of the three specific languages you mentioned require garbage collection, and none of them even mention it in their specifications. It is perfectly legal to implement either Java or C# without GC. — Jörg W Mittag, Commented Feb 7 at 13:00
@JörgWMittag Reminds me of the original Lisp Machine garbage collector: Reboot every day or so. The OS was probably unstable enough that it took care of that for you automatically. — Barmar, Commented Feb 7 at 21:05
@JörgWMittag I hate to be contradictory but I think you'll find that every version of the C# specification mentions garbage collection literally on page one of the spec. The spec does not require any specific garbage collection regime in a conforming implementation. But C# was specifically designed for runtimes that have GCs. — Eric Lippert, Commented Feb 8 at 1:05

dan04 · Accepted Answer · 2024-02-08 01:32:59Z

Technically speaking, garbage collection is a feature of the runtime environment, not the language. As for specifics:

The Microsoft Common Language Runtime (CLR) and Java Virtual Machine both divide their heaps into “generations”, based on the idea that most objects will be small and short-lived, and can be allocated and collected independently of long-lived objects. This removes the need “to work over the graph of all objects in the process”, since most of the time only the most recent generation will be collected.
Swift is different, basic its memory management around automatic reference counting.
CPython uses a hybrid approach, relying mainly on reference counting, but with an additional garbage collector to detect the cyclic garbage that reference counting does not.

Mark-and-Sweep garbage collection...would stall threads to make the operation safe with regard to threads.

This is a common problem. Some runtimes let you choose different GC algorithms, or fine-tune parameters like the heap size or number of GC threads, but ultimately, programmers of GC-ed languages have just learned to live with the occasional GC pauses.

One popular optimization (if you can call it that) is the null garbage collector. That is, if your program never actually allocates more memory than your computer has, then its legal to simply never collect the garbage, and just leave it to the OS to free the process's memory when it exits.

Appreciations for mentioning CPython, it's such good a thinking out of the box. — DannyNiu, Commented Feb 8 at 3:23
The null collector is particularly useful in conjunction with process-based concurrency. A trick we used at FB in previous instances of the Hack compiler was to farm program analysis of some subset of files out to as many processes as the physical hardware had processors. Since each process (we hope) did only a relatively small amount of work, we simply turned the GC off for those processes and let the OS deal with it. — Eric Lippert, Commented Feb 8 at 4:22
(Not all analysis tasks were amenable to this partitioning scheme and some compiler processes did start running out of memory as the codebase grew. Preventing that looming disaster was a whole other story. :) ) — Eric Lippert, Commented Feb 8 at 4:24
One amusing use of the null garbage collector was in a guided missile, which was simply designed to explode before it ran out of memory. — dan04, Commented Feb 8 at 16:50

Eric Lippert · Accepted Answer · 2024-02-08 04:32:50Z

Mark-and-Sweep garbage collection need to work over the graph of all objects in the process.

It does not! Generational GC works over the graph of all objects in a generation. Objects that survive one generation are promoted to the next, which is GC'd less often. You therefore get fast reclamation of short-lived temporaries, and infrequent scans of long-lived objects.

and would stall threads to make the operation safe with regard to threads

There are clever things you can do to avoid stop-the-world collections, but generally we just live with the tax.

I can think of no way to work around this and make it efficient.

"Efficiency" is work done divided by cost, so before you spend a lot of time thinking about ways to improve efficiency, start by defining what your metrics are for "work" and "cost".

Mark-and-sweep GCs are efficient enough for the vast majority of business processes; in cases where collection stalls are frequent enough to exceed user-focused performance budgets, my usual advice is to start by reducing collection pressure via pooling or other strategies. We did a lot of that when building the Roslyn C# compiler.

But how are some system/application-programming languages solving it?

I once asked a gardener at Oxford how they got their lawn so nice. "Rake the dew off every morning, mow it once a week, run a lawn roller over it once a month, for 400 years". You too can have a great lawn; start today.

The .NET GC is one of the most heavily tuned pieces of software on the planet. Patrick Dussud did a great job in the initial performance tuning and there has been literally two decades of a team of experts continuing to tweak it.

You solve this performance problem the same way you solve all foundational subsystem performance problems. (1) think about performance of foundational subsystems up front, during design, (2) hire industry leading experts (3) measure measure measure measure measure.... (4) make careful changes designed to achieve particular goals, (5) go back to step 3. Do that for 20 years and you too will have a great GC.

Reminds me of a meme where one guy asks another how he wrote such good code, and he replies, "it's not my code." Standing on the shoulders of giants. — Seggan, Commented Feb 8 at 16:13
@Seggan: That reminds me of the Peter Sellers movie where he asks the hotel manager "does your dog bite?" and the manager says "no", and then the dog bites him and he says "I thought you said your dog does not bite!" "That sir, is not my dog." — Eric Lippert, Commented Feb 9 at 1:33

Oscar Smith · Accepted Answer · 2024-02-07 17:31:21Z

The basic answer is that solving this is hard and a balancing act between different priorities. the other part of the answer is that it's not just garbage collection that suffers in multithreaded environments. manual memory management gets a lot harder also. Some of the strategies to consider are

keep as many values as possible off the heap (immutable data generally can be moved to stack or registers by a sufficiently smart compiler)
multithreaded gc so the pauses are shorter
use a concurrent gc where most of the gc time isn't done inside the lock.
atomic reference counting.

Pebal · Accepted Answer · 2024-02-16 13:28:33Z

0

You can have a fully concurrent GC that does not stop the mutator threads at all. Look at the SGCL for C++.

A variation of the three-color mark and sweep algorithm was used. The GC engine work in a separate thread. Separate stacks are created for pointers so that the mutator thread does not have to be stopped while the GC scans its stack. Simple atomic write barriers allow mutator to modify pointers while GC marks live objects.

edited Feb 16 at 13:28

answered Feb 16 at 9:13

Pebal

311 silver badge3 bronze badges

1

$\begingroup$ Please disclose your affiliation to the linked library as per the help center. $\endgroup$
– MisterMiyagi
Commented Feb 16 at 9:52
$\begingroup$ I do not hide my affiliation to this library. I also don't understand what significance it has for the topic. It is just an example of an implementation of a non-blocking GC algorithm. $\endgroup$
– Pebal
Commented Feb 16 at 10:44
1

$\begingroup$ I don't mind a self-promoting answer as long as they solve, or demonstrate their idea of solving said problems. My questions for you here are: do you intend to and are capable of maintain your library for long-term? is your library in alpha, beta, golden-master, or production status? $\endgroup$
– DannyNiu
Commented Feb 16 at 11:04
$\begingroup$ This library was created to test whether it is possible to implement an efficient, fully non-blocking GC. Its status is "it seems to work", I treat it as a testing ground. $\endgroup$
– Pebal
Commented Feb 16 at 11:16
1

$\begingroup$ Your code is a 2.6k line of monologue (well-beyond my self-claimed ~300 optimum), can you provide a sketch of how you view and approach the problem? $\endgroup$
– DannyNiu
Commented Feb 16 at 11:58

| Show 1 more comment

Stack Exchange Network

If garbage collection always sacrifice performance in multi-threaded languages, then how do some languages approch this problem?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
memory-management
.

Hot Network Questions

If garbage collection always sacrifice performance in multi-threaded languages, then how do some languages approch this problem?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged memory-management.

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
memory-management
.