Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: weak: new package providing weak pointers #67552

Open
mknyszek opened this issue May 21, 2024 · 18 comments
Open

proposal: weak: new package providing weak pointers #67552

mknyszek opened this issue May 21, 2024 · 18 comments
Labels
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented May 21, 2024

What is a weak pointer?

Weak pointers (or weak references, as they are referred to in other languages) allow developers to reference memory without preventing the garbage collector from reclaiming it. To prevent visible dangling references, weak pointers become nil when the referenced memory is collected. Weak pointers can be converted to regular ("strong") pointers which do prevent the garbage collector from reclaiming memory, and allow typical use and access to that memory.

Weak pointers are typically a bit harder to work with than regular pointers because they can become nil at any time. Virtually every weak-to-strong conversion must be guarded by a nil check. Often, weak pointers will become nil at unexpected times.

Nonetheless, weak pointers exist in many languages because they are useful. The main use-cases for weak pointers are related to efficient memory management and reclamation. For example, efficiently managing memory for canonicalization maps, or for memory whose lifetime is bound to the lifetime of another object (akin to JavaScript's WeakMap). Another good use case for weak pointers is for hinting to the GC that it's OK to drop some resources because they're cheap to reconstruct later, especially if those resources have a large memory footprint.

Proposal

I propose adding a new weak package to the standard library with the following API.

// Pointer is a weak pointer to a value of type T.
//
// Two Pointer values compare equal if the pointers
// that they were created from compare equal. This property is retained even
// after the object referenced by the pointer used to create a weak reference
// is reclaimed.
//
// If multiple weak pointers are made to different offsets within same object
// (for example, pointers to different fields of the same struct), those pointers
// will not compare equal.
// If a weak pointer is created from an object that becomes unreachable, but is
// then resurrected due to a finalizer, that weak pointer will not compare equal
// with weak pointers created after resurrection.
//
// Calling Make with a nil pointer returns a weak pointer whose Value method
// always returns nil. The zero value of a Pointer behaves as if it was created
// by passing nil to Make and compares equal with such pointers.
type Pointer[T any] struct { ... }

// Make creates a weak pointer from a pointer to a value of type T.
func Make[T any](ptr *T) Pointer[T] { ... }

// Value returns the original pointer used to create the weak pointer.
// It returns nil if the value pointed to by the original pointer was reclaimed by
// the garbage collector.
// If a weak pointer points to an object with a finalizer, then Value will
// return nil as soon as the object's finalizer is queued for execution.
func (p Pointer[T]) Value() *T { ... }

Design discussion

Representation

The weak pointer we propose would be represented via an indirection object under the hood. This means that each pointer that has any weak pointers associated with it also has an associated 8 byte allocation that contains a single pointer. This pointer is the actual weak reference. This representation is very CPU efficient, since the GC can atomically set a single pointer to make all weak references to an object nil.

This 8 byte allocation is rarely a problem in practice since weak pointers are most often useful for memory efficiency anyway. For example, canonicalization maps are deduplicating memory already, so the extra 8 bytes per canonical copy is relatively cheap. However, it may be worth calling out this detail in the documentation as the C# language does.

This representation has other benefits. One benefit is that the API's equality semantics fall out very naturally. Weak pointers created from the same pointer remain equal even after that pointer no longer exists. This makes it possible to build maps keyed by weak pointers. Another benefit is its simplicity: very little of the GC has to change to support weak pointers.

Avoiding object resurrection

A subtle but important property of the weak pointer implementation is that it must not enable object resurrection. This would create many complications, including the fact that weakly-referenced objects would take 2 GC cycles to reclaim; and they could not be involved in cycles, or they would never be reclaimed (much like finalizers).

A simple way to implement weak references that maintains this property is to guard weak-to-strong conversions with ensuring the object is swept. A swept object cannot be resurrected, because it is always definitely reachable or unreachable, nothing in between. This means the weak-to-strong conversion path may need to sweep the object, but this happens at most once per GC cycle, and the Go runtime's sweeper is quite fast, so the cost shouldn't be noticeable in practice.

Why this specific interaction with finalizers?

The interaction between weak pointers and finalizers that resurrect objects is subtle. There are two choices: the weak pointer can go nil before the finalizer runs, or it can track the object's resurrection, and only go nil once the object is well and truly dead. In C# jargon, this is the difference between "short" weak reference and a "long" weak reference.

At first glance, the latter seems clearer since the weak pointer really only goes nil once nothing can reference the object. However, this semantics has a big problem, and it's subtle.

Today, the decision to resurrect an object in a finalizer is internal to an API’s implementation. If we allowed weak-to-strong conversions after finalization, then clients of an API could resurrect an object without the cooperation of its implementation, at which point the object's internals may be in an inconsistent state. This creates the potential for a race that is not immediately apparent to either the API authors or the client. Worse still, these races will be very rare and unlikely to be caught by testing, even with the race detector enabled. If the weak pointer goes nil before the finalizer runs, then such races are impossible.

It is telling that other garbage collected languages either make "short" weak references the default and do not recommend their "long" form (C#), or only offer the "short" form to begin with (Java).

Risks

Breaking the GC abstraction

The unique.Handle proposal explicitly rejected weak pointers as an alternative on the grounds that they are difficult to use. Fundamentally, this is because they break the GC abstraction by revealing the timing with which memory is actually reclaimed, and this is mainly a problem because that timing is unpredictable. The GC abstraction of "infinite memory" (only allocation, no apparent free) is important in general for composability.

While it is true that weak pointers are more difficult to use than regular pointers, we believe that we have picked an API (building on the experiences of other languages) that minimizes surprises, maintains composability, and has a simple implementation. This was discovered during the implementation of unique.Handle and it was compelling enough that it led to this proposal.

Although weak pointers will be misused in some cases, providing documentation, examples, and advice in the GC guide will go a long way to mitigating that.

@gopherbot gopherbot added this to the Proposal milestone May 21, 2024
@thediveo
Copy link

glad this is being resurrected *scnr*

@cespare
Copy link
Contributor

cespare commented May 21, 2024

How does this interact with the unique package (#62483)? Isn't it possible for a user to implement unique (or at least solve the same problems that unique solves) using weak pointers? If we add weak, will we regret having both weak and unique?

@mknyszek
Copy link
Contributor Author

@cespare I don't think we'll regret having both. The main reason to have unique in std is that canonicalization maps benefit from being as global as possible, both for performance (bigger potential for sharing data) and composability (handles are always comparable, regardless of their origin). That alone makes it worth it, I think.

A secondary reason is that it's not actually possible to safely implement the concurrent map underlying unique outside of the standard library today. #54670 (maphash.Comparable) would change that. While I think maphash.Comparable is still important for the ecosystem, it's not going to be easy to make that as efficient as the hashing the std maps use. (See the issue for a breakdown of the problem by Austin.)

Weak pointers are still useful for lots of other cases where globalness is less useful and maybe even not what you want at all. For example, implementing weak-keyed maps, which are generally used for attaching data to memory you don't own the lifecycle of. There's also plenty more reasonable ad-hoc use-cases that don't neatly fall into something std would support out of the box.

@Merovius
Copy link
Contributor

What does Make((*T)(nil)) do? Useless and the answer is probably "return a Pointer that always returns nil", but just to make sure.

@mknyszek
Copy link
Contributor Author

mknyszek commented May 22, 2024

@Merovius Yep, that's right. Thanks! Updated the proposal.

@apparentlymart
Copy link

Although a weak.Pointer that is immediately nil is a little strange, I've written programs with similar features in other languages where it's been helpful to represent an "optional weak pointer" without adding another level of indirection. I probably wouldn't complain too much if it weren't possible, but it has enough marginal use that I don't think it's worth overcomplicating things to forbid it.

(For example, a Rust analog would be std::rc::Weak::new(), which returns a Weak where upgrade always returns None, which is directly analogous to a weak.Pointer where Value always returns nil.)

Is the zero value of weak.Pointer[T] equivalent to what you'd get from passing nil to weak.Make?

@mknyszek
Copy link
Contributor Author

mknyszek commented May 23, 2024

Is the zero value of weak.Pointer[T] equivalent to what you'd get from passing nil to weak.Make?

From my perspective, I believe that's what the semantics would have to imply. In the implementation, yes, I'd probably just have it return the zero value of the weak pointer.

EDIT: Oh, it occurs to me that passing nil for a specific type could have a different value than the zero value. (Some per-type sentinel value.) I don't think we should do that. I think the zero value of a weak.Pointer[T] should compare equal with the result of weak.Make[T](nil). Updated the proposal.

@LukeXeon
Copy link

type Pointer[T any] struct {

I see "weak pointer" under the "internal" package. Is there any progress on this proposal currently?

@mknyszek
Copy link
Contributor Author

@LukeXeon This proposal is awaiting review. If accepted, the internal/weak package already contains an implementation that mostly implements the semantics laid out here. It could just be moved out of internal with a few minor tweaks. Note that, if the proposal is accepted, this would land in Go 1.24, not the upcoming Go 1.23 (it's already too late for Go 1.23).

@eihigh
Copy link

eihigh commented Jun 10, 2024

Unlike server programs that allocate and release memory for each request and persist it in the DB, a weak pointer is essential for client programs that hold multiple containers or partially release items. I feel that Go lacks support for such programs.

(As an aside, #64777 is another example of a feature needed for client programs)

@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

In our discussion yesterday, @bradfitz asked if weak pointers made it possible to maintain a map of expensive derived information about something without preventing the GC of that something. Here is a worked example of how weak pointer enables that:

var cache struct {
    mu sync.Mutex
    m map[weak.Pointer[Foo]]*derived
}

func init() {
    cache.m = make(map[weak.Pointer[Foo]]*derived)
}

func cached(f *Foo) *derived {
    cache.mu.Lock()
    defer cache.mu.Unlock()

    p := weak.Make(f)
    d := cache.m[p]
    if d != nil {
        return d
    }
    d = expensiveComputation(f)
    cache.m[p] = d
    runtime.AddCleanup(f, deleteP, p)
}

func deleteP(p weak.Pointer[Foo]) {
    cache.mu.Lock()
    defer cache.mu.Unlock()

    delete(cache.m, p)
}

I am assuming runtime.AddCleanup from #67535, which is guaranteed to work with any pointer f.
(SetFinalizer might break things if f already had a finalizer, or if f was inside another object.)

@thediveo
Copy link

@rsc can this please find its way into the official documentation or alternatively the Tour of Go? Such things really need to be easily accessible, and not buried only in an issue comment.

@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

I think it would make sense for this to be an example in the weak package, if the proposal is accepted. Putting it in the Go tour seems TMI.

@thediveo
Copy link

Yeah, it could otherwise become a "Tor-Tour of Go" (sorry, couldn't resist)

@rsc
Copy link
Contributor

rsc commented Jul 1, 2024

Following up on my comment from last week, here is a potential weak cache abstraction that avoids holding the lock during expensiveComputation and also wraps up the idioms nicely so you don't have to retype them. I am not proposing to add it to the package (others can argue for that if they want) but it's here as an example, and we can include it as an example in the package too.

type Cache[K any, V any] struct {
    f func(*K) V
    m atomic.Map[weak.Pointer[K], func() V]
}

func NewCache[K comparable, V any](f func(*K)V) *Cache[K, V] {
    return &Cache[K, V]{f: f}
}

func (c *Cache[K, V]) Get(k *K) V {
    kw := weak.Make(k)
    vf, ok := c.m.Load(kw)
    if ok {
        return vf()
    }
    vf = sync.OnceValue(func() V { return c.f(k) })
    vf, loaded := c.m.LoadOrStore(kw)
    if !loaded {
        // Stored kw→vf to c.m; add the cleanup.
        runtime.AddCleanup(k, c.cleanup, kw)
    }
    return vf()
}

func (c *Cache[K, V]) cleanup(kw weak.Pointer[K]) {
    c.m.Delete(kw)
}

var cached = NewCache(expensiveComputation)
@DmitriyMV
Copy link
Contributor

@rsc

Since the cache doesn't do weak -> strong pointer, and with current GC this should also work, no?

type Cache[K any, V any] struct {
    f func(*K) V
    m atomic.Map[uintptr, func() V]
}

func NewCache[K comparable, V any](f func(*K)V) *Cache[K, V] {
    return &Cache[K, V]{f: f}
}

func (c *Cache[K, V]) Get(k *K) V {
    kw := uintptr(unsafe.Pointer((k))
    vf, ok := c.m.Load(kw)
    if ok {
        return vf()
    }
    vf = sync.OnceValue(func() V { return c.f(k) })
    vf, loaded := c.m.LoadOrStore(kw)
    if !loaded {
        // Stored kw→vf to c.m; add the cleanup.
        runtime.AddCleanup(k, c.cleanup, kw)
    }
    return vf()
}

func (c *Cache[K, V]) cleanup(kw uintptr) {
    c.m.Delete(kw)
}

var cached = NewCache(expensiveComputation)
@rsc
Copy link
Contributor

rsc commented Jul 3, 2024

@DmitriyMV There is a race in that code. It could theoretically happen that the GC identifies k as garbage, puts it back on a free list for future allocation, queues the cleanup, and then the program allocates a new k, gets the recycled address, and reuses the stale value because the cleanup has not gotten a chance to run yet. (It's also not guaranteed to work in an implementation that moves objects, but objects aren't moving today except on the stack, and k escapes to the heap so it won't be a stack object in the current implementation.)

@rsc
Copy link
Contributor

rsc commented Jul 25, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment