15

The following code looks rather harmless on first sight. A user uses the function bar() to interact with some library functionality. (This may have even worked for a long time since bar() returned a reference to a non-temporary value or similar.) Now however it is simply returning a new instance of B. B again has a function a() that returns a reference to an object of the iterateable type A. The user wants to query this object which leads to a segfault since the temporary B object returned by bar() is destroyed before the iteration begins.

I am indecisive who (library or user) is to blame for this. All library provided classes look clean to me and certainly aren't doing anything different (returning references to members, returning stack instances, ...) than so much other code out there does. The user doesn't seem to do anything wrong as well, he's just iterating over some object without doing anything concerning that objects lifetime.

(A related question might be: Should one establish the general rule that code shouldn't "range-based-for-iterate" over something that is retrieved by more than one chained calls in the loop header since any of these calls might return a rvalue?)

#include <algorithm>
#include <iostream>

// "Library code"
struct A
{
    A():
        v{0,1,2}
    {
        std::cout << "A()" << std::endl;
    }

    ~A()
    {
        std::cout << "~A()" << std::endl;
    }

    int * begin()
    {
        return &v[0];
    }

    int * end()
    {
        return &v[3];
    }

    int v[3];
};

struct B
{
    A m_a;

    A & a()
    {
        return m_a;
    }
};

B bar()
{
    return B();
}

// User code
int main()
{
    for( auto i : bar().a() )
    {
        std::cout << i << std::endl;
    }
}
5
  • 6
    When you figured out who to blame, what will be the next step? Yelling at him/her?
    – JensG
    Commented Nov 9, 2014 at 15:02
  • 8
    No, why would I? I am actually more interested to know where the thought process of developing this "program" failed to avoid this problem in the future.
    – hllnll
    Commented Nov 9, 2014 at 15:34
  • This has nothing to do with rvalues, or range-based for loops, but with the user not understanding the object lifetime properly.
    – James
    Commented Nov 9, 2014 at 16:45
  • Site-remark: This is CWG 900 which was closed as Not A Defect. Maybe the minutes contain some discussion.
    – dyp
    Commented Nov 9, 2014 at 21:06
  • 8
    Who is to blame for this? Bjarne Stroustrup and Dennis Ritchie, first and foremost. Commented Nov 9, 2014 at 21:57

2 Answers 2

14

I think the fundamental problem is a combination of language features (or lack thereof) of C++. Both the library code and the client code are reasonable (as evidenced by the fact that the problem is far from obvious). If the lifetime of the temporary B was suitable extended (to the end of the loop) there would be no problem.

Making temporaries life just long enough, and no longer, is extremely hard. Not even a rather ad-hoc "all temporaries involved in the creation of the range for a range-based for live until the end of the loop" would be without side effects. Consider the case of B::a() returning a range that's independent of the B object by value. Then the temporary B can be discarded immediately. Even if one could precisely identify the cases where a lifetime extension is necessary, as these cases are not obvious to programmers, the effect (destructors called much later) would be surprising and perhaps an equally subtle source of bugs.

It would be more desirable to just detect and forbid such nonsense, forcing the programmer to explicitly elevate bar() to a local variable. This is not possible in C++11, and probably never will be possible because it requires annotations. Rust does this, where the signature of .a() would be:

fn a<'x>(bar: &'x B) -> &'x A { bar.a }
// If we make it as explicit as possible, or
fn a(&self) -> &A { self.a }
// if we make it a method and rely on lifetime elision.

Here 'x is a lifetime variable or region, which is a symbolic name for the period of time a resource is available. Frankly, lifetimes are hard to explain -- or we haven't yet figured out the best explanation -- so I will restrict myself to the minimum necessary for this example and refer the inclined reader to the official documentation.

The borrow checker would notice that the result of bar().a() needs to live as long as the loop runs. Phrased as a constraint on the lifetime 'x, we write: 'loop <= 'x. It would also notice that the receiver of the method call, bar(), is a temporary. The two pointers are associated with the same lifetime, hence 'x <= 'temp is another constraint.

These two constraints are contradictory! We need 'loop <= 'x <= 'temp but 'temp <= 'loop, which captures the problem pretty precisely. Because of the conflicting requirements, the buggy code is rejected. Note that this is a compile-time check and Rust code usually results to the same machine code as equivalent C++ code, so you need not pay a run-time cost for it.

Nevertheless this is a big feature to add to a language, and only works if all code uses it. the design of APIs is also affected (some designs that would be too dangerous in C++ become practical, others can't be made to play nice with lifetimes). Alas, that means it's not practical to add to C++ (or any language really) retroactively. In summary, the fault is on the inertia successful languages have and the fact that Bjarne in 1983 didn't have the crystal ball and foresight to incorporate the lessons of the last 30 years of research and C++ experience ;-)

Of course, that's not at all helpful in avoiding the problem in the future (unless you switch to Rust and never use C++ again). One could avoid longer expressions with multiple chained method calls (which is pretty limiting, and doesn't even remotely fix all lifetime troubles). Or one could try adopting a more disciplined ownership policy without compiler assistance: Document clearly that bar returns by value and that the result of B::a() must not outlive the B on which a() is invoked. When changing a function to return by value instead of a longer-lived reference, be conscious that this is a change of contract. Still error prone, but may speed up the process of identifying the cause when it does happen.

1
  • "Making temporaries life just long enough, and no longer, is extremely hard." I'm not seeing what needs to be hard. The least-surprising semantics, it seems to me, would be to make any temporaries created as subexpressions of range_expression live as long as the loop, period. This is simple and conservative, and is essentially the same behavior as when an expression is passed as an arg to a function, and it seems to work well. There's no need to raise the ambition to "precisely identify the cases where a lifetime extension is necessary" which turns it into an impossible problem.
    – Don Hatch
    Commented Mar 16, 2021 at 7:16
14

Can we solve this issue using C++ features?

C++11 has added member function ref-qualifiers, which allows restricting the value category of the class instance (expression) the member function can be called on. For example:

struct foo {
    void bar() & {} // lvalue-ref-qualified
};

foo& lvalue ();
foo  prvalue();

lvalue ().bar(); // OK
prvalue().bar(); // error

When calling the begin member function, we know that most likely we'll also need to call the end member function (or something like size, to get the size of the range). This requires that we operate on an lvalue, since we need to address it twice. You can therefore argue that these member functions should be lvalue-ref-qualified.

However, this might not solve the underlying issue: aliasing. The begin and end member function alias the object, or the resources managed by the object. If we replace begin and end by a single function range, we should provide one that can be called on rvalues:

struct foo {
    vector<int> arr;

    auto range() & // C++14 return type deduction for brevity
    { return std::make_pair(arr.begin(), arr.end()); }
};

for(auto const& e : foo().range()) // error

This might be a valid use case, but the above definition of range disallows it. Since we cannot address the temporary after the member function call, it might be more reasonable to return a container, i.e. an owning range:

struct foo {
    vector<int> arr;

    auto range() &
    { return std::make_pair(arr.begin(), arr.end()); }

    auto range() &&
    { return std::move(arr); }
};

for(auto const& e : foo().range()) // OK

Applying this to the OP's case, and slight code review

struct B {
    A m_a;
    A & a() { return m_a; }
};

This member function changes the value category of the expression: B() is a prvalue, but B().a() is an lvalue. On the other hand, B().m_a is an rvalue. So let's start by making this consistent. There are two ways to do this:

struct B {
    A m_a;
    A &  a() &  { return m_a; }

    A && a() && { return std::move(m_a); }
    // or
    A    a() && { return std::move(m_a); }
};

The second version, as said above, will fix the issue in the OP.

Additionally, we can restrict B's member functions:

struct A {
    // [...]

    int * begin() & { return &v[0]; }
    int * end  () & { return &v[3]; }

    int v[3];
};

This won't have any impact on the OP's code, since the result of the expression after the : in the range-based for loop is bound to a reference variable. And this variable (as an expression used to access its begin and end member functions) is an lvalue.

Of course, the question is whether or not the default rule should be "aliasing member functions on rvalues should return an object which owns all its resources, unless there's a good reason not to". The alias it returns can legally be used, but is dangerous in the way you're experiencing it: it cannot be used to extend the lifetime of its "parent" temporary:

// using the OP's definition of `struct B`,
// or version 1, `A && a() &&;`

A&&      a = B().a(); // bug: binds directly, dangling reference
A const& a = B().a(); // bug: same as above
A        a = B().a(); // OK

A&&      a = B().m_a; // OK: extends the lifetime of the temporary

In C++2a, I think you're supposed to work around this (or a similar) issue as follows:

for( B b = bar(); auto i : b.a() )

instead of the OP's

for( auto i : bar().a() )

The workaround manually specifies that the lifetime of b is the entire block of the for-loop.

Proposal which introduced this init-statement

Live Demo

1

Not the answer you're looking for? Browse other questions tagged or ask your own question.