8

Is the following undefined behaviour?

 union {
   int foo;
   float bar;
 } baz;

 baz.foo = 3.14 * baz.bar;

I remember that writing and reading from the same underlying memory between two sequence points is UB, but I am not certain.

5
  • Evaluation is unordered, but not the side effects, which are ordered in C++11.
    – Kerrek SB
    Commented Oct 22, 2015 at 21:34
  • 4
    Which language do you want an answer for? Commented Oct 22, 2015 at 21:38
  • @AlanStokes: C and C++, as tagged :D Commented Oct 22, 2015 at 21:40
  • 5
    Please be clear whether you want a C or C++ answer. Not both, for otherwise this question is too broad.
    – Walter
    Commented Oct 22, 2015 at 22:16
  • @Walter Why does the SO allow both tags then?
    – curiousguy
    Commented Oct 28, 2015 at 12:48

4 Answers 4

6

I remember that writing and reading from the same underlying memory between two sequence points is UB, but I am not certain.

Reading and writing to the same memory location in the same expression does not invoke undefined behavior until and unless that location is modified more than once between two sequence points or the side effect is unsequenced relative to the value computation using the value at the same location.

C11: 6.5 Expressions:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]

The expression

 baz.foo = 3.14 * baz.bar;  

has well defined behaviour if bar is initialized before. The reason is that the side effect to baz.foo is sequenced relative to the value computations of the objects baz.foo and baz.bar.

C11: 6.5.16/3 Assignment operators:

[...] The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

20
  • 1
    An unsequenced read and write of the same memory location is UB just like two unsequenced writes.
    – Brian Bi
    Commented Oct 22, 2015 at 21:37
  • 3
    @melpomene, "unsequenced" is a defined term in the C2011 standard. A complete definition would probably be inappropriately large for this venue, but I encourage you to read what the standard itself says about it, and about the "sequenced before" relation. Commented Oct 22, 2015 at 21:50
  • 3
    @melpomene; Read this to know everything about sequence before and unsequenced. Download n1570 pdf from here.
    – haccks
    Commented Oct 22, 2015 at 21:56
  • 2
    @haccks, you've omitted mention of the key provision of the standard relevant to this question: "The side effect of updating the stored value of the left operand [of an assignment operator] is sequenced after the value computations of the left and right operands" (from C11 6.5.16/3). Absent that, or some other provision having the same effect, the provision you quoted would hold that the behavior is undefined. Commented Oct 22, 2015 at 22:03
  • 1
    I think historically there was a requirement that an object may only be read and written in the same expression if both were accessed the same way. Certainly there are many machines where such a rule would enable useful optimizations (e.g. on an 8x51 clone with two data pointers, given "uint32_t *foo,*bar;" copying four bytes from "foo" to "bar" would be most efficiently implemented by copying the first byte of foo to the first byte of bar, then the second byte, third, and fourth, but that could malfunction if they overlap.)
    – supercat
    Commented Oct 30, 2015 at 20:48
4

Disclaimer: This answer addresses C++.

You're accessing an object whose lifetime hasn't begun yet - baz.bar - which induces UB by [basic.life]/(6.1).

Assuming bar has been brought to life (e.g. by initializing it), your code is fine; before the assignment, foo need not be alive as no operation is performed that depends on its value, and during it, the active member is changed by reusing the memory and effectively initializing it. The current rules aren't clear about the latter; see CWG #1116. However, the status quo is that such assignments are indeed setting the target member as active (=alive).

Note that the assignment is sequenced (i.e. guaranteed to happen) after the value computation of the operands - see [expr.ass]/1.

12
  • @Barry It means the latter. See CWG 556.
    – Columbo
    Commented Oct 22, 2015 at 21:57
  • Currently I do not have C++ standard copy to check for more detail but I have some doubt on your your explanation.
    – haccks
    Commented Oct 22, 2015 at 21:58
  • 1
    @Columbo But then that's weird right? u.a = u.b is undefined, but u.a = B(u.b) is fine?
    – Barry
    Commented Oct 22, 2015 at 21:59
  • @Barry It is weird, but it's the intention of the wording AFAICS ("instead of being a general statement about aliasing, it's describing the situation in which the source of the value being assigned is storage that overlaps the storage of the target object"). The target object is a temporary of type float, but that temporary's storage does certainly not overlap baz.foos. Then again, perhaps the committee was not precise enough in wording their note, and they actually did mean that u.a=f(u.b) is not defined. Eitherway, lifetime rules are a mess.
    – Columbo
    Commented Oct 22, 2015 at 22:06
  • 1
    The behavior is definitely defined in C, supposing baz.bar has been initialized and baz.foo has not subsequently been written to. Given the C / C++ reconciliation efforts in the 2011 versions of the standards, I would be very surprised to find that the same code has undefined behavior in C++. Commented Oct 22, 2015 at 22:09
2

Answering for C, not C++

I thought this was Defined Behavior, but then I read the following paragraph from ISO C2x (which I guess is also present in older C standards, but didn't check):

6.5.16.1/3 (Assignment operators::Simple Assignment::Semantics):

If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined.

So, let's consider the following:

union {
    int        a;
    const int  b;
} db;

union {
    int    a;
    float  b;
} ub1;

union {
    uint32_t  a;
    int32_t   b;
} ub2;

Then, it is Defined Behavior to do:

db.a = db.b + 1;

But it is Undefined Behavior to do:

ub1.a = ub1.b + 1;

or

ub2.a = ub2.b + 1;

The definition of compatible types is in 6.2.7/1 (Compatible type and composite type). See also: __builtin_types_compatible_p().

16
  • Although, does + create a new object such that makes it defined again? I.e., ub1.a = ub1.b; is undefined behavior for sure, but is it ub1.a = ub1.b + 1;? I got a warning from that code, but I'm not convinced. Commented Jun 17, 2022 at 6:30
  • And that also triggers the question: Is ub1.a = ub1.b + 0; defined?! Commented Jun 17, 2022 at 7:04
  • 1
    So far as I am aware, nothing in the Standard implies that the built-in operators would produce copies of their operands, and on many 8-bit or 16-bit platforms, having long1 = long2 + 1; perform an intermediate copy would substantially increase code size and execution time.
    – supercat
    Commented Jun 17, 2022 at 15:00
  • 1
    Incidentally, on some 8-bit platforms, the fastest way to perform long1 = long2+long3; is to process the code as though it were long1 = long2; long1+=long3; or long1 = long3; long1+=long2;, but the first substitution is only valid if long1 and long3 are known to identify disjoint regions of storage, and the second is only valid if long1 and long2 are distinct.
    – supercat
    Commented Jun 17, 2022 at 17:57
  • 1
    I upvoted for the obvious effort that went into the answer - but I think it is wrong. ub1.a and ub1.b are not alive at the same time, so there are no two objects which overlap, which makes the clause not applicable. Together with 6.5.16/3 (side effect of updating after read) this seems to make all three cases legal. Commented Jun 20, 2022 at 10:17
0

The Standard uses the phrase "Undefined Behavior", among other things, as a catch-all for situations where many implementations would process a construct in at least somewhat predictable fashion (e.g. yielding a not-necessarily-predictable value without side effects), but where the authors of the Standard thought it impractical to try to anticipate everything that implementations might do. It wasn't intended as an invitation for implementations to behave gratuitously nonsensically, nor as an indication that code was erroneous (the phrase "non-portable or erroneous" was very much intended to include constructs that might fail on some machines, but would be correct on code which was not intended to be suitable for use with those machines).

On some platforms like the 8051, if a compiler were given a construct like someInt16 += *someUnsignedCharPtr << 4; the most efficient way to process it if it didn't have to accommodate the possibility that the pointer might point to the lower byte of someInt16 would be to fetch *someUnsignedCharPtr, shift it left four bits, add it to the LSB of someInt16 (capturing the carry), reload *someUnsignedCharPtr, shift it right four bits, and add it along with the earlier carry to the MSB of someInt16. Loading the value from *someUnsignedCharPtr twice would be faster than loading it, storing its value to a temporary location before doing the shift, and then having to load its value from that temporary location. If, however, someUnsignedCharPtr were to point to the lower byte of someInt16, then the modification of that lower byte before the second load of someUnsignedCharPtr would corrupt the upper bits of that byte which would, after shifing, be added to the upper byte of someInt16.

The Standard would allow a compiler to generate such code, even though character pointers are exempt from aliasing rules, because it does not require that compilers handle all situations where unsequenced reads and writes affect regions of storage that partially overlap. If such accesses were performed usinng a union instead of a character pointer, a compiler might recognize that the character-type access would always overlap the least significant byte of the 16-bit value, but I don't think the authors of the Standard wanted to require that compilers invest the time and effort that might be necessary to handle such obscure cases meaningfully.

Not the answer you're looking for? Browse other questions tagged or ask your own question.