Some hyper-modern C compilers will infer that if a program will invoke Undefined Behavior when given certain inputs, such inputs will never be received. Consequently, any code which would be irrelevant unless such inputs are received may be eliminated.
As a simple example, given:
void foo(uint32_t);
uint32_t rotateleft(uint_t value, uint32_t amount)
{
return (value << amount) | (value >> (32-amount));
}
uint32_t blah(uint32_t x, uint32_t y)
{
if (y != 0) foo(y);
return rotateleft(x,y);
}
a compiler may infer that because evaluation of value >> (32-amount)
will yield Undefined Behavior when amount
is zero, function blah
will never be called with y
equal to zero; the call to foo
can thus be made unconditional.
From what I can tell, this philosophy seems to have caught hold sometime around 2010. The earliest evidence I've seen of its roots goes back to 2009, and it's been enshrined in the C11 standard which explicitly states that if Undefined Behavior occurs at any point in a program's execution, the behavior of the entire program retroactively becomes undefined.
Was the notion that compilers should attempt to use Undefined Behavior to justify reverse-causal optimizations (i.e. the Undefined Behavior in the rotateleft
function should cause the compiler to assume that blah
must have been called with a non-zero y
, whether or not anything would ever cause y
to hold a non-zero value) seriously advocated prior to 2009? When was such a thing first seriously proposed as an optimization technique?
[Addendum]
Some compilers have, even in the 20th Century, included options to enable certain kinds of inferences about loops and the values computed therein. For example, given
int i; int total=0;
for (i=n; i>=0; i--)
{
doSomething();
total += i*1000;
}
a compiler, even without the optional inferences, might rewrite it as:
int i; int total=0; int x1000;
for (i=n, x1000=n*1000; i>0; i--, x1000-=1000)
{
doSomething();
total += x1000;
}
since the behavior of that code would precisely match the original, even if the compiler specified that int
values always wrap in mod-65536 two's-complement fashion. The additional-inference option would let the compiler recognize that since i
and x1000
should cross zero at the same time, the former variable can be eliminated:
int total=0; int x1000;
for (x1000=n*1000; x1000 > 0; x1000-=1000)
{
doSomething();
total += x1000;
}
On a system where int
values wrapped mod 65536, an attempt to run either of the first two loops with n
equal to 33 would result in doSomething()
being invoked 33 times. The last loop, by contrast, wouldn't invoke doSomething()
at all, even though the first invocation of doSomething()
would have preceded any arithmetic overflow. Such a behavior might be considered "non-causal", but the effects are reasonably well constrained and there many cases where the behavior would be demonstrably harmless (in cases where a function is required to yield some value when given any input, but the value may be arbitrary if the input is invalid, having the loop finish faster when given an invalid value of n
would actually be beneficial). Further, compiler documentation tended to be apologetic for the fact that it would change the behavior of any programs--even those that engaged in UB.
I'm interested in when compiler writers' attitudes changed away from the idea that platforms should when practical document some usable behavioral constraints even in cases not mandated by the Standard, to the idea that any constructs which would rely upon any behaviors not mandated by the Standard should be branded illegitimate even if on most existing compilers it would work as well or better than any strictly-compliant code meeting the same requirements (often allowing optimizations which would not be possible in strictly-compliant code).
shape->Is2D()
to be invoked on an object that was not derived fromShape2D
. There is a huge difference between optimizing out code which would only be relevant if a critical Undefined Behavior has already happened versus code which would only be relevant in cases where...Shape2D::Is2D
is actually better than the program deserves.int prod(int x, int y) {return x*y;}
would have sufficed. Complying with "don't launch nukes" in strictly-compliant fashion, however, would require code that's harder to read and would almost certainly run much slower on many platforms.