32

Please take a look at this post: Is there a performance difference between i++ and ++i in C?

There are two essential statements in the answer:

  1. Modern compiler produce the same machine code no matter whether i++ or ++i is used.
  2. On old compilers i++ might be slower because it has to remember the old value.

I personally verified statement 1 and can confirm it.

But I have a problem with statement 2 as I cannot find any example where the old value of i is used and i++ can be replaced by ++i. To make it clear here is an example:

printf("the value of i is %i\n", i++);

I cannot simply replace i++ with ++i as it has a different return value. I can only imagine examples where the return value is not used. For example:

while (i < 10)
{
  i++;
}

I now can replace i++ with ++i. But does this make it faster? Did any compiler ever save the old value of a variable which is then never used (the old value)?

Can anyone provide a code example and the disassembly of the compilation where indeed using ++i instead of i++ makes it faster? Using an old compiler of course.

I hope you understand this is a historical retro question as probably all compilers optimize this for decades.

17
  • 11
    using post incrementation on pointers is faster on old CPUs like 68000 because compiler can do move.b (a0)+,d0. With pre-incrementation it has to add 1 to address register first. This is verified by Amiga SAS/C Commented Jun 7, 2023 at 19:46
  • 14
    Re, "Modern compilers produce the same machine code..." But ++i and i++ are supposed to do different things!! Perhaps you meant to say that modern compilers produce the same code for an entire statement consisting of nothing but the expression ++i or i++. Commented Jun 7, 2023 at 21:37
  • 19
    Not retro, but TCC, the Tiny C Compiler does produce different code for plain isolated ++i vs i++, with an extra redundant mov for i++ to "remember the old value". Try on godbolt. Granted, TCC is what you might call an "aggressively non-optimizing" compiler, but it does show that it's something a primitive compiler might plausibly do. Commented Jun 7, 2023 at 23:44
  • 2
    -1 because “This question does not show any research effort; it is unclear or not useful”. OP did not (or doesn' show any hint of) look into godbolt or CPU ISA, mixes compilers and languages, misquotes and truncates original quote (which is said to be the basis of this question), and no, this is not a retro-computing question.
    – Olivier
    Commented Jun 8, 2023 at 7:16
  • 5
    Well the linked answer has a comment by Mark Hanson which answers your question, doesn’t it? “ The only compiler I found that didn't optimize the i++ case […] was the Software Toolworks C80 compiler by Walt Bilofsky. That compiler was for Intel 8080 CP/M systems.”
    – Mormegil
    Commented Jun 8, 2023 at 9:40

11 Answers 11

23

I would answer "yes" on the basis that C compilers are (or at least appear) simple enough that plenty of amateurs and students will have a go at writing one, and at least some of them will screw this up. Proving this requires naming a concrete example, but since this would be a rather tedious and unrewarding research project involving seeking-out terrible compilers and getting them up and running and inspecting their output, I'm going to pass on that.

The quality of compilers improved substantially roughly 30 years ago due to a couple of factors: The first is fairly boring in that that computers just got faster and had more memory so could perform optimisations which were prohibitively-expensive or impossible on earlier machines. The second is that there were major advances in computer science such as SSA Form which enabled much more rigorous and exhaustive analysis of code flow which previously tended to be done in a somewhat ad hoc manner. In particular, it would be noticed that the value returned by i++/++i is never used, and so the code producing the value (but not the code performing the increment, unless it turned out that i was also unused) would be deleted. So that's why "Modern compiler produce the same machine code no matter whether i++ or ++i is used".

(If your compiler textbook is from the 1990s or earlier, it's time to retire it and get the current edition.)

Very early and primitive compilers would pretty much emit precanned assembler fragments while parsing, and rely on a peephole optimiser to eliminate the more obvious redundant instructions. i++ might for example result in an instruction to copy i into the register designated for holding the result of expression evaluation, and another which increments i. The peephole optimiser may well notice that the result is immediately overwritten by the next bit of code and thus clearly unused and the instruction can be eliminated. Or it might not. The only good thing to say about this design of compiler is that it can work in a single pass without loading the whole program in memory, and so run on appallingly-small machines. Original ANSI C syntax is that way pretty much so that this sort of compiler can be used on it.

4
  • 23
    I feel that the phrase "screw this up" in your first paragraph is inappropriate since it's about applying an optimization; choosing to leave it out is a defensible position for the reasons you go into, and since it does not affect the correctitude of the compiler's output. It is strictly about runtime characteristics only. Commented Jun 8, 2023 at 10:03
  • 5
    A good compiler will only perform optimizing transforms if the author has, at minimum, made reasonable effort to fully analyzed all possible consequences, and confirmed that it behaves soundly in all of them. If a compiler's author hasn't put in such effort, declining to perform such a transform is not a defect. Unfortunately, some compiler writers view a failure to perform any transform that can't be proven unsafe is a defect.
    – supercat
    Commented Jun 8, 2023 at 14:45
  • 3
    Compilers written by students don't count. They never were public nor being sold.
    – zomega
    Commented Jun 8, 2023 at 15:22
  • 3
    @zomega: Nate Eldredge commented on the question about TCC, the Tiny C Compiler, making different asm. It's designed to compile fast, in one pass, making terribly inefficient asm. But most code shouldn't care about being efficient with anti-optimizing compilers (TCC or gcc -O0), at least not if it makes the source any less easy to read. (How can I optimize these loops (with compiler optimization disabled)?) Commented Jun 8, 2023 at 18:15
14

A bit of tinkering in turns up a counter-example to the first statement.

The below code in produces generates the below assembly when compiled with current GCC for x64 (https://godbolt.org/z/9naWMrjf3) without optimization enabled:

_Atomic int v;
void pre(void) { ++v; }
void post(void) { v++; }


pre:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-8], 1
        mov     eax, DWORD PTR [rbp-8]
        mov     edx, eax
        mov     eax, edx
        lock xadd       DWORD PTR v[rip], eax
        add     eax, edx
        mov     DWORD PTR [rbp-4], eax
        nop
        pop     rbp
        ret
post:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-8], 1
        mov     eax, DWORD PTR [rbp-8]
        lock xadd       DWORD PTR v[rip], eax
        mov     DWORD PTR [rbp-4], eax
        nop
        pop     rbp
        ret

The postfix version uses the initial value yielded as the result of an x86 exchange-add atomic instruction without re-computing the result.

My best guess is that some version of some legacy compiler with optimization disabled would make the assumption that the result of a post-increment may be used and generate different. However, I have no counter-example to back this up.

4
  • 3
    i++ faster than ++i in this example. Funny.
    – zomega
    Commented Jun 8, 2023 at 13:49
  • 6
    @zomega: Note that this is with optimization disabled, so performance is mostly irrelevant in that case. GCC knows that if the result isn't used, it can optimize std::atomic::fetch_add into just an atomic increment that doesn't return the old value, like lock inc dword ptr [rip + v]. godbolt.org/z/eqnG1crnW shows older MSVC misses the optimization from lock xadd to lock inc or lock add, but with optimization enabled it still doesn't spend extra instructions afterward to compute the unused value of the ++v expression. So both ways still compile the same. Commented Jun 8, 2023 at 18:28
  • 3
    That's odd, GCC is storing the ++v or v++ result to the stack (in the red-zone below RSP), with mov [rbp-4], eax. And also storing/reloading a 1. This seems to be an artifact of how it internally handles _Atomic, maybe mapping them to a call to an always_inline helper function with args? If we use GNU C's native __atomic_add_fetch and fetch_add builtins on a plain int, we just get lock add DWORD PTR w[rip], 1 for both, even at -O0 ( godbolt.org/z/x3zvvG486) Commented Jun 8, 2023 at 18:37
  • 1
    Related: Why is this C++ wrapper class not being inlined away? shows how GCC compiles a call to an __attribute__((always_inline)) wrapper function you define yourself, storing / reloading so the function-arg and return-value objects actually exist like they do in the abstract machine. The code-gen here for _Atomic looks like what you'd get from an unoptimized inlined call to a function doing either add_fetch or fetch_add. (x86's lock xadd implements fetch_add, leaving the register holding the old value from memory unmodified.) Commented Jun 8, 2023 at 18:40
10

I now can replace i++ with ++i. But does this make it faster?

No, as in the form used both are to be reduced as a simple increment (sequence). There are no other operations done with either old or new value.

The situation may be different when used in an expression like a[i++] = j[i] vs. a[++i] = j[i]. Especially on CPUs that only implement one of those addressing modes native.

Can anyone provide a code example and the disassembly of the compilation where indeed using ++i instead of i++ makes it faster?

This still would only resolve do differences in CPU architecture, wouldn't it?

I hope you understand this is a historical retro question as probably all compilers optimize this for decades.

Not really, as this is a generic language question, not really a historic one unless you can tie it to a certain implementation or compiler strategy of a certain time. Just asking a qestion in 'has there been form' doesn't make it a historic one.

3
  • 5
    Yes, UB always changes the picture. Commented Jun 8, 2023 at 8:21
  • 4
    "Especially on CPUs that only implement one of those addressing modes native" This really is a huge factor in this discussion. The underlying CPU architecture and available assembly instructions play a big part here, perhaps more so than the compiler itself.
    – bta
    Commented Jun 8, 2023 at 23:42
  • 1
    I believe the intention was for semantically equivalent uses of i++ and ++i. For example: sum+= a[i++] vs. sum += a[i]; ++i. And the answer is probably still that some historic compiler generated a post-increment access from the former but not the latter.
    – Thief
    Commented Jun 9, 2023 at 6:19
10

Yes, but these are not semantically equivalent, of course, so we're not comparing apples to apples as it were.

The HP/1000 is effectively an accumulator machine; its C compiler in the early 1980's would generate code sequences such as:

  • i++: load i, add 1, store i, subtract 1.

Whereas the other

  • ++i: load i, add 1, store i, then use as is

Delaying the increment for after the expression's use would most certainly mitigate the extra subtraction, however, that takes some compiler work, and, a new load/add/store sequence would generally be needed, so not necessarily even getting the benefit, being a wash unless the original i was still in one of the two accumulators.

3
  • 2
    Did that compiler still waste a sub if the value wasn't used, like in for(int i=0 ; i<i ; i++)? I'd hope not. The OP said they were asking about cases where ++i is a drop-in replacement for i++ not creating an off-by-one error. Commented Jun 8, 2023 at 18:24
  • 1
    @PeterCordes, I don't recall, sorry. Would have been easy for the compiler to remove that in the case the value is unused.
    – Erik Eidt
    Commented Jun 8, 2023 at 19:16
  • @ErikEidt I think that is the point of the question. Yes, the compiler could have removed it, but also, compilers were more primitive the longer ago you look, so maybe they didn't. Commented Jun 9, 2023 at 12:32
8

Did any compiler ever save the old value of a variable which is then never used (the old value)?

For the C code

int i;
foo() { ++i; }
bar() { i++; }

the PDP-11 Unix V6 C compiler produces:

.globl  _i
.comm   _i,2
.globl  _foo
.text
_foo:
~~foo:
jsr     r5,csv
inc     _i
L1:jmp  cret
.globl  _bar
.text
_bar:
~~bar:
jsr     r5,csv
inc     _i
L2:jmp  cret
.globl
.data

The code is identical.

8
  • 19
    You can't prove a claim of non-existence with a single non-example. Commented Jun 8, 2023 at 6:22
  • 8
    @user3840170 I was not trying to; this is not math.SE. The implication is: if a very early compiler was able to avoid unnecessary instructions in this particular case, then writers of subsequent C compilers intended for public use would strive to be no worse than that one.
    – Leo B.
    Commented Jun 8, 2023 at 6:44
  • 6
    @LeoB. It does not prove anything. Take another compiler for another CPU architecture and it may work differently because the architecture and compiler are different.
    – Justme
    Commented Jun 8, 2023 at 7:40
  • 2
    Yeah, just see Nate's comment on the question, even one compiler of these days generates different code. Commented Jun 8, 2023 at 10:53
  • 6
    This isn't just any C compiler -- in many ways it's THE compiler, the first one to be widely used. In that respect this is the definitive answer from a retrocomputing point of view. Commented Jun 9, 2023 at 9:47
8

This was definitely the case for the Consulair C compiler for MacOS that I used for a few years in the mid 1980s.

The 68000 processor has post-increment instructions, allowing you to calculate *p++ with a single instruction (read from pointer p with post increment), so reading *p++ should have been faster than reading *++p. However, the compiler wasn't smart enough for that. It translated *p++ literally into (p += 1, *(p-1)), producing one add instruction, plus one memory access with a non-zero offset to access p[-1], while *++p was an add instruction, plus one memory access with 0 offset, which was shorter and faster.

All made more annoying by the fact that *p++ should have been faster.

7

It depends on the way the optimizer works.

Basically, everything that uses SSA form internally will generate the same code, because both ++i and i++ generate a new register for the incremented value and subsequent code will use that register to refer to i until it is overwritten again, and the only difference is that when the value of the expression is used, it refers to the "new" register in one case and to the "old" register in the other -- so when the value is unused, there is no place where the register number is inserted, so there is no code that can be different.

The only compilers that could generate different code are those that do not use SSA as an intermediate form. Those will likely be bootstrapping compilers, where performance of generated code is largely irrelevant.

4
  • 1
    this isn't true at all. MSVC has just moved to SSA quite recently and in the past its optimizer has always been able to optimize this
    – phuclv
    Commented Jun 8, 2023 at 15:39
  • 4
    @phuclv, the answer doesn't seem to say that only SSA compilers would be able to optimize it, just that they'll do it for sure, while others might not.
    – ilkkachu
    Commented Jun 8, 2023 at 16:22
  • The question asks about any compiler ever. I find it hard to believe that your average C compiler back when C was new used SSA. More likely it translated things straightforwardly to assembly code. Remember that post-increment was added to the language because there was an assembly instruction for it. Commented Jun 9, 2023 at 12:33
  • @user253751: I think the latter point is a myth; while the PDP-11 featured a post-increment addressing mode, that could have been more readily supported using a special operator (perhaps notated *+) which would dereference and increment a pointer than by having a general-purpose post-increment operator that works on integers as well as pointers.
    – supercat
    Commented Jun 9, 2023 at 19:21
4

I used to be a compiler writer, and had the opportunity to test this code on many CPUs, operating systems, and compilers. The only compiler I found that didn't optimize the i++ case (in fact, the compiler that brought this to my attention professionally) was the Software Toolworks C80 compiler by Walt Bilofsky. That compiler was for Intel 8080 CP/M systems. It's safe to say that any compiler which does not include this optimization is not one meant for general use

Is there a performance difference between i++ and ++i in C?


As commented, TCC is another example

TCC performs only few optimizations, the code is generated in a single pass and every C statement is compiled on its own. It seems that the compiler is almost as simple as possible so it does not store any state during the compilation if it is not necessary.

Why is the ecx register used during an inc instruction

But since TCC is "relatively new" we can consider its dead ancestor OTCC retro

1
  • 2
    CC65 is a compiler where x++ is actually faster that ++x if x is an int. Curiously, at least with default compilation options, both ++x and x++ generate redundant loads if x is an unsigned char, and even generate redundant sign extension code if x is signed char.
    – supercat
    Commented Jun 8, 2023 at 16:34
1

The knee jerk to go to ++i might be there just to avoid confusion. For any given person one of those is going to feel like it returns the correct value and the other like it doesn't. A mathematical operator that doesn't apply before assignment is exceptional in C, so I could see trying to avoid the i++ case. It's like trying to intentionally put the constant on the left side of ==, it reduces the chance of a difficult-to-see mistake. In other words it may increase programmer performance.

1
  • 1
    Converting "is x equal to 3" (how most people would describe a test) into "is 3 equal to x" (which no-one ever says) when writing code surely reduces the performance of programmers by making the code not reflect the thought.
    – dave
    Commented Jun 18, 2023 at 19:55
0

For the examples that you have shown you are correct that there is no difference in speed if i is an integer, char, or other basic type, and the result is not being used. If you use the result on the same line then the produced code will be different: int a = ++i produces different code than int a = i++.

However

A better question is, what if i is not an int, char, or other basic type? But has a prefix/postfix incrementor defined? What does the compiler do then? It would be nice if the complier was smart enough to swap out the post-fix operation with the prefix if the use case is identical to what you showed... Sadly, the compiler makers are smart enough to know that it is possible for the post-fix incrementor to do something different then the pre-fix incrementor for user defined types. So it is not allowed to swap them out even though the i++ might include an unneeded, and computationally wasteful copy.

My answer is For basic types, the pre/postfix does not matter. But if you use custom classes with overloaded pre/postfix operators then prefix will be faster then postfix, so it is a good habit to use ++i instead of i++. I think optimization levels might make a difference, I believe that o4 will intelligently change i++ to ++i, but I am not sure at this moment.

0

I had an interview phone call with Oracle/London. While discussing, I asked the interviewer "what is the difference between ++i and i++" ? A month later, he sent me an email. Yes, one of them needs to store the value (in a register, stack, etc). So, it depends on processor architecture, compiler implementation and usage. If we have a tiny processor, with few registers and resources and want use that in a loop for millions of times ... better think about it. I do not see that kind of requirements now.

2
  • 1
    See the C version of the as-if rule stackoverflow.com/questions/15718262/…
    – Sebastian
    Commented Jun 15, 2023 at 22:28
  • 2
    This does not answer the asked question if there has ever been a compiler where ++i was faster than i++. Do you know a compiler?
    – Justme
    Commented Jun 16, 2023 at 8:33

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .