9

For readability, I think the first code block below is better. But is the second code block faster?

First Block:

for (int i = 0; i < 5000; i++){
    int number = rand() % 10000 + 1;
    string fizzBuzz = GetStringFromFizzBuzzLogic(number);
}

Second Block:

int number;
string fizzBuzz;
for (int i = 0; i < 5000; i++){
    number = rand() % 10000 + 1;
    fizzBuzz = GetStringFromFizzBuzzLogic(number);
}

Does redeclaring variables in C++ cost anything?

4
  • "Does redeclaring variables in C++ cost anything?" That isn't a redeclaration? Did you mean non immediate assignment? Commented Feb 28, 2015 at 15:52
  • 1
    There is only one thing which might be different, considering any adequate optimizing compiler: The first example creates (in the function) and destroys (at the end of the loop) a string, the second creates a string at the beginning and destroys it at the end of the block, in addition to a move, and the same work as the first block. Commented Feb 28, 2015 at 15:59
  • This program doesn't actually do anything. Both versions should take zero time to run. Try producing some output then we can talk! Commented Feb 28, 2015 at 18:32
  • Any decent compiler will notice a calculation like this that is invariant and hoist it out of the loop. Your particular example however likely disappears entirely as it has no visible effect. Commented Feb 2, 2017 at 3:38

5 Answers 5

11

Any modern compiler will notice this and do the optimization work. When in doubt, always go for the readability. Declare variables in as inner-most scope as you can.

1
  • 2
    Any modern compiler will be able to optimize the second to the first form? Are you sure? Remember observable behavior must be preserved under the as-if-rule. Commented Feb 28, 2015 at 16:22
7

I benchmarked this particular code, and even WITHOUT optimisation, it came to almost the same runtime for both variants. And as soon as the lowest level of optimisation is turned on, the result is very close to identical (+/- a bit of noise in the time measurement).

Edit: below analysis of the generated assembler code shows that it's hard to guess which form is faster, since the answer most people would probably give is func2, but it turns out this function is a tiny bit slower, at least when compiling with clang++ and -O2. And it's good evidence that "writ code, benchmark, change code, benchmark" is the correct way to deal with performance, not guessing based on reading the code. And remember what someone told me, optimising is a bit like taking an onion apart in layers - once you optimise one part, you end up looking at something very similar just a little smaller... ;)

However, my initial analysis made func1 significantly slower - that turns out to be becuse the compiler, for some bizarr reason, doesn't optimise the rand() % 10000 + 1 in func1 but does in func2 when optimisation is turned of. This means that func1. However, once optimisation is enabled, both functions gets a "fast" modulo.

Using the linux performance tool perf shows that with clang++ and -O2 we get the following for func1

  15.76%  a.out    libc-2.20.so         free
  12.31%  a.out    libstdc++.so.6.0.20  std::string::_S_construct<char cons
  12.29%  a.out    libc-2.20.so         _int_malloc
  10.05%  a.out    a.out                func1
   7.26%  a.out    libc-2.20.so         __random
   6.36%  a.out    libc-2.20.so         malloc
   5.46%  a.out    libc-2.20.so         __random_r
   5.01%  a.out    libstdc++.so.6.0.20  std::basic_string<char, std::char_t
   4.83%  a.out    libstdc++.so.6.0.20  std::string::_Rep::_S_create
   4.01%  a.out    libc-2.20.so         strlen

and for func2:

  17.88%  a.out    libc-2.20.so         free
  10.73%  a.out    libc-2.20.so         _int_malloc                    
   9.77%  a.out    libc-2.20.so         malloc
   9.03%  a.out    a.out                func2                        
   7.63%  a.out    libstdc++.so.6.0.20  std::string::_S_construct<char con
   6.96%  a.out    libstdc++.so.6.0.20  std::string::_Rep::_S_create
   4.48%  a.out    libc-2.20.so         __random  
   4.39%  a.out    libc-2.20.so         __random_r
   4.10%  a.out    libc-2.20.so         strlen 

There are some subtle differences, but I would call those as being more to do with the relatively short runtime of the benchmark, rather than the difference in actual code generated by the compiler.

This is with the following code:

#include <iostream>
#include <string>
#include <cstdlib>

#define N 500000

extern std::string GetStringFromFizzBuzzLogic(int number);

void func1()
{
    for (int i = 0; i < N; i++){
        int number = rand() % 10000 + 1;
        std::string fizzBuzz = GetStringFromFizzBuzzLogic(number);
    }
}

void func2()
{
    int number;
    std::string fizzBuzz;
    for (int i = 0; i < N; i++){
        number = rand() % 10000 + 1;
        fizzBuzz = GetStringFromFizzBuzzLogic(number);
    }
}

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}

int main(int argc, char **argv)
{

    void (*f)();

    if (argc == 1)
    f = func1;
    else
    f = func2;

    for(int i = 0; i < 5; i++)
    {
        unsigned long long t1 = rdtsc();

        f();
        t1 = rdtsc() - t1;

        std::cout << "time=" << t1 << std::endl;
    }
}

and in a separate file:

#include <string>

std::string GetStringFromFizzBuzzLogic(int number)
{
    return "SomeString";
}

Running with func1:

./a.out
time=876016390
time=824149942
time=826812600
time=825266315
time=826151399

Running with func2:

./a.out
time=905721532
time=895393507
time=886537634
time=879836476
time=883887384

This is with another 0 added to N - so 10 times longer runtime - it seems that it's fairly consistently a little SLOWER, but it's a few percent, and probably within the noise, really - in time, the whole benchmark takes around 1.30-1.39 seconds.

Edit: Looking at the assembly code of the actual loop [this is only a portion of the loop, but the rest is identical in terms of what the code actutally does]

Func1:

.LBB0_1:                                # %for.body
    callq   rand
    movslq  %eax, %rcx
    imulq   $1759218605, %rcx, %rcx # imm = 0x68DB8BAD
    movq    %rcx, %rdx
    shrq    $63, %rdx
    sarq    $44, %rcx
    addl    %edx, %ecx
    imull   $10000, %ecx, %ecx      # imm = 0x2710
    negl    %ecx
    leal    1(%rax,%rcx), %esi
    movq    %r15, %rdi
    callq   _Z26GetStringFromFizzBuzzLogici
    movq    (%rsp), %rax
    leaq    -24(%rax), %rdi
    cmpq    %rbx, %rdi
    jne .LBB0_2
.LBB0_7:                                # %_ZNSsD2Ev.exit
    decl    %ebp
    jne .LBB0_1

Func2:

.LBB1_1:
    callq   rand
    movslq  %eax, %rcx
    imulq   $1759218605, %rcx, %rcx # imm = 0x68DB8BAD
    movq    %rcx, %rdx
    shrq    $63, %rdx
    sarq    $44, %rcx
    addl    %edx, %ecx
    imull   $10000, %ecx, %ecx      # imm = 0x2710
    negl    %ecx
    leal    1(%rax,%rcx), %esi
    movq    %rbx, %rdi
    callq   _Z26GetStringFromFizzBuzzLogici
    movq    %r14, %rdi
    movq    %rbx, %rsi
    callq   _ZNSs4swapERSs
    movq    (%rsp), %rax
    leaq    -24(%rax), %rdi
    cmpq    %r12, %rdi
    jne .LBB1_4
.LBB1_9:                                # %_ZNSsD2Ev.exit19
    incl    %ebp
    cmpl    $5000000, %ebp          # imm = 0x4C4B40

So, as can be seen, the func2 version contains an extra function call:

    callq   _ZNSs4swapERSs

which translates to std::basic_string<char, std::char_traits<char>, std::allocator<char> >::swap(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) or std::string::swap(std::string&) - which is presumably the result of calling std::string::operator=(std::string &s). This would explain why func2 is slightly slower than func1.

I'm sure it is possible to find cases where constructing/destroying an object takes significant amounts of time in a loop, but in general, it will make little or no difference at all, and having clearer code will actually help the reader. It will also often help the compiler with "life-time analysis", since it's less code to "walk" to find out if the variable is used later (in this case, the code is short anyway, but that's obviously not always the case in real life examples)

3
  • In this particular case, func1 is slightly faster than func2 (at least sometimes - I ran it several times, and there is definitely overlap between the slowest and fastest of each choice). I didn't check the assembler code carefully to see if there is any notable difference somewhere. Commented Feb 28, 2015 at 17:34
  • Awesome! Thanks! I appreciate everyone elses answers as well, but this is definitely the one that best helped me understand the answer to my question. green checkmark
    – Evorlor
    Commented Feb 28, 2015 at 17:35
  • 1
    I did some more edits, including showing WHY func2 is actually slower than func1 and clarifying that the only way to know for sure is to benchmark. Commented Feb 28, 2015 at 18:15
4

The 1st code block should be considered faster, since you don't have any overhead for calling the std::string default constructor once.

Actually you don't have a redeclaration of the variables in your 2nd code block. These are just plain assignment operations.

A redeclaration would actually mean you have something like this

int number;
string fizzBuzz;
for (int i = 0; i < 5000; i++){
    int number = rand() % 10000 + 1;
 // ^^^
    string fizzBuzz = GetStringFromFizzBuzzLogic(number);
 // ^^^^^^
}

In this case the overhead would be optimized out by the compiler, since the outer scope variables aren't used at all.

1
  • 1
    Are you sure it will be smart enough to limit the lifetime of the string to the first example under the as-if-rule? Even though the allocator might have observable behavior? Commented Feb 28, 2015 at 16:01
3

There is no such thing as a redeclaration in C++. In your second code snippet, number and fizzBuzz are only declared and initialised once. The = which follow later on are assignments.

As with all optimisation questions, you can only guess or preferably measure. And then of course it all entirely depends on your compiler and the settings you invoke it with. And of course, there can be a tradeoff between speed optimisation and space optimisation.

I know of no serious C++ programmer who would not prefer the first form, because it is easier to read and simply more concise.

Only if the program would be considered too slow and if there was measuring on which parts of the code cause the slowdown and if those measurements pointed to this loop, only then would they consider changing it.

However, as the others said, this is an unrealistic scenario. It is extremely unlikely that a modern compiler would treat the two snippets in a different way with regards to optimisation and that you would experience any measurable speed difference.

(edit: Sorry for the typo, had confused "first" and "second" there)

2
  • Who would not prefer the second block? Typo, or is the second block your suggestion?
    – Evorlor
    Commented Feb 28, 2015 at 16:04
  • @Evorlor: Typo. Fixed it. Thanks! Commented Feb 28, 2015 at 16:04
2

All declaring (value) variables does is increment the stack by the combined size of all the local variables in that function/method.

There may be a cost to calling constructors /destructors more than the optimal amount of times with object types (your string).

In this case there is no difference. The optimizer will give you the best solution anyway if using a decent compiler.

You might want the code to read in the optimal way so your peers don't think you write bad code!

Not the answer you're looking for? Browse other questions tagged or ask your own question.