-2

I know there is a similar question about this: constexpr performing worse at runtime.
But my case is a lot simpler than that one, and the answers were not enough for me. I'm just learning about constexpr in C++11 and a wrote a code to compare its efficiency, and for some reason, using constexpr makes my code run more than 4 times slower!
By the way, i'm using exactly the same example as in this site: https://www.embarcados.com.br/introducao-ao-cpp11/ (its in Portuguese but you can see the example code about constexpr). Already tried other expressions and the results are similar.

constexpr double divideC(double num){
    return (2.0 * num + 10.0) / 0.8;
}

#define SIZE 1000
int main(int argc, char const *argv[])
{
    // Get number of iterations from user
    unsigned long long count;
    cin >> count;
    
    double values[SIZE];

    // Testing normal expression
    clock_t time1 = clock();
    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8;
    }
    time1 = clock() - time1;
    cout << "Time1: " << float(time1)/float(CLOCKS_PER_SEC) << " seconds" << endl;
    
    // Testing constexpr
    clock_t time2 = clock();
    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = divideC( 3.0 );
    }
    time2 = clock() - time2;
    cout << "Time2: " << float(time2)/float(CLOCKS_PER_SEC) << " seconds" << endl;

    return 0;
}

Input given: 9999999999

Ouput:

> Time1: 5.768 seconds
> Time2: 27.259 seconds

Can someone tell me the reason of this? As constexpr calculations should run in compile time, it's supposed to run this code faster and not slower.

I'm using msbuild version 16.6.0.22303 to compile the Visual Studio project generated by the following CMake code:

cmake_minimum_required(VERSION 3.1.3)
project(C++11Tests)

add_executable(Cpp11Tests main.cpp)

set_property(TARGET Cpp11Tests PROPERTY CXX_STANDARD_REQUIRED ON)
set_property(TARGET Cpp11Tests PROPERTY CXX_STANDARD 11)
11
  • 3
    Do you compile with optimizations enabled? Commented Jul 12, 2020 at 11:53
  • 1
    Yes, please identify the compiler, version, Standard option, and full compile/link command lines. Commented Jul 12, 2020 at 11:55
  • 1
    ...and no, just "Visual Studio compiler" is far from sufficient. Commented Jul 12, 2020 at 11:57
  • 2
    There's no evidence that you're compiling with optimisations enabled, and if you're not, there is no meaning to measurements of performance. Commented Jul 12, 2020 at 12:04
  • 1
    Without optimizations, the compiler will keep the divideC call so it is slower. With optimizations on the compiler knows that everything related to values can be optimized away without any side-effects. So the shown code can never give any meaningful measurements between the difference of values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8; or values[i%SIZE] = divideC( 3.0 );
    – t.niese
    Commented Jul 12, 2020 at 12:10

1 Answer 1

2

Without optimizations, the compiler will keep the divideC call so it is slower.

With optimizations on any decent compiler knows that - for the given code - everything related to values can be optimized away without any side-effects. So the shown code can never give any meaningful measurements between the difference of values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8; or values[i%SIZE] = divideC( 3.0 );

With -O1 any decent compiler will create something this:

    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = (2.0 * 3.0 + 10.0) / 0.8;
    }

results in:

        mov     rdx, QWORD PTR [rsp+8]
        test    rdx, rdx
        je      .L2
        mov     eax, 0
.L3:
        add     eax, 1
        cmp     edx, eax
        jne     .L3
.L2:

and

    for (int i = 0; i < count; i++)
    {
        values[i%SIZE] = divideC( 3.0 );
    }

results in:

        mov     rdx, QWORD PTR [rsp+8]
        test    rdx, rdx
        je      .L4
        mov     eax, 0
.L5:
        add     eax, 1
        cmp     edx, eax
        jne     .L5
.L4:

So both will result in the identical machine code, only containing the counting of the loop and nothing else. So as soon as you turn on optimizations you will only measure the loop but nothing related to constexpr.

With -O2 even the loop is optimized away, and you would only measure:

    clock_t time1 = clock();
    time1 = clock() - time1;
    cout << "Time1: " << float(time1)/float(CLOCKS_PER_SEC) << " seconds" << endl;
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.