57

Using GCC 6.3, the following C++ code:

#include <cmath>
#include <iostream>

void norm(double r, double i)
{
    double n = std::sqrt(r * r + i * i);
    std::cout << "norm = " << n;
}

generates the following x86-64 assembly:

norm(double, double):
        mulsd   %xmm1, %xmm1
        subq    $24, %rsp
        mulsd   %xmm0, %xmm0
        addsd   %xmm1, %xmm0
        pxor    %xmm1, %xmm1
        ucomisd %xmm0, %xmm1
        sqrtsd  %xmm0, %xmm2
        movsd   %xmm2, 8(%rsp)
        jbe     .L2
        call    sqrt
.L2:
        movl    std::cout, %edi
        movl    $7, %edx
        movl    $.LC1, %esi
        call    std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
        movsd   8(%rsp), %xmm0
        movl    std::cout, %edi
        addq    $24, %rsp
        jmp     std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)

For the call to std::sqrt, GCC first does it using sqrtsd and saves the result on to the stack. If it overflows, it then calls the libc sqrt function. But it never saves the xmm0 after that and before the second call to operator<<, it restores the value from the stack (because xmm0 was lost with the first call to operator<<).

With a simpler std::cout << n;, it's even more obvious:

subq    $24, %rsp
movsd   %xmm1, 8(%rsp)
call    sqrt
movsd   8(%rsp), %xmm1
movl    std::cout, %edi
addq    $24, %rsp
movapd  %xmm1, %xmm0
jmp     std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)

Why is GCC not using the xmm0 value computed by libc sqrt?

2
  • 11
    This is actually a really cool trick they implemented, we finally get the performance of single assembly instructions for calculating transcendental functions in the common case without having to use -fno-math-errno and similar. Commented Apr 9, 2017 at 13:13
  • 1
    sqrt is actually algebraic, not transcendental
    – jdh8
    Commented Oct 28, 2018 at 15:23

1 Answer 1

77

It doesn't need to call sqrt to compute the result; it's already been calculated by the SQRTSD instruction. It calls sqrt to generate the required behaviour according to the standard when a negative number is passed to sqrt (for example, set errno and/or raise a floating-point exception). The PXOR, UCOMISD, and JBE instructions test whether the argument is less than 0 and skip the call to sqrt if this isn't true.

9
  • 12
    @Benoît Like I said, it doesn't need the result of the sqrt. It's not calling sqrt to obtain the result. It's calling sqrt purely for its side effects, its error handling when the argument to sqrt is less than 0.
    – Ross Ridge
    Commented Apr 9, 2017 at 5:27
  • 2
    Which side effects ? The only I would think of would be setting errno.
    – Benoît
    Commented Apr 9, 2017 at 5:30
  • 13
    @Benoît Isn't that enough? In C++11 it can also (or instead) generate an FE_INVALID floating-point exception. The compiler is simply leaving it up to the library implementation to handle this case.
    – Ross Ridge
    Commented Apr 9, 2017 at 5:35
  • 6
    @TheTechel: In the cases where the argument >= 0, std::sqrt won't be called because it gets jumped over and the sqrtsd assembly instruction does the work. In the other cases (argument < 0), the sqrtsd instruction will be jumped over and std::sqrt will be called (with might or might not try to calculate the root of a negative number). So there will be at most one calculation.
    – hoffmale
    Commented Apr 9, 2017 at 9:37
  • 2
    @hoffmale Eh? Where do you see in the question that the sqrtsd instructions gets jumped over? It looks to me like that one executes unconditionally, it's only the call to sqrt that gets skipped. Which is still enough when optimising for the common case.
    – user743382
    Commented Apr 9, 2017 at 10:44

Not the answer you're looking for? Browse other questions tagged or ask your own question.