6

Some software (often performance-oriented, e.g. Linux kernel, DPDK) has C helpers for influencing branch prediction.

I have an absolutely simple code snippet (suppose I know the percantage of a > b) to represent the problem of conditions nesting and applying likely/unlikely when some logic is nested:

bool foo()
{
    foo1(1);
    foo2(2);

    /* if (unlikely(a > b)) */
    /* if (a > b)*/
    {
        puts("Ohhh!!! Rare case");
        return true;
    }
    return false;
}

int main(void)
{
    /* if (unlikely(foo())) */
    /* if (foo()) */
    {
        puts("Azaza");
    }
}

So which 2 lines should be uncomented for more performance from a theoretical point of view?

Obviously there are 3 ways to assist compilier with branch prediction:

1. if (unlikely(a > b)) ... if (unlikely(foo()))

2. if (a > b) ... if (unlikely(foo()))

3. if (unlikely(a > b)) ... if (foo())

Which is theoretically the most efficient and why?

13
  • 1
    @TedLyngmo Tsyvarev is absolutely right! The question is about how to do it in case of nesting. Updated question
    – NK-cell
    Commented Jul 6, 2023 at 12:51
  • @Tsyvarev You're absolutely right
    – NK-cell
    Commented Jul 6, 2023 at 12:56
  • 1
    Perfect. My question is removed.
    – Ted Lyngmo
    Commented Jul 6, 2023 at 13:02
  • 2
    In this case, foo should be written as simply return a > b;, without any branches. If you have more code than just return in the if/else then it's fine, but in that case of course the likely should be in foo.
    – interjay
    Commented Jul 6, 2023 at 13:11
  • 2
    @interjay It is more logical to assume that first of all it should be in main(), IMHO it's better to cut off the wrong branch of execution earlier.
    – NK-cell
    Commented Jul 6, 2023 at 15:35

1 Answer 1

0

To my knowledge, the likely/unlikely statements show the best effect if the conditional variable is not in the cache. Let me explain in more detail.

In your code, the processor has to execute foo anyway. So the likely won't have any strong effect here, since no code path can be skipped during speculative execution. The function must be executed. Let's assume you save the result of foo before in a variable and the code looks like this:

int x = foo();
if (likely(x))
{
    puts("Azaza");
}

In this case the likely(x) will probably only influence the prefetcher/decoder, since the processor just computed x and the value is most likely cached in L1 (except it got interrupted at that point). However, for a precise answer one has to know the microarchitecture extremely well, since it might be possible that current CPUs are so advanced that they can fetch and decode both branches at once.

Now let's assume you have a global variable volatile int c = 15 and we change your code:

if (likely(b == 15))
{
    puts("Azaza");
} else {
    puts("Bzaza");
}

When we execute the code and b is accessed for the first time it won't be in the cache and the processor has to fetch it from memory. This costs multiple hundred CPU-cycles and instead of stalling the CPU starts executing code speculatively without knowing the value of b. In that case, the likely keyword indicates that we should execute the first branch. Note that the executed instructions are not visible to the outside world at that time. Modern x86 processors can execute up to 400 micro-ops speculatively and only commit the result once the prediction turned out to be true.

So to answer your question I would put the likely/unlikely keyword around the a > b.

9
  • ... or add the keyword around both branches (assuming no compiler optimization for the simple code) Commented Jul 27, 2023 at 11:49
  • 1
    The Q&A linked by the OP has a mostly obsolete top answer. See comments on it like mine and BeeOnRope's: How do the likely/unlikely macros in the Linux kernel work and what is their benefit? , and Ciro's answer: likely/unlikely influences branch layout, and the decision to make branchy vs. branchless asm if that's an option. But nothing can hint the actual CPU hardware branch prediction about which is more likely. (On modern x86 and AArch64). Commented Jul 27, 2023 at 17:15
  • 1
    This answer would make sense for PowerPC and some MIPS where machine code can contain branch hints for the CPU. And Pentium 4, the only x86 microarchitecture to support branch hints. (Is it possible to tell the branch predictor how likely it is to follow the branch?). Older x86 like P4 and P6 may also have static prediction influenced by branch layout, but since Sandybridge or Haswell (IT-TAGE predictors) no. Why did Intel change the static branch prediction mechanism over these years? Commented Jul 27, 2023 at 17:19
  • 1
    I just looked up the spec for the A-65 (developer.arm.com/documentation/100439/latest), p 78 indicates that static branch prediction is still used, so it might be possible that my assumption was correct in the case no dynamic information is available. Commented Jul 28, 2023 at 7:37
  • 1
    Yes, true, some non-x86 CPUs may use the classic BTNF (backward taken, forward not-taken) as a fallback when a dynamic prediction isn't available. I think IT-TAGE (in x86, found in Haswell / Zen 2 and later) will always make a direction prediction for conditional direct branches, though; it doesn't try to figure out whether the indexed entry is "for that branch" or not: branches can alias each other, and one branch's state can be distributed over many entries if it's reached with many different branch histories leading up to it. (Making a target address prediction is harder, but separate.) Commented Jul 28, 2023 at 7:49

Not the answer you're looking for? Browse other questions tagged or ask your own question.