The document discusses CPU pipeline optimization techniques, including both hardware and software branch prediction. It describes two main ways for compilers to provide better branch prediction hints to hardware: 1) Using explicit builtin_expect hints to indicate the likely branch direction; and 2) Leveraging profile information collected from representative runs to determine branch likelihoods and optimize compilation accordingly. Profile-guided optimization (PGO) generalizes this approach to further optimizations beyond just branch prediction.
3. The CPU pipe line
● Most CPUs today are built using pipelines
● This enables the CPU to clock at a higher rate
● This also enables the CPU to use more of it's
infrastructure at the same time and so utilizing
the hardware better.
● It also speed things up because computation is
done in parallel.
● This is even more so in multi-core CPUs.
4. The problems with pipe lines
● The pipe line idea is heavily tied to the idea of branch
prediction
● If the CPU has not yet finished certain instructions
and sees a branch following them then it needs to be
able to guess where the branch will go
● Otherwise the pipeline idea itself becomes
problematic.
● That is because the execution of instructions in
parallel at the CPU level is halted every time there is
a branch.
5. Hardware prediction
● The hardware itself does a rudimentary form of
prediction.
● The assumption of the hardware is that what
happened before will happen again (order)
● This is OK and hardware does a good job even
without assistance.
● This means that a random program will make the
hardware miss-predict and will cause execution
speed to go down.
● Example: "if(random()<0.5) {"
6. Software prediction
● The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go using
hints left inside the assembly.
● Special instructions were created for this by the
hardware manufacturers
● The compiler decides whether to use hinted branches
or non hinted ones.
● It will use the hinted ones only when it can guess where
the branch will go.
● For instance: in a loop the branch will tend to go back
to the loop.
7. The problem of branching
● When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are
the chances that the condition will evaluate to
true.
● Therefor it will usually use a hint-less branch
statement.
● Unless you tell it otherwise.
● There are two ways to tell the compiler which
way the branch will go.
8. First way - explicit hinting in the
software
● You can use the __builtin_expect construct to
hint at the right path.
● Instead of writing "if(x) {" you write:
"if(__builtin_expect((x),1)) {"
● You can wrap this in a nice macro like the Linux
kernel folk did.
● The compiler will plant a hint to the CPU telling
it that the branch is likely to succeed.
● See example
9. Second way - using profile
information
● You leave your code alone.
● You compile the code with -fprofile-arcs.
● Then you run it on a typical scenario creating files which
show which way branches went (auxname.gcda for each
file).
● Then you compile your code again with -fbranch-
probabilities which uses the gathered data to plant the
right hints.
● In GCC you must compile the phases using the exact
same flags (otherwise expect problems)
● See example
10. Second way - PGO
● This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
● This includes the branch prediction we saw
before but also other types of optimization
(reordering of switch cases as an example).
● This is why you should use the more general
flags -fprofile-generate and -fprofile-use which
imply the previous flags and add even more
profile generated optimization.