SlideShare a Scribd company logo
The lesser known GCC optimizations
Mark Veltzer
mark@veltzer.net
Using profile information for optimization
The CPU pipe line
● Most CPUs today are built using pipelines
● This enables the CPU to clock at a higher rate
● This also enables the CPU to use more of it's
infrastructure at the same time and so utilizing
the hardware better.
● It also speed things up because computation is
done in parallel.
● This is even more so in multi-core CPUs.
The problems with pipe lines
● The pipe line idea is heavily tied to the idea of branch
prediction
● If the CPU has not yet finished certain instructions
and sees a branch following them then it needs to be
able to guess where the branch will go
● Otherwise the pipeline idea itself becomes
problematic.
● That is because the execution of instructions in
parallel at the CPU level is halted every time there is
a branch.
Hardware prediction
● The hardware itself does a rudimentary form of
prediction.
● The assumption of the hardware is that what
happened before will happen again (order)
● This is OK and hardware does a good job even
without assistance.
● This means that a random program will make the
hardware miss-predict and will cause execution
speed to go down.
● Example: "if(random()<0.5) {"
Software prediction
● The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go using
hints left inside the assembly.
● Special instructions were created for this by the
hardware manufacturers
● The compiler decides whether to use hinted branches
or non hinted ones.
● It will use the hinted ones only when it can guess where
the branch will go.
● For instance: in a loop the branch will tend to go back
to the loop.
The problem of branching
● When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are
the chances that the condition will evaluate to
true.
● Therefor it will usually use a hint-less branch
statement.
● Unless you tell it otherwise.
● There are two ways to tell the compiler which
way the branch will go.
First way - explicit hinting in the
software
● You can use the __builtin_expect construct to
hint at the right path.
● Instead of writing "if(x) {" you write:
"if(__builtin_expect((x),1)) {"
● You can wrap this in a nice macro like the Linux
kernel folk did.
● The compiler will plant a hint to the CPU telling
it that the branch is likely to succeed.
● See example
Second way - using profile
information
● You leave your code alone.
● You compile the code with -fprofile-arcs.
● Then you run it on a typical scenario creating files which
show which way branches went (auxname.gcda for each
file).
● Then you compile your code again with -fbranch-
probabilities which uses the gathered data to plant the
right hints.
● In GCC you must compile the phases using the exact
same flags (otherwise expect problems)
● See example
Second way - PGO
● This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
● This includes the branch prediction we saw
before but also other types of optimization
(reordering of switch cases as an example).
● This is why you should use the more general
flags -fprofile-generate and -fprofile-use which
imply the previous flags and add even more
profile generated optimization.

More Related Content

Gcc opt

  • 1. The lesser known GCC optimizations Mark Veltzer mark@veltzer.net
  • 2. Using profile information for optimization
  • 3. The CPU pipe line ● Most CPUs today are built using pipelines ● This enables the CPU to clock at a higher rate ● This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. ● It also speed things up because computation is done in parallel. ● This is even more so in multi-core CPUs.
  • 4. The problems with pipe lines ● The pipe line idea is heavily tied to the idea of branch prediction ● If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go ● Otherwise the pipeline idea itself becomes problematic. ● That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.
  • 5. Hardware prediction ● The hardware itself does a rudimentary form of prediction. ● The assumption of the hardware is that what happened before will happen again (order) ● This is OK and hardware does a good job even without assistance. ● This means that a random program will make the hardware miss-predict and will cause execution speed to go down. ● Example: "if(random()<0.5) {"
  • 6. Software prediction ● The hardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. ● Special instructions were created for this by the hardware manufacturers ● The compiler decides whether to use hinted branches or non hinted ones. ● It will use the hinted ones only when it can guess where the branch will go. ● For instance: in a loop the branch will tend to go back to the loop.
  • 7. The problem of branching ● When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. ● Therefor it will usually use a hint-less branch statement. ● Unless you tell it otherwise. ● There are two ways to tell the compiler which way the branch will go.
  • 8. First way - explicit hinting in the software ● You can use the __builtin_expect construct to hint at the right path. ● Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" ● You can wrap this in a nice macro like the Linux kernel folk did. ● The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. ● See example
  • 9. Second way - using profile information ● You leave your code alone. ● You compile the code with -fprofile-arcs. ● Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). ● Then you compile your code again with -fbranch- probabilities which uses the gathered data to plant the right hints. ● In GCC you must compile the phases using the exact same flags (otherwise expect problems) ● See example
  • 10. Second way - PGO ● This whole approach is a subset of a bigger concept called PGO – Profile Generated Optimizations ● This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). ● This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.