Big function (several thousand lines) to avoid function call overhead
This completely neglects the idea of the common-case and rare case branches of code. People won't get far neglecting this difference. There are factors to consider like the increased cost of more distant jumps and icache misses.
There is an unusual and not-so-commonly-cited case where inlining a somewhat large function can result in improved performance, but it has to do with constant propagation and dead code elimination more than calling overhead or its interference with register allocation. Take an example like this:
// Returns true if the ray intersects the triangle. If 'cull' is true, backfaces
// are ignored.
bool ray_tri_intersect(ray r, tri t, bool cull)
{
...
if (cull)
...
else
...
...
if (cull)
...
else
...
...
return result;
}
In those cases, I've found that calling the function like this:
if (ray_tri_intersect(r, t, false)) {...}
... will fail to eliminate the branching overhead of the if
' statements within the function. That's a bit unfortunate and there are speed-ups to be gained here if we just brute force inline the function, but ideally, we just need 3 versions of the function: one with cull
with a known compile-time value of true, another with false
, and another with a cull
value that can only be deduced at runtime (l-value). To inline every single call to the function is rather brutish and bloated in terms of code generation but maybe the best practical option we have in C in response to a hotspot.
No SRP to avoid Object overhead
The only practical overhead to an object that applies in a language like C++ is what the programmer introduces. Naturally, there will be some if you introduce virtual functions in the form of a vptr
and virtual dispatch, but there is no practical overhead to an object that you don't add yourself as with the case of Java or C#. That said, I have written many posts on here about trapping ourselves in performance-critical cases by designing our objects too granularly, like trying to represent a pixel of an image as an object, or a particle of a particle system as an object. That has nothing to do with compilers or optimizers but human design. If you have a system that has a boatload of dependencies to a teeny Pixel
object, there is no breathing room to change that to, say, loopy SIMD algorithms without extremely invasive changes to the codebase by modeling things at such a granular level. For very loopy performance-critical stuff, it does help to avoid dependencies to teeny objects storing very little data, but not because of some object overhead
. It's a human overhead of trapping yourself to a design that leaves little room for bulky, meaty optimizations. You don't leave much room for yourself to optimize an Image
operation if the majority of your code depends on working with a Pixel
object. You'll work yourself towards needing to rewrite your entire engine if you have so many dependencies to teeny objects that interfere with your ability to make broad, meaningful performance improvements without rewriting almost everything.
Reuse variable as much as possible instead of having scoped variables
Variables don't take any memory or resources. I keep repeating myself here like a broken record but variables don't require memory. Operations require memory. If you add x+y+z
, the computer can only perform scalar additions (excluding SIMD) on two operands at a time. So it can only do x+y
and it needs to remember the sum to add z
to it. This is what takes memory and I've found this so widely misconceived. Variables don't take memory. Variables are just human-friendly handles to the memory that results from operations like these. Understanding this is key to really understanding how our compilers and optimizers work so that we can better understand the results from our profilers.
For people interested in this topic, I recommend starting with Static Single Assignment Form and the optimizations that compilers make resulting from it.
That said, there are some cases where reusing an object
can net a performance improvement. For example, in C++, if you use std::vector
to store small, temporary results for each iteration of a loop with a million iterations, you can see substantial performance improvements hoisting the vector
out of the loop and clearing it and reusing it across iterations. That's because defining a vector object is much more than a variable declaration: it involves initializing the vector, and subsequent push_backs
and resize
and the like will involve heap allocations and possibly a linear-time copy construction algorithm (or something akin to a memcpy
with trivially copy-constructible types).
This overhead works towards zilch if you use a tiny_vector
or small_vector
implementation which uses a small buffer optimization in loops where you don't exceed the size of the small buffer. Unfortunately, the standard library doesn't offer SBOs for std::vector
(although oddly enough they prioritized making std::string
use one in C++11+). But forget the idea that variables have overhead. They have none. It's the operations involved with constructing the vector and inserting elements into it which have an overhead in this case.
Operations require space to store results. Variables require none.
They are just programmer-friendly handles to the results.
++i/i++
A quick glance at disassembly should show no difference here unless you're using complex objects, and by "complex", I mean beyond a simple random-access iterator (maybe something which allocates memory per iteration that the compiler failed to optimize away). 99.9% of the time, there should be no difference for most people. That said, I have never understood stubborn C++ programmers who refuse to use prefix notation in favor of postfix in places where it makes no difference when the prefix notation is guaranteed to be as fast or faster than postfix. Still, the stubborn ones are very rarely causing inefficiencies with their stubbornness (at least from a runtime standpoint -- maybe we could still save the compiler some extra work though and reduce build times by a small amount favoring ++i
across the board, especially if we involve UDTs).
My question: Are those practices myths or reality? In the context of an application with heavy performance criteria (Like High Frequency Trading Software) could those practices supersede "good development" practices?
If you make a decent profiler a best friend of yours, you'll see first-hand a lot of cases where the general rules of thumbs are mostly right and where they're mostly misleading. I recommend that. Make a profiler your best friend if you haven't already. It is definitely the case that a lot of things passed off as "general wisdom" are misleading, and some which definitely are not and good advice. But the key to thinking critically and telling the difference comes from measuring with a good tool that breaks things down for you in detail.