70

Many questions and answers on the C/C++ pages, specifically or indirectly discuss micro performance issues (such is the overhead of an indirect vs direct vs inline function), or using an O(N2) vs O(NlogN) algorithm on a 100 item list.

I always code with no concern about micro performance, and little concern about macro performance, focusing on easy to maintain, reliable code, unless or until I know I have a problem.

My question is why is it that a large number of programmers care so much? Is it really an issue for most developers, have I just been lucky enough to not to have to worry too much about it, or am I a bad programmer?

8
  • 5
    +1, good general question.
    – iammilind
    Commented May 11, 2011 at 3:21
  • +1 good question..I added 2 tags.. hope you don't mind about that.
    – 0verbose
    Commented May 11, 2011 at 3:36
  • 2
    I head two great quotes 1) "Premature Optimisation is the root of all evil." 2) 80% of your time will be spent of 20% of your code (80/20 rule). Commented May 11, 2011 at 7:00
  • 2
    I notice a couple of answers talk about my O(n*n) example. I explicitly specified a list of 100 items, yet they still insist that the O(nlogn) is better, explicitly stating performance improvements if the list, in the furture, goes to 1000's or million's. Is this micro optimisation obsession because programers are programing to possible future requirements rather actual curent requirements? (Where have I heard that before... )
    – mattnz
    Commented May 11, 2011 at 8:16
  • 5
    @James the full quote from Donald Knuth is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil". There'll be some good answers about the remaining 3% in this thread.
    – StuperUser
    Commented May 11, 2011 at 12:49

19 Answers 19

14

In practice, performance is seldom an issue that needs to be managed at that level of detail. It's worth keeping an eye on the situation if you know you're going to be storing and manipulating huge amounts of data, but otherwise, you're right, and better off, keeping things simple.

One of the easiest traps to fall into -- especially in C and C++ where you have such fine-grained control -- is optimizing too early, and at too fine a level. In general the rule is: A) don't optimize until you find out you have a problem, and B) don't optimize anything that you haven't proven to be a problem area by using a profiler.

A corollary to B) is: programmers are notoriously bad at predicting where their performance bottlenecks are, even though, to a one, they think they're good at it. Use a profiler, and optimize the parts that are slow, or change algorithms if one section of code is being called way too many times, so that it's causing a problem.

8
  • 6
    Another one: initialization code that executes once doesn't generally need optimization, so look elsewhere. Commented May 11, 2011 at 3:27
  • 3
    Depends on how often "once" is. When running ./configure, I would venture to say up to 75% of the run time might be spent on "initialization" code in the programs the script runs. 25-50% might even be spent on dynamic linking. Commented May 11, 2011 at 3:47
  • 12
    Rule A is a terrible rule. The architecture of a system plays a role in performance, and if you find out later your architecture simply can't support your performance requirements you're basically screwed. So while you be able to pass over fine details, completely ignoring this at the beginning is just plain wrong. Commented May 11, 2011 at 5:40
  • 3
    @edA-qa : I used to think so, but over the years have experianced many more projects stuggle or fail before any performance considerations became a concern. Every time I have had performance concerns, the fix has been a comparitively low cost, days or a few weeks, No more of a concern than any other "bug" detected an d fix in development. However, jsut like any otehr risk item, performance coneres need to be identified and mitigated early in the project.
    – mattnz
    Commented May 11, 2011 at 8:22
  • 5
    The OP asked why so many care, and I fail to see how this response actually answered the question, unless the OP was more interested in hearing someone say "don't worry about it!".
    – red-dirt
    Commented May 11, 2011 at 17:33
55

I think everything on your list is micro-optimization, which should not generally looked at, except for

using an O(n*n) vs O(NlogN) algorithm on a 100 item list

which I think should be looked at. Sure, that list is 100 items right now, and everything is fast for small n, but I'd be willing to bet soon that same code is going to be reused for a several million line list, and the code is still going to have to work reasonably.

Choosing the right algorithm is never a micro-optimization. You never know what kinds of data that same code are going to be used for two months or two years later. Unlike the "micro-optimizations" which are easy to apply with the guidance of a profiler, algorithm changes often require significant redesign to make effective use of the new algorithms. (E.g. some algorithms require that the input data be sorted already, which might force you to modify significant portions of your applications to ensure the data stays sorted)

12
  • 37
    +1 for "Choosing the right algorithm is never a micro-optimization."
    – 0verbose
    Commented May 11, 2011 at 3:41
  • 9
    I +1'd too, but note that choosing the big-O-optimal algorithm when your data sizes are sure to be small can be detrimental to development time, program size, and perhaps even memory usage. If you're sorting poker hands, do you really want to write a quicksort, smoothsort, or mergesort? I would start with simple insertion sort or use a sorting network. Commented May 11, 2011 at 3:52
  • 8
    That's funny. In a thread about micro-optimization, a lot of commentators micro-optimize the answers. ;)
    – Secure
    Commented May 11, 2011 at 7:57
  • 5
    "I'd be willing to bet soon that same code is going to be reused for a several million line list": That completely depends on the problem domain. Examples: if you're writing a chess algorithm, you can be reasonably sure the board size won't change. If you program an autonomous vehicle, the number of wheels won't grow that fast, either.
    – nikie
    Commented May 11, 2011 at 10:49
  • 3
    I dislike "choosing the right algorithm is never a micro-optimization" because it's OBVIOUSLY true, given the nature of the word "right". However, I feel like your implication is really "the fastest or most efficient" algorithm, which I disagree with. Choosing the most efficient algorithm is the wrong choice if it takes loads of time to implement and the speed or space of that segment hardly matters anyways. Commented Sep 2, 2011 at 23:20
18

A looooooong time ago, in my first job, I wrote code for embedded systems. These systems used 8086 microprocessors, and had limited memory. We used the Intel C compiler. One system I built needed to access a 3-d array of structures. I built it just like the book told me: call malloc for the 3 dimensions, then allocate rows for the next dimension, then calloc for the end nodes.

It was pretty complicated (for me at the time), I had to do curve fitting, ANOVA process control and Chi-squared analysis. There were no libraries that did this for us; we had to write it all and fit it all onto the 8086.

The system ran like a dog. After a quick profiling, I discovered that one of the biggest problems was the allocator. To fix the problem I abandoned all the calls to malloc and did my own memory management of one large block of memory.


In another case on the same job, the customer was complaining about response time on their statistical process control system. The team before me had designed "software PLC" system where operators could use a boolean logic for combining signals and tripping switches. They wrote it in a simplified language, what we'd call a "domain specific language" today. as I recall it looked like ((A1 + B1) > 4) AND (C1 > C2) and so on.

The original design parsed and interpreted that string every time it was evaluated. On our measly processor, this consumed lots of time, and it meant that the process controller couldn't update as fast as the process was running.

I took a new look at it and decided that I could translate that logic into assembly code, at runtime. I parsed it once and then each time it ran, the app called into a dynamically generated function. Kind of like some viruses do today, I guess (but I don;t really know). The result was a 100-fold increase in performance, which made the customer and my boss really really happy.

The new code was not nearly as maintainable, being that I had built a custom compiler. But the performance advantage well outweighed the maintenance disadvantage.


More recently I was working on a system that needed to parse an XML fly, dynamically. Larger files would take considerably more time. This was very performance sensitive; too slow of a parse would cause the UI to become completely unusable.

These kinds of things come up all the time.


So.... sometimes you want maintainable, easy-to-write code. Sometimes you want code that runs quickly. The tradeoff is the engineering decision you need to make, on each project.

2
  • 9
    In all your examples the cost of optimizing it afterwards wasn't much higher than writing the fast code from the beginning. So writing slower simpler code first and then optimizing where necessary worked well in all of them. Commented May 11, 2011 at 11:24
  • 7
    @CodeInChaos: The answer doesn't claim otherwise. It speaks to the OP's question "Why I should care about micro performance and efficiency?" Pre-optimization issues were merely inferred by the other answerers.
    – webbiedave
    Commented May 11, 2011 at 15:42
12

If you are processing large images and iterating over every pixel, then performance tweaking can be critical.

3
  • 2
    +1 -- also, high frequency finance, any kind of audio/video encoder/decoder, simulations and modeling (e.g. games), systemwide bits like CPU schedulers and memory managers, etc. Commented May 11, 2011 at 3:45
  • 3
    CAN be critical, but IS only critical after you've proved it to be so and you've profiled it to be where you think the problem is. (Hint: it probably isn't there.) Commented May 11, 2011 at 5:37
  • 2
    @JUST MY correct OPINION: actually, for image processing, data processing is usually the second largest time consumer (I/O still is the biggest). However, optimizing for I/O requires a lot of unusual/crazy designs and their acceptance by fellow programmers, and sometimes it is outright impossible to improve. The processing part, however, is usually embarassingly parallelizable, hence they are easily reapable benefits. (One's tweaking might be seen by another as a straight textbook implementation... unless you reach the level of VirtualDub)
    – rwong
    Commented May 11, 2011 at 6:24
12

Let me tell you a bit about the why behind the culture.

If you're closer to 40 than to 20, and you've been programming for a living through your adult years, then you came of age when C++ was really the only game in town, desktop apps were the norm, and hardware was still greatly lagging software in terms of bandwidth/performance capabilities.

  • We used to have to do stupid programming tricks to be able to read large (>2G) files...
  • We used to worry about executable size...
  • We used to worry about how much memory our programs were consuming...
  • We regularly made algorithmic time vs. space trade-off decisions...
  • Even on the back-end, we had to write CGI programs in C or C++ for anything to handle a decent no. of RPS... It was several orders of magnitude faster.
  • We used to run tests on the merits of performance between delphi/c++/vb!

Very few people have to worry about these things today.

However, 10 years ago you still had to worry about your software being downloaded over a 56kb modem, and being run on a 5 year old PC... Do you remember how crappy PCs were in 1996? Think in terms of 4GB of hard drive, a 200Mhz processor, and 128Mb of RAM...

And the servers of 10 years ago? Dell's "next generation" server cost $2000, and came with 2 (!) 1Ghz pentium processors, 2Gb or Ram, and a 20Gb hard drive.

It was simply a different ballgame, and all of those "senior" engineers that have 10 years of experience (the guys likely to be answering your questions), cut their teeth in that environment.

3
  • 1
    The additional 20 years of experience also means we've got the burn marks from having been through the optimization process many, many times and avoid doing things that may need it later. Same reason I don't smack my thumb (much) while using a hammer.
    – Blrfl
    Commented May 11, 2011 at 13:40
  • 1
    loop unwinding <shudder>
    – red-dirt
    Commented May 11, 2011 at 16:58
  • 5
    and today all the kids who thought bandwidth, CPU and memory were unlimited are finding their mobile applications don't work very well.
    – gbjbaanb
    Commented Jun 8, 2016 at 9:35
9

there's already 10 answers here and some are really good, but because this is a personal pet peeve of mine...

premature optimization which a) takes way more time to do than a simple solution b) introduces more code where simple solution would've been half the size and half the complexity and c) makes things less readable is ABSOLUTELY should be avoided. However, if a developer has a choice between using a std::map or std::vector and he chooses the wrong collection out of pure ignorance for performance that is as bad if not worse than premature optimization. What if you could slightly change your code today, maintain readability, keep same complexity, but make it more efficient, would you do it? Or would you call it "premature optimization"? I find that a lot people wouldn't even give that any thought one way or another.

Once I was the guy who advised "micro-optimization" that required very little change and I was given the same response that you just said, "you shouldn't optimize too early. Let's just get it to work and we'll change it later if there is a performance problem". It took several releases before we fixed it. And yes it was a performance problem.

While early optimization may not be good, I think it is very beneficial if people write code with understanding what that code is going to do and don't simply disregard any question that results in O(x) notation as being "optimization". There's plenty of code you can write now and with a little thought about performance avoid 80% of issues down the road.

Also consider that a lot of performance problems are not going to happen in your environment and not right away. Some times you'll have a customer that pushes the limit or another developer decides to build on top of your framework and increase the number of objects 10-fold. With some though about performance now, you could avoid very costly redesign later. And if the problem is found after the software is officially released, even a simple fix becomes 20 times more expensive to apply.

So in conclusion, keeping performance in mind at all times helps develop good habits. Which are just as important to have as writing clean, as simple as possible and organized code.

1
6

I suspect that a lot of what you're seeing is simple sampling error. When people are dealing with straightforward situations, they write code and that's the end of things. They ask questions when they're dealing with something relatively tricky, such as needing to optimize, especially in a situation where it's not necessarily obvious that optimization would be needed.

That said, there's undoubtedly some premature optimization involved as well. Correctly or otherwise, C and C++ have a reputation for performance, which tends to attract people who care about performance -- including those who may do optimization as much for enjoyment as because it's really needed.

4
  • 1
    +1 -- Note: Most SO questions with the "performance" tag are probably part of that sampling error :P Commented May 11, 2011 at 3:44
  • 3
    I sure see a damn lot of premature-optimization questions on here... I think it comes from the fact that a lot of hobbyist programmers get started with the idea of writing games, and there's a huge corpus of nonsense "optimization" books and websites related to game development that put bad ideas in beginners' heads. :-) Commented May 11, 2011 at 3:45
  • 4
    When you're dealing with something tricky, often it seems easier to take a break from the tricky problem and fritter away your time worrying about whether you should use i++ or ++i Commented May 11, 2011 at 4:02
  • @Carson63000: yes that could totally skew the samples. Or they spend time answering questions about why my operator ++ didn't compile.
    – rwong
    Commented May 11, 2011 at 6:31
4

A couple of the other answers mention embedded systems, and I'd like to expand on this.

There are plenty of devices containing low-end processors, for example: the boiler controller in your house, or a simple pocket calculator, or the dozens of chips inside a modern car.

To save money, these may have quantities of flash (to store code) and RAM which seem tiny to those who've only written code for PCs or smartphones. To save power, they may run at relatively low clock rates.

To take an example, the STM32 family of microcontrollers goes from 24 MHz, 16 KB flash and 4 KB of RAM, up to 120 MHz, 1 MB flash and 128 KB RAM.

When writing code for chips like these, it saves a lot of time if you aim to make your code as efficient as possible as a matter of course. Obviously, premature optimisation remains a bad idea; but with practise, you learn how common problems can be solved quickly and/or with minimal resources, and code accordingly.

1
  • 1
    good points to consider for embedded systems, a field I work in myself. Even with that in mind my experience over the years is that misguided optimization is always a waste of time. Without tools to guide us we rarely find the problem areas
    – Jeff
    Commented May 11, 2011 at 12:51
2

These being essentially low-level languages, when one runs into a pathological performance case where one detail that wouldn't matter 99% of the time is causing the bottleneck, one actually has the opportunity to directly work around the issue (unlike with most other languages); but of course, often, how to do so most effectively is not immediately apparent. Hence half of the weird/interesting micro-optimization questions asked here.

The other half just comes from those curious about how close they can get to the metal. These being essentially low-level languages, after all...

1
  • +1: worth pointing out that the "pathological performance" could happen to anyone in the world, regardless of language or platform. The ability to re-implement in a lower-level language for testing and read disassembly may provide more insights, but doesn't always provide a workable solution. Example: "I know I can do it in assembly - but it needs to run in partial-trust environment!"
    – rwong
    Commented May 11, 2011 at 6:39
2

Performance is always a hot topic when you're dealing with C and C++. Regarding how far one should go, you can always go crazy to the point of inline-ing ASM, or using pointer arithmetic for faster iteration. However, there comes a point to where one spends so much time optimizing that working on developing the overall program comes to a halt.

When dealing with these issues, there's programmer performance and code performance. Which of these to focus on will always bring up interesting questions. In the end the most important question is how noticeable it is to the user. Will the user be working with data that creates arrays with hundreds or thousands of elements? In this case coding for getting things done quickly might have your user complaining that the program's standard operations are slow.

Then there's the user who will be working with small amounts of data. A few files here and there, where doing things like sorting and file operations won't be as noticeable to the user if you're using higher level functions that make things easier for you to maintain at the cost of some performance.

This is just a small example of the issues you'll run into. Other matters include the target user's hardware. You're going to have to worry about performance a lot more if you deal with embedded systems, then if your users have, say, dual core machines with gigs of ram.

2
  • Hmm.. I don't use pointer arithmetic for faster iteration -- it's a multiply and add instruction per loop whether you're using index based or pointer based iteration. I do use it though, because it's usually more clear than index-based iteration. Commented May 11, 2011 at 3:28
  • pointer arithmetic is no faster than w/e.
    – Joel Falcou
    Commented May 11, 2011 at 3:37
2

Why do programmers care so much? There are silly ideas populating their heads, such as solving performance problems before they know they have them, and not understanding when they are guessing.

It's tricky because, in my experience, there are some performance issues one should think about ahead of time. It takes experience to know what they are.

That said, the method I use is similar to yours, but not the same:

  1. Start with the simplest possible design. In particular, the data structure should be as normalized and minimal as possible. To the extent it has unavoidable redundancy, one should be shy of notifications as a way to keep it consistent. It is better to tolerate temporary inconsistency, and repair it with a periodic process.

  2. When the program is under development, do performance tuning periodically, because performance problems have a way of quietly creeping in. The method I use is random-pausing, because I think it's the best.

Here's a blow-by-blow example of what I mean.

1

To be honest, it depends on what's your aim and whether you are programming professionally or as a hobby.

Nowadays, modern computers are really powerful machines. Regardless of what basic operations you decide to do, whether you are attempting to micro optimize or not, they can make their job remarkably fast. But of course, if you are doing something else (for example, supercomputing for fields like physics or chemistry), you may want to optimize as much as you want.

The early MIT programmers weren't born to make awesome stuff; They started simplifying and powering existing algorithms. Their pride was to make 2 + 2 give four in two seconds less than the existing algorithm (that's just an example, you get the idea). They constantly tried to use less punch cards in their TI-83 machines for performance.

Also, if you are programming for embedded systems, then you certainly have to keep an eye on micro performance. You don't want to have a slow digital clock that ticks a second 5 nanoseconds earlier than another digital clock.

Finally, if you are a hobbyist programmer then there is certainly no harm in optimizing the smallest details even though your program is fast it is. It's not needed, but certainly something you can work on and take the chance to learn more. If you are working professionally in a piece of software, you can't take that luxury unless it is extremely needed.

3
  • 1
    I don't think being a hobbyist programmer has anything to do with this. Just because you're not doing something professionally doesn't necessarily mean you have all the time in the world to spend on it. Moreover, most hobbyists are going to make bigger mistakes, such as choosing the wrong algorithms, than most true professionals will make. Moreover, the professional is probably working on a product which processes significantly more data than the hobbyist (which therefore must be faster), and has to keep customers happy with the app's performance. Hobbyists have no such constraints. Commented May 11, 2011 at 3:41
  • They don't, but they certainly have more time to work on them merely if they want to.
    – Sergio
    Commented May 11, 2011 at 3:45
  • 3
    I would argue the opposite. I've got 8 hours or more a day to work on something as a professional. I get 1, maybe 2 hours per day for my hobby projects. Commented May 11, 2011 at 3:47
1

using an O(N2) vs O(NlogN) algorithm on a 100 item list.

I was in a similar situation recently. I had an array of items. In the expected case, there were two (!) items in the list, and even in the worst case I don’t expect more than four or maybe eight.

I needed to sort that list. Turns out, replacing std::sort with a sorting network (essentially a lot of nested ifs) shaved off a large percentage of the running time (I don’t remember the number but it was something like 10–20%). This is a huge benefit of a micro-optimisation, and the code is absolutely performance-critical.

Of course, I only did this after profiling. But the point is, if I use a language that is as inconvenient and convoluted as C++ (not to mention its infuriatingly complex rules for overload resolution), then I want to reap full benefits.

1

Cumulative energy use

There's one answer that I always think is missing from these discussion and which bothers me a bit - cumulative energy usage.

Sure, maybe it does not matter much if you write your program in a high level interpreted language, and let it run in a browser with a couple of layers of indirection, or if your loop takes 0.01 seconds instead of 0.001 seconds. No one will notice, that is, no individual user will notice.

But when tens of thousands, or even millions of users in some cases use your code, all that extra inefficiency adds up. If your tool prevents a CPU from entering the sleep state for just ten seconds per day, and a million users use it, your inefficient algorithm just used up an extra 140 kWh[1] per day.

I rarely see this discussed, and I think that's sad. I strongly suspect that the figures are far worse for popular frameworks, like Firefox, and fancy interactive web applications, and it would be interesting to research.


[1] I just made that up, 10 million seconds times 50 Watts. The exact figure depends on many things.

1
  • 1
    You should start right off by mentioning the magic word "mobile". When running on a desktop platform, an application which takes 1/100sec of CPU time to draw a frame 60 times a second will be "fast enough"; improving the performance tenfold would make zero difference to the user. On a mobile platform, however, an application which runs at 90% CPU utilization may guzzle batteries much faster than running at 10%.
    – supercat
    Commented Jun 9, 2016 at 22:38
1

Sometimes you just have algorithms that can't be better than linear time for which there's still a strong performance demand.

An example is video processing where you can't make an image/frame brighter as a basic example without looping through every pixel (well, I suppose you can with some kind of hierarchical structure indicating properties inherited by children which ultimately descend down into image tiles for leaf nodes, but then you'd defer a higher cost of looping through every pixel to the renderer and the code would probably be harder to maintain than even the most micro-optimized image filter).

There's lot of cases like that in my field. I tend to be doing more linear-complexity loops that have to touch everything or read everything than ones that benefit from any kind of sophisticated data structure or algorithm. There's no work that can be skipped when everything has to be touched. So at that point if you're inevitably dealing with linear complexity, you have to make the work done per iteration cheaper and cheaper.

So in my case the most important and common optimizations are often data representations and memory layouts, multithreading, and SIMD (typically in this order with data representation being the most important, as it affects the ability to do the latter two). I'm not running into so many problems that get solved by trees, hash tables, sorting algorithms, and things of that sort. My daily code is more in the vein of, "for each thing, do something."

Of course it's another case to talk about when optimizations are necessary (and more importantly, when they aren't), micro or algorithmic. But in my particular case, if a critical execution path needs optimization, the 10x+ speed gains are often achieved by micro-level optimizations like multithreading, SIMD, and rearranging memory layouts and access patterns for improved locality of reference. It's not so often that I get to, say, replace a bubble sort with an introsort or a radix sort or quadratic-complexity collision detection with a BVH so much as find hotspots that, say, benefit from hot/cold field splitting.

Now in my case my field is so performance-critical (raytracing, physics engines, etc) that a slow but perfectly correct raytracer that takes 10 hours to render an image is often considered as useless or more than a fast one which is completely interactive but outputs the ugliest images with rays leaking everywhere due to the lack of watertight ray/tri intersection. Speed is arguably the primary quality metric of such software, arguably even more than correctness to some point (since "correctness" is a fuzzy idea with raytracing since everything is approximating, so long as it's not crashing or anything like that). And when that's the case, if I don't think about efficiency upfront, I find I have to actually change the code at the most expensive design level to handle more efficient designs. So if I don't think sufficiently about efficiency upfront when designing something like a raytracer or physics engine, chances are that I might have to rewrite the entire damned thing before it can be considered useful enough in production and by the actual users, not by the engineers.

Gaming is another field similar to mine. Doesn't matter how correct your game logic is or how maintainable and brilliantly engineered your codebase is if your game runs at 1 frame per second like a slideshow. In certain fields the lack of speed could actually render the application useless to its users. Unlike games, there's no "good enough" metric in areas like raytracing. The users always want more speed, and the industrial competition is predominantly in seeking faster solutions. It'll never be good enough until it's real-time, at which point games will be using path tracers. And then it probably still won't be good enough for VFX, since then the artists might want to load billions of polygons and have particle simulations with self-collision among billions of particles at 30+ FPS.

Now if it's of any comfort, in spite of that I still write around 90% of the code in a scripting language (Lua) with no concerns about performance whatsoever. But I have an unusually large amount of code that does actually need to loop through millions to billions of things, and when you're looping through millions to billions of things, you do start to notice an epic difference between naive single-threaded code that invokes a cache miss with every iteration vs. say, vectorized code running in parallel accessing contiguous blocks where no irrelevant data is loaded into a cache line.

0

As you mentioned, care about micro performance issues is worthless before you account some problems really caused by these issues

0

It is really impossible to answer this question in general. Most software that is being built today are internal web sites and LOB applications, and for that kind of programming your reasoning is quite correct. On the other side, if you are writing something like a device driver or a game engine, no optimization is "premature"; chances are your software will run on very different systems with different hardware constraints. In that case you should design for performance and make sure you don't pick a sub-optimal algorithm.

1
  • Exactly what I wanted to say. Every piece of software has its application domain and should not be expected to behave optimally outside of it. In this sense, premature optimization is an example of misguided perfectionism.
    – K.Steff
    Commented Jun 29, 2012 at 1:36
0

I think the problem of the programmer, who cares so much about performance is, that sometimes in his life, he needed to write micro-performant code, maybe very urgently, and he learned, learned, learned, and in the end he knew a lot of things and tricks.

And now it is hard to forget, and without prior measurement, which shows, that he doesn't need to worry, he is on the secure side, using fast code.

It is always nice to show your deep knowledge, your skills and some tricks, and to reuse something you have learned. It makes you feel valuable, and the time, spend on learning, being it worth.

Sometimes in my live, I learned prefix increment is faster ...

for (int i = 0; i < MAX; ++i)

... than postfix increment:

for (int i = 0; i < MAX; i++)

Now if MAX is low, it will not matter, and if there is real work in the loop, it will not matter too. But there isn't a reason to use the postfix version, even if todays compiler optimize the code on their own.

Maybe the performance seekers need an additional goal, beside writing 'working code', like 'working and readable code' to have a guideline in the big sea of options.

0

have I just been lucky enough to not to have to worry too much about it, or am I a bad programmer?

Do you care about your requirements? If performance isn't a requirement then don't worry about it. Spending any significant time on it is a disservice to your employer.

To an extent performance is always a requirement. If you can hit it without thinking about it, you are justified in not thinking about it.

Personally, I'm most often driven by performance when my tests take to long to pass. I'm too impatient to wait 5 minutes while a set of tests pass. But that's usually solved by fiddling with the tests.

My question is why is it that a large number of programmers care so much? Is it really an issue for most developers,

There are large numbers of programmers who are justified in how much they care. There are large numbers who aren't. Let's talk about those who aren't.

One of the first things programmers learn in school, after how to make things actually work, is big O notation. Many of them learn the lesson properly and thus properly focus on things dramatically impacted by n. Others don't get the math and only take away the lesson that once it works in needs to be fast. Worse, some of these students never learn anything else about what's important to do with your code besides make it work and make it work fast. The missed lessons: make it readable, design it well, don't play around in it for no reason.

Knuth was right: premature optimization is the root of all evil. But once it works what is the next step? Fast right? NO! The next step is readable. Readable is the first, next, middle, and last step. Many of the people I find doing unneeded performance optimizations are throwing readability under the bus.

Some even get a perverse thrill from how unreadable their code is. They've had to suffer looking at hard to understand code created by others so now it's their turn at payback.

I know this because I used to do this. I once refactored a perfectly readable 5 line if structure down to an indecipherable one line boolean expression and proudly sent it to my professor expecting to impress since I could create something so compact and intimidating. I didn't get the praise I was hoping for.

If code stays readable making it fast later is easy. That's why Knuth emphasizes "premature" not "unneeded". Because sure, faster is better. But better is only better depending on what you sacrifice for it. So wait until you know what performance you really need before you make sacrifices for it. Sacrifice readability reluctantly because once it's gone, it's hard to get back.

Beyond readability is the whole world of software design. What this site is about. Some have no clue what to do as far as design. So since they can't impress with design they make an indecipherable mess so people can't tell they have no clue. Since no one ever fixes their code it must be good code right?

For some, performance is the catch all excuse to do whatever they want. Programmers have a lot of power and autonomy. Trust has been put in them. Don't abuse the trust.

Not the answer you're looking for? Browse other questions tagged or ask your own question.