When evaluating code performance, CPU instructions is not the best metric, since the exact number of operations depends on the compiler, CPU model, architecture and so on.
And we came up with a bunch of mathematical tools to describe the performance of an algorithm (the most popular being big-Oh notation) and we simply report the running time to get a feel for how the implementation runs in practice (yes, big-Oh is not always that easy to use).
I need to optimize a piece of code. This implies a lot of reading and staring at the code looking for places to improve, but also a great deal of experimentation. I need to check if idea A does improve (and by how much) the performance. And I do this by measuring the time it takes to execute a piece of code.
But this is quite imperfect: I do use a multitasking system. I cannot totally reproduce the running conditions of the previous experiment. My browser might decide to start 10 more threads which get in the way, or I might get some CPU throttling at a different point in time due to various reasons.
So, usually, I need not only to wait for my program to run, but I also tend to close my browser, make sure the IDE is not indexing or doing some expensive operation, wait a few minutes for the CPU to cool down, and then run my experiment. This is the only way I could get reliable timings.
The question is: is there a way to count the number of operations? On my CPU, for a specific process. This would be (theoretically) an ideal way of comparing 2 versions of my code, unless there is some issue I am missing.