110

I recently interviewed at Amazon. During a coding session, the interviewer asked why I declared a variable in a method. I explained my process and he challenged me to solve the same problem with fewer variables. For example (this wasn't from the interview), I started with Method A then improved it to Method B, by removing int s. He was pleased and said this would reduce memory usage by this method.

I understand the logic behind it, but my question is:

When is it appropriate to use Method A vs. Method B, and vice versa?

You can see that Method A is going to have higher memory usage, since int s is declared, but it only has to perform one calculation, i.e. a + b. On the other hand, Method B has lower memory usage, but has to perform two calculations, i.e. a + b twice. When do I use one technique over the other? Or, is one of the techniques always preferred over the other? What are things to consider when evaluating the two methods?

Method A:

private bool IsSumInRange(int a, int b)
{
    int s = a + b;

    if (s > 1000 || s < -1000) return false;
    else return true;
}

Method B:

private bool IsSumInRange(int a, int b)
{
    if (a + b > 1000 || a + b < -1000) return false;
    else return true;
}
27
  • 232
    I'm willing to bet that a modern compiler will generate the same assembly for both of those cases.
    – 17 of 26
    Commented Sep 4, 2018 at 18:25
  • 13
    I rollbacked the question to the original state, since your edit invalidated my answer - please don't do that! If you ask a question how to improve your code, then don't change the question by improving the code in the shown way - this makes the answers look meaningless.
    – Doc Brown
    Commented Sep 4, 2018 at 19:40
  • 77
    Wait a second, they asked to get rid of int s while being totally fine with those magic numbers for upper and lower bounds?
    – null
    Commented Sep 4, 2018 at 20:52
  • 34
    Remember: profile before optimizing. With modern compilers, Method A and Method B may be optimized to the same code (using higher optimization levels). Also, with modern processors, they could have instructions that perform more than addition in a single operation. Commented Sep 4, 2018 at 22:07
  • 142
    Neither; optimize for readability.
    – Andy
    Commented Sep 4, 2018 at 23:19

15 Answers 15

146

Instead of speculating about what may or may not happen, let's just look, shall we? I'll have to use C++ since I don't have a C# compiler handy (though see the C# example from VisualMelon), but I'm sure the same principles apply regardless.

We'll include the two alternatives you encountered in the interview. We'll also include a version that uses abs as suggested by some of the answers.

#include <cstdlib>

bool IsSumInRangeWithVar(int a, int b)
{
    int s = a + b;

    if (s > 1000 || s < -1000) return false;
    else return true;
}

bool IsSumInRangeWithoutVar(int a, int b)
{
    if (a + b > 1000 || a + b < -1000) return false;
    else return true;
}

bool IsSumInRangeSuperOptimized(int a, int b) {
    return (abs(a + b) < 1000);
}

Now compile it with no optimization whatsoever: g++ -c -o test.o test.cpp

Now we can see precisely what this generates: objdump -d test.o

0000000000000000 <_Z19IsSumInRangeWithVarii>:
   0:   55                      push   %rbp              # begin a call frame
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d ec                mov    %edi,-0x14(%rbp)  # save first argument (a) on stack
   7:   89 75 e8                mov    %esi,-0x18(%rbp)  # save b on stack
   a:   8b 55 ec                mov    -0x14(%rbp),%edx  # load a and b into edx
   d:   8b 45 e8                mov    -0x18(%rbp),%eax  # load b into eax
  10:   01 d0                   add    %edx,%eax         # add a and b
  12:   89 45 fc                mov    %eax,-0x4(%rbp)   # save result as s on stack
  15:   81 7d fc e8 03 00 00    cmpl   $0x3e8,-0x4(%rbp) # compare s to 1000
  1c:   7f 09                   jg     27                # jump to 27 if it's greater
  1e:   81 7d fc 18 fc ff ff    cmpl   $0xfffffc18,-0x4(%rbp) # compare s to -1000
  25:   7d 07                   jge    2e                # jump to 2e if it's greater or equal
  27:   b8 00 00 00 00          mov    $0x0,%eax         # put 0 (false) in eax, which will be the return value
  2c:   eb 05                   jmp    33 <_Z19IsSumInRangeWithVarii+0x33>
  2e:   b8 01 00 00 00          mov    $0x1,%eax         # put 1 (true) in eax
  33:   5d                      pop    %rbp
  34:   c3                      retq

0000000000000035 <_Z22IsSumInRangeWithoutVarii>:
  35:   55                      push   %rbp
  36:   48 89 e5                mov    %rsp,%rbp
  39:   89 7d fc                mov    %edi,-0x4(%rbp)
  3c:   89 75 f8                mov    %esi,-0x8(%rbp)
  3f:   8b 55 fc                mov    -0x4(%rbp),%edx
  42:   8b 45 f8                mov    -0x8(%rbp),%eax  # same as before
  45:   01 d0                   add    %edx,%eax
  # note: unlike other implementation, result is not saved
  47:   3d e8 03 00 00          cmp    $0x3e8,%eax      # compare to 1000
  4c:   7f 0f                   jg     5d <_Z22IsSumInRangeWithoutVarii+0x28>
  4e:   8b 55 fc                mov    -0x4(%rbp),%edx  # since s wasn't saved, load a and b from the stack again
  51:   8b 45 f8                mov    -0x8(%rbp),%eax
  54:   01 d0                   add    %edx,%eax
  56:   3d 18 fc ff ff          cmp    $0xfffffc18,%eax # compare to -1000
  5b:   7d 07                   jge    64 <_Z22IsSumInRangeWithoutVarii+0x2f>
  5d:   b8 00 00 00 00          mov    $0x0,%eax
  62:   eb 05                   jmp    69 <_Z22IsSumInRangeWithoutVarii+0x34>
  64:   b8 01 00 00 00          mov    $0x1,%eax
  69:   5d                      pop    %rbp
  6a:   c3                      retq

000000000000006b <_Z26IsSumInRangeSuperOptimizedii>:
  6b:   55                      push   %rbp
  6c:   48 89 e5                mov    %rsp,%rbp
  6f:   89 7d fc                mov    %edi,-0x4(%rbp)
  72:   89 75 f8                mov    %esi,-0x8(%rbp)
  75:   8b 55 fc                mov    -0x4(%rbp),%edx
  78:   8b 45 f8                mov    -0x8(%rbp),%eax
  7b:   01 d0                   add    %edx,%eax
  7d:   3d 18 fc ff ff          cmp    $0xfffffc18,%eax
  82:   7c 16                   jl     9a <_Z26IsSumInRangeSuperOptimizedii+0x2f>
  84:   8b 55 fc                mov    -0x4(%rbp),%edx
  87:   8b 45 f8                mov    -0x8(%rbp),%eax
  8a:   01 d0                   add    %edx,%eax
  8c:   3d e8 03 00 00          cmp    $0x3e8,%eax
  91:   7f 07                   jg     9a <_Z26IsSumInRangeSuperOptimizedii+0x2f>
  93:   b8 01 00 00 00          mov    $0x1,%eax
  98:   eb 05                   jmp    9f <_Z26IsSumInRangeSuperOptimizedii+0x34>
  9a:   b8 00 00 00 00          mov    $0x0,%eax
  9f:   5d                      pop    %rbp
  a0:   c3                      retq

We can see from the stack addresses (for example, the -0x4 in mov %edi,-0x4(%rbp) versus the -0x14 in mov %edi,-0x14(%rbp)) that IsSumInRangeWithVar() uses 16 extra bytes on the stack.

Because IsSumInRangeWithoutVar() allocates no space on the stack to store the intermediate value s it has to recalculate it, resulting in this implementation being 2 instructions longer.

Funny, IsSumInRangeSuperOptimized() looks a lot like IsSumInRangeWithoutVar(), except it compares to -1000 first, and 1000 second.

Now let's compile with only the most basic optimizations: g++ -O1 -c -o test.o test.cpp. The result:

0000000000000000 <_Z19IsSumInRangeWithVarii>:
   0:   8d 84 37 e8 03 00 00    lea    0x3e8(%rdi,%rsi,1),%eax
   7:   3d d0 07 00 00          cmp    $0x7d0,%eax
   c:   0f 96 c0                setbe  %al
   f:   c3                      retq

0000000000000010 <_Z22IsSumInRangeWithoutVarii>:
  10:   8d 84 37 e8 03 00 00    lea    0x3e8(%rdi,%rsi,1),%eax
  17:   3d d0 07 00 00          cmp    $0x7d0,%eax
  1c:   0f 96 c0                setbe  %al
  1f:   c3                      retq

0000000000000020 <_Z26IsSumInRangeSuperOptimizedii>:
  20:   8d 84 37 e8 03 00 00    lea    0x3e8(%rdi,%rsi,1),%eax
  27:   3d d0 07 00 00          cmp    $0x7d0,%eax
  2c:   0f 96 c0                setbe  %al
  2f:   c3                      retq

Would you look at that: each variant is identical. The compiler is able to do something quite clever: abs(a + b) <= 1000 is equivalent to a + b + 1000 <= 2000 considering setbe does an unsigned comparison, so a negative number becomes a very large positive number. The lea instruction can actually perform all these additions in one instruction, and eliminate all the conditional branches.

To answer your question, almost always the thing to optimize for is not memory or speed, but readability. Reading code is a lot harder than writing it, and reading code that's been mangled to "optimize" it is a lot harder than reading code that's been written to be clear. More often than not, these "optimizations" have negligible, or as in this case exactly zero actual impact on performance.


Follow up question, what changes when this code is in an interpreted language instead of compiled? Then, does the optimization matter or does it have the same result?

Let's measure! I've transcribed the examples to Python:

def IsSumInRangeWithVar(a, b):
    s = a + b
    if s > 1000 or s < -1000:
        return False
    else:
        return True

def IsSumInRangeWithoutVar(a, b):
    if a + b > 1000 or a + b < -1000:
        return False
    else:
        return True

def IsSumInRangeSuperOptimized(a, b):
    return abs(a + b) <= 1000

from dis import dis
print('IsSumInRangeWithVar')
dis(IsSumInRangeWithVar)

print('\nIsSumInRangeWithoutVar')
dis(IsSumInRangeWithoutVar)

print('\nIsSumInRangeSuperOptimized')
dis(IsSumInRangeSuperOptimized)

print('\nBenchmarking')
import timeit
print('IsSumInRangeWithVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeWithoutVar: %fs' % (min(timeit.repeat(lambda: IsSumInRangeWithoutVar(42, 42), repeat=50, number=100000)),))
print('IsSumInRangeSuperOptimized: %fs' % (min(timeit.repeat(lambda: IsSumInRangeSuperOptimized(42, 42), repeat=50, number=100000)),))

Run with Python 3.5.2, this produces the output:

IsSumInRangeWithVar
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD
              7 STORE_FAST               2 (s)

  3          10 LOAD_FAST                2 (s)
             13 LOAD_CONST               1 (1000)
             16 COMPARE_OP               4 (>)
             19 POP_JUMP_IF_TRUE        34
             22 LOAD_FAST                2 (s)
             25 LOAD_CONST               4 (-1000)
             28 COMPARE_OP               0 (<)
             31 POP_JUMP_IF_FALSE       38

  4     >>   34 LOAD_CONST               2 (False)
             37 RETURN_VALUE

  6     >>   38 LOAD_CONST               3 (True)
             41 RETURN_VALUE
             42 LOAD_CONST               0 (None)
             45 RETURN_VALUE

IsSumInRangeWithoutVar
  9           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_ADD
              7 LOAD_CONST               1 (1000)
             10 COMPARE_OP               4 (>)
             13 POP_JUMP_IF_TRUE        32
             16 LOAD_FAST                0 (a)
             19 LOAD_FAST                1 (b)
             22 BINARY_ADD
             23 LOAD_CONST               4 (-1000)
             26 COMPARE_OP               0 (<)
             29 POP_JUMP_IF_FALSE       36

 10     >>   32 LOAD_CONST               2 (False)
             35 RETURN_VALUE

 12     >>   36 LOAD_CONST               3 (True)
             39 RETURN_VALUE
             40 LOAD_CONST               0 (None)
             43 RETURN_VALUE

IsSumInRangeSuperOptimized
 15           0 LOAD_GLOBAL              0 (abs)
              3 LOAD_FAST                0 (a)
              6 LOAD_FAST                1 (b)
              9 BINARY_ADD
             10 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             13 LOAD_CONST               1 (1000)
             16 COMPARE_OP               1 (<=)
             19 RETURN_VALUE

Benchmarking
IsSumInRangeWithVar: 0.019361s
IsSumInRangeWithoutVar: 0.020917s
IsSumInRangeSuperOptimized: 0.020171s

Disassembly in Python isn't terribly interesting, since the bytecode "compiler" doesn't do much in the way of optimization.

The performance of the three functions is nearly identical. We might be tempted to go with IsSumInRangeWithVar() due to it's marginal speed gain. Though I'll add as I was trying different parameters to timeit, sometimes IsSumInRangeSuperOptimized() came out fastest, so I suspect it may be external factors responsible for the difference, rather than any intrinsic advantage of any implementation.

If this is really performance critical code, an interpreted language is simply a very poor choice. Running the same program with pypy, I get:

IsSumInRangeWithVar: 0.000180s
IsSumInRangeWithoutVar: 0.001175s
IsSumInRangeSuperOptimized: 0.001306s

Just using pypy, which uses JIT compilation to eliminate a lot of the interpreter overhead, has yielded a performance improvement of 1 or 2 orders of magnitude. I was quite shocked to see IsSumInRangeWithVar() is an order of magnitude faster than the others. So I changed the order of the benchmarks and ran again:

IsSumInRangeSuperOptimized: 0.000191s
IsSumInRangeWithoutVar: 0.001174s
IsSumInRangeWithVar: 0.001265s

So it seems it's not actually anything about the implementation that makes it fast, but rather the order in which I do the benchmarking!

I'd love to dig in to this more deeply, because honestly I don't know why this happens. But I believe the point has been made: micro-optimizations like whether to declare an intermediate value as a variable or not are rarely relevant. With an interpreted language or highly optimized compiler, the first objective is still to write clear code.

If further optimization might be required, benchmark. Remember that the best optimizations come not from the little details but the bigger algorithmic picture: pypy is going to be an order of magnitude faster for repeated evaluation of the same function than cpython because it uses faster algorithms (JIT compiler vs interpretation) to evaluate the program. And there's the coded algorithm to consider as well: a search through a B-tree will be faster than a linked list.

After ensuring you're using the right tools and algorithms for the job, be prepared to dive deep into the details of the system. The results can be very surprising, even for experienced developers, and this is why you must have a benchmark to quantify the changes.

22
  • 6
    To provide an example in C#: SharpLab produces identical asm for both methods (Desktop CLR v4.7.3130.00 (clr.dll) on x86) Commented Sep 6, 2018 at 12:53
  • 2
    @VisualMelon funilly enough the positive check: "return (((a + b) >= -1000) && ((a+b) <= 1000)); " gives a different result. :sharplab.io/…
    – Pieter B
    Commented Sep 6, 2018 at 13:57
  • 12
    Readability can potentially make a program easier to optimize too. The compiler can easily rewrite to use equivalent logic like it is above, only if it can actually figure out what you're trying to do. If you use a lot of old-school bithacks, cast back and forth between ints and pointers, reuse mutable storage etc. it may be much harder for the compiler to prove that a transformation is equivalent, and it'll just leave what you wrote, which may be suboptimal. Commented Sep 6, 2018 at 14:12
  • 1
    @Corey see edit.
    – Phil Frost
    Commented Sep 6, 2018 at 14:56
  • 2
    @Corey: this answer is actually telling you exactly what I wrote in my answer: there is no difference when you use a decent compiler, and instead focus on readibilty. Of course, it looks better founded - maybe you believe me now.
    – Doc Brown
    Commented Sep 7, 2018 at 5:39
66

To answer the stated question:

When to optimize for memory vs performance speed for a method?

There are two things you have to establish:

  • What is limiting your application?
  • Where can I reclaim the most of that resource?

In order to answer the first question, you have to know what the performance requirements for your application are. If there are no performance requirements then there is no reason to optimize one way or the other. The performance requirements help you to get to the place of "good enough".

The method you provided on its own wouldn't cause any performance issues on its own, but perhaps within a loop and processing a large amount of data, you have to start thinking a little differently about how you are approaching the problem.

Detecting what is limiting the application

Start looking at the behavior of your application with a performance monitor. Keep an eye on CPU, disk, network, and memory usage while it's running. One or more items will be maxed out while everything else is moderately used--unless you hit the perfect balance, but that almost never happens).

When you need to look deeper, typically you would use a profiler. There are memory profilers and process profilers, and they measure different things. The act of profiling does have a significant performance impact, but you are instrumenting your code to find out what's wrong.

Let's say you see your CPU and disk usage peaked. You would first check for "hot spots" or code that either is called more often than the rest or takes a significantly longer percentage of the processing.

If you can't find any hot spots, you would then start looking at memory. Perhaps you are creating more objects than necessary and your garbage collection is working overtime.

Reclaiming performance

Think critically. The following list of changes is in order of how much return on investment you'll get:

  • Architecture: look for communication choke points
  • Algorithm: the way you process data might need to change
  • Hot spots: minimizing how often you call the hot spot can yield a big bonus
  • Micro optimizations: it's not common, but sometimes you really do need to think of minor tweaks (like the example you provided), particularly if it is a hot spot in your code.

In situations like this, you have to apply the scientific method. Come up with a hypothesis, make the changes, and test it. If you meet your performance goals, you're done. If not, go to the next thing in the list.


Answering the question in bold:

When is it appropriate to use Method A vs. Method B, and vice versa?

Honestly, this is the last step in trying to deal with performance or memory problems. The impact of Method A vs. Method B will be really different depending on the language and platform (in some cases).

Just about any compiled language with a halfway decent optimizer will generate similar code with either of those structures. However those assumptions don't necessarily remain true in proprietary and toy languages that don't have an optimizer.

Precisely which will have a better impact depends on whether sum is a stack variable or a heap variable. This is a language implementation choice. In C, C++ and Java for example, number primitives like an int are stack variables by default. Your code has no more memory impact by assigning to a stack variable than you would have with fully inlined code.

Other optimizations that you might find in C libraries (particularly older ones) where you can have to decide between copying a 2 dimensional array down first or across first is a platform dependent optimization. It requires some knowledge of how the chipset you are targeting best optimizes memory access. There are subtle differences between architectures.

Bottom line is that optimization is a combination of art and science. It requires some critical thinking, as well as a degree of flexibility in how you approach the problem. Look for big things before you blame small things.

8
  • 2
    This answer focuses on my question the most and doesn't get caught up on my coding examples, i.e. Method A and Method B.
    – Corey P
    Commented Sep 4, 2018 at 23:05
  • 18
    I feel like this is the generic answer to "How do you address performance bottlenecks" but you would be hard pressed to identify relative memory usage from a particular function based on whether it had 4 or 5 variables using this method. I also question how relevant this level of optimization is when the compiler (or interpreter) may or may not optimize this away.
    – Eric
    Commented Sep 5, 2018 at 0:24
  • @Eric, as I mentioned, the last category of performance improvement would be your micro-optimizations. The only way to have a good guess if it will have any impact is by measuring performance/memory in a profiler. It is rare that those types of improvements have payoff, but in timing sensitive performance problems you have in simulators a couple well placed changes like that can be the difference between hitting your timing target and not. I think I can count on one hand the number of times that paid off in over 20 years of working on software, but it's not zero. Commented Sep 5, 2018 at 0:41
  • @BerinLoritsch Again, in general I agree with you, but in this specific case I do not. I've provided my own answer, but I've not personally seen any tools that will flag or even give you ways to potentially identify performance issues related to stack memory size of a function.
    – Eric
    Commented Sep 5, 2018 at 1:09
  • @DocBrown, I have remedied that. Regarding the second question, I pretty much agree with you. Commented Sep 5, 2018 at 12:50
45

"this would reduce memory" - em, no. Even if this would be true (which, for any decent compiler is not), the difference would most probably be negligible for any real world situation.

However, I would recommend to use method A* (method A with a slight change):

private bool IsSumInRange(int a, int b)
{
    int sum = a + b;

    if (sum > 1000 || sum < -1000) return false;
    else return true;
    // (yes, the former statement could be cleaned up to
    // return abs(sum)<=1000;
    // but let's ignore this for a moment)
}

but for two completely different reasons:

  • by giving the variable s an explaining name, the code becomes clearer

  • it avoids to have the same summation logic twice in code, so the code becomes more DRY, which means less error prone to changes.

10
  • 36
    I would clean it up even further and go with "return sum > -1000 && sum < 1000;".
    – 17 of 26
    Commented Sep 4, 2018 at 18:28
  • 36
    @Corey any decent optimizer will use a CPU register for the sum variable, thus leading to zero memory usage. And even if not, this is only a single word of memory in a “leaf” method. Considering how incredibly memory-wasteful Java or C# can be otherwise due to their GC and object model, a local int variable literally does not use any noticeable memory. This is pointless micro-optimization.
    – amon
    Commented Sep 4, 2018 at 18:42
  • 10
    @Corey: if it is "a little more complex", it probably won't become "a noticeable memory usage". Maybe if you construct a really more complex example, but that makes it a different question. Note also, just because you don't create a specific variable for an expression, for complex intermediate results, the run time environment may still internally create temporary objects, so it completely depends on the details of the language, enviroment, optimization level, and whatever you call "noticeable".
    – Doc Brown
    Commented Sep 4, 2018 at 19:33
  • 8
    In addition to the points above, I'm pretty sure how C# / Java chooses to store sum would be an implementation detail and I doubt anyone could make a convincing case as to whether or not a silly trick like avoiding one local int would lead to this or that amount of memory usage in the long term. IMO readability is more important. Readability can be subjective, but FWIW, personally I'd rather you never do the same computation twice, not for CPU usage, but because I only have to inspect your addition once when I'm looking for a bug.
    – jrh
    Commented Sep 4, 2018 at 22:12
  • 2
    ... also note that garbage collected languages in general are an unpredictable, "churning sea of memory" that (for C# anyway) might only be cleaned up when needed, I remember making a program that allocated gigabytes of RAM and it only started "cleaning up" after itself when memory became scarce. If the GC doesn't need to run, it might take its sweet time and save your CPU for more pressing matters.
    – jrh
    Commented Sep 4, 2018 at 22:13
33

You can do better than both of those with

return (abs(a + b) > 1000);

Most processors (and hence compilers) can do abs() in a single operation. You not only have fewer sums, but also fewer comparisons, which are generally more computationally expensive. It also removes the branching, which is much worse on most processors because it stops pipelining being possible.

The interviewer, as other answers have said, is plant life and has no business conducting a technical interview.

That said, his question is valid. And the answer to when you optimise and how, is when you've proved it's necessary, and you've profiled it to prove exactly which parts need it. Knuth famously said that premature optimisation is the root of all evil, because it's too easy to try to gold-plate unimportant sections, or make changes (like your interviewer's) which have no effect, whilst missing the places which really do need it. Until you've got hard proof it's really necessary, clarity of code is the more important target.

Edit FabioTurati correctly points out that this is the opposite logic sense to the original, (my mistake!), and that this illustrates a further impact from Knuth's quote where we risk breaking the code while we're trying to optimise it.

10
  • 2
    @Corey, I am quite sure the Graham pins the request "he challenged me to solve the same problem with less variables" as expected. If I'd be the interviewer, I'd expect that answer, not moving a+b into if and doing it twice. You understand it wrong "He was pleased and said this would reduce memory usage by this method" - he was nice to you, hiding his disappointment with this meaningless explanation about memory. You shouldn't be taking it serious to ask question here. Did you get a job? My guess you didn't :-(
    – Sinatr
    Commented Sep 5, 2018 at 13:30
  • 1
    You are applying 2 transformations at the same time: you have turned the 2 conditions into 1, using abs(), and you also have a single return, instead of having one when the condition is true ("if branch") and another one when it's false ("else branch"). When you change code like this, be careful: there's the risk to inadvertently write a function that returns true when it should return false, and vice versa. Which is exactly what happened here. I know you were focusing on another thing, and you've done a nice job at it. Still, this could have easily cost you the job... Commented Sep 5, 2018 at 20:49
  • 2
    @FabioTurati Well spotted - thanks! I'll update the answer. And it's a good point about refactoring and optimisation, which makes Knuth's quote even more relevant. We should prove we need the optimisation before we take the risk.
    – Graham
    Commented Sep 6, 2018 at 1:24
  • 2
    Most processors (and hence compilers) can do abs() in a single operation. Unfortunately not the case for integers. ARM64 has a conditional negate it can use if flags are already set from an adds, and ARM has predicated reverse-sub (rsblt = reverse-sub if less-tha) but everything else requires multiple extra instructions to implement abs(a+b) or abs(a). godbolt.org/z/Ok_Con shows x86, ARM, AArch64, PowerPC, MIPS, and RISC-V asm output. It's only by transforming the comparison into a range-check (unsigned)(a+b+999) <= 1998U that gcc can optimize it like in Phil's answer. Commented Sep 6, 2018 at 21:15
  • 2
    The "improved" code in this answer is still wrong, since it produces a different answer for IsSumInRange(INT_MIN, 0). The original code returns false because INT_MIN+0 > 1000 || INT_MIN+0 < -1000; but the "new and improved" code returns true because abs(INT_MIN+0) < 1000. (Or, in some languages, it'll throw an exception or have undefined behavior. Check your local listings.) Commented Sep 7, 2018 at 3:36
16

When is it appropriate to use Method A vs. Method B, and vice versa?

Hardware is cheap; programmers are expensive. So the cost of the time you two wasted on this question is probably far worse than either answer.

Regardless, most modern compilers would find a way to optimize the local variable into a register (instead of allocating stack space), so the methods are probably identical in terms of executable code. For this reason, most developers would pick the option that communicates the intention most clearly (see Writing really obvious code (ROC)). In my opinion, that would be Method A.

On the other hand, if this is purely an academic exercise, you can have the best of both worlds with Method C:

private bool IsSumInRange(int a, int b)
{
    a += b;
    return (a >= -1000 && a <= 1000);
}
12
  • 17
    a+=b is a neat trick but I have to mention (just in case it isn't implied from the rest of the answer), from my experience methods that mess with parameters can be very hard to debug and maintain.
    – jrh
    Commented Sep 4, 2018 at 22:21
  • 1
    I agree @jrh. I am a strong advocate for ROC, and that sort of thing is anything but.
    – John Wu
    Commented Sep 4, 2018 at 22:40
  • 3
    "Hardware is cheap; programmers are expensive." In the world of consumer electronics, that statement is false. If you sell millions of units, then it is a very good investment to spend $500.000 in additional development cost to save $0,10 on the hardware costs per unit. Commented Sep 5, 2018 at 10:34
  • 2
    @JohnWu: You simplified out the if check, but forgot to reverse the result of the comparison; your function is now returning true when a + b is not in the range. Either add a ! to the outside of the condition (return !(a > 1000 || a < -1000)), or distribute the !, inverting tests, to get return a <= 1000 && a >= -1000; Or to make the range check flow nicely, return -1000 <= a && a <= 1000; Commented Sep 5, 2018 at 19:03
  • 1
    @JohnWu: Still slightly off at the edge cases, distributed logic requires <=/>=, not </> (with </>, 1000 and -1000 are treated as being out of range, original code treated them as in range). Commented Sep 5, 2018 at 19:30
11

I would optimize for readability. Method X:

private bool IsSumInRange(int number1, int number2)
{
    return IsValueInRange(number1+number2, -1000, 1000);
}

private bool IsValueInRange(int Value, int Lowerbound, int Upperbound)
{
    return  (Value >= Lowerbound && Value <= Upperbound);
}

Small methods that do just 1 thing but are easy to reason about.

(This is personal preference, I like positive testing instead of negative, your original code is actually testing whether the value is NOT outside the range.)

8
  • 6
    This. (Upvoted comments above that were similar re: readability). 30 years ago, when we were working with machines that had less than 1mb of RAM, squeezing performance was necessary - just like the y2k problem, get a few hundred thousand records that each have a few bytes of memory being wasted due to unused vars and references, etc and it adds up quick when you only have 256k of RAM. Now that we are dealing with machines that have multiple gigabytes of RAM, saving even a few MB of RAM use vs readability and maintainability of code isn't a good trade.
    – ivanivan
    Commented Sep 5, 2018 at 13:19
  • @ivanivan: I don't think the "y2k problem" was really about memory. From a data-entry standpoint, entering two digits is more efficient than entering four, and keeping things as entered is easier than converting them to some other form.
    – supercat
    Commented Sep 5, 2018 at 16:13
  • 10
    Now you have to trace through 2 functions to see what's happening. You can't take it at face value, because you can't tell from the name whether these are inclusive or exclusive bounds. And if you add that information, the name of the function is longer than the code to express it.
    – Peter
    Commented Sep 5, 2018 at 16:27
  • 1
    Optimise readability and make small, easy-to-reason functions – sure, agree. But I strongly disagree that renaming a and b to number1 and number2 aids readability in any way. Also your naming of the functions is inconsistent: why does IsSumInRange hard-code the range if IsValueInRange accept it as arguments? Commented Sep 5, 2018 at 19:30
  • The 1st function can overflow. (Like other answers' code.) Although the complexity of the overflow-safe code is an argument for putting it into a function.
    – philipxy
    Commented Sep 5, 2018 at 23:36
6

In short, I don't think the question has much relevance in current computing, but from a historical perspective it's an interesting thought exercise.

Your interviewer is likely a fan of the Mythical Man Month. In the book, Fred Brooks makes the case that programmers will generally need two versions of key functions in their toolbox: a memory-optimized version and a cpu-optimized version. Fred based this on his experience leading the development of the IBM System/360 operating system where machines may have as little as 8 kilobytes of RAM. In such machines, memory required for local variables in functions could potentially be important, especially if the compiler did not effectively optimize them away (or if code was written in assembly language directly).

In the current era, I think you would be hard pressed to find a system where the presence or absence of a local variable in a method would make noticeable difference. For a variable to matter, the method would need to be recursive with deep recursion expected. Even then, it's likely that the stack depth would be exceeded causing Stack Overflow exceptions before the variable itself caused an issue. The only real scenario where it may be an issue is with very large, arrays allocated on the stack in a recursive method. But that is also unlikely as I think most developers would think twice about unnecessary copies of large arrays.

4

After the assignment s = a + b; the variables a and b are not used anymore. Therefore, no memory is used for s if you are not using a completely brain-damaged compiler; memory that was used anyway for a and b is re-used.

But optimising this function is utter nonsense. If you could save space, it would be maybe 8 bytes while the function is running (which is recovered when the function returns), so absolutely pointless. If you could save time, it would be single numbers of nanoseconds. Optimising this is a total waste of time.

3

Local value type variables are allocated on the stack or (more likely for such small pieces of code) use registers in the processor and never get to see any RAM. Either way they are short lived and nothing to worry about. You start considering memory use when you need to buffer or queue data elements in collections that are both potentially large and long lived.

Then it depends what you care about most for your application. Processing speed? Response time? Memory footprint? Maintainability? Consistency in design? All up to you.

2
  • 4
    Nitpicking: .NET at least (language of the post is unspecified) doesn't make any guarantees about local variables being allocated "on the stack". See "the stack is an implementation detail" by Eric Lippert.
    – jrh
    Commented Sep 4, 2018 at 22:19
  • 1
    @jrh Local variables on stack or heap may be an implementation detail, but if someone really wanted a variable on the stack there's stackalloc and now Span<T>. Possibly useful in a hot spot, after profiling. Also, some of the docs around structs imply that value types may be on the stack while reference types will not be. Anyway, at best you might avoid a bit of GC.
    – Bob
    Commented Sep 6, 2018 at 2:27
2

As other answers have said, you need to think what you're optimising for.

In this example, I suspect that any decent compiler would generate equivalent code for both methods, so the decision would have no effect on the run time or memory!

What it does affect is the readability of the code.  (Code is for humans to read, not just computers.)  There's not too much difference between the two examples; when all other things are equal, I consider brevity to be a virtue, so I'd probably pick Method B.  But all other things are rarely equal, and in a more complex real-world case, it could have a big effect.

Things to consider:

  • Does the intermediate expression have any side-effects?  If it calls any impure functions or updates any variables, then of course duplicating it would be a matter of correctness, not just style.
  • How complex is the intermediate expression?  If it does lots of calculations and/or calls functions, then the compiler may not be able to optimise it, and so this would affect performance.  (Though, as Knuth said, “We should forget about small efficiencies, say about 97% of the time”.)
  • Does the intermediate variable have any meaning?  Could it be given a name that helps to explain what's going on?  A short but informative name could explain the code better, while a meaningless one is just visual noise.
  • How long is the intermediate expression?  If long, then duplicating it could make the code longer and harder to read (especially if it forces a line break); if not, the duplication could be shorter over all.
1

As many of the answers have pointed out, attempting to tune this function with modern compilers won't make any difference. An optimizer can most likely figure out the best solution (up-vote to the answer that showed the assembler code to prove it!). You stated that the code in the interview was not exactly the code you were asked to compare, so perhaps the actual example makes a bit more sense.

But let's take another look at this question: this is an interview question. So the real issue is, how should you answer it assuming that you want to try and get the job?

Let's also assume that the interviewer does know what they are talking about and they are just trying to see what you know.

I would mention that, ignoring the optimizer, the first may create a temporary variable on the stack whereas the second wouldn't, but would perform the calculation twice. Therefore, the first uses more memory but is faster.

You could mention that anyway, a calculation may require a temporary variable to store the result (so that it an be compared), so whether you name that variable or not might not make any difference.

I would then mention that in reality the code would be optimized and most likely equivalent machine code would be generated since all the variables are local. However, it does depend on what compiler you are using (it was not that long ago that I could get a useful performance improvement by declaring a local variable as "final" in Java).

You could mention that the stack in any case lives in its own memory page, so unless your extra variable caused the stack to overflow the page, it won't in reality allocate any more memory. If it does overflow it will want a whole new page though.

I would mention that a more realistic example might be the choice of whether to use a cache to hold the results of many computations or not and this would raise a question of cpu vs memory.

All this demonstrates that you know what you are talking about.

I would leave it to the end to say that it would be better to focus on readabilty instead. Although true in this case, in the interview context it may be interpretted as "I don't know about performance but my code reads like a Janet and John story".

What you should not do is trot out the usual bland statements about how code optimization is not necessary, don't optimize until you have profiled the code (this just indicates you can't see bad code for yourself), hardware costs less than programmers, and please, please, don't quote Knuth "premature blah blah ...".

Code performance is a genuine issue in a great many organisations and many organisations need programmers who understand it.

In particular, with organisations such as Amazon, some of the code has huge leverage. A code snippet may be deployed on thousand of servers or millions of devices and may be called billions of times a day every day of the year. There may be thousands of similar snippets. The difference between a bad algorithm and a good one can easily be a factor of a thousand. Do the numbers and multiple all this up: it makes a difference. The potential cost to the organisation of non-performing code can be very significant or even fatal if a system runs out of capacity.

Furthmore, many of these organisations work in a competetive environment. So you cannot just tell your customers to buy a bigger computer if your competitor's software already works ok on the hardware that they have or if the software runs on a mobile handset and it can't be upgraded. Some applications are particularly performance critical (games and mobile apps come to mind) and may live or die according to their responsiveness or speed.

I have personally over two decades worked on many projects where systems have failed or been unusable due to performance issues and I have been called in the optimize those systems and in all cases it has been due to bad code written by programmers who didn't understand the impact of what they were writing. Furthmore, it is never one piece of code, it is always everywhere. When I turn up, it is way to late to start thinking about performance: the damage has been done.

Understanding code performance is a good skill to have in the same way as understanding code correctness and code style. It comes out of practice. Performance failures can be as bad as functional failures. If the system doesn't work, it doesn't work. Doesn't matter why. Similarly, performance and features that are never used are both bad.

So, if the interviewer asks you about performance I would recommend to try and demonstrate as much knowledge as possible. If the question seems a bad one, politely point out why you think it would not be an issue in that case. Don't quote Knuth.

0

You should first optimize for correctness.

Your function fails for input values that are close to Int.MaxValue:

int a = int.MaxValue - 200;
int b = int.MaxValue - 200;
bool inRange = test.IsSumInRangeA(a, b);

This returns true because the sum overflows to -400. The function also doesn't work for a = int.MinValue + 200. (incorrectly adds up to "400")

We won't know what the interviewer was looking for unless he or she chimes in, but "overflow is real".

In an interview situation, ask questions to clarify the scope of the problem: What is are the allowed maximum and minimum input values? Once you have those, you can throw an exception if the caller submits values outside of the range. Or (in C#), you can use a checked {} section, which would throw an exception on overflow. Yes, it's more work and complicated, but sometimes that's what it takes.

3
  • The methods were only examples. They were not written to be correct, but to illustrate the actual question. Thanks for the input though!
    – Corey P
    Commented Sep 6, 2018 at 17:14
  • I think the interview question is directed at performance, so you need to answer the intent of the question. The interviewer is not asking about behaviour at the limits. But interesting side point anyway.
    – rghome
    Commented Sep 7, 2018 at 12:15
  • 1
    @Corey Good interviewers as questions to 1) assess the candidate ability concerning the issue, as suggested by rghome here yet also 2) as a opening into the larger issues (like the unspoken functional correctness) and depth of related knowledge - this is more so in later career interviews - good luck. Commented Sep 7, 2018 at 16:58
0

Your question should have been: "Do I need to optimize this at all?".

Version A and B differ in one important detail that makes A preferrable, but it is unrelated to optimization: You do not repeat code.

The actual "optimization" is called common subexpression elimination, which is what pretty much every compiler does. Some do this basic optimization even when optimizations are turned off. So that isn't truly an optimization (the generated code will almost certainly be exactly the same in every case).

But if it isn't an optimization, then why is it preferrable? Alright, you don't repeat code, who cares!

Well first of all, you do not have the risk of accidentially getting half of the conditional clause wrong. But more importantly, someone reading this code can grok immediately what you're trying to do, instead of a if((((wtf||is||this||longexpression)))) experience. What the reader gets to see is if(one || theother), which is a good thing. Not rarely, I happens that you are that other person reading your own code three years later and thinking "WTF does this mean?". In that case it's always helpful if your code immediately communicates what the intent was. With a common subexpression being named properly, that's the case.
Also, if at any time in the future, you decide that e.g. you need to change a+b to a-b, you have to change one location, not two. And there's no risk of (again) getting the second one wrong by accident.

About your actual question, what you should optimize for, first of all your code should be correct. This is the absolutely most important thing. Code that isn't correct is bad code, even moreso if despite being incorrect it "works fine", or at least it looks like it works fine. After that, code should be readable (readable by someone unfamiliar with it).
As for optimizing... one certainly shouldn't deliberately write anti-optimized code, and certainly I'm not saying you shouldn't spend a thought on the design before you start out (such as choosing the right algorithm for the problem, not the least efficient one).

But for most applications, most of the time, the performance that you get after running correct, readable code using a reasonable algorithm through an optimizing compiler is just fine, there's no real need to worry.

If that isn't the case, i.e. if the application's performance indeed doesn't meet the requirements, and only then, you should worry about doing such local optimizations as the one you attempted. Preferrably, though, you would reconsider the top-level algorithm. If you call a function 500 times instead of 50,000 times because of a better algorithm, this has larger impact than saving three clock cycles on a micro-optimization. If you don't stall for several hundred cycles on a random memory access all the time, this has a larger impact than doing a few cheap calculations extra, etc etc.

Optimization is a difficult matter (you can write entire books about that and get to no end), and spending time on blindly optimizting some particular spot (without even knowing whether that's the bottleneck at all!) is usually wasted time. Without profiling, optimization is very hard to get right.

But as a rule of thumb, when you're flying blind and just need/want to do something, or as a general default strategy, I would suggest to optimize for "memory".
Optimizing for "memory" (in particular spatial locality and access patterns) usually yields a benefit because unlike once upon a time when everything was "kinda the same", nowadays accessing RAM is among the most expensive things (short of reading from disk!) that you can in principle do. Whereas ALU, on the other hand, is cheap and getting faster every week. Memory bandwidth and latency doesn't improve nearly as fast. Good locality and good access patterns can easily make a 5x difference (20x in extreme, contrieved examples) in runtime compared to bad access patterns in data-heavy applications. Be nice to your caches, and you will be a happy person.

To put the previous paragraph into perspective, consider what the different things that you can do cost you. Executing something like a+b takes (if not optimized out) one or two cycles, but the CPU can usually start several instructions per cycle, and can pipeline non-dependent instructions so more realistically it only costs you something around half a cycle or less. Ideally, if the compiler is good at scheduling, and depending on the situation, it might cost zero.
Fetching data ("memory") costs you either 4-5 cycles if you're lucky and it's in L1, and around 15 cycles if you are not so lucky (L2 hit). If the data isn't in the cache at all, it takes several hundred cycles. If your haphazard access pattern exceeds the TLB's capabilities (easy to do with only ~50 entries), add another few hundred cycles. If your haphazard access pattern actually causes a page fault, it costs you a few ten thousand cycles in the best case, and several million in the worst.
Now think about it, what's the thing you want to avoid most urgently?

0

When to optimize for memory vs performance speed for a method?

After getting the functionality right first. Then selectivity concern oneself with micro optimizations.


As an interview question regarding optimizations, the code does provoke the usual discussion yet misses the higher level goal of Is the code functionally correct?

Both C++ and C and others regard int overflow as a problem from the a + b. It is not well defined and C calls it undefined behavior. It is not specified to "wrap" - even though that is the common behavior.

bool IsSumInRange(int a, int b) {
    int s = a + b;  // Overflow possible
    if (s > 1000 || s < -1000) return false;
    else return true;
}

Such a function called IsSumInRange() would be expected to be well defined and perform correctly for all int values of a,b. The raw a + b is not. A C solution could use:

#define N 1000
bool IsSumInRange_FullRange(int a, int b) {
  if (a >= 0) {
    if (b > INT_MAX - a) return false;
  } else {
    if (b < INT_MIN - a) return false;
  }
  int sum = a + b;
  if (sum > N || sum < -N) return false;
  else return true;
}

The above code could be optimized by using a wider integer type than int, if available, as below or distributing the sum > N, sum < -N tests within the if (a >= 0) logic. Yet such optimizations may not truly lead to "faster" emitted code given a smart compiler nor be worth the extra maintenance of being clever.

  long long sum a;
  sum += b;

Even using abs(sum) is prone to problems when sum == INT_MIN.

0

What kind of compilers are we talking about, and what sort of "memory"? Because in your example, assuming a reasonable optimizer, the expression a+b needs to generally be stored in a register (a form of memory) prior to doing such arithmetic.

So if we're talking about a dumb compiler that encounters a+b twice, it's going to allocate more registers (memory) in your second example, because your first example might just store that expression once in a single register mapped to the local variable, but we're talking about very silly compilers at this point... unless you're working with another type of silly compiler that stack spills every single variable all over the place, in which case maybe the first one would cause it more grief to optimize than the second*.

I still want to scratch that and think the second one is likely to use more memory with a dumb compiler even if it's prone to stack spills, because it might end up allocating three registers for a+b and spill a and b more. If we're talking most primitive optimizer then capturing a+b to s will probably "help" it use less registers/stack spills.

This is all extremely speculative in rather silly ways absent measurements/disassembly and even in the worst-case scenarios, this is not a "memory vs. performance" case (because even among the worst optimizers I can think of, we're not talking about anything but temporary memory like stack/register), it's purely a "performance" case at best, and among any reasonable optimizer the two are equivalent, and if one is not using a reasonable optimizer, why obsesses about optimization so microscopic in nature and especially absent measurements? That's like instruction selection/register allocation assembly-level focus which I would never expect anyone looking to stay productive to have when using, say, an interpreter that stack spills everything.

When to optimize for memory vs performance speed for a method?

As for this question if I can tackle it more broadly, often I don't find the two diametrically opposed. Especially if your access patterns are sequential, and given the speed of the CPU cache, often a reduction in the amount of bytes processed sequentially for non-trivial inputs translates (up to a point) to plowing through that data faster. Of course there are breaking points where if the data is much, much smaller in exchange for way, way more instructions, it might be faster to process sequentially in larger form in exchange for fewer instructions.

But I've found many devs tend to underestimate how much a reduction in memory use in these types of cases can translate to proportional reductions in time spent processing. It's very humanly intuitive to translate performance costs to instructions rather than memory access to the point of reaching for big LUTs in some vain attempt to speed up some small computations, only to find performance degraded with the additional memory access.

For sequential access cases through some huge array (not talking local scalar variables like in your example), I go by the rule that less memory to sequentially plow through translates to greater performance, especially when the resulting code is simpler than otherwise, until it doesn't, until my measurements and profiler tell me otherwise, and it matters, in the same way I assume sequentially reading a smaller binary file on disk would be faster to plow through than a bigger one (even if the smaller one requires some more instructions), until that assumption is shown to no longer apply in my measurements.

Not the answer you're looking for? Browse other questions tagged or ask your own question.