123

I know that in the architectures I'm personally familiar with (x86, 6502, etc), the stack typically grows downwards (i.e. every item pushed onto the stack results in a decremented SP, not an incremented one).

I'm wondering about the historical rationale for this. I know that in a unified address space, it's convenient to start the stack on the opposite end of the data segment (say) so there's only a problem if the two sides collide in the middle. But why does the stack traditionally get the top part? Especially given how this is the opposite of the "conceptual" model?

(And note that in the 6502 architecture, the stack also grows downwards, even though it is bounded to a single 256-byte page, and this direction choice seems arbitrary.)

11 Answers 11

65

As to the historic rationale, I can't say for certain (because I didn't design them). My thoughts on the matter are that early CPUs got their original program counter set to 0 and it was a natural desire to start the stack at the other end and grow downwards, since their code naturally grows upward.

As an aside, note that this setting of the program counter to 0 on reset is not the case for all early CPUs. For example, the Motorola 6809 would fetch the program counter from addresses 0xfffe/f so you could start running at an arbitrary location, depending on what was supplied at that address (usually, but by no means limited to, ROM).

One of the first things some historical systems would do would be to scan memory from the top until it found a location that would read back the same value written, so that it would know the actual RAM installed (e.g., a z80 with 64K address space didn't necessarily have 64K or RAM, in fact 64K would have been massive in my early days). Once it found the top actual address, it would set the stack pointer appropriately and could then start calling subroutines. This scanning would generally be done by the CPU running code in ROM as part of start-up.

With regard to the stacks growth, not all of them grow downwards, see this answer for details.

7
  • 1
    I like the Z80 RAM detection strategy story. Makes some sense that text segments are laid out growing upwards-- programmers of yore had somewhat more direct contact with dealing with the implications of that than the stack.Thanks paxdiablo. The pointer to the set of alternative forms of stack implementations is also super interesting.
    – Ben Zotto
    Commented Jan 10, 2010 at 19:26
  • Doesn't early-day memory have a way to notify its size and we have to calculate it manually?
    – phuclv
    Commented Sep 28, 2015 at 15:46
  • 1
    @LưuVĩnhPhúc, I have to assume you're a generation (or two) behind me. I still remember the TRS-80 model 3 method for getting the date and time being to ask the user for it at boot time. Having a memory scanner to set the upper limit of memory was considered state of the art back in the day :-) Can you imagine what would happen if Windows asked the time, or how much memory you had, every time you booted?
    – paxdiablo
    Commented Sep 29, 2015 at 2:49
  • 1
    Indeed, the Zilog Z80 documentation says the part starts up by setting the PC register to 0000h and executing. It sets the interrupt mode to 0, disables interrupts, and sets the I and R registers to 0, too. After that, it starts executing. At 0000h, it starts running code. THAT code has to initialize the stack pointer before it can call a subroutine or enable interrupts. What vendor sells a Z80 that behaves the way you describe?
    – MikeB
    Commented Nov 14, 2015 at 15:36
  • 1
    Mike, sorry, I should have been clearer. When I said the CPU scanned memory, I didn't mean that was a feature of the CPU itself. It was actually controlled from a program in ROM. I'll clarify.
    – paxdiablo
    Commented Nov 14, 2015 at 19:40
25

One good explanation I heard was that some machines in the past could only have unsigned offsets, so you'd want to the stack to grow downward so you could hit your locals without having to lose the extra instruction to fake a negative offset.

21

Stanley Mazor (4004 and 8080 architect) explains how stack growth direction was chosen for 8080 (and eventually for 8086) in "Intel Microprocessors: 8008 to 8086":

The stack pointer was chosen to run "downhill" (with the stack advancing toward lower memory) to simplify indexing into the stack from the user's program (positive indexing) and to simplify displaying the contents of the stack from a front panel.

2
9

One possible reason might be that it simplifies alignment. If you place a local variable on the stack which must be placed on a 4-byte boundary, you can simply subtract the size of the object from the stack pointer, and then zero out the two lower bits to get a properly aligned address. If the stack grows upwards, ensuring alignment becomes a bit trickier.

3
  • 2
    Computers don't subtract; they add in 2's compliment. Anything that is done by subtracting is really done by adding. Consider, computers have adders, not subtracters.
    – jww
    Commented Feb 12, 2018 at 8:01
  • 4
    @jww - that's a distinction without a difference. I might has well claim computer's don't add they only subtract! For the purposes of this answer, it doesn't really matter - but most ALUs will use a circuit that supports both addition and subtraction with the same performance. That is, while A - B could conceptually be implemented as A + (-B) (i.e., a separate negation step for B), it isn't in practice.
    – BeeOnRope
    Commented May 24, 2018 at 7:18
  • 2
    @jww Your nitpick is wrong for early computers - it took some time for two's complement to win, and until it did, there were computers using one's complement and sign-and-magnitude and maybe other things instead. With those implementations, there may well have been an advantage to adding versus subtracting. So in the absence of additional information, it is wrong to rule this out as a possible factor influencing addressing scheme choices like stack direction.
    – mtraceur
    Commented Jun 5, 2020 at 21:25
6

IIRC the stack grows downwards because the heap grows upwards. It could have been the other way around.

3
  • 9
    An upward-growing heap allows efficient realloc in some cases, but a downward-growing heap pretty much never does. Commented Dec 9, 2016 at 5:28
  • @PeterCordes why?
    – Yashas
    Commented May 24, 2018 at 2:30
  • 4
    @Yashas: because realloc(3) needs more space after an object to just extend the mapping without copying. Repeated realloc of the same object is possible when it's followed by an arbitrary amount of unused space. Commented May 24, 2018 at 3:41
6

Because then a POP uses the same addressing mode that is commonly used to scan through strings and arrays

An instruction that pops a value off of a stack needs to do two things: read the value out of memory, and adjust the stack pointer. There are four possible design choices for this operation:

  1. Preincrement (++*ptr) the stack pointer first, then read the value. This implies that the stack will grow "downwards" (towards lower memory addresses).

  2. Predecrement (--*ptr) the stack pointer first, then read the value. This implies that the stack will grow "upwards" (towards higher memory addresses).

  3. Read the value first, then postincrement (*ptr++) the stack pointer. This implies that the stack will grow downwards.

  4. Read the value first, then postdecrement (*ptr--) the stack pointer. This implies that the stack will grow upwards.


In many computer languages (particularly C), strings and arrays are passed to functions as pointers to their first element. A very common operation is to read the elements of the string or array in order, starting with the first element. Such an operation needs only the postincrement addressing mode described above.

Furthermore, reading the elements of a string or array is more common than writing the elements. Indeed, there are many standard library functions that perform no writing at all (e.g. strlen(), strchr(), strcmp())!


Therefore, if you have a limited number of addressing modes in your instruction set design, the most useful addressing mode would be a read that postincrements. This results in not only the most useful string and array operations, but also a POP instruction that grows the stack downward.

The second-most-useful addressing mode would then be a post-decrement write, which can be used for the matching PUSH instruction.

Indeed, the PDP-11 had postincrement and predecrement addressing modes, which produced a downward-growing stack. Even the VAX did not have preincrement or postdecrement.

3

I believe it's purely a design decision. Not all of them grow downward -- see this SO thread for some good discussion on the direction of stack growth on different architectures.

2

I'm not certain but I did some programming for the VAX/VMS back in the days. I seem to remember one part of memory (the heap??) going up and the stack going down. When the two met, then you were out of memory.

2
2

I believe the convention began with the IBM 704 and its infamous "decrement register". Modern speech would call it an offset field of the instruction, but the point is they went down, not up.

2

Just 2c more:

Beyond all the historic rationale mentioned, I'm quite certain there's no reason which is valid in modern processors. All processors can take signed offsets, and maximizing the heap/stack distance is rather moot ever since we started dealing with multiple threads.

I personally consider this a security design flaw. If, say, the designers of the x64 architecture would have reversed the stack growth direction, most stack buffer overflows would have been eliminated - which is kind of a big deal. (since strings grow upward).

0

One advantage of descending stack growth in a minimal embedded system is that a single chunk of RAM can be redundantly mapped into both page O and page 1, allowing zero page variables to be assigned starting at 0x000 and the stack growing downwards from 0x1FF, maximizing the amount it would have to grow before overwriting variables.

One of the original design goals of the 6502 was that it could be combined with, for example, a 6530, resulting in a two-chip microcontroller system with 1 KB of program ROM, timer, I/O, and 64 bytes of RAM shared between stack and page zero variables. By comparison, the minimal embedded system of that time based on an 8080 or 6800 would be four or five chips.

Not the answer you're looking for? Browse other questions tagged or ask your own question.