53

I'm studying the 65c816 assembly for the 1994 game, Super Metroid.

A hobbyist studied the game in-depth and created a RAM map. From it:

7E:0B56 - 7E:0B57    Moves Samus this distance horizontally, in subpixels.
7E:0B58 - 7E:0B59    Moves Samus this distance horizontally, in pixels.
7E:0B5A - 7E:0B5B    Moves Samus this distance vertically, in subpixels.
7E:0B5C - 7E:0B5D    Moves Samus this distance vertically, in pixels.

I'm new to assembly, but as I understand it, the game stores Samus's movement in RAM, to be picked up later by some code that performs the actual moving (the "function"). But why do things this way? Why not simply push the above "parameters" onto the stack and pop them from within the "function"? Is storing the parameters in static addresses somehow more efficient? Is it to save space on the stack? Is it for better code organization?

12
  • 7
    This feels like it would be a better fit on a programming SE. Even if this was a technique used on a game from the 90s, it probably isn't otherwise exclusive to such systems. That is, one could expect similar techniques to be used in any modern design around this chip, which is still being produced. It will probably get more love over on an exclusive programming site.
    – user12
    Commented May 17, 2016 at 15:51
  • 5
    There are so many SE's these days, I didn't even know there was a Game Development one. I was debating between Stack Overflow and Computer Science, but in the former, "Why?" / "subjective" questions tend to get downvoted to hell, and the latter seemed too academic, when Retro caught my eye and seemed perfect, with questions about Atari on the frontpage. I do agree with @jdv that it would have gotten more "love" and formal answers elsewhere, but I just felt like it belonged on retro :-) Commented May 17, 2016 at 19:24
  • 4
    We are still in beta here and trying to figure out exactly what belongs and what doesn't, so if nothing else questions like these are useful to help us figure that out. As for the Game Development SE, I have no idea how receptive they are to retro programming questions. Maybe have a poke around and there and see if any similar questions have gotten good answers there?
    – mnem
    Commented May 17, 2016 at 19:30
  • 5
    A question may be on-topic on more than one site. Or none at all. In the former case, it's a decision based on the target audience and the kinds of answers you are interested in. I encounter this a lot with Open Source software licensing questions, which are on-topic on (a least) Software Engineering, Open Source, and Law. These questions can be asked at any of those three, but the kinds of answers (and their quality!!!) will differ wildly. Commented May 17, 2016 at 21:32
  • 2
    Hm, as the asker though, how should I have known the answer beforehand to know where to ask? :-) Maybe as long as another curious visitor may conceivably and justifiably come here, and not another place, to ask this same question... maybe that would be a criterion for on-topicness. (Not at all upset or feeling persecuted! I mod another site here and I have had similar discussions about Beer SE vs Homebrewing SE.) Commented May 17, 2016 at 23:14

7 Answers 7

56

The 8 bit 6502 family doesn't have any stack-relative addressing modes that would make it easy to use the stack for variable storage. One can access values on the stack with a sequence such as TSX; LDA &102, X, but that's slower, clobbers X, and uses more memory (both in code size and stack usage) than a global variable.

The 65C816 adds stack-relative instructions such as LDA 2, S, so it is now plausible to use local variables on the stack. But global variables were idiomatic on the 6502 and old habits die hard…

5
  • 3
    Yup, believe it or not, there were major architectures, as well as micros, without stack addressing modes. E.g., System 360.
    – davidbak
    Commented May 18, 2016 at 2:20
  • 8
    This is the correct answer.
    – Gaius
    Commented May 18, 2016 at 13:15
  • @davidbak: FORTRAN does not allow recursion, and hardware to automate "push current PC and set new PC" is expensive. If recursion isn't necessary, having a dedicated spot for each subroutine's return address may be easier and cheaper than pushing and popping return addresses to/from a stack.
    – supercat
    Commented Jan 26, 2017 at 15:32
  • @supercat, Yes, and more to the point, in the era when FORTRAN was invented, machine architectures did not have stack addressing modes (auto incr- or decr- registers with or without indirection). (The Burroughs stack machines came slightly later ... and were programmed in ALGOL.)
    – davidbak
    Commented Jan 26, 2017 at 15:47
  • 1
    "...and old habits die hard..." I recognized this was the answer of pure experience. Thank you. Commented May 15, 2017 at 17:54
49

Off the top of my head I can think of two reasons, there are probably more.

The first reason is that these variables may be set by a routine each frame, and then a lot of code uses them during the time of the whole frame. Every interrupt routine that fires during that frame may want to read out the current direction.

The second reason is that, in a real-time embedded environment1 like a console game, it's important to know that you have enough resources to handle the worst-case conditions. One resource is time; you want to make sure that everything you want to perform can be handled in one frame update. Another resource, and relevant in this case, is memory. Whatever happens, you can't run out of memory2. There's no swap space, you can't pop up a requester, or kill off something else. Add to this the fact that embedded platforms like this do not have any form of memory protection, so if you write too much on the stack — game over, man.

The combination of these two makes it reasonable to reserve RAM for variables like this.


1. These platforms still exist today, think control systems for robots, your fridge, basically every "hidden" microcontroller.

2. I've recently worked with an operating system designed for high reliability. It had every type of dynamic allocations removed. No malloc() for you.

4
  • 2
    A vote for the first part. If these parameters are at fixed locations, an ISR can work on them without needing to first locate them. All the timing critical display updates would likely be interrupt driven. Commented May 17, 2016 at 16:01
  • 12
    @2: I work with a system that has high-reliability parts. They all have static arrays for storage; even stacks and queues are pre-allocated and tightly constrained - e.g. if there is stuff in the queue still but new must be added, first a cancellation mechanism for the old is triggered; the queue will never overflow as only a fixed number of elements can be stored. The non-critical parts can still allocate memory dynamically, but they MUST fail gracefully - they are not allowed to pull the whole system down.
    – SF.
    Commented May 17, 2016 at 18:14
  • "and then a lot of code uses them during the time of the whole frame." +1 In most games I've worked on (modern era) we store the velocity of an object in memory - especially when dealing with physics or some animations.
    – NPSF3000
    Commented May 19, 2016 at 3:07
  • 1
    I saw the new (to me) "RC" icon in the HNQ list and my curiosity was piqued. Opened it to see it stands for retrocomputing and thought: "I should let pipe know". But I see you're on top of things. :)
    – Oliphaunt
    Commented May 19, 2016 at 10:59
24

As mentioned previously the timing issue is the cause not to waste time in pushing up parameters, access them with cost-intensive addressing modes and pull them finally from stack. Too much action if this occurs in a tight, time-critical frame building routine. In a games of a certain size usually all could be handled with global variables. Some state of the game has to be held into globals anyway. It's a just a waste of time to copy the global state into local parameters all the time. There is no advantage or gain to use high-level language structures like parameter passing to functions or subroutines. Local variable protection, opaque variables, a common calling interface convention and so on are simply not necessary in a closed world of a game. An important exception is the calling interface to a surrounding operating system, to attach a game to the system environment and using its resources or gain access to libraries with offers graphical or sound support and so on (3D engines, sound players, ...).

Larger developments or projects tend to use a high-level language which implies the usage of such techniques. On assembler level you have only to bother with the high-level calling interface if you want to embed such routines for (mostly) time-critical reasons. It can simply seen as a connection of some assembly with the rest of the project in high-level.

0
14

It's also worth pointing out that the intricacies of maintaining variables on the stack can result in slower code. And of course there are limits to how big the stack can be; even with the more expansive stack on the 65816, you're still limited to a fraction of bank $00 (so 64K minus the direct page(s) minus any other stacks you have around minus any I/O space or ROM or the like you have. So preserving bank zero space for use by other code that actually has to use it is pretty critical.

As has been pointed out, the '816 does have stack relative addressing. But it's a little awkward and is very easy to screw up. If you make any changes to the configuration of the stack, you can mess up all kinds of stuff with things getting misaligned. Especially if you pass parameters on the stack.

So it's just so much easier to not use the stack that way unless you're either (a) really sure of what you're doing or (b) really sure it will make a huge difference.

12

Code written in high-level languages do a lot of stack-relative operations, because compilers are good at keeping track of which stack offset refers to which variable in the current context.

In hand-written assembly code it was often more common to store things as 'globals' in well-known memory locations, just because it's easier for humans to think about memory layout in that way. If recursion or concurrent instances are not needed in a particular function, then global storage is a possible solution.

I worked on Z80 and 8086 assembly code software systems back in the day, and even though both of these processors have index registers that would allow stack-relative storage of variables, it was common practice to store variables as globals.

2
  • 1
    The Z80 allowed indexing off IX and IY, and the C compilers I've used employed IX as a frame pointer, but IX-based operations have such a huge speed penalty that it's generally more efficient to avoid using any automatic variables and simply use static variables instead. Too bad Z80 compilers don't support static overlay pseudo-stack the way compilers for 8x51 and PIC often do.
    – supercat
    Commented May 19, 2016 at 3:39
  • 2
    This. And it's not just about writing code; it's also about debugging! Even if the instruction set supports stack-relative addressing, think about how much easier it is to debug a program when the memory dump contains distinctive values like 0B56.
    – Artelius
    Commented May 20, 2016 at 7:36
10

Another example in the tightly constrained world of games - especially those with a real-time loop (e.g., for updating the display on a fixed schedule): there was no space for a "task queue" and/or no time for a "rendezvous" mechanism that would enable data to be passed from one thread to another in one of the "structured" methods that would be more common today. Instead, the programmer passed data via global variables and was responsible for analyzing (and preventing or mitigating) data races and other concurrency hazards. These solutions are fast and cheap at runtime, slow and expensive at coding time ...

BTW, when looking at older code for very resource-constrained systems you'll frequently find coding techniques that are generally shunned today - for good reason! The most egregious would be self-modifying code which was much more common on earlier and/or smaller architectures with less choice of addressing modes, fewer registers, etc. Look at those mechanisms and marvel how we were able to do so much with so little ... and in such a way that it nearly always worked!

7
  • "self-modifying code" – Well, today, that's called "Dependency Injection" and is all the rage :-D (More seriously: reflective metaprogramming is a powerful technique, and is used in some communities to great effect, e.g. Lisp and Smalltalk, but also Java bytecode rewriting is an example, e.g. Kilim.) Commented May 17, 2016 at 21:39
  • 1
    @JörgWMittag - I know you're kidding but of course by "self-modifying code" I'm referring to what has now been dressed up and formalized as "run time code generation" or "dynamic code generation". When it isn't so fancy: you just overwrite your own machine code right before you execute it. For example, I did this on a minicomputer that lacked a variable shift instruction (where the shift count comes out of a register). I just created a fixed shift instruction with the right count in its immediate field and wrote it into the code right before it executed ...
    – davidbak
    Commented May 17, 2016 at 22:18
  • 2
    Hey, when I wanted to save a register, I would store the value into a location that was the immediate value of a Load instruction later on where I needed the value again.
    – JDługosz
    Commented May 18, 2016 at 2:06
  • 1
    Yes, it was meant as a joke. I know Linux does this in a few occasions. Two I know of: when a kernel that has been compiled for a multi-CPU system boots on a singe-CPU system, it overwrites its spin lock implementation with NOPs, there's no point in protecting against concurrent access if there's no concurrency. And there are "hybrid" kernels that can run natively as well as accelerated-paravirtualized (e.g. on Xen or HyperV), and they patch in the correct implementations of certain interrupt routines and low-level hardware access functions at runtime (boot time, to be precise). Commented May 18, 2016 at 2:52
  • And of course kexec, the ability to replace the running kernel image with a newer one without rebooting the system is basically a giant multi-100-kByte runtime patch. Linux doesn't guarantee stable ABIs inside the kernel, so kexec has to patch and update all internal kernel data structures, e.g. process descriptors (including for currently executing processes), internal filesystem driver data (including for currently mounted active filesystems with open files) and so on. Commented May 18, 2016 at 2:57
4

From your description, it sounds like a normal thing to do even today — you just have the wrong model in your head.

The description sounds like there is a "physics" thread responsible for updating the state of the game world, and updates happen periodically (say... once per frame, just before rendering).

When some routine wishes to move Samus Aran, it doesn't do it directly; instead it would invoke a method to queue up the change, and then the next time the physics thread runs, it executes all of the changes at once.

This model makes it easier to ensure regular updates of the world state. Additionally, if multiple changes happen between frames, it is likely much more efficient to only have to calculate the effect of motion once per frame rather than doing so after each individual event.


The gameplay of Super Metroid appears to be so that the motion doesn't need to be broken into individual events, and so it doesn't need an actual queue; instead the changes in each direction simply get added up, and only the total change in each direction effected.

(naturally, the method to queue up the change would be inlined, so you wouldn't see a function call at all)

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .