34
$\begingroup$

When a computer stores a variable, when a program needs to get the variable's value, how does the computer know where to look in memory for that variable's value?

$\endgroup$
7
  • 18
    $\begingroup$ It doesn't; "the computer" is completely oblivious. We have to hardcode all addresses. (Which is simplifying a bit, but not by too much.) $\endgroup$
    – Raphael
    Commented Oct 13, 2016 at 20:40
  • 1
    $\begingroup$ @Raphael: Let's generalize that to "we have to hardcode base addresses". $\endgroup$
    – phresnel
    Commented Oct 14, 2016 at 9:36
  • $\begingroup$ Every time you declare a variable the program responsible for running your code includes the variable name with it's address in a hashtable (aka namespace). I'd suggest reading the book "Structure and Implementation of Computer Programs (SICP) to become well acquainted with such little details. $\endgroup$ Commented Oct 15, 2016 at 9:02
  • $\begingroup$ Your source programme uses a variable. The compiler or interpreter decides how to implement it: it generates instructions for the computer to execute and has to make sure that those intructions fetch values from the places in which previous instructions stored them. $\endgroup$
    – PJTraill
    Commented Oct 15, 2016 at 22:26
  • 1
    $\begingroup$ @AbhirathMahipal: a variable need not have an address at compile time or even run time; “namespace” is a language concept while a table (hashed or otherwise) is an implementation detail; the name need nod persist in the programme when it is run. $\endgroup$
    – PJTraill
    Commented Oct 15, 2016 at 22:30

5 Answers 5

33
$\begingroup$

I'd suggest you look into the wonderful world of Compiler Construction! The answer is that it's a bit of a complicated process.

To try to give you an intuition, remember that variable names are purely there for the programmer's sake. The computer will ultimately turn everything into addresses at the end.

Local variables are (generally) stored on the stack: that is, they're part of the data structure that represents a function call. We can determine the complete list of variables that a function will (maybe) use by looking at that function, so the compiler can see how many variables it needs for this function and how much space each variable takes.

There's a little bit of magic called the stack pointer, which is a register which always stores the address of where the current stack starts.

Each variable is given a "stack offset", which is where in the stack it's stored. Then, when the program needs to access a variable x, the compiler replaces x with STACK_POINTER + x_offset, to get the actual physical place it's stored in memory.

Note that, this is why you get a pointer back when you use malloc or new in C or C++. You can't determine where exactly in memory a heap-allocated value is, so you have to keep a pointer to it. That pointer will be on the stack, but it will point to the heap.

The details of updating stacks for function calls and returns are complicated, so I'd reccomend The Dragon Book or The Tiger Book if you're interested.

$\endgroup$
0
24
$\begingroup$

When a computer stores a variable, when a program needs to get the variable's value, how does the computer know where to look in memory for that variable's value?

The program tells it. Computers do not natively have a concept of "variables" - that's entirely a high-level language thing!

Here's a C program:

int main(void)
{
    int a = 1;
    return a + 3;
}

and here's the assembly code it compiles to: (comments starting with ;)

main:
    ; {
    pushq   %rbp
    movq    %rsp, %rbp

    ; int a = 1
    movl    $1, -4(%rbp)

    ; return a + 3
    movl    -4(%rbp), %eax
    addl    $3, %eax

    ; }
    popq    %rbp
    ret

For "int a = 1;" the CPU sees the instruction "store the value 1 at the address (value of register rbp, minus 4)". It knows where to store the value 1 because the program tells it.

Likewise, the next instruction says "load the value at address (value of register rbp, minus 4) into register eax". The computer doesn't need to know about things like variables.

$\endgroup$
1
  • 2
    $\begingroup$ To connect this to jmite's answer, %rsp is the CPU's stack pointer. %rbp is a register that refers to the bit of the stack used by the current function. Using two registers simplifies debugging. $\endgroup$
    – MSalters
    Commented Oct 14, 2016 at 15:44
2
$\begingroup$

When the compiler or interpreter encounters the declaration of a variable, it decides what address it will use to store that variable, and then records the address in a symbol table. When subsequent references to that variable are encountered, the address from the symbol table is substituted.

The address recorded in the symbol table may be an offset from a register (such as the stack pointer) but that's an implementation detail.

$\endgroup$
0
$\begingroup$

The exact methods depend on what specifically you are talking about and how deep you want to go. For example, storing files on a hard drive is different than storing something in memory or storing something in a database. Although the concepts are similar. And how you do it at a programming level is a different explanation than how a computer does it at the I/O level.

Most systems use some sort of directory/index/registry mechanism to allow the computer to find and access the data. This index/directory will contain one or more keys, and the address the data is actually located in (whether that be hard drive, RAM, database, etc.).

Computer Program Example

A computer program can access memory in a variety of ways. Typically the operating system gives the program an address space, and the program can do what it wants with that address space. It can write directly to any address within its memory space, and it can keep track of that how it wants. This will sometimes vary by programming language and operating system, or even according to a programmer's preferred techniques.

As mentioned in some of the other answers, the exact coding or programming used differs, but typically behind the scenes it uses something like a stack. It has a register that stores the memory location where the current stack starts, and then a method of knowing where in that stack a function or variable is.

In many higher level programming languages, it takes care of all that for you. All you have to do is declare a variable, and store something in that variable, and it creates the necessary stacks and arrays behind the scenes for you.

But considering how versatile programming is, there isn't really one answer, since a programmer can choose to write directly to any address within its allocated space any time (assuming he is using a programming language that allows that). Then he could store its location in an array, or even just hard code it into the program (i.e. the variable "alpha" is always stored at the beginning of the stack or always stored in the first 32 bits of allocated memory).

Summary

So basically, there has to be some mechanism behind the scenes that tells the computer where data is stored. One of the most popular ways is some sort of index/directory that contains key(s) and the memory address. This is implemented in all sorts of ways and is usually encapsulated from the user (and sometimes even encapsulated from the programmer).

Reference: How do computers remember where they store things?

$\endgroup$
0
$\begingroup$

It knows because of templates and formats.

The program/function/computer doesn't actually know where anything is. It just expects something to be in a certain place. Let's use an example.

class simpleClass{
    public:
        int varA=58;
        int varB=73;
        simpleClass* nextObject=NULL;
};

Our new class 'simpleClass' contains 3 important variables - two integers that can contain some data when we need them to, and a pointer to another 'simpleClass object'. Let's assume that we're on a 32-bit machine for the sake of simplicity. 'gcc' or another 'C' compiler would make a template for us to work with to allocate some data.

Simple Types

Firstly, when one uses a keyword for a simple type like 'int', a note is made by the compiler in the executable file's '.data' or '.bss' section so that when it's executed by the operating system, the data is available to the program. The 'int' keyword would allocate 4 bytes (32 bits), while a 'long int' would allocate 8 bytes (64 bits).

Sometimes, in a cell-by-cell manner, a variable may come right after the instruction that's supposed to load it into memory, so it would look like this in pseudo-assembly:

...
clear register EAX
clear register EBX
load the immediate (next) value into EAX
5
copy the value in register EAX to register EBX
...

This would end with the value '5' in EAX as well as EBX.

While the program executes, every instruction is executed except for the '5' since the immediate load references it and makes the CPU skip over it.

The downside of this method is that it's only really practical for constants, since it would be impractical to keep arrays/buffers/strings in the middle of your code. So, generally, most variables are kept in program headers.

If one needed to access one of these dynamic variables, then one could treat the immediate value as if it were a pointer:

...
clear register EAX
clear register EBX
load the immediate value into EAX
0x0AF2CE66 (Let's say this is the address of a cell containing '5')
load the value pointed to by EAX into EBX
...

This would end with the value '0x0AF2CE66' in register EAX and the value of '5' in register EBX. One can also add values in registers together, so we'd be able to find elements of an array or string using this method.

Another important point is that one's able to store values when using addresses in a similar manner, so that one can reference the values at those cells later.

Complex Types

If we make two objects of this class:

simpleClass newObjA;
simpleClass newObjB;

then we can assign a pointer to the second object to the field available for it in the first object:

newObjA.nextObject=&newObjB;

Now the program can expect to find the address of the second object within the first object's pointer field. In memory, this would look something like:

newObjA:    58
            73
            &newObjB
            ...
newObjB:    58
            73
            NULL

One very important fact to note here is that 'newObjA' and 'newObjB' don't have names when they're compiled. They're just places where we expect some data to be at. So, if we add 2 cells to &newObjA then we find the cell that acts as 'nextObject'. Therefore, if we know the address of 'newObjA' and where the 'nextObject' cell is relative to it, then we can know the address of 'newObjB':

...
load the immediate value into EAX
&newObjA
add the immediate value to EAX
2
load the value in EAX into EBX

This would end with '2+&newObjA' in 'EAX' and '&newObjB' in 'EBX'.

Templates/Formats

When the compiler compiles the class definition, it's really compiling a way to make a format, a way to write to a format, and a way to read from a format.

The example given above is a template for a singly-linked-list with two 'int' variables. These kinds of constructions are very important for dynamic memory allocation, along with binary and n-ary trees. Practical applications of n-ary trees would be filesystems composed of directories pointing to files, directories, or other instances recognized by drivers/the operating system.

In order to access all of the elements, think about an inchworm working its way up and down the structure. This way, the program/function/computer doesn't know anything, it just executes instructions to move data around.

$\endgroup$
2
  • $\begingroup$ The words 'template' and 'format' as used here do not appear in any compiler or compiler textbook I have ever seen, and there doesn't appear to be any reason to use both words for the same non-existent thing. Variables have addresses and/or offsets, that's all you need to know. $\endgroup$
    – user207421
    Commented Oct 16, 2016 at 2:29
  • $\begingroup$ I'm using the words since they're abstractions for data arrangement, just like numbers, files, arrays, and variables are abstractions. $\endgroup$ Commented Oct 19, 2016 at 13:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.