1

I just did a rough calculation around the max size of an unsigned 64-bit integer, which is:

18,446,744,073,709,551,615
q5 q4  t   b   m   t   h

Looking at AWS's hardware specifications on their largest machines, it gets up to 3,904GB, which is:

3,904,000,000,000,000,000 bytes
5 q4  t   b   m   t   h

To me that means the pointers are stored as 64-bit integers. I am new to thinking about memory and pointers but just wanted to clarify that.

I'm a bit confused though still. A pointer is a "programming language construct". So technically, even on a 64-bit machine, if you are only using less than ~4 billion integers (32-bit max integer size), then I'm wondering why you can't just have the pointers be 32 bits. That way pointers are 32-bits until you run out of space, then you can start using 64-bit pointers. Then it would give you a bit more space to have more objects.

Still confused though. A pointer holds the location of an address in memory. It says the "address" is 64-bits. So if we were to have 32-bit pointers pointing to 32-bit chunks in the 64-bit memory, I'm not sure what that would look like or mean. It seems like it means you would have to do offsets (though I don't understand that too well).

Wondering if one could demonstrate in C, Assembly, or JavaScript, how it would look to store 32-bit pointers in a 64-bit address space. If C handles it for you automatically, then how Assembly does it.


I would like to know how I could use a large memory like above, but store 32-bit pointers, until the max is reached then use 64-bit pointers, and not sure what that would look like exactly. I will try to draw a diagram explaining how I'm thinking about it.

  | The bars and . are like a ruler and mark the bit positions.
  - Each row under a number (1, 2, 3, ...) means a memory address.
  ⬚ Means no data in memory address.
  ⬒ Means data of type 1 in memory address.
  ■ Means data of type 2 in memory address.
  ● Means a bit of integer pointer is plugged into memory address slot.
  ◌ Means no bit of integer pointer is plugged into memory address slot.
                                                                                                                                 |
                                                                 |                                                               |
                                 |                               |                               |                               |
                 |               |               |               |               |               |               |               |
         |       |       |       |       |       |       |       |       |       |       |       |       |       |       |       |
   . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |

1. Empty 64-bit memory.
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌
   ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ...
   ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ⬚ ...
   ...
   ...

2. 64-bit memory filled with 32-bit pieces of data (type 1 ⬒, type 2 ■).
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌
   ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ...
   ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ...
   ...
   ...

3. 64-bit memory filled with 64-bit pieces of data.
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌
   ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ...
   ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ...
   ...
   ...

4. 64-bit memory filled with 4-bit pieces of data.
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌
   ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ...
   ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ...
   ...
   ...

5. 64-bit memory filled with 32-bit pieces of data, with second 32-bits accessed by a 32-bit pointer.
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
   ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ 
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ...
   ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ... 
   ...
   ...

6. 64-bit memory filled with 64-bit pieces of data, with second 64-bits accessed by a 64-bit pointer.
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌
   ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■
   ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ...
   ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ...
   ...
   ...

7. 64-bit memory filled with 4-bit pieces of data, with second piece of data accessed by a pointer.
   ◌ ◌ ◌ ◌ ● ● ● ● ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌
   ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ...
   ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ⬒ ⬒ ...
   ...
   ...

8. 64-bit memory filled with 8-bit pieces of data, with second piece of data accessed by a pointer.
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ● ● ● ● ● ● ● ● ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌
   ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ 
   ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ◌ ...
   ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ■ ■ ■ ■ ■ ■ ■ ■ ⬒ ⬒ ⬒ ⬒ ⬒ ⬒ ...
   ...
   ...

What I'm imagining is that the integers are like keys to a lock (which is the memory address). An empty key hole looks like the 64 ◌'s in a row in (1). A full key hole for a 64-bit address looks like the 64 ●'s in a row in (6). If I give the 64-bit memory address space a 32-bit key, it's like it would look like (5). So it wouldn't fully fill in the 64-bit long (64-◌ long) key hole, it would only fill (in this case) the second half of it. And so it seems like it wouldn't match the address. But I'm trying to point to the 32-bits of data right there in the second half! In order to match the address, it seems you'd have to fill in the key holes in the full 64-bit row, as in (6). I am wondering if my understanding is messed up here, please let me know where I'm off.

In case that wasn't clear, the first numbers 1-4 in the chart show data that lays in memory (with 1 being an empty memory). The second numbers 5-8 show us trying to access the data using a pointer (the black circles ● in a row being the pointer/key to the memory address lock).

Finally, I have one last issue. I wonder if you can take it further and store data at even smaller chunks. Such as storing 4-bits of data, as in (7). This just goes to demonstrate how the pointer / address system works in a bit more detail. I don't know if you can have a 4-bit pointer point to a 4-bit chunk of memory. This seems like, because of alignment requirements, you would end up fetching at least 8-bits at a time. But that's okay. I just want to make sure it is or isn't possible to use an n-bit pointer to access n-bits of data in a 64-bit memory space.

And if so, how that would look, either in C or Assembly, or JavaScript would also work.

I would like to know this to know how you are supposed to store data in a 64-bit memory, and what you are allowed to do with the pointers given the "memory addresses are 64-bits". That is, if I can do memory.get(a32BitPointer) and have it return 32 bits of data, from a 32-bit aligned memory slot. (Or equivalently, a 4, 8, 16, etc. bit piece of data or sized pointer).

2
  • 1
    Um, you think that a thousand is 10⁶ (10^6) and a million is 10⁹ (10^9)? Commented Aug 28, 2018 at 23:17
  • your number is far too wrong. 3,904GB is 3 904 000 000 000 (only 9 zeros) and not 3,904,000,000,000,000,000. It's easier to think in terms of ISO prefixes KMGTE... instead of htmbt... as you wrote. It took me a while to understand what t and h mean when they're far from the thousand and unit column
    – phuclv
    Commented Nov 9, 2018 at 1:59

2 Answers 2

4

A pointer points to contains an absolute address.

If you need to add a value before you use the pointer, what you have is an offset, not a real pointer.

In C, a void pointer can be a function pointer, e.g. you can call a function through it. For that to work you need all 64 bits if the CPU is in 64-bit mode.

If your CPU supports 64 address lines (it may physically have less), then it has an address space of 2^64, which is 0x1 0000 0000 0000 0000 - ranging from 0x0000 0000 0000 0000 to 0xFFFF FFFF FFFF FFFF.

If you want your pointer to be useable by CPU instructions without needing additional CPU instructions to find out what you really mean (native CPU code can deal with pointers directly), then it must be as wide as the CPU's address space.

Offsets are slower because the CPU must add to get the address you want, though CPUs have native instructions that do that too.

I'm not a super expert with the x86-64 ISA, but there's probably CPU instructions that treat 32-bit values as 64-bit values with the first 32 bits assumed to be 0. CPU still has to internally "extend" the real value to 64 bits.

In x86 and x86-64 you certainly can use 8, 16, 32, and 64 bit offsets (no x86/x86 CPU instructions work with just 4-bit values)

1
  • 1
    A pointer doesn't "point to an absolute address". The value of the pointer IS an absolute address - a location in memory. (Well, ok. what a pointer points to, i.e. what that memory location contains, could be an absolute address, but it could equally well be the address of some data item, or the starting address of a data structure, etc.) Commented Aug 24, 2018 at 21:27
2

First, 3904GB of memory needs only 42 bits to address. It consists of only 3 904 000 000 000 bytes instead of what you've calculated. It can be quickly checked with PowerShell

PS C:\> [math]::Log(3904GB, 2) # GB base 2, or GiB
41.9307373375629
PS C:\> [math]::Log(3904e9, 2) # GB base 10
41.8280901915491

So technically, even on a 64-bit machine, if you are only using less than ~4 billion integers (32-bit max integer size), then I'm wondering why you can't just have the pointers be 32 bits. That way pointers are 32-bits until you run out of space, then you can start using 64-bit pointers. Then it would give you a bit more space to have more objects.

x32 ABI is a 64-bit x86 ABI that uses 32-bit pointers. Processes have only a 32-bit address space, which means they can't use more than 4GB of memory (no problem for most user applications) but they'll be able to take advantage of the bigger and wider register space. The global memory space is still 64-bit, since that's fixed in hardware, thus the 32-bit pointer will be used as an offset to the process' base address instead of a direct pointer. The implementation is simply like this

void* getRealPointer(uintptr_t base_addr, uint32_t ptr)
{
    // value stored in pointer is the offset/distance from base
    return (void*)(base_addr + ptr);
}

This technique is also common on many other 64-bit RISC architectures like Sparc (Why does Linux on sparc64 architecture use 32-bit pointers in user-space and 64-bit pointers in kernel-space?), MIPS or PowerPC, since on the transition to 64-bit they didn't increase the number of registers like x86 and ARM, which means a 32-bit process is likely faster than a 64-bit one unless it needs a lot of 64-bit math or more than 2/3/4GB of RAM

On 64-bit processors such as the G5, Debian PPC uses a 64-bit kernel with 32-bit user space. This is because the 64-bit PowerPC processors have no "32-bit mode" like the Intel 64/AMD64 architecture. Hence, 64-bit PowerPC programs that do not need 64-bit mathematical functions will run somewhat slower than their 32-bit counterparts because 64-bit pointers and long integers consume twice as much memory, fill the CPU cache faster, and thus need more frequent memory accesses.

Linux on PowerPC

Nevertheless you can't just use 32-bit pointers until you run out of space then start using 64-bit pointers, that doesn't make sense. A type always has a fixed size. If you reserve space for only 32 bits of the pointer then what will happen when you need to use 64-bit pointers? Where will you store the high part?


So if we were to have 32-bit pointers pointing to 32-bit chunks in the 64-bit memory, I'm not sure what that would look like or mean

That's called word-addressable memory. Instead of pointing to each byte, now each value simply points to a different word

It'll be easier to imagine memory to consist of a series of linear cells that are identified by unique IDs. Those IDs are what we normally call "address" and are what is stored in pointers. Cell size is typically 1 byte on modern systems (i.e. byte-addressable memory). However many older systems like the Unisys or PDP do use word-addressable memory with a cell contains a word (36-bit long in case of those architectures). Therefore in those systems char* would be larger than int* since you'll need some more bits to store the position of the byte you want to address

I don't quite understand your chart but people rarely need to address each bit like that, since that obviously reduces the total memory we can address. Although to be fair there do exist a few architectures with bit-addressable memory, mainly embedded systems. If looks like you want the low 32-bit of a 64-bit value when you give the CPU a 32-bit address, but it doesn't work that way. To address each half you'll need one more significant bit instead of half the number of bits like that. The principle is simple: If we use bigger cell size then on the same amount of memory, less cells are needed which means less bits for the ID are needed; and vice versa. On the hardware level the cell size is typically fixed.

Below is an example for the first 16 bytes in memory

╔══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╗
║ 0000 │ 0001 │ 0010 │ 0011 │ 0100 │ 0101 │ 0110 │ 0111 │ 1000 │ 1001 │ 1010 │ 1011 │ 1100 │ 1101 │ 1110 │ 1111 ║
╠══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╪══════╣
║ b0   │ b1   │ b2   │ b3   │ b4   │ b5   │ b6   │ b7   │ b8   │ b9   │ b10  │ b11  │ b12  │ b13  │ b14  │ b15  ║
╟──────┴──────┼──────┴──────┼──────┴──────┼──────┴──────┼──────┴──────┼──────┴──────┼──────┴──────┼──────┴──────╢
║ w0 000      │ w1 001      │ w2 010      │ w3 011      │ w4 100      │ w5 101      │ w6 110      │ w7 111      ║
╟─────────────┴─────────────┼─────────────┴─────────────┼─────────────┴─────────────┼─────────────┴─────────────╢
║ dw0 00                    │ dw1 01                    │ dw2 10                    │ dw3 11                    ║
╟───────────────────────────┴───────────────────────────┼───────────────────────────┴───────────────────────────╢
║ o0                                                    │ o1                                                    ║
╚═══════════════════════════════════════════════════════╧═══════════════════════════════════════════════════════╝

You can also look at the illustration in this answer

If we address each 2-byte word then the Nth-word will have the byte address as N*2. Same to any other chunk sizes where the real offset can be calculated as offset*sizeof(chunk). As a result the 2 low bits in a 4-byte aligned address the 3 low bits in an 8-byte aligned address are always zero. If you don't use word-addressable pointers then those low bits can be used to store data which is called a tagged pointer

64-bit JVM uses this technique with compressed Oops. See the Trick behind JVM's compressed Oops Objects in Java are always aligned to 8 bytes, so they can address 8*4 = 32GB of memory with 32-bit address.

Managed pointers in the Java heap point to objects which are aligned on 8-byte address boundaries. Compressed oops represent managed pointers (in many but not all places in the JVM software) as 32-bit object offsets from the 64-bit Java heap base address. Because they're object offsets rather than byte offsets, they can be used to address up to four billion objects (not bytes), or a heap size of up to about 32 gigabytes. To use them, they must be scaled by a factor of 8 and added to the Java heap base address to find the object to which they refer. Object sizes using compressed oops are comparable to those in ILP32 mode.

https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#compressedOop

Recently Google also applies the same technique to its V8 rendering engine and uses 32-bit pointer in 64-bit V8, reducing memory footprint by ~35%

Branch and load/store instructions on most RISC architectures also store the word address in the immediate part, since there's no point wasting precious space saving those always-zero bits. For example MIPS branch and jump instructions: JAL, J, BEQ, BLEZ, BGTZ...

You must log in to answer this question.