Most of the processors/CPUs widely used today, have a bit count that is a power of 2 (usually 32 and 64, but also 16, 8, and 4 bits).

Even though the meaning of bit count isn't consistent (some say it's the word size, the size of the registers, the instruction width, the data or address bus width etc.), all of these are almost always powers of 2.

I know there are some exceptions to this, for example the Intel 8086 had a 20 bit address bus, but as I said it is usually a power of 2.

Why does this happen, what are some exceptions, and why?

    as a supplement to the answers provided you may want to review this: superuser.com/a/1563097/171793
    8086 is a 16-bit CPU that supports a way to address more memory. It's not rare for an N-bit CPU to support some features to allow N+m address bits, often with small m. Designs like these were common in the years before transistor budgets were ready to make the leap to 2N-bit CPUs. (e.g. 8086 came late-ish in the 16-bit era, when high-end CPUs were soon or already 32-bit in at least some ways, like register and address width, such as M68k). Or as an extension to squeeze more life out of a 32-bit design with existing software, like x86 PAE, or 32-bit PowerPC has some special regs I think.
    Historically there have been wide ranges of bitnesses, including registers and byte/word widths, not just address busses. Were there ever 12-, 24-, 48-, etc bit processors? and Have there been any instruction sets with an odd register width?. But as for why, What was the rationale behind 36 bit computer architectures? has some attempts. Once 8-bit bytes became standard, multiples of that are obvious, and power of 2 B means no mul/div by 3
    On electronics.SE For mainstream computing what are the practical advantages of 64-bit register size CPUs given the needs of today and the near future? was originally titled Why did chip designers choose to jump from 32-bit to 64-bit CPUs? and has several answers suggesting reasons why 64 is the next logical step after 32, instead of say 48.
    From the processor's standpoint, I assume it's so that an address is an integer number of bytes. The motherboard, on the other hand, may discard any of the bits of the address provided by the CPU to map an address to either physical memory or IO devices. e.g. You could use the most significant bits of the address to select between memory and various IO devices, and may only ever need n of the least significant bits of the address depending on how much memory/io space is mapped. But the processor doesn't care what the rest of the computer does with its address bits.
    – Wyck
    Commented Aug 4, 2022 at 14:22

8-bit bytes

Much of this grows out of the adoption of the 8-bit byte. That became popular with the introduction of the IBM 360 family of computers in 1964. In an issue that year of the IBM Technical Journal, an explanation of the choice was offered:

Character size, 6 vs 4/8: In character size, the fundamental problem is that decimal digits require 4 bits, the alphanumeric characters require 6 bits. Three obvious alternatives were considered - 6 bits for all, with 2 bits wasted on numeric data; 4 bits for digits, 8 for alphanumeric, with 2 bits wasted on alphanumeric; and 4 bits for digits, 6 for alphanumeric, which would require adoption of a 12-bit module as the minimum addressable element. The 7-bit character, which incorporated a binary recoding of decimal digit pairs, was also briefly examined.

The 4/6 approach was rejected because (a) [it] was desired it to have the versatility and power of manipulating character streams and addressing individual characters, even in models where decimal arithmetic is not used, (b) limiting the alphabetic character to 6 bits seemed short-sighted, and (c) the engineering complexities of this approach might well cost more than the wasted bits in the character.

The straight-6 approach, used in the IBM 702-7080 and 1401-7010 families, as well as in other manufacturers' systems, had the advantages of familiar usage, existing I/O equipment, simple specification field structure, and of commensurability with a 48-bit floating-point word and a 24-bit instruction field.

The 4/8 approach, used in the IBM 650-7074 family and elsewhere, had greater coding efficiency, spare bits in the alphabetic set (allowing the set to grow), and commensurability with a 32/64-bit floating-point word and a 16-bit instruction field. Most important of these factors was coding efficiency, which arises from the fact that the use of numeric data in business records is more than twice as frequent as alphanumeric. This efficiency implies, for a given hardware investment, better use of core storage, faster tapes, and more capacious disks.

Overall, an 8-bit byte allowed a reasonably large character set, by the standards of the time, and also allowed two BCD digits per byte.

The move to byte addressing

The priority in the earliest computer designs was to process numbers as rapidly as possible. A number was typically stored in a machine word, and the desired numerical range determined the size of the word. Instructions were normally a single word, and there was often a single address as part of each instruction. The size of the address field in instructions determined the memory size. The IBM 704/709 is an example; it had a maximum of 4096 words of 36 bits, with six characters per word, each of 6 bits. Addresses are 12 bits.

As the range of uses for computers expanded, handling text data became more and more important. Doing that in a word-addressed machine is cumbersome, at best. A byte-addressed machine allows you to access individual characters easily, but demands a larger address field. At the same time, magnetic core memory allowed building much larger memories than vacuum tubes, electrostatic storage or delay lines.

These developments essentially forced computers to have larger address spaces, and ended the practice of having an address in each instruction.

Larger Data Items

It obviously makes things simpler to have a whole number of bytes per data item. Simplicity at this level is extremely worthwhile, because it's always been important to make a computer run as fast as possible within a limited budget of electronics parts (tubes early on, transistors since then). So two bytes (16 bits) becomes an obvious size.

For larger sizes, there are two factors that show up in the electronics design:

Counting things

Implementing instructions often requires counting through the bytes (or bits) of data items. Using powers of two makes the electronics of those counters simpler. To count through 4 bytes, you need a two-bit counter, which can hold values from 0 to 3. Counting through three bytes still needs a two-bit counter, but one of its values is meaningless and has to be treated as a special case in hardware.

Sending data over a serial line requires counting through the bits of each item, which is another benefit of 8-bit bytes. A 3-bit counter will handle them, without any need for special cases.

The IBM 360 picked 32-bit addresses (although it only allowed 24-bit memory addresses for its first decade), and once that was established, it was far easier to compete with IBM using 8-bit bytes and 32-bit addresses than if you wanted to do something different.

Memory fetches and data alignment

Fetching data from memory is simpler if data items are "aligned". This means that their addresses are a multiple of their size. So for a byte-addressed machine, like the IBM 360, a single byte can be at any address. A two-byte (16-bit) item is "aligned" if it is at an even-numbered address. A four-byte (32-bit) item is aligned if its address is a multiple of 4.

Many computer designs of the 1960s through 1990s had memories that could fetch 4 bytes in one operation, starting from an address that was a multiple of 4. If your data items are aligned, then you're guaranteed to be able to fetch any two- or four-byte item in a single read from memory. If they are not aligned, you sometimes need two fetches. That requires more complexity in the memory access system, to recognise that the operation is misaligned and generate the extra fetch. That complexity, and the extra fetch, slow things down.

Items bigger than four bytes will need two fetches, but life is simpler if your larger items are eight bytes, and aligned on 8-byte boundaries. Then you always need exactly two fetches. If you have 8-byte items that are not aligned, then you need three fetches.

In modern fast systems, fetches are always of complete cache lines, usually 32 or 64 bytes. These are always aligned, and aligned data items that fit inside them always arrive complete.

Quite a few computer designs regard a misaligned fetch as a program bug, and kill programs that execute one. x86-based systems don't do that, but have to pay the complexity price. They do run faster with aligned data, so that is normally used even though it is not compulsory.

24-bit systems

I've used a 24-bit system, an ICL 1900 mainframe. It used 6-bit bytes, four per 24-bit word. Those 6-bit bytes limited it to UPPERCASE text, and 24-bit pointers limited it to 16MB of RAM, which is tiny by today's standards.

A more modern 24-bit system with 8-bit bytes would still be limited to 16MB of easily addressable memory, and would be paying the costs of counters with unwanted states, and memory items that were either misaligned, or wasted a byte of memory for every 24-bit integer. A 32-bit system would be more capable, and can be built very cheaply in today's technology.

Lessons of history

There have been a couple of influential computer systems that had 32-bit integers and pointers, but used 24-bit addressing. They're the Motorola 68000 and the IBM 360. In both cases, only the lowest 24 bits of an address were used, but addresses stored in memory occupied 32 bits.

As those systems were limited to 16MB of RAM, programmers stored other data in the spare 8 bits. And when 16MB of RAM clearly wasn't enough and the designs were expanded to 32-bit addressing, that data stored in spare bits became a serious problem, if it was treated as part of the address.

On the 68000 family, existing programs had to be changed to stop using those no-longer-spare bits. This was most noticeable in the wider computer industry for Macintosh software in the late 1980s, when updating for 68020 compatibility, but the same thing happened on Amiga, and presumably other 68000-based systems.

On the successors of the IBM 360, 24-bit address programs could still be run, as could programs using larger addresses. But only 31 of the potential 32 address bits could be used; an address bit had been sacrificed to let the hardware tell the difference between the two kinds of code.

Post-32-bit designs

Everyone who designed a general-purpose architecture with addressing larger than 32 bits knew of the 360 and the 68000, and how much pain 24-bit addressing had caused. Nobody who was serious tried to design a segmented architecture like real-mode x86 for going beyond 32-bit addressing. Everyone used flat address spaces. There are only a few vaguely sane choices for address size.

  • 40-bit addressing is complicated. The electronics have unused values in counting through bits and bytes. If memory fetches are 32 bits, then 40-bit pointers always require two fetches; if memory fetches are 40 bits, then some of your 16-bit and 32-bit fetches require two fetch operations, and some of your 64-bit fetches need three. You can reduce that by widening your 32-bit quantities to 40 bits, and 64-bit quantities to 80 bits, but that isn't a great idea - see below.

  • 40-bit addressing also won't last very long. It only allows addressing 1024GB, and as of 2023, that would already be a problem for some markets. Expanding 40-bit addressing to a bigger address space would cause another round of disruption, as software was updated to make use of it, and would likely destroy backwards compatibility to 40-bit. It also gives you another round of alignment complexity if you took the 40- and 80-bit option.

  • 48- or 56-bit addressing are about as complex as 40-bit, and while they probably would last rather longer, by the time you've gone this far, you might as well go all the way.

  • 64-bit is simpler to build than 40-, 48- or 56-bit. It will last longer. Its register size matches standard floating-point data sizes. It seems logical.

The first general-purpose post-32-bit microprocessors released were the MIPS R4000 in 1991, the Kendal Square Research KSR-1, also in 1991 and the DEC Alpha in 1992.

  1. I don't know many details of the MIPS project, but it was a 64-bit extension of their 32-bit R2000 and R3000 microprocessors. SGI bought MIPS when they got into financial difficulties in 1991-92 to ensure the supply of processors for their workstation products.

  2. The KSR-1 was a supercomputer with at least eight 64-bit microprocessors of their own design. It was not successful.

  3. The DEC project had the most effect, because DEC was a major computer company at the time. The effort had started in 1988, initially aiming to keep the 32-bit VAX architecture relevant in the long term. The designers rapidly realised that this was impractical, and designed a new architecture, intended to last at least 25 years. They therefore went for 64-bit addressing, to make sure that they didn't run out of address space.

Releasing a competitor to Alpha or MIPS which wasn't 64-bit would obviously have a marketing problem with "why isn't it 64-bit?" questions. So 64-bit became the consensus. The much newer RISC-V architecture makes some provision for 128-bit addressing, although this has not yet been designed.

An important detail: no current 64-bit processor can actually have 64-bits worth of memory connected to it. None of them have enough address lines. This does not matter. Future implementations can be given more address lines. Programmers have to be discouraged from using the "spare" address bits, but that's practical to do, and operating systems can be designed to reject such usage.

64-bit ARMv9-A has optional features to improve security that use some of the "spare" high bits, but they are optional, intended for use in mobile devices which don't need peta- and exabyte memories at present.

    Why not 24-bit? 40-bit? etc.
  Abuse of the "spare" byte in the Motorola 68000's 32-bit pointers was a problem on the Amiga as well as the Macintosh. Particularly game and demo coders would do this to eke out a little more efficiency, and it all fell apart when later Amiga models came out with 68020+ processors.
    Commented Dec 31, 2023 at 4:51
    Commented Jan 4 at 0:30
    Commented Jan 8 at 2:04

Having 2ⁿ bit registers, allows bits in the registers to be addressed with an integer number of bits:

Addressing a bit in an 8 bit register will need 3 bits. Addressing a bit in a 16 bit register will need 4 bits. But, addressing a bit in a 12 bit register will need 3.58… bits. You would have to round it up to 4, thus wasting ≈0.4 of a bit.


  • Immediate shift: the shift distance is stored in the instruction.
  • Immediate read bit: reads a bit for a register, the bit number is specified in the instruction.
  • Reading or writing a bit from a large memory field: Mask (bit wise and) to get upper bits of the address and use these as the byte address, mask (bit wise and) to get lower bits of the address and use these as the bit offset.

We don't always use registers to specify the bit. We often use immediate addressing (the address is in the instruction). Even arm does this, and it does not have immediate addressing (at byte level). When we have the address in a register, then we are often dealing with more than 32 bits, so have to mask the address to get the byte number and the bit number. This only works because of the power of 2.

    I'd like to hear what assembly developers or chip designers think about this. I don't think addressing bits is a common operation and most bits are wasted anyway: if you're addressing a 16-bit register with a 16-bit register, you're wasting 12 bits, and it gets worse as registers grow.
  I think we may be conflating a few things. I believe CAD is talking about the MAR which a CPU pushes a memory address to in order to retrieve the data at that address. en.wikipedia.org/wiki/Memory_address_register . when the MAR is filled, the MDR will then be filled with the data requested. web.archive.org/web/20170328171842/http://www.cs.umd.edu/class/… . the width of the MAR bus (the number of lines into it) determines the overall address size for that architecture. it is usually a Word, whatever the word size on that architecture is.
  You don't normally address individual bits of a register, except in an instructions like a shift by an immediate count, like sar eax, 31 or bts rdx, 55. Most insns don't need a shift-amount field. When a CPU reads / writes registers for an instructions like add eax, ecx, it reads all 32 bits in parallel, addressing the register file by register number. All 32 data lines use the same register-number to address an SRAM cell in parallel, so the bits-within-register "addressing" is just a matter of parallel wires, and is implicit by position in most cases.
  • 2
    Not wasting address-space is a reason why it's normal to have 8, 16, or 32 registers, rather than for each one to be 8, 16, or 32 bits wide. The relevant address space is register numbers (which take space in a machine-code instruction, as in MIPS where each of dst, src1, and src2 take 5 bits: cs.kzoo.edu/cs230/Resources/MIPS/MachineXL/…), not bit-numbers. Although interestingly, MIPS does have enough room in its simple instruction format for an SHAMT field which most instructions don't use, only shifts using it for a shift-amount. Most ISAs aren't as wasteful
  • 2
    Ok, yes, ARM machine code has room for a shift count in most or all instructions, but that's the exception, not the rule. When x86 machine code does have an immediate shift count, it's either 2-bit (in an addressing mode's SIB byte for scaled-index) or 8-bit (masked to 5 or 6) as an imm8 for some instruction like shr/shl/bt. Ease of indexing a large bit-array in memory by decomposing a bit-index into byte-address and bit-within-byte (or within word) is an advantage for power-of-2 register widths, as minor as that is.

The most common reason is because computers use the binary system, where a bit can be either a zero or one. If computers used ternary values for the bits, then we'd have everything in powers of 3.

As regarding RAM/memory:

A number N of bits in an address bus (used to select an address) can address 2^N bytes. Whenever the number of address bits increases to N+1, automatically the addressable space increases by a factor of 2.

The manufacturers will naturally use the maximum address capacity when including memory chips in the design, so memory size will naturally be in powers of two.

As regarding register sizes:

The same reasoning applies, since internally the hardware may address a bit in the register using its number, which again is in binary notation.

(All this is just a supposition and an enormous simplification of the real situation. I'm sure that an electrical engineer will be able to demonstrate why circuits based on the binary logic will naturally use a power-two. As the Intel 8086 has shown, other numbers are possible, but may be costlier to manufacture.)

    the 8086's address bus was exactly 20-bit: if you try to access an address like 0xFFFF:0x0100, you'll hit physical address 0x000F0, not 0x1000F0. And no, it wasn't 32-bit in any sense: all general-purpose registers were 16-bit, and it's (some of) these registers that were used to point to memory (in addition to 16-bit segment registers).
    In practice for N bit physical address space you'd have N separate wires anyhow, at least from the bits of VHDL and Verilog design I remember. Also most (I'd say all, but I'm sure there are some weird rarities out there) 64bit CPUs don't actually have a 64bit address space for many reasons, so the real answer is "they don't". For x64 CPUs for example the architectural limit is 52 bits of physical memory and in practice I think 40 bits are used (TLB caches and the additional indirections become problematic there).
    additional "indirections" (levels of the page tables) are a downside of wider virtual addresses, like PML5 in Ice Lake for 57-bit virtual, so an OS wouldn't enable it unless the 48-bit virtual address-space isn't wide enough. ( Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)?). The physical address width a CPU supports costs extra bits in cache tags and TLB, so it's grown gradually. And I guess in internal wiring for passing data around; externally over DDR memory busses, addresses are broken into row/column.
    As I commented on another answer, the register width being a power of 2 is not really related to addressing bits within registers. That doesn't happen except for shift instructions, CPUs just feed all the bits to the ALU. Addressing is why we have power-of-2 numbers of registers, to not waste coding space of register numbers in machine-code. Register width is a multiple of the byte size (8), and to make address math multiply / modulo cheap in binary we want 2^n bytes

As RonJohn wrote, "computer industry, pushed by IBM, standardized on 8-bit bytes for General Purpose computers".

After that, there are internal advantages with using powers of 2 multiples of that - it allows for everything to be aligned when different size collections of bytes are used, and allows different bits of the address to be routed different places - e.g. which cache-line/block/page you want vs. where in the cache-line/block/page. That would require an ugly (and slow) division if anything other than a power of two was used.

For instance, you might have 32 or 64 bit registers, a 64 or 128 bit memory bus, 128 bit or 256 bit cache lines, 512 byte (=4096 bit) blocks on disk, 4096 byte (=32768 bit) pages in memory, etc. As long as these are all powers of two, the boundaries between them will as much as possible be in the same places, and the addresses all get split up bit-wise for addressing purposes, which leads to simpler hardware.

If for example you were to then throw a 25 bit or 48 bit structure in there and wanted to have an array of them, there would have to be either wasted space or you would end up having elements split across cache lines, memory pages etc. and it would take a division using all bits to determine which element an address was in.

The alignment part of this doesn't matter as much as it used to - for instance modern Intel/AMD chips don't have penalties for misaligned data at the byte level, but quite a bit of engineering has gone into that.

It also would work just fine if the lowest addressable unit was some number of bits as long as powers of 2 are used above that - for instance, if a byte was 7 bits, the other values would be 14, 28, 56, 112 ... . Indeed, many older architectures and some more specialized word addressable CPUs like DSPs use different word sizes, particularly if data and program memory is stored separately (Harvard Architecture).

    If a CPU had to divide a physical address by 3 or 5 or some non-power-of-2 to figure out which cache line to look in, or which memory controller to use for a dual-channel system, that would be a disaster, increasing latency in some important critical paths. (Like L1d cache hit time.) Instead, breaking up addresses into pieces by taking ranges of bits is something we can do with just wiring. Byte-addressable memory is I think a major motivator; odd word sizes aren't a problem in word-addressable machines. (Like 24-bit DSPs.)
    Commented Aug 5, 2022 at 8:42

I'd suggest that it is more a convention, less a technical issue.

Wikipedia's page on the word size choice in computer architecture has a nice table with a lot of older computer architectures with byte/word sizes which are not a power of two. On that page there is also a box in the upper right hand corner with links to individual pages for many unconventional word sizes (e.g., 12, 18, 24 but also 31, 36, 45 (!) etc.

One area where the byte/word size has significant impact is in compilers; i.e. your C compiler needs to know how many bits are available in the addressing registers (i.e. 16/24/32/64 bit), and many aspects flow from that (i.e., how arrays of integers are laid out in memory and how the code to access that is generated by the compiler).

Code built for one word size is then of course also usually incompatible with code built for other word sizes, even if the CPU in question would otherwise use exactly the same coding for their assembly language.

Also, if your data storage is binary - in the old days it was not unheard of to just dump a page of memory into a file and read that back in "as is" later - this completely breaks down if the two machines use different word sizes (even if other aspects, like endianess, are the same).

All of this leads to the industry converging on a smaller and smaller number of word sizes, just like many other technological aspects have converged over the years. As user alt-ctrl-delor has mentioned in his answer, one (maybe minor) aspect here could be that if you have some integer within your CPU, maybe even only internally/in hardware, where you wish to store the location of a bit within an register (i.e., a dynamic value ranging from 0 to the word size of your register), then having a register width that is a power of 2 avoids waste and some error states (i.e., addressing a bit past the non-power-of-two word size).

    Bringing back memories. Back in the 80's I as involved in writing a quantum chemistry suite. Target systems included 24, 32, 36, and 64 bit architectures and both little- and big-endian. Even when using high-level languages this made the coding very interesting.
It's coincidence, or just everyone copying everyone else. There is no deep technical reason that the number of bits should be a power of two. (There is a deep technical reason however that the number of possible values is a power of two).

A processor with 24, or 31, or 60 bit words would work just fine. Processors supporting extended precision have an 80 bit floating point type (that would be the x86 processors today and the 68k processors in the past).

When storing bit arrays, you need to divide integers by the number of bits in a word. There you have a slight advantage for powers of two, but an instruction dividing by one specific non-power of two integer is quite simple. When you want to pack small numbers into a bit array, it's nice if the number of bits in the small number divides the number of bits in a word, so a 60 bit word would allow you to store 60 bits, 30 x 2bits, 20x3 bits, 15x4 bits, 12x5 bits, or 10x6 bits in a word.

For character encodings, having 12 bit bytes instead of 8 would make it possible to store all unicode code points into two bytes instead of 1, 2, 3 or 4, which would make lots of text processing code faster.

BTW: "RAM chips will have a size that is a (power of 2) bits". Until Apple ships computers with 12GB RAM chips... Again no technical reason for a power of 2. I actually expected 10GB per chip :-)

CDC (the company that Cray left to build Cray computers) had sixty bit words. Very nice to fit four 15-bit, two 15 and one 30-bit, or two 30-bit instructions into one word.

  Apple's 12GB RAM chips are almost certainly an internal combo of 8GB and 4GB chip sets.
    – RonJohn
    Commented Aug 4, 2022 at 17:12

Because the computer industry, pushed by IBM, standardized on 8-bit bytes for General Purpose computers.

The consequence of this is that it's more convenient for registers to have bit counts in multiples of 8:

  • 8 - 1 byte per register
  • 16 - 2 bytes per register
  • 24 - 3 bytes per register
  • 32 - 4 bytes per register
  • 40 - 5 bytes per register
  • 48 - 6 bytes per register
  • 56 - 7 bytes per register
  • 64 - 8 bytes per register

As you can see, 8- and 16-bit counts are obvious.

Why not 24 bits? Because Moore's Law meant that transistor counts were growing fast enough that designers could go straight from 16 to 32 bits.

After 32 bits, why not 48 bits?

Well, we did, but for the address bus. In the registers, they bit the bullet and went for 64 bits.

If computers ever need more than 256TB of RAM, then chip and motherboard makers easily increase the address bus from 48 to 64 bits while leaving the registers untouched.

    That is not Moore's Law.
  ambiguous phrasing. I meant that Moore's Law meant that transistor counts were growing fast enough that designers could go straight from 16 to 32 bits.
    – RonJohn
    Commented Aug 5, 2022 at 6:37
    – Bergi
    note though that there actually are 24-bit chips: DSPs (digital signal processors). Because they are specialized, high volume and price sensitive, it makes sense to only build them as large as needed, and 24 bits is more than Good Enough for human ears, no matter what audiophiles say.
I think some of the arguments missing is that it is convenient that a register can be divided into two parts which are easy to work with, for instance with smaller registers. Hence, for the intel architecture, you initially had 16 bits registers that were subdivided into two 8-bit subregisters, and then these registers were extended to 32 bits, keeping the least significant digits addressable as 16 bits register, and then to 64 bits. If you had a 48 bits register, it would leave an awkward 16 most significant bits part, and you would have to play with 16 bits and 32 bits subregisters... What a pain !

Moreover, when you multiply the contents of two registers, you need a register of twice that size (or two registers of the initial size) to contain the result. Having registers that cannot be combined two by two would make things complicated for arithmetic.

I guess this view is supported by the fact that, for addressing (and not arithmetic), the processors internally use other numbers of bits.

  I don't think partial registers (for backwards compat and as a result of extensions) is a very strong argument. It really only applies to x86 (including x86-64), and the number of bits that go unused when doing 32-bit operations isn't particularly significant. It's fairly rare to do a 64-bit load of two 32-bit values and rol rax, 32 to swap them. Only AL and AH 8-bit halves are separately usable, not high-16 or high-32 partial registers. Why is there not a register that contains the higher bytes of EAX?
  And some more links in Why aren't the higher 16-bits in EAX accessible by name (like AX, AH and AL)?. If AMD64 had been AMD48, with a 48-bit RAX, we'd still have a 32-bit EAX. 6-byte push/pop creating misalignment unless they padded out to qwords would be obvious problems, and lack of an 8-byte store to conveniently set 2 dwords at once would be less good. But for actually working with data in GP registers, essentially irrelevant. If you want to keep multiple values in a register at once, that's what 128-bit XMM registers are fo.
  With a 48-bit RAX, the only 16-bit partial register would still be AX, the low 16 bits. Also, x86 (like other ISAs) has always done widening multiply / narrowing division by using multiple registers, not a single wider register. e.g. mul rcx produces a 128-bit result from RAX * RCX in RDX:RAX. If that was 48x48 => 96-bit instead of 64x64 => 128-bit, not a showstopper. Other ISAs are the same way, like ARM umull or RISC-V instructions that produce either the low-half or high-half product. (Perhaps you're thinking of x86 mul cl doing AH:AL = al*cl, which happens to work as AX)

Generally, processors are referred to as x-bit, where x is the width of the data bus.

In rough terms, increasing the data bus size offers primarily speed as a reward...twice the bits, twice the data access speed. This is the ideal; some inefficiencies are involved (e.g., fetching 1 byte still takes 1 cycle on a 32-bit system, same as fetching 4 bytes).

As ctrl-alt-delor points out, there are some logistical advantages to stick with a bus width of 2**N. The only restriction of following this pattern is that you have to double the width every time you introduce a wider architecture, giving a max access speed increase of 2.

And in the end, it's hard to justify a new architecture if it's not at least twice as fast as the old one.

Note that low end microcontrollers offer many exceptions to this rule, and the properties of audio and human perception have made 12 bits a popular width for DSPs.

    then Pentium Pro is a 64-bit CPU. Or 36-bit, if we go by the address bus.
    – Silbee
    – Ruslan
    The point is that the claim that the "x" refers to the width of the data bus is simply not true for many, many CPUs. Sandy Bridge and onward had 128 bit wide data buses to main memory iirc (and hell even the P4 had data busses of 256 bit to its caches).
    – Silbee
CPU register width tends to grow faster than the ability to cheaply get RAM to fill it all.

You can save money on developing and manufacturing your CPU by not adding pins, such as address pins, that will never be used.

Furthermore the 8086 was really weird - addresses were expressed using segments. 20 bits address lines made sense for that.

A segment was basically a value shifted 4 bits, and it was combined with an offset to get the real address. E.g 0000h:1000h and 0001h:0100h referred to the same address. This allowed some interesting tricks with making code relocatable I think, but did make C-style pointers complicated.

386 and other CPU architectures with MMUs change the game up a bit.

Physical addresses only matter if

  • your MMU is "identity mapped" and your mapping is 1:1 from virtual->physical.
  • you're in the "ring 0" or kernel/supervisor mode that allows access to all memory.

When you are in the "ring 3" or user mode, your access is constrained by the MMU - accessing pages that are marked specially will cause a fault to kernel mode.

Now of course, this can be used to limit a user process to a specific amount of memory. But it can also be used to implement swap files and mmap - a UNIX system call that essentially leverages the MMU to make it look like a file is mapped to memory. This simplifies access to it for programs that need to deal with large amounts of data and random access, such as databases.

So, 64 address bits is a lot of room to map files in and some random high address in the 64-bit space might be chosen - but it's never meant to reach actual RAM in the first place.

I can't really find good pinouts of any modern CPU but I doubt they physically bring out 64 address lines for each of the RAM channels. But there is still value in having all registers that refer to memory be 64 bit due to the above.

  • 2
  • 2
  • 1
  • 1
  • 1
I'll try with an intuitive and non-technical answer, which I think addresses the core of the issue. Computers work on a binary number base, and powers of 2 in binary are "nice" numbers. Just like in base 10 the powers of 10 (100, 1000, 10000 and so on) are trivial to calculate, so it happens that the powers of 2 are trivial in binary:

2^0 = 1     0000 0000  0000 0001
2^1 = 2     0000 0000  0000 0010
2^2 = 4     0000 0000  0000 0100
2^3 = 8     0000 0000  0000 1000
2^4 = 16    0000 0000  0001 0000
2^5 = 32    0000 0000  0010 0000
2^6 = 64    0000 0000  0100 0000
2^7 = 128   0000 0000  1000 0000
2^8 = 256   0000 0001  0000 0000
2^9 = 512   0000 0010  0000 0000
2^10= 1.024 0000 0100  0000 0000

As you can see the power of 2 matches the length of the maximum number that can be represented with that number of bits. For example, an unsigned 8 bit number can be any decimal number between 0 and 255 (binary 1111 1111).

Increasing the registry size of a power of 2 means that the processor can use addresses that are 1 order of magnitude larger.

Therefore while it makes sense to have a 16, 32 or 64 bit processor it would not make sense to have a 35 bit processors, as that wouldn't match a whole order of magnitude increase in addressing capabilities. And it would seem to me that it would make things unnecessary complicated.


If I remember the math theory correctly, ideal numerical system has e numbers, where 3 is the closest integer number to it. That means that 3 would be better from a theory point of view, then a binary system, but the implementation in hardware is easier to do with two numbers then 3.

For a bit more details, see this: https://math.stackexchange.com/questions/446664/what-is-the-most-efficient-numerical-base-system

    As David Bandel commented on that math Q&A, this is "efficiency" in terms of number of digits to express an arbitrary real number, times the number of different symbols (digits), i.e. the base. This is basically irrelevant for computing; most numbers in programs are integers, not reals. And this is a fairly arbitrary definition of efficiency. On a ternary computer (3 states per trit), yes power-of-3 sizes would be natural, but not really for that math reason. Commented Aug 5, 2022 at 3:32
