Why are most of the common processors' bit counts powers of 2?

Question

Most of the processors/CPUs widely used today, have a bit count that is a power of 2 (usually 32 and 64, but also 16, 8, and 4 bits).

Even though the meaning of bit count isn't consistent (some say it's the word size, the size of the registers, the instruction width, the data or address bus width etc.), all of these are almost always powers of 2.

I know there are some exceptions to this, for example the Intel 8086 had a 20 bit address bus, but as I said it is usually a power of 2.

Why does this happen, what are some exceptions, and why?

as a supplement to the answers provided you may want to review this: https://superuser.com/a/1563097/171793 . — Frank Thomas, Aug 03 '22 at 16:35
8086 is a 16-bit CPU that supports a way to address more memory. It's not rare for an N-bit CPU to support some features to allow N+m address bits, often with small m. Designs like these were common in the years before transistor budgets were ready to make the leap to 2N-bit CPUs. (e.g. 8086 came late-ish in the 16-bit era, when high-end CPUs were soon or already 32-bit in at least some ways, like register and address width, such as M68k). Or as an extension to squeeze more life out of a 32-bit design with existing software, like x86 PAE, or 32-bit PowerPC has some special regs I think. — Peter Cordes, Aug 04 '22 at 05:36
Historically there have been wide ranges of bitnesses, including registers and byte/word widths, not just address busses. [Were there ever 12-, 24-, 48-, etc bit processors?](https://retrocomputing.stackexchange.com/q/12794) and [Have there been any instruction sets with an odd register width?](https://retrocomputing.stackexchange.com/q/22126). But as for why, [What was the rationale behind 36 bit computer architectures?](https://retrocomputing.stackexchange.com/q/11801) has some attempts. Once 8-bit bytes became standard, multiples of that are obvious, and power of 2 B means no mul/div by 3 — Peter Cordes, Aug 04 '22 at 06:35
On electronics.SE [For mainstream computing what are the practical advantages of 64-bit register size CPUs given the needs of today and the near future?](https://electronics.stackexchange.com/q/467614) was originally titled *Why did chip designers choose to jump from 32-bit to 64-bit CPUs?* and has several answers suggesting reasons why 64 is the next logical step after 32, instead of say 48. — Peter Cordes, Aug 04 '22 at 07:55
From the _processor's_ standpoint, I assume it's so that an _address_ is an integer number of bytes. The motherboard, on the other hand, may discard any of the bits of the address provided by the CPU to map an address to either physical memory or IO devices. e.g. You could use the most significant bits of the address to select between memory and various IO devices, and may only ever need _n_ of the least significant bits of the address depending on how much memory/io space is mapped. But the processor doesn't care what the rest of the computer does with its address bits. — Wyck, Aug 04 '22 at 14:22
Re: addressing more than 2^32 bytes of memory in 32-bit RISC ISAs like PowerPC or PA-RISC: see a note at the bottom of [RISC access address greater than largest integer register](https://stackoverflow.com/a/50750122). Yes, PowerPC has an extra 4 bits of address that can be set by the kernel. — Peter Cordes, Aug 05 '22 at 20:28
[18 bit was also common](https://en.wikipedia.org/wiki/18-bit_computing), incl. some PDPs (but *not* the famous [PDP-11](https://en.wikipedia.org/wiki/PDP-11) (birthplace of [Unix](https://en.wikipedia.org/wiki/Unix) and [C](https://en.wikipedia.org/wiki/C_%28programming_language%29))) - *"18 bits was a common word size for smaller computers in the 1960s, when large computers often used 36 bit words and 6-bit character sets, sometimes implemented as extensions of [BCD](https://en.wikipedia.org/wiki/BCD_(character_encoding))"* — Peter Mortensen, Aug 05 '22 at 22:22

John Dallman · Answer 1 · 2022-08-16T16:41:50.893

8-bit bytes

Much of this grows out of the adoption of the 8-bit byte. That became popular with the introduction of the IBM 360 family of computers in 1964. In an issue that year of the IBM Technical Journal, an explanation of the choice was offered:

Character size, 6 vs 4/8: In character size, the fundamental problem is that decimal digits require 4 bits, the alphanumeric characters require 6 bits. Three obvious alternatives were considered - 6 bits for all, with 2 bits wasted on numeric data; 4 bits for digits, 8 for alphanumeric, with 2 bits wasted on alphanumeric; and 4 bits for digits, 6 for alphanumeric, which would require adoption of a 12-bit module as the minimum addressable element. The 7-bit character, which incorporated a binary recoding of decimal digit pairs, was also briefly examined.

The 4/6 approach was rejected because (a) [it] was desired it to have the versatility and power of manipulating character streams and addressing individual characters, even in models where decimal arithmetic is not used, (b) limiting the alphabetic character to 6 bits seemed short-sighted, and (c) the engineering complexities of this approach might well cost more than the wasted bits in the character.

The straight-6 approach, used in the IBM 702-7080 and 1401-7010 families, as well as in other manufacturers' systems, had the advantages of familiar usage, existing I/O equipment, simple specification field structure, and of commensurability with a 48-bit floating-point word and a 24-bit instruction field.

The 4/8 approach, used in the IBM 650-7074 family and elsewhere, had greater coding efficiency, spare bits in the alphabetic set (allowing the set to grow), and commensurability with a 32/64-bit floating-point word and a 16-bit instruction field. Most important of these factors was coding efficiency, which arises from the fact that the use of numeric data in business records is more than twice as frequent as alphanumeric. This efficiency implies, for a given hardware investment, better use of core storage, faster tapes, and more capacious disks.

Overall, an 8-bit byte allowed a reasonably large character set, by the standards of the time, and also allowed two BCD digits per byte.

The move to byte addressing

The priority in the earliest computer designs was to process numbers as rapidly as possible. A number was typically stored in a machine word, and the desired numerical range determined the size of the word. Instructions were normally a single word, and there was often a single address as part of each instruction. The size of the address field in instructions determined the memory size. The IBM 704/709 is an example; it had a maximum of 4096 words of 36 bits, with six characters per word, each of 6 bits. Addresses are 12 bits.

As the range of uses for computers expanded, handling text data became more and more important. Doing that in a word-addressed machine is cumbersome, at best. A byte-addressed machine allows you to access individual characters easily, but demands a larger address field. At the same time, magnetic core memory allowed building much larger memories than vacuum tubes, electrostatic storage or delay lines.

These developments essentially forced computers to have larger address spaces, and ended the practice of having an address in each instruction.

Larger Data Items

It obviously makes things simpler to have a whole number of bytes per data item. Simplicity at this level is extremely worthwhile, because it's always been important to make a computer run as fast as possible within a limited budget of electronics parts (tubes early on, transistors since then). So two bytes (16 bits) becomes an obvious size.

For larger sizes, there are two factors that show up in the electronics design:

Counting things

Implementing instructions often requires counting through the bytes (or bits) of data items. Using powers of two makes the electronics of those counters simpler. To count through 4 bytes, you need a two-bit counter, which can hold values from 0 to 3. Counting through three bytes still needs a two-bit counter, but one of its values is meaningless and has to be treated as a special case in hardware.

Sending data over a serial line requires counting through the bits of each item, which is another benefit of 8-bit bytes. A 3-bit counter will handle them, without any need for special cases.

The IBM 360 picked 32-bit addresses (although it only allowed 24-bit memory addresses for its first decade), and once that was established, it was far easier to compete with IBM using 8-bit bytes and 32-bit addresses than if you wanted to do something different.

Memory fetches and data alignment

Fetching data from memory is simpler if data items are "aligned". This means that their addresses are a multiple of their size. So for a byte-addressed machine, like the IBM 360, a single byte can be at any address. A two-byte (16-bit) item as "aligned" if it is at an even-numbered address. A four-byte (32-bit) item is aligned if its address is a multiple of 4.

Many computer designs of the 1960s through 1990s had memories that could fetch 4 bytes in one operation, starting from an address that was a multiple of 4. If your data items are aligned, then you're guaranteed to be able to fetch any two- or four-byte item in a single read from memory. If they are not aligned, you sometimes need two fetches. That requires more complexity in the memory access system, to recognise that the operation is misaligned and generate the extra fetch. That complexity, and the extra fetch, slow things down.

Items bigger than four bytes will need two fetches, but life is simpler if your larger items are eight bytes, and aligned on 8-byte boundaries. Then you always need exactly two fetches. If you have 8-byte items that are not aligned, then you need three fetches.

In modern fast systems, fetches are always of complete cache lines, usually 32 or 64 bytes. These are always aligned, and aligned data items that fit inside them always arrive complete.

Quite a few computer designs regard a misaligned fetch as a program bug, and kill programs that execute one. x86-based systems don't do that, but have to pay the complexity price. They do run faster with aligned data, so that is normally used even though it is not compulsory.

24-bit systems

I've used a 24-bit system, an ICL 1900 mainframe. It used 6-bit bytes, four per 24-bit item. Those 6-bit bytes limited it to UPPERCASE text, and 24-bit pointers limited it to 16MB of RAM, which is tiny by today's standards.

A more modern 24-bit system with 8-bit bytes would still be limited to 16MB of easily addressable memory, and would be paying the costs of counters with unwanted states, and memory items that were either misaligned, or wasted a byte of memory for every 24-bit integer. A 32-bit system would be more capable, and can be built very cheaply in today's technology.

Lessons of history

There have ben a couple of influential computer systems that had 32-bit integers and pointers, but used 24-bit addressing. They're the Motorola 68000 and the IBM 360. In both cases, only the lowest 24 bits of an address were used, but addresses were stored in memory in 32 bits.

As those systems were limited to 16MB of RAM, programmers stored other data in the spare 8 bits. And when 16MB of RAM clearly wasn't enough and the designs were expanded to 32-bit addressing, that data stored in spare bits became a serious problem, if it was treated as part of the address.

On the 68000 family, existing programs had to be changed to stop using those no-longer-spare bits. This was most noticeable in the wider computer industry for Macintosh software in the late 1980s, when updating for 68020 compatibility, but the same thing happened on Amiga, and presumably other 68000-based systems.

On the successors of the IBM 360, 24-bit address programs could still be run, as could programs using larger addresses. But only 31 of the potential 32 address bits could be used; an address bit had been sacrificed to let the hardware tell the difference between the two kinds of code.

Everyone who designed a general-purpose architecture with addressing larger than 32 bits knew of those examples, and how much pain they'd caused. So let's look at the choices of address size:

40-bit addressing involves electronic and alignment complexity, and clearly wasn't going to last very long. It only allows addressing 1024GB, and as of 2022, that would already have become a problem for some markets.
48- or 56-bit addressing are about as complex as 40-bit, and while they probably would last rather longer, by the time you've gone this far, you might as well go all the way.
64-bit is simpler to build than 40-, 48- or 56-bit. It will last longer. Its register size matches standard floating-point data sizes. It seems logical.

The first general-purpose post-32-bit microprocessor released was the DEC Alpha in 1992. The project had started in 1988, initially aiming to keep the 32-bit VAX architecture relevant in the long term. The designers rapidly realised that this was impractical, and designed a new architecture, intended to last at least 25 years. They therefore went for 64-bit addressing, to make sure that they didn't run out of address space.

Releasing a competitor to Alpha which wasn't 64-bit would obviously have a marketing problem with "why isn't it 64-bit?" questions. So 64-bit became the consensus. The much newer RISC-V architecture makes some provision for 128-bit addressing, although this has not yet been designed.

An important detail: no current 64-bit processor can actually have 64-bits worth of memory connected to it. None of them have enough address lines. This does not matter. Future implementations can be given more address lines. Programmers have to be discouraged from using the "spare" address bits, but that's practical to do, and operating systems can be designed to reject such usage.

Abuse of the "spare" byte in the Motorola 68000's 32-bit pointers was a problem on the Amiga as well as the Macintosh. Particularly game and demo coders would do this to eke out a little more efficiency, and it all fell apart when later Amiga models came out with 68020+ processors. — Carson63000, Aug 10 '22 at 01:53

ctrl-alt-delor · Answer 2 · 2022-08-04T11:01:07.610

15

Having 2ⁿ bit registers, allows bits in the registers to be addressed with an integer number of bits:

Addressing a bit in an 8 bit register will need 3 bits. Addressing a bit in a 16 bit register will need 4 bits. But, addressing a bit in a 12 bit register will need 3.58… bits. You would have to round it up to 4, thus wasting ≈0.4 of a bit.

Example

Immediate shift: the shift distance is stored in the instruction.
Immediate read bit: reads a bit for a register, the bit number is specified in the instruction.
Reading or writing a bit from a large memory field: Mask (bit wise and) to get upper bits of the address and use these as the byte address, mask (bit wise and) to get lower bits of the address and use these as the bit offset.

We don't always use registers to specify the bit. We often use immediate addressing (the address is in the instruction). Even arm does this, and it does not have immediate addressing (at byte level). When we have the address in a register, then we are often dealing with more than 32 bits, so have to mask the address to get the byte number and the bit number. This only works because of the power of 2.

edited Aug 04 '22 at 11:01

answered Aug 03 '22 at 18:20

ctrl-alt-delor

2,326
19
29

12

I'd like to hear what assembly developers or chip designers think about this. I don't think addressing bits is a common operation and most bits are wasted anyway: if you're addressing a 16-bit register with a 16-bit register, you're wasting 12 bits, and it gets worse as registers grow. – gronostaj Aug 03 '22 at 19:02
I think we may be conflating a few things. I believe CAD is talking about the MAR which a CPU pushes a memory address to in order to retrieve the data at that address. https://en.wikipedia.org/wiki/Memory_address_register . when the MAR is filled, the MDR will then be filled with the data requested. https://web.archive.org/web/20170328171842/http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Overall/mar.html . the width of the MAR bus (the number of lines into it) determines the overall address size for that architecture. it is usually a Word, whatever the word size on that architecture is. – Frank Thomas Aug 03 '22 at 21:55
You don't normally address individual bits of a register, except in an instructions like a shift by an immediate count, like `sar eax, 31` or `bts rdx, 55`. Most insns don't need a shift-amount field. When a CPU reads / writes registers for an instructions like `add eax, ecx`, it reads all 32 bits in parallel, addressing the register file by register number. All 32 data lines use the same register-number to address an SRAM cell in parallel, so the bits-within-register "addressing" is just a matter of parallel wires, and is implicit by position in most cases. (@gronostaj since you asekd) – Peter Cordes Aug 04 '22 at 06:45
2

Not wasting address-space is a reason why it's normal to have 8, 16, or 32 registers, rather than for each one to be 8, 16, or 32 bits wide. The relevant address space is register numbers (which take space in a machine-code instruction, as in MIPS where each of dst, src1, and src2 take 5 bits: http://www.cs.kzoo.edu/cs230/Resources/MIPS/MachineXL/InstructionFormats.html), not bit-numbers. Although interestingly, MIPS does have enough room in its simple instruction format for an SHAMT field which most instructions don't use, only shifts using it for a shift-amount. Most ISAs aren't as wasteful – Peter Cordes Aug 04 '22 at 06:46
@PeterCordes on the ARM, most instruction do have the option to apply a shift. This may seem wasteful, but the ARM designers knew that compilers would use it a lot. – ctrl-alt-delor Aug 04 '22 at 10:56
@gronostaj I edited the question to address your comment. And maybe some others. – ctrl-alt-delor Aug 04 '22 at 11:01
2

Ok, yes, ARM machine code has room for a shift count in most or all instructions, but that's the exception, not the rule. When x86 machine code does have an immediate shift count, it's either 2-bit (in an addressing mode's SIB byte for scaled-index) or 8-bit (masked to 5 or 6) as an `imm8` for some instruction like `shr/shl/bt`. Ease of indexing a large bit-array in memory by decomposing a bit-index into byte-address and bit-within-byte (or within word) is an advantage for power-of-2 register widths, as minor as that is. – Peter Cordes Aug 04 '22 at 11:28
PowerPC instructions mostly use 31 registers - the encoding that would be "register 0" is used for "constant 0" instead, 2^n - 2 with a special encoding for constants 0 and 1 would also be useful. – gnasher729 Aug 04 '22 at 16:38
@gnasher729 that's a quite typical RISCy-thing, because [almost all RISC architectures that have 32 or more registers have a dedicated zero registers](https://stackoverflow.com/a/52438497/995714) – phuclv Aug 06 '22 at 03:38

score 10 · Answer 3 · edited Aug 05 '22 at 22:15

10

The most common reason is because computers use the binary system, where a bit can be either a zero or one. If computers used ternary values for the bits, then we'd have everything in powers of 3.

As regarding RAM/memory:

A number N of bits in an address bus (used to select an address) can address 2^N bytes. Whenever the number of address bits increases to N+1, automatically the addressable space increases by a factor of 2.

The manufacturers will naturally use the maximum address capacity when including memory chips in the design, so memory size will naturally be in powers of two.

As regarding register sizes:

The same reasoning applies, since internally the hardware may address a bit in the register using its number, which again is in binary notation.

(All this is just a supposition and an enormous simplification of the real situation. I'm sure that an electrical engineer will be able to demonstrate why circuits based on the binary logic will naturally use a power-two. As the Intel 8086 has shown, other numbers are possible, but may be costlier to manufacture.)

edited Aug 05 '22 at 22:15

Peter Mortensen

12,090
23
70
90

answered Aug 03 '22 at 14:36

harrymc

455,459
31
526
924

1

+1 although of course quantum computers are an exception to this rule, they are not binary based. – LPChip Aug 03 '22 at 15:04
1

I *think* the 8086 was 32-bit, but used "compression" or an implicit value based on segment, and then the 20 bits is 4 bits + 16 bit offset. something like {implicit segment}: 4 bit + offset. The actual data stored was 16bits but was...8bit values contiguous(??). In old windows API, they used "highbit lowbit" pairs often – Yorik Aug 03 '22 at 19:13
in plain English, though, as you said: for each "column" you add to a number representation, you implicitly create the full compliment of numbers for that column. This is true regardless of the base, and is a consequence of numeric representations, not computer design *per se*. – Yorik Aug 03 '22 at 19:20
11

@Yorik the 8086's address bus was exactly 20-bit: if you try to access an address like 0xFFFF:0x0100, you'll hit physical address 0x000F0, not 0x1000F0. And no, it wasn't 32-bit in any sense: all general-purpose registers were 16-bit, and it's (some of) these registers that were used to point to memory (in addition to 16-bit segment registers). – Ruslan Aug 03 '22 at 20:59
1

@harrymc bit addressing is a rarely used operation, I don't think it had any significant effect on the choices of CPU register sizes. – Ruslan Aug 03 '22 at 21:03
In fact, the x87 FPU has 80-bit registers. – wizzwizz4 Aug 03 '22 at 21:04
@wizzwizz4 but they can't be used to address anything (and one never calls x86+x87 system an 80-bit machine). – Ruslan Aug 03 '22 at 21:08
5

In practice for N bit physical address space you'd have N separate wires anyhow, at least from the bits of VHDL and Verilog design I remember. Also most (I'd say all, but I'm sure there are some weird rarities out there) 64bit CPUs don't actually have a 64bit address space for many reasons, so the real answer is "they don't". For x64 CPUs for example the architectural limit is 52 bits of physical memory and in practice I think 40 bits are used (TLB caches and the additional indirections become problematic there). – Voo Aug 03 '22 at 21:36
@Voo actually, x86_64 has gone from 40 bits to 48, and now going to 52 (though not sure whether this one is already implemented). So more than 40 bits have been used and appeared insufficient. – Ruslan Aug 03 '22 at 22:08
1

@Voo The address space is 64-bit, but the _physical_ address space is less. The first is important for software design, the second is important for hardware design. – Luaan Aug 04 '22 at 06:23
2

@Voo: additional "indirections" (levels of the page tables) are a downside of wider *virtual* addresses, like PML5 in Ice Lake for 57-bit virtual, so an OS wouldn't enable it unless the 48-bit virtual address-space isn't wide enough. ( [Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)?](//stackoverflow.com/q/46509152)). The *physical* address width a CPU supports costs extra bits in cache tags and TLB, so it's grown gradually. And I guess in internal wiring for passing data around; externally over DDR memory busses, addresses are broken into row/column. – Peter Cordes Aug 04 '22 at 06:40
@Luaan Yes I know that (I mean I explicitly said physical address space after all), but we're talking about hardware design and whether N bit physical address space gives any advantages (or whether the original assumption is true to begin with, which I rather doubt) – Voo Aug 04 '22 at 06:54
@Peter Ah yes right, obviously you index with the virtual address. The main point that many N-bit CPUs don't actually have a N-bit physical address space is true I think though :-) – Voo Aug 04 '22 at 06:57
4

@Voo: Yeah for sure. x86-64's current page-table format limits the max possible physical address size to 52-bit. We're not bumping into that just yet, but getting close. For example a [Westmere Xeon E7-4820 from 2011](https://en.wikichip.org/wiki/intel/xeon_e7/e7-4820) supported 44-bit physical addresses. (That includes not just DRAM but PCIe address space, in case that matters.) And a [Skylake Xeon-D](https://en.wikichip.org/wiki/intel/xeon_d/d-2191) supported 46-bit. (That's separate from the max DRAM you can attach to a single socket; huge RAM needs multi-socket) – Peter Cordes Aug 04 '22 at 07:15
@Voo: Before Intel folded up their Optane business, Optane DIMMs were high-capacity storage in physical address space, and thus one driving factor for server CPU supporting larger physical width. Also semi-related: [Why 4-level paging can only cover 64 TiB of physical address](https://stackoverflow.com/q/72417308) (Linux wants 4x the virtual address space, 2 extra bits, compared to physical. So PML5 is already required for Linux to use all the RAM in the biggest servers. IIRC I've heard of 128 or 256 TiB RAM, but only found https://news.ycombinator.com/item?id=7302770 (64TB in 2014) – Peter Cordes Aug 04 '22 at 07:24
7

As I [commented](https://superuser.com/questions/1735122/why-are-most-the-common-processors-bit-count-a-power-of-2#comment2686089_1735188) on another answer, the register width being a power of 2 is not really related to addressing bits within registers. That doesn't happen except for shift instructions, CPUs just feed all the bits to the ALU. Addressing is why we have power-of-2 *numbers* of registers, to not waste coding space of register numbers in machine-code. Register width is a multiple of the byte size (8), and to make address math multiply / modulo cheap in binary we want 2^n bytes – Peter Cordes Aug 04 '22 at 07:36
"...because computers ARBITRARILY use the binary system..." yes. – Raydot Aug 05 '22 at 21:06
@Ruslan to be fair, the address bus in 286/386 real mode is [21 bits](https://en.wikipedia.org/wiki/A20_line) which is called the [A20 line feature](https://wiki.osdev.org/A20_Line) – phuclv Aug 06 '22 at 03:45
1

@phuclv well no, it was 24-bit in 286 and 386SX, and 26-bit in 386EX. The A20 was basically an AND gate that controlled one of the address bits, but it wasn't at the highest bit. – Ruslan Aug 06 '22 at 07:51

MadMan · Answer 4 · 2022-08-05T09:05:48.247

As RonJohn wrote, "computer industry, pushed by IBM, standardized on 8-bit bytes for General Purpose computers".

After that, there are internal advantages with using powers of 2 multiples of that - it allows for everything to be aligned when different size collections of bytes are used, and allows different bits of the address to be routed different places - e.g. which cache-line/block/page you want vs. where in the cache-line/block/page. That would require an ugly (and slow) division if anything other than a power of two was used.

For instance, you might have 32 or 64 bit registers, a 64 or 128 bit memory bus, 128 bit or 256 bit cache lines, 512 byte (=4096 bit) blocks on disk, 4096 byte (=32768 bit) pages in memory, etc. As long as these are all powers of two, the boundaries between them will as much as possible be in the same places, and the addresses all get split up bit-wise for addressing purposes, which leads to simpler hardware.

If for example you were to then throw a 25 bit or 48 bit structure in there and wanted to have an array of them, there would have to be either wasted space or you would end up having elements split across cache lines, memory pages etc. and it would take a division using all bits to determine which element an address was in.

The alignment part of this doesn't matter as much as it used to - for instance modern Intel/AMD chips don't have penalties for misaligned data at the byte level, but quite a bit of engineering has gone into that.

It also would work just fine if the lowest addressable unit was some number of bits as long as powers of 2 are used above that - for instance, if a byte was 7 bits, the other values would be 14, 28, 56, 112 ... . Indeed, many older architectures and some more specialized word addressable CPUs like DSPs use different word sizes, particularly if data and program memory is stored separately (Harvard Architecture).

If a CPU had to divide a physical address by 3 or 5 or some non-power-of-2 to figure out which cache line to look in, or which memory controller to use for a dual-channel system, that would be a disaster, increasing latency in some important critical paths. (Like L1d cache hit time.) Instead, breaking up addresses into pieces by taking ranges of bits is something we can do with just wiring. Byte-addressable memory is I think a major motivator; odd word sizes aren't a problem in word-addressable machines. (Like 24-bit DSPs.) — Peter Cordes, Aug 05 '22 at 03:34
@PeterCordes I should have made that a bigger part of my answer - I thought of that halfway through, but was a but rushed. Editing to make that clearer. — MadMan, Aug 05 '22 at 08:42

AnoE · Answer 5 · 2022-08-04T15:00:39.020

I'd suggest that it is more a convention, less a technical issue.

Wikipedia's page on the word size choice in computer architecture has a nice table with a lot of older computer architectures with byte/word sizes which are not a power of two. On that page there is also a box in the upper right hand corner with links to individual pages for many unconventional word sizes (e.g., 12, 18, 24 but also 31, 36, 45 (!) etc.

One area where the byte/word size has significant impact is in compilers; i.e. your C compiler needs to know how many bits are available in the addressing registers (i.e. 16/24/32/64 bit), and many aspects flow from that (i.e., how arrays of integers are laid out in memory and how the code to access that is generated by the compiler).

Code built for one word size is then of course also usually incompatible with code built for other word sizes, even if the CPU in question would otherwise use exactly the same coding for their assembly language.

Also, if your data storage is binary - in the old days it was not unheard of to just dump a page of memory into a file and read that back in "as is" later - this completely breaks down if the two machines use different word sizes (even if other aspects, like endianess, are the same).

All of this leads to the industry converging on a smaller and smaller number of word sizes, just like many other technological aspects have converged over the years. As user alt-ctrl-delor has mentioned in his answer, one (maybe minor) aspect here could be that if you have some integer within your CPU, maybe even only internally/in hardware, where you wish to store the location of a bit within an register (i.e., a dynamic value ranging from 0 to the word size of your register), then having a register width that is a power of 2 avoids waste and some error states (i.e., addressing a bit past the non-power-of-two word size).

Bringing back memories. Back in the 80's I as involved in writing a quantum chemistry suite. Target systems included 24, 32, 36, and 64 bit architectures and both little- and big-endian. Even when using high-level languages this made the coding very interesting. — doneal24, Aug 04 '22 at 18:12

gnasher729 · Answer 6 · 2022-08-04T16:44:21.033

It's coincidence, or just everyone copying everyone else. There is no deep technical reason that the number of bits should be a power of two. (There is a deep technical reason however that the number of possible values is a power of two).

A processor with 24, or 31, or 60 bit words would work just fine. Processors supporting extended precision have an 80 bit floating point type (that would be the x86 processors today and the 68k processors in the past).

When storing bit arrays, you need to divide integers by the number of bits in a word. There you have a slight advantage for powers of two, but an instruction dividing by one specific non-power of two integer is quite simple. When you want to pack small numbers into a bit array, it's nice if the number of bits in the small number divides the number of bits in a word, so a 60 bit word would allow you to store 60 bits, 30 x 2bits, 20x3 bits, 15x4 bits, 12x5 bits, or 10x6 bits in a word.

For character encodings, having 12 bit bytes instead of 8 would make it possible to store all unicode code points into two bytes instead of 1, 2, 3 or 4, which would make lots of text processing code faster.

BTW: "RAM chips will have a size that is a (power of 2) bits". Until Apple ships computers with 12GB RAM chips... Again no technical reason for a power of 2. I actually expected 10GB per chip :-)

CDC (the company that Cray left to build Cray computers) had sixty bit words. Very nice to fit four 15-bit, two 15 and one 30-bit, or two 30-bit instructions into one word.

Apple's 12GB RAM chips are almost certainly an internal combo of 8GB and 4GB chip sets. — RonJohn, Aug 04 '22 at 17:12

RonJohn · Answer 7 · 2022-08-05T06:39:03.383

2

Because the computer industry, pushed by IBM, standardized on 8-bit bytes for General Purpose computers.

The consequence of this is that it's more convenient for registers to have bit counts in multiples of 8:

8 - 1 byte per register
16 - 2 bytes per register
24 - 3 bytes per register
32 - 4 bytes per register
40 - 5 bytes per register
48 - 6 bytes per register
56 - 7 bytes per register
64 - 8 bytes per register

As you can see, 8- and 16-bit counts are obvious.

Why not 24 bits? Because Moore's Law meant that transistor counts were growing fast enough that designers could go straight from 16 to 32 bits.

After 32 bits, why not 48 bits?

Well, we did, but for the address bus. In the registers, they bit the bullet and went for 64 bits.

If computers ever need more than 256TB of RAM, then chip and motherboard makers easily increase the address bus from 48 to 64 bits while leaving the registers untouched.

edited Aug 05 '22 at 06:39

answered Aug 04 '22 at 17:33

RonJohn

253
2
12

2

That is not Moore's Law. – Bergi Aug 04 '22 at 18:49
@Bergi ambiguous phrasing. I meant that Moore's Law meant that transistor counts were growing fast enough that designers could go straight from 16 to 32 bits. – RonJohn Aug 05 '22 at 06:37
The address bus of x64 CPUs has been 40bit, 44bit and is now at 46 (or possibly 48, I haven't checked the latest versions). The architectural maximum is a nice round 52 bits. So the claim that powers of 2 have anything to do with the address bus are very, very suspect (if you think how addressing works in hardware you'll see that you'd usually have N separate wires anyhow). – Voo Aug 05 '22 at 08:50
@RonJohn Thanks, that phrasing is better. But it still doesn't really explain why they double it rather than, say, triple it (with slightly longer development iterations). – Bergi Aug 05 '22 at 16:53
@Bergi tripling 16 bits gives you 48 bits. In the late 1970s and early 1980s when They were working on After 16 Bits, 48 bits would have been too complex to build and manufacture, and thus sell. Tripling 32 bits gives you **72 bits** and that’s too much. – RonJohn Aug 05 '22 at 17:00
1

@Bergi note though that there actually are 24-bit chips: DSPs (digital signal processors). Because they are specialized, high volume and price sensitive, it makes sense to only build them as large as needed, and 24 bits is more than Good Enough for human ears, no matter what audiophiles say. – RonJohn Aug 05 '22 at 17:06

score 1 · Answer 8 · answered Aug 05 '22 at 08:23

1

I think some of the arguments missing is that it is convenient that a register can be divided into two parts which are easy to work with, for instance with smaller registers. Hence, for the intel architecture, you initially had 16 bits registers that were subdivided into two 8-bit subregisters, and then these registers were extended to 32 bits, keeping the least significant digits addressable as 16 bits register, and then to 64 bits. If you had a 48 bits register, it would leave an awkward 16 most significant bits part, and you would have to play with 16 bits and 32 bits subregisters... What a pain !

Moreover, when you multiply the contents of two registers, you need a register of twice that size (or two registers of the initial size) to contain the result. Having registers that cannot be combined two by two would make things complicated for arithmetic.

I guess this view is supported by the fact that, for addressing (and not arithmetic), the processors internally use other numbers of bits.

answered Aug 05 '22 at 08:23

Vincent Fourmond

229
2
3

I don't think partial registers (for backwards compat and as a result of extensions) is a very strong argument. It really only applies to x86 (including x86-64), and the number of bits that go unused when doing 32-bit operations isn't particularly significant. It's fairly rare to do a 64-bit load of two 32-bit values and `rol rax, 32` to swap them. Only AL and AH 8-bit halves are separately usable, not high-16 or high-32 partial registers. [Why is there not a register that contains the higher bytes of EAX?](https://stackoverflow.com/q/228200) – Peter Cordes Aug 05 '22 at 09:12
And some more links in [Why aren't the higher 16-bits in EAX accessible by name (like AX, AH and AL)?](https://stackoverflow.com/a/28431022). If AMD64 had been AMD48, with a 48-bit RAX, we'd still have a 32-bit EAX. 6-byte `push`/`pop` creating misalignment unless they padded out to qwords would be obvious problems, and lack of an 8-byte store to conveniently set 2 dwords at once would be less good. But for actually working with data in GP registers, essentially irrelevant. If you want to keep multiple values in a register at once, that's what 128-bit XMM registers are fo. – Peter Cordes Aug 05 '22 at 09:15
With a 48-bit RAX, the only 16-bit partial register would still be AX, the *low* 16 bits. Also, x86 (like other ISAs) has always done widening multiply / narrowing division by using multiple registers, not a single wider register. e.g. `mul rcx` produces a 128-bit result from `RAX * RCX` in RDX:RAX. If that was 48x48 => 96-bit instead of 64x64 => 128-bit, not a showstopper. Other ISAs are the same way, like ARM `umull` or RISC-V instructions that produce either the low-half or high-half product. (Perhaps you're thinking of x86 `mul cl` doing `AH:AL = al*cl`, which happens to work as AX) – Peter Cordes Aug 05 '22 at 09:20

score 0 · Answer 9 · answered Aug 03 '22 at 21:11

0

Generally, processors are referred to as x-bit, where x is the width of the data bus.

In rough terms, increasing the data bus size offers primarily speed as a reward...twice the bits, twice the data access speed. This is the ideal; some inefficiencies are involved (e.g., fetching 1 byte still takes 1 cycle on a 32-bit system, same as fetching 4 bytes).

As ctrl-alt-delor points out, there are some logistical advantages to stick with a bus width of 2**N. The only restriction of following this pattern is that you have to double the width every time you introduce a wider architecture, giving a max access speed increase of 2.

And in the end, it's hard to justify a new architecture if it's not at least twice as fast as the old one.

Note that low end microcontrollers offer many exceptions to this rule, and the properties of audio and human perception have made 12 bits a popular width for DSPs.

answered Aug 03 '22 at 21:11

Cristobol Polychronopolis

109

6

_"Generally, processors are referred to as x-bit, where x is the width of the data bus."_ — then Pentium Pro is a 64-bit CPU. Or 36-bit, if we go by the address bus. – Ruslan Aug 03 '22 at 21:15
@Ruslan the pentium pro was still 32bit. It had a 64bit wide bus (so did the normal Pentium tho); but the CPUs data width was 32bit. – Silbee Aug 03 '22 at 21:44
@Silbee what is "CPU data width"? One has to specify exactly what one means when discussing this, because e.g. 80486 can work with 80-bit data units via the FPU instructions, and Pentium III has 128-bit registers due to SSE. So, it's the general-purpose registers that leads to the denotation of the CPU as X-bit. – Ruslan Aug 03 '22 at 21:50
5

@Silbee The point is that the claim that the "x" refers to the width of the data bus is simply not true for many, many CPUs. Sandy Bridge and onward had 128 bit wide data buses to main memory iirc (and hell even the P4 had data busses of 256 bit to its caches). – Voo Aug 03 '22 at 21:52
@Ruslan i misread the crucial word 'then' in your original comment... sorry :-) – Silbee Aug 04 '22 at 12:29

score 0 · Answer 10 · answered Aug 04 '22 at 19:56

0

CPU register width tends to grow faster than the ability to cheaply get RAM to fill it all.

You can save money on developing and manufacturing your CPU by not adding pins, such as address pins, that will never be used.

Furthermore the 8086 was really weird - addresses were expressed using segments. 20 bits address lines made sense for that.

A segment was basically a value shifted 4 bits, and it was combined with an offset to get the real address. E.g 0000h:1000h and 0001h:0100h referred to the same address. This allowed some interesting tricks with making code relocatable I think, but did make C-style pointers complicated.

386 and other CPU architectures with MMUs change the game up a bit.

Physical addresses only matter if

your MMU is "identity mapped" and your mapping is 1:1 from virtual->physical.
you're in the "ring 0" or kernel/supervisor mode that allows access to all memory.

When you are in the "ring 3" or user mode, your access is constrained by the MMU - accessing pages that are marked specially will cause a fault to kernel mode.

Now of course, this can be used to limit a user process to a specific amount of memory. But it can also be used to implement swap files and mmap - a UNIX system call that essentially leverages the MMU to make it look like a file is mapped to memory. This simplifies access to it for programs that need to deal with large amounts of data and random access, such as databases.

So, 64 address bits is a lot of room to map files in and some random high address in the 64-bit space might be chosen - but it's never meant to reach actual RAM in the first place.

I can't really find good pinouts of any modern CPU but I doubt they physically bring out 64 address lines for each of the RAM channels. But there is still value in having all registers that refer to memory be 64 bit due to the above.

answered Aug 04 '22 at 19:56

LawrenceC

73,030
15
129
214

2

`mmap` does use RAM to hold a copy of a page while you're accessing it, unless you have non-volatile DIMMs (Optane or other persistent memory) directly connected to the memory bus. And in that case you still need special APIs like Linux `mmap(MAP_SHARED_VALIDATE|MAP_SYNC)` ([man page](https://man7.org/linux/man-pages/man2/mmap.2.html)). Even SSDs don't have cache-line write granularity, or anywhere near the performance you'd need for that to make sense. So you mmap a file, and on first access the kernel pages the data in to a new physical page, with the virtual address wired to it. – Peter Cordes Aug 05 '22 at 03:42
2

At some later point, the kernel notices that page table entry for that page of the pagecache is marked "dirty" (by the CPU updating it when executing an instruction that writes), and syncs the data back to the disk or whatever backing store (e.g. NFS). So mmaped file data does come in and out of RAM, but the kernel can write back and evict to free up those pages if the machine doesn't have RAM for the whole size of the mmaped file(s). – Peter Cordes Aug 05 '22 at 03:44
1

Or wait, were you just saying the 64-bit virtual address never reaches RAM? Yeah of course not, it's a virtual address. :P 64 (or actually 48 or 57-bit on x86-64) virtual addresses are plenty, and yeah can be useful even on machines without anywhere near that amount of physical RAM. – Peter Cordes Aug 05 '22 at 03:46
1

Re: external pins: right, physical address space maps to multiple channels of DRAM controllers, and the external pins for each channel only need to address as much RAM as can be connected at once on that bus. Further, (DDR1..n) SDRAM splits addresses row/column (this is where the DRAM "page size" comes from, switching rows = opening a new page with a separate command.) https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory#Control_signals / [How much of ‘What Every Programmer Should Know About Memory’ is still valid?](https://stackoverflow.com/q/8126311) has diagrams – Peter Cordes Aug 05 '22 at 03:50
1

DDR4 SDRAM (https://en.wikipedia.org/wiki/DDR4_SDRAM#Command_encoding) bus has 18 address lines + 4 bank-select lines, 3 "chip select" signals (per DIMM?), and an "activate" signal line. And 64 data lines, and a clock. A CPU that doesn't want to support the largest DIMMs could leave the top address line unconnected; the rest double as other signals for other commands. – Peter Cordes Aug 05 '22 at 03:55
1

*you're in the "ring 0" or kernel/supervisor mode that allows access to all memory.* - If paging is enabled at all (that's optional except in 64-bit mode), being in ring 0 doesn't let you access arbitrary memory. You do need a page-table entry for it, i.e. there has to be a virt->phys mapping for it. If translation fails (TLB miss + page walk doesn't find a page-table-entry), the CPU will raise a page fault, not use the address untranslated. If that happens in Linux, it's a kernel bug: Linux expects kernel code to check addresses before dereferencing them, if they might need to be paged in – Peter Cordes Aug 05 '22 at 04:00
1

There's a PTE bit that indicates whether a translation is valid for ring 0 only vs. for user-space too https://wiki.osdev.org/Paging#Page_Directory. So kernel code+data can stay mapped while user-space is running. Except on CPUs with the Meltdown vulnerability; speculative success for such loads was the problem. (Linux does direct-map all of physical RAM to a high virtual address, like 1:1 identity mapping but with the virt start address != 0.) – Peter Cordes Aug 05 '22 at 04:03

Valentino · Answer 11 · 2023-07-22T15:59:54.270

I'll try with an intuitive and non-technical answer, which I think addresses the core of the issue. Computers work on a binary number base, and powers of 2 in binary are "nice" numbers. Just like in base 10 the powers of 10 (100, 1000, 10000 and so on) are trivial to calculate, so it happens that the powers of 2 are trivial in binary:

2^0 = 1     0000 0000  0000 0001
2^1 = 2     0000 0000  0000 0010
2^2 = 4     0000 0000  0000 0100
2^3 = 8     0000 0000  0000 1000
2^4 = 16    0000 0000  0001 0000
2^5 = 32    0000 0000  0010 0000
2^6 = 64    0000 0000  0100 0000
2^7 = 128   0000 0000  1000 0000
2^8 = 256   0000 0001  0000 0000
2^9 = 512   0000 0010  0000 0000
2^10= 1.024 0000 0100  0000 0000
...

As you can see the power of 2 matches the length of the maximum number that can be represented with that number of bits. For example, an unsigned 8 bit number can be any decimal number between 0 and 255 (binary 1111 1111).

Increasing the registry size of a power of 2 means that the processor can use addresses that are 1 order of magnitude larger.

Therefore while it makes sense to have a 16, 32 or 64 bit processor it would not make sense to have a 35 bit processors, as that wouldn't match a whole order of magnitude increase in addressing capabilities. And it would seem to me that it would make things unnecessary complicated.

score -4 · Answer 12 · answered Aug 04 '22 at 10:00

-4

If I remember the math theory correctly, ideal numerical system has e numbers, where 3 is the closest integer number to it. That means that 3 would be better from a theory point of view, then a binary system, but the implementation in hardware is easier to do with two numbers then 3.

For a bit more details, see this: https://math.stackexchange.com/questions/446664/what-is-the-most-efficient-numerical-base-system

answered Aug 04 '22 at 10:00

BЈовић

567
1
7
16

1

As [David Bandel commented](https://math.stackexchange.com/questions/446664/what-is-the-most-efficient-numerical-base-system#comment7180357_3481131) on that math Q&A, this is "efficiency" in terms of number of digits **to express an arbitrary *real* number**, times the number of different symbols (digits), i.e. the base. This is basically irrelevant for computing; most numbers in programs are integers, not reals. And this is a fairly arbitrary definition of efficiency. On a ternary computer (3 states per trit), yes power-of-3 sizes would be natural, but not really for that math reason. – Peter Cordes Aug 05 '22 at 03:32
If you were trying to apply that definition of efficiency to computing, keep in mind that the more symbols you have, the more bits each one needs. (More information per digit in whatever base you chose.) A base that isn't a power of 2 makes little sense for a binary computer; you waste 1 of the 4 states of the 2 bits it takes to hold base-3 digit. (Unless you distribute information across bits, e.g. in some chunk size where 3^n is close to but below 2^m. But then decoding is needed to separate base3 digits.) Given that we're talking about register widths for binary computers, a showstopper. – Peter Cordes Aug 05 '22 at 09:04