32

I am going to teach a small group of people about the numbering systems in computing and was wondering how many bits per digit are there in the decimal system, for instance:

  • Hex (base 16) - 4 bits
  • Octal (base 8) - 3 bits
  • Binary (base 2) - 1 bit
  • Decimal (base 10) - ?
Konrad Rudolph
  • 7,601
  • 6
  • 29
  • 36
user92592
  • 447
  • 1
  • 4
  • 6
  • 7
    Intuition: Let's say what you seek is `d`, it covers one decimal digit, the range of `0..9`. `3*d` bits mean three decimal digits and allow you to represent integers from the range `0..999`. Whole ten bits (think binary now) give a range of `0..1023`. 999 is quite close to 1023, yet a little less. So you may expect `d` should be little less than 10/3. – Kamil Maciorowski Nov 28 '17 at 11:31
  • 5
    This post seems like it would fit better on Stack Overflow than on Super User. – gmarmstrong Nov 28 '17 at 11:39
  • 21
    @gmarmstrong: I'd argue Mathematics.SE (or possibly SoftwareEngineering.SE). This is not directly related to a programming problem. – Flater Nov 28 '17 at 14:28
  • 10
    @Flater: [math.se] is definitely the right place, as this is basically information theory 101. – MechMK1 Nov 28 '17 at 15:55
  • 3
    While we're linking other SEs, OP might be interested in [Computer Science Educators](https://cseducators.stackexchange.com/) given his context. (it wouldn't be a good place to post this question, but it might be useful in the future) – Aaron Nov 28 '17 at 17:52
  • 7
    There’s no shame in not knowing this, but one who doesn’t might not be the best person for teaching number systems. – WGroleau Nov 28 '17 at 17:59
  • At least in the area of floating point math, the question is really meaningless, because numbers are represented in a binary form of scientific notation (IEEE 754). So the numbers 1.0 and (approximately) 100000000000.0 require the same 8 bytes/64 bits (in double precision) 52 mantissa, 11 exponent, and 1 sign bit. – jamesqf Nov 28 '17 at 23:22
  • I would say the question is ill-formed. Base-2 and base-10 are incommensurable. You shouldn't even be thinking about 'bits per digit'. The question does make sense in hex, or base-64, but not decimal. – user207421 Nov 29 '17 at 00:28
  • 1
    No more ill-formed than ‘What is 4 minus 7?’ or ‘What are the square roots of 2?’. Those questions don’t have answers in the natural numbers, but they do have consistent, useful answers if you go beyond them, and so does this one: a decimal digit takes up slightly under 3⅓ bits, so three of them together will fit into 3⅓ × 3 = 10 bits, six into 20 bits, etc. – deltab Nov 29 '17 at 04:28
  • @jamesqf Floating points are a way of mapping the abstract concept of a number to a fixed-length bit string. They're not specific to decimal, numbers are not inherently decimal. So if we follow your reasoning, you can't even say that a binary digit is 1 bit and a hexadecimal digit is 4 bits, because a 1 digit binary number, when converted to a floating point, would require 64 bits as well. You can definitely talk about the number of bits per digit in the different bases, and because of the reason you gave, you should not look at their representation as a floating point to find out. – FrederikVds Nov 29 '17 at 09:02
  • @jamesqf The number 100000000000.0 is actually representable precisely in binary64, so there's no need to say "(approximately)". – Mr Lister Nov 29 '17 at 11:38
  • 1
    @Mr Lister: OK, but how about 100000000000.123? My point was that "bits per digit" only makes sense in certain contexts. So you can represent any integer up to 2^n - 1 in n bits, considering the n bits as a digit. (though imagine the fun of 2^64 unique glyphs - better than Unicode :-)). Or you can represent decimal in ASCII with 8 bits per digit, with some extra. Or use Binary Coded Decimal, the hardware for which might still be be in your latest Pentium processor. – jamesqf Nov 29 '17 at 19:09
  • The two most common computer representations of decimal, historically, have been the straight-forward 4-bit encoding (with six combinations left unused) and *centesimal*, a 7-bit encoding of the values 0-99 (with 28 combos left unused). – Daniel R Hicks Nov 30 '17 at 02:43
  • @WGroleau I disagree. The usefulness of this number (log base 2 of 10) is mostly just a point of interest not a point of terrible usefulness when talking about the representation of integers, fixed point or floating point numbers. Other issues like the expressibility of 0.1 in bases without 5 as a prime factor (mentioned below) are far more useful. While I can certainly come up with this number and would guess that a lot of people familiar with number systems could probably extend concepts to come up with it, I have **NEVER** used this number in any sort of conversion between bases or thinking – CrazyCasta Nov 30 '17 at 08:13
  • 2
    My point is not whether or not the number is useful but whether someone who needs all this discussion is ready to teach the subject. – WGroleau Nov 30 '17 at 11:28

10 Answers10

102

What you are looking for is the 2-based logarithm of 10, which is an irrational number of around 3.32192809489....

The fact that you can't use an integer number of bits for a decimal digit is the root cause of why many fractions that are easy to express in the decimal system (e.g. 1/5 or 0.2), are impossible (not hard: really impossible) to express in binary. This is important when evaluating rounding errors in floating point arithmetics.

Eugen Rieck
  • 19,950
  • 5
  • 51
  • 46
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/69519/discussion-on-answer-by-eugen-rieck-how-many-bits-per-digit-in-the-decimal-syste). – DavidPostill Nov 30 '17 at 19:04
21

In other words, what amount of information is contained in a single digit in these systems.

For base 2, base 4, base 8, base 16 and other 2N bases the answer is obvious because in a base 2N each digit can be expressed with exactly N digits.

How do you get N given 2N? Well, you use a 2-based logarithm, which is an inverse of exponentiation.

  • log2 2 = 1 (1 bit per digit in base 2)
  • log2 4 = 2 (2 bits per digit in base 4)
  • log2 8 = 3 (3 bits per digit in base 8)
  • log2 16 = 4 (4 bits per digit in base 16)

K-based logarithms of numbers that are not powers of K aren't cardinal numbers. In particular:

  • log2 10 = 3.321928094887362347870319429489390175864831393024580612054…

This number may look confusing, but it actually has some uses. For example, it's an entropy of a single decimal digit.

For your case, though, I don't think this value is of any use. @Christian's answer does a good job at explaining why.

gronostaj
  • 55,965
  • 20
  • 120
  • 179
8

On the subject of bits:

I'm sorry to say the question is misguided. You wouldn't use bits in that manner. A bit is a binary digit. You can convert the decimal number 10, to a binary 1010 (8+2), so you'd need 4 bits to express the decimal value 10.


Powers of 2

You've fallen into a bit of a trap, by using binary (2), octal (8) and hexadecimal (16) as examples, because these are all powers of 2, and thus you can think of them in terms of bits, whereas 10 isn't a power of 2, so it just doesn't work very well like that.

Christian
  • 247
  • 1
  • 3
  • 21
    The question is not misguided. In the subject of information theory it is perfectly normal to talk about bits in this way. And then Eugen Rieck's answer is a good answer. –  Nov 28 '17 at 12:12
  • True, you could do what Eugen Riecek suggested and use a float rather than an int to describe it, and get an actual answer out of this. I'm not sure what you'd use that answer _for_ exactly, but that's neither here nor there. – Christian Nov 28 '17 at 12:30
  • 2
    I suggest that you mention BCD (binary-coded decimal), which is commonly represented by 4-bits in electronics. In practical terms, the number of bits used to represent a decimal number is typically 4, but it depends upon the implementation. – davidmneedham Nov 28 '17 at 14:42
  • @davidmneedham The reason they were encoded with 4 bits is because - as Eugen Rieck pointed out - decimal digits have 3 – MechMK1 Nov 28 '17 at 15:45
  • 1
    @DavidStockinger Right, it depends on whether it is a theoretical question or an implementation question. – davidmneedham Nov 28 '17 at 15:49
  • @davidmneedham You can't have one without the other. If the *log2(X)* is *n* then you need at least n bit to store a digit in base*X* – MechMK1 Nov 28 '17 at 15:53
  • 2
    ln(10)/ln(2) is the theoretical answer. 4 bits is the likely implementation answer. – davidmneedham Nov 28 '17 at 15:59
  • 2
    @davidmneedham No, most numbers are stored in binary. BCD is used for rare specialised purposes but most encodings are either integer or floating point decimal. In these systems the log answer is the correct one, it gives minimum number of bits to store all numbers of a given decimal length (round up) and explains why a given number of bits does not store a fixed number of decimal digits. – Jack Aidley Nov 29 '17 at 12:49
7

BCD - Binary Coded Decimal uses 4 bits per digit, the same as Hexadecimal.

https://en.wikipedia.org/wiki/Binary-coded_decimal

CWS Matt
  • 95
  • 1
  • Except that "BCD" is often used to refer to the 6-bit character encoding. – Daniel R Hicks Nov 30 '17 at 02:46
  • @MrLister - https://en.wikipedia.org/wiki/BCD_(character_encoding) – Daniel R Hicks Dec 06 '17 at 13:20
  • @DanielRHicks Ah, OK. Wikipedia says that it was used the late 1950s and early 1960s (i.e. before EBCDIC was invented), so I'm not ashamed I never heard of it. Even though I now realise that the name EBCDIC was derived from it! Anyway, the term BCD is not still "often used" to refer to the encoding as you're saying. – Mr Lister Dec 06 '17 at 14:21
3

Using bits implies a power of 2, thus, as others have said you can't easily shohorn 10 bits into bytes without wastage. A common solution is to use 4 bits as per hexadecimal and waste the 6 states represented as A-F. The interesting bit is doing decimal math with this - it's not neat and simple.

A useful teaching idea might be to compare how Micky Mouse might have developed a counting system, as he only has 4 fingers per hand - which leads naturally to an octal based system.

davidgo
  • 68,623
  • 13
  • 106
  • 163
  • I believe you meant to refer to Hex in your answer as its Hex that has the A-F values – user92592 Nov 28 '17 at 11:42
  • @user92582 yes, ta. Corrected. – davidgo Nov 28 '17 at 11:44
  • And you can use those "waste" 6 states to encode a decimal point, negative, sequence terminator, etc. As for decimal math... it's not neat but simple? Just write some code to do what we teach little children :p – Kaithar Nov 28 '17 at 18:22
  • @kaithar - I don't believe that what you are proposing is valid, as any one of those operations would require a full bit or more - which you don't have available. – davidgo Nov 28 '17 at 18:25
  • Perhaps you misinterpreted my meaning, but totally the proposal is perfectly valid... it's standard symbol coding. Let's say 0000-1001 is normal BCD, 1010 is the decimal separator, 1110 is a negative sign and 1111 is a terminator. Sure you need a math library that understands that, but you're already needing something odd when you're encoding numbers as a sequence of nibbles. – Kaithar Nov 28 '17 at 22:16
  • 1
    No ideaal where the "10 bits" are coming form. 10 bits =1024 values. A decimal digit only has 10 possible values. – MSalters Nov 29 '17 at 19:57
  • @MSalters A typo of 10 states. – wizzwizz4 Nov 29 '17 at 21:51
3

In base 1024, each symbol is 10 bits. Three decimal digits have the same amount of information as one digit in base 1000, which is slightly less than 1024. Therefore, a decimal digit has slightly less than 10/3 bits. This approximation gives 3.333333..., while the exact number is 3.321928...

Acccumulation
  • 509
  • 3
  • 5
3

This might be an oversimplification but it depends on which question you are asking.
(and the answer is basically octal or hex)

I also don't consider fractional bits as bits because in practical usage bits don't have fractions.

Q1: How many bits can you represent in a decimal digit?

A1: You can represent 3 bits of information in a single decimal digit:

The most common scheme would be straight binary with wrapping where 0=8=000 and 1=9=001. But you could use any scheme there is nothing that says this is the only way to encode bits into decimal digits.

  • 0: 000
  • 1: 001
  • 2: 010
  • 3: 011
  • 4: 100
  • 5: 101
  • 6: 110
  • 7: 111
  • 8: 000 <- wrapping (or unused)
  • 9: 001 <- wrapping (or unused)

or

Q2: How many bits does it take to represent a decimal digit?

A2: You need at least 4 bits to represent all decimal digits. With some waste or wrapping.

Again the most common scheme would be straight binary with wrapping but you could use any other scheme.

  • 0: 0000
  • 1: 0001
  • 2: 0010
  • 3: 0011
  • 4: 0100
  • 5: 0101
  • 6: 0110
  • 7: 0111
  • 8: 1000
  • 9: 1001
  • 0: 1010 <- wrapping (or unused)
  • 1: 1011 <- wrapping (or unused)
  • 2: 1100 <- wrapping (or unused)
  • 3: 1101 <- wrapping (or unused)
  • 4: 1110 <- wrapping (or unused)
  • 5: 1111 <- wrapping (or unused)
Justin Ohms
  • 274
  • 1
  • 9
2
  • Hex (base 16) - 4 bits
  • Octal (base 8) - 3 bits
  • Binary (base 2) - 1 bit
  • Decimal (base 10) - 3 1/3 bits.
    210 = 1,024
    103 = 1,000
    220 = 1,048,576
    106 = 1,000,000
    3 digits in base 10 up to 999 can be held in 10 bits in base 2.
    6 digits in base 10 up to 999,999 can be held in 20 bits in base 2.
    This is were the idea of kilobytes, megabytes, and gigabytes originated.
Russell Hankins
  • 181
  • 1
  • 2
  • It's actually slightly less than 3 1/3... Your answer is a bit ambiguous, and the suggestion that numbers up to 999 can be stored instead of numbers between 0-1023 is a bit misleading. – wizzwizz4 Nov 29 '17 at 21:53
0

Disclaimer - I'm not an information theorist, just a code monkey who works primarily in C and C++ (and thus, with fixed-width types), and my answer is going to be from that particular perspective.

It takes on average 3.2 bits to represent a single decimal digit - 0 through 7 can be represented in 3 bits, while 8 and 9 require 4. (8*3 + 2*4)/10 == 3.21.

This is less useful than it sounds. For one thing, you obviously don't have fractions of a bit. For another, if you're using native integer types (i.e., not BCD or BigInt), you're not storing values as a sequence of decimal digits (or their binary equivalents). An 8 bit type can store some values that take up to 3 decimal digits, but you can't represent all 3-decimal-digit values in 8 bits - the range is [0..255]. You cannot represent the values [256..999] in only 8 bits.

When we're talking about values, we'll use decimal if the application expects it (e.g., a digital banking application). When we're talking about bits, we'll usually use hex or binary (I almost never use octal since I work on systems that use 8-bit bytes and 32-bit words, which aren't divisible by 3).

Values expressed in decimal don't map cleanly on to binary sequences. Take the decimal value 255. The binary equivalents of each digit would be 010, 101, 101. Yet, the binary representation of the value 255 is 11111111. There's simply no correspondence between any of the decimal digits in the value to the binary sequence. But there is a direct correspondence with hex digits - F == 1111, so that value can be represented as FF in hex.

If you're on a system where 9-bit bytes and 36-bit words are the norm, then octal makes more sense since bits group naturally into threes.


  1. Actually, the average per digit is smaller since 0 and 1 only require a single bit, while 2 and 3 only require 2 bits. But, in practice, we consider 0 through 7 to take 3 bits. Just makes life easier in a lot of ways.

John Bode
  • 117
  • 1
  • 4
    It's not quite that simple; for example, that 3-or-4 bit encoding isn't sufficient to tell whether `1001001` should be `91` or `49`. –  Nov 28 '17 at 21:00
  • @Hurkyl: again, my perspective is using fixed-width integer types - `1001001` maps to `73` (`64 + 8 + 1`). I do not interpret it as a sequence of binary coded decimal digits. If it's *supposed* to be BCD, which must use 4 bits per digit, then we must assume a leading `0` bit, so it must be `49`. – John Bode Nov 28 '17 at 21:16
  • 2
    I was just trying to point out that variable-length encodings aren't as simple as you make them out to be; you need to tell where one symbol ends and another begins. so you can't just say that you can represent 8 and 9 with four bits, 4-7 with three, 2-3 with two and 0-1 with one. And you can see that the `3.2` figure you get actually violates the information theory bound of `log(10)/log(2)`. –  Nov 28 '17 at 21:24
  • @Hurkyl: I wasn't trying to make anything simple, nor was I talking about any sort of encoding. The largest value that can be represented in a 32-bit integer is 10 decimal digits wide (3.2 bits per digit), but there's no correspondence between the binary encoding of any of the digits and the binary encoding of the value. If you're using some form of binary coding for decimal digits, then either the width must be fixed *a la* BCD, or you must use some kind of Huffman coding, which I am not advocating. – John Bode Nov 28 '17 at 22:05
  • You can encode a 16 digit number with 32 bits if you declare that one is represented by 01. But that's not what people mean when they talk about how many digits it takes to encode decimal digits, and neither is what you're talking about. – Acccumulation Nov 28 '17 at 23:31
  • 1
    The problem with this scheme is that you forgot the one extra bit you need to indicate whether 3 or 4 bits follow. And with an average length of 4.2 bits per decimal digit, this is even worse than BCD – MSalters Nov 29 '17 at 20:01
  • You can *represent* values >255 just fine in 8 bits. You just cannot represent *more than 256 discrete values* in 8 bits. We commonly choose to map an 8-bit space to the range \[+0..+255\] (unsigned) or \[-128..+127\] (signed, two's-complement notation), but there's no reason why we have to choose that particular mapping if some other mapping makes more sense for the particular application. For a real-world example, this is how many image file formats represent mappings from 8-bit byte values (one byte per pixel, for storage) to 24-bit RGB color values (for display) via a look-up table. – user Nov 30 '17 at 14:36
  • @MSalters Variable-length encodings are also notoriously easy to mess up especially parsers for. I applaud some of the reasoning that went into UTF-8, including the fact that US-ASCII is automatically valid UTF-8 on the byte level, but in between it and multiple Unicode codepoints combining into a single character as it appears on screen, let's just say I'm glad I haven't ben tasked with writing a UTF-8 Unicode parser even for something simple such as, say, finding the number of characters in a string, or extracting a substring. Schlemiel the painter was doing it right all along! – user Nov 30 '17 at 14:43
0

If I were teaching this, I would first explain what a number (expressed as a series of digits means). i.e., from right to left, assuming base n, a * n^0 + b * n^1 + c * n ^2 ... z * n^y.

Then explain that 10^3 is approximately equal to 2 ^ 10. It is not exact and is the reason in computers, we often do not know what 2k really means (is it 2,000 or 2,048?) It serves fairly well for quick approximations. 2^16 is about 2 ^ (16 - 10) * 1,000, or 2^6 (64) * 1,000 or 64,000. In reality, it is 65,536, but if you don't mind being off around a percent, it works fairly well for quick approximations.

PeterH
  • 7,377
  • 20
  • 54
  • 82