This char is 254 in ASCII Extended Table, and 25A0 in Unicode. If I run putchar(254) the terminal does not recognize the char, as I think it utilizes not extended ASCII.
- 557
- 1
- 6
- 21
- 71
- 1
- 2
- 6
-
3Which character set is your terminal using? [UTF-8](https://en.wikipedia.org/wiki/UTF-8)? [windows-1252](https://en.wikipedia.org/wiki/Windows-1252)? [CP437](https://en.wikipedia.org/wiki/Code_page_437)? [ISO 8859-15](https://en.wikipedia.org/wiki/ISO/IEC_8859-15)? [JIS X 0201](https://en.wikipedia.org/wiki/JIS_X_0201)? There is no single "extended ASCII Table", so without knowing what character set your system uses, "254" has no meaning I'm afraid. – marcelm May 15 '22 at 11:15
3 Answers
printf("■\n"); works for me - putchar('■'); gives me a warning about multi-character character constants. putchar(254); gives me a 'þ' character.
Also, make sure the terminal emulator you are using supports unicode.
- 66
- 2
There is no such thing as “the extended ASCII”. There are tens, if not hundreds, of 8-bit encodings based on ASCII. Many of them are available on Ubuntu, but they are not commonly used. The modern computing world, Ubuntu or not, uses the Unicode character set, generally encoded as UTF-8. Unicode is a 32-bit character set and UTF-8 encodes each 32-bit code point as one to four 8-bit bytes.
If character 254 is ■, you probably want one of the 8-bit encoding used in the text mode of PC computers, sometimes known as DOS code pages. I can't find a simple way to run a program that wants a DOS code page. (Some terminals, including xterm but not Gnome-terminal, support 8-bit code pages via luit. But Ubuntu does not come with locale definitions for DOS code pages.) You can use iconv or recode to convert between a DOS code page and UTF-8, e.g.
my_program_that_outputs_cp437 | iconv -f CP437
If you're writing a program today, unless you need compatibility with 20th century software, use Unicode and UTF-8. So if you want ■ (BLACK SQUARE), it's code point 25A0. Since UTF-8 uses more than one byte per character, you need puts(), not putchar(): a “char” in C is a single byte.
puts("■");
putchar(254) prints nothing because it's an unfinished UTF-8 encoding.
- 59,745
- 16
- 131
- 158
C (also C++, etc) is quite annoying -- a single Unicode character isn't one "character". As Unicode uses more than one byte to store a single character, you cannot use functions that only support one "character" like putchar. Consider other printing functions, like printf as Alex said.
Of course, your terminal has to support Unicode, lol.
- 557
- 1
- 6
- 21
-
3_"As Unicode uses more than one byte to store a single character"_ - Not necessarily. `a` in UTF-8 is a single byte. – marcelm May 15 '22 at 11:16
-
@marcelm Sorry, I mean, those not in ASCII. Unicode is ASCII-compatible, so ASCII characters can store in a single byte. – Emoji May 16 '22 at 10:03
-
@Emoji I think people (incl. me) object to you not distinguishing character sets and encodings. Unicode is a character set, nothing else than a table of all possible characters with associated IDs. As such, it does not specify any bit representation at all of those characters. It is, for example, UTF-8 -- an *encoding* -- that specifies how characters are encoded. UTF-8 is ASCII-compatible. – ComFreek May 24 '22 at 11:02