5

This is somewhat related to question

On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

Even on Windows 7, I found that the only way I can get unicode to go into a file is by

> cmd /U
> dir /B > files.txt

the file will be in "Unicode" when I open in Notepad and try "Save As", and if I dir /B > files.html and open the HTML file in firefox, it can show using Encoding of UTF-16 (or UTF-16 LE).

but, if I want to see it on the screen instead of having it go to a file, it is still impossible. Is there a way to make it happen? Possibly somehow telling cmd not to show nonprintable characters as "?"

Update: I tried cmd.exe, cygwin's bash on windows, and PowerShell. They are the same. Except if I change the "Properties -> Font" to Consolas or Lucida Console, there is some improvement -- now it is not question mark but is either square border or square with a question mark in it.

The more expensive Mac computers with Mac OS X can do it. The free Ubuntu can do it too.

nonopolarity
  • 9,516
  • 25
  • 116
  • 172

5 Answers5

5

This is a very old question, but all of the answers given here are wrong.

You will never see Unicode output on the Windows command line (CMD.exe). The reason is that CMD cannot display Unicode. It can, however, display DBCS (Double-Byte Character Set).

If you want to see Japanese output, for example, you have to change your System Locale to Japanese and reboot. Then, you'll be able to see Japanese DBCS (i.e. Shift-JIS) characters on the command line. Windows supports Japanese Shift-JIS, Simplified Chinese, Korean, and Traditional Chinese "Big5" DBCS code pages.

Incidentally, you can pipe UTF-16 (inaccurately used interchangeably with "Unicode" by Microsoft) to a file, then open that file in, say, Notepad, and view the Unicode characters. You can also mark and copy the gibberish text from CMD.exe and paste it into Notepad and see the Unicode characters. In other words, CMD supports Unicode, but it doesn't display Unicode.

You can find more information in this blog post.

Jeff
  • 166
  • 1
  • 3
1

Based on your username I suspect you mainly work with asian languages.

Windows tools operate normally in unicode mode (as you saw by piping the output of dir into a file and opening that file with an editor):

  1. the tool does its stuff
  2. it outputs unicode characters
  3. another program takes this output and has to display it.

to display any character on the screen the program from step 3 has to lookup the glyph appropriate for the given byte sequence. example:

  • 0x65 'a' maps to a different glyph in each font (so the 'a' looks different from font to font)

  • 0x937 'Ω' (greek 'omega') maps to a different glyph in each font as well

this mapping only works IF the font has a glyph for the given byte sequence. otherwise the visual result differs, sometimes you see '?', sometimes diamonds etc.

again: dirproduces bytesequences, which sometimes are purely in the ASCII-range, sometimes they are in the unicode range (depending on what filenames it finds). it sends these sequences to another program which is responsible for actually rendering the bytesequences. to be able to display these sequences, this program has to map the sequence to a glyph. to do that, it has to search in a font for the glyph. if the font does not have a glyph for the given sequence, then the program can not display the byte sequence produced by, for example, dir.

so, the solution to your problem (seeing any unicode-character in the 'console / terminal' of windows) is: use a font for the program which has (almost) every glyph for (almost) any given unicode bytesequence in it.

akira
  • 61,009
  • 17
  • 135
  • 165
  • 1
    hm, but the cmd, cygwin bash, and PowerShell all are limited to 3 fonts: Raster fonts, Lucida Console, and Consolas... actually Windows usually fall back to a unicode font when it can't display anything with the current font... also, if I redirect the output, like `dir > file.txt` it is still question mark in the file, even though it is "square box" on the screen. – nonopolarity Jun 27 '10 at 06:48
  • @Jian Lin: yes, but that is essentially YOUR problem to provide a font which contains these glyphs. and even if windows falls back to "some" font which holds "some" unicode glyphs in it ... that is not enough to display some of your asian glyphs (you have problems with the asian glyphs, right?). – akira Jun 27 '10 at 06:53
  • according to some websites, "Ascender Uni Duo" seems to be the best font (even for "fixed") http://www.ascendercorp.de/fonts/multilingual/ascender-uni/ but maybe you find something better / cheaper http://en.wikipedia.org/wiki/Unicode_typefaces – akira Jun 27 '10 at 06:58
  • @akira there are many fonts on Windows 7 that can display the whole Unicode glyph set. But (1) Cmd window won't let you choose any of them. (2) When windows or the app falls back to the font that can display unicode, such as Lucida Sans Unicode, it can display most any chinese characters. – nonopolarity Jun 27 '10 at 07:34
  • Lucida Sans Unicode used to be much larger... now it is about 300kb on Windows 7. But anyway, even if you set the any web browser to use this font or any other font such as Time New Roman, when you go to http://news.google.com/news?edchanged=1&ned=tw you can still see the chinese characters if you are using Vista or Win 7. Either the app, or more likely Windows, when cannot find the glyph in that current font, will go find it in the font that has it. – nonopolarity Jun 27 '10 at 07:50
  • besides, when I redirect the output using CMD /U and then DIR /B > file.txt, I can see the correct glyph in Notepad automatically, even using a default English font. So unless Microsoft is saying, oh we just won't show unicode char in Command Prompt, even people still use it and it is part of Win 7, we will make it behave less well than Notepad. PowerShell too. Unicode? out of the question. – nonopolarity Jun 27 '10 at 07:54
  • as you said: cmd.exe only accepts fonts for fixed sizes. it does not matter if you can see all the glyphs in your webbrowser, or in notepad, or in xyz. if the glyph is not in the font used by cmd.exe you can not see it, period. even if windows fallsback to other (fixed size) fonts: if the glyph is not in there either, it can not be displayed. and thats why i said: find a fixed size font for cmd.exe which contains almost all glyphs (as "ascender uni duo", so i was told) – akira Jun 27 '10 at 09:14
  • and no, you are not using an "english only" font in notepad. you are lucky that either the font itself or the fallback provides the glyphs the bytesequences require. anyhow, in notepad the default font is not the fixed size font. – akira Jun 27 '10 at 09:17
  • I think Mac OS X solved it by making the glyph 2 characters wide, and then, no character is overlapping. It works pretty well and at least people can see the unicode filenames. It is not trying to build a rocket here. – nonopolarity Jun 27 '10 at 22:32
  • It really has nothing to do with the operating system or encodings. The Windows console display simply uses just one font and doesn't look for alternatives if a glyph is missing. OTOH, the Windows text box (which Notepad uses) does look for alternative fonts. – Philipp Jun 28 '10 at 08:58
  • 1
    @akira: Good answer, I'd just replace ”byte sequences” by “16-bit strings” or “UTF-16 strings” since that is what Windows internally uses. – Philipp Jun 28 '10 at 09:00
  • @Phillip: i wanted to keep it more generic since the underlying mechanism is the same on every OS: bytesequence -> lookp the glyphs in the font -> rendering. – akira Jun 28 '10 at 09:15
  • There is no font I can change to and use, even though there are about 12 chinese fonts on Windows 7. The only font I can change to and use is Courier New, which is pure English. The font "Ascender Uni Duo" costs $149 and is almost as expensive as Windows 7 itself. And who knows whether it will work or not... – nonopolarity Jun 28 '10 at 10:07
  • i understand your pain. the last time i contacted ascender they were very friendly, just ask them if you can test the font. – akira Jun 28 '10 at 11:33
0

https://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how

Use chcp 65001 to change the codepage to UTF8 and use Lucida Console.

ta.speot.is
  • 14,205
  • 3
  • 33
  • 48
  • hm... still won't work... cmd /U, chcp 65001, dir, and dir /B with the font already set to Lucida Console, still the same. – nonopolarity Jun 27 '10 at 08:11
  • 1
    You may want to try adding more fonts to the console: http://support.microsoft.com/kb/247815 and http://blogs.msdn.com/b/oldnewthing/archive/2007/05/16/2659903.aspx (the latter for some discussion on the issue). – ta.speot.is Jun 27 '10 at 08:19
  • it _all_ depends on the fonts you are giving the program to render the text. read the support article, good info in it. – akira Jun 27 '10 at 09:20
  • 1
    @taspeotis: The Windows console always uses Unicode internally, regardless of the codepage setting (which is obsolete anyway and only included for backwards compatibility). It is really just a font problem. – Philipp Jun 28 '10 at 08:59
  • Can any of the included font on Win 7 be used? such as MingLiU, DFKai-SB – nonopolarity Jun 28 '10 at 09:39
  • There is no font I can change to and use, even though there are about 12 chinese fonts on Windows 7. The only font I can change to and use is Courier New, which is pure English. The font "Ascender Uni Duo" costs $149 and is almost as expensive as Windows 7 itself. And who knows whether it will work or not... – nonopolarity Jun 28 '10 at 10:05
  • What's wrong with adding fonts http://support.microsoft.com/kb/247815 – ta.speot.is Jul 01 '10 at 03:49
  • It is fine adding font. But (1) I already have 7 or 8 default Chinese fonts on the system that should also have other unicode characters that I don't care as much, but, if you say, add one more, sure, I can do that. (2) which one to add -- is there any free one. Somebody suggested adding one that is $149. – nonopolarity Jul 02 '10 at 06:42
  • @Philipp it's not just a font problem. The CMD window is an old-school DBCS program. The command line processor itself supports Unicode, but not the display portion. The only way to show Japanese, Chinese, Korean, and Trad. Chinese in the CMD window (or any old-school DBCS UI) is to change the System Locale. – Jeff Dec 01 '14 at 04:39
0

It has nothing to do with encodings since the Windows console always uses Unicode internally. The characters are simply not available in the fonts you use, which are designed for programming and European languages. I don't have access to Windows at the moment, but I remeber that I could print Greek characters after switching to the Lucida Console font. Using a font like DejaVu Sans Mono might work.

Philipp
  • 271
  • 1
  • 7
  • i ve created a russian filename and cmd.exe displayed the glyphs correctly after switching to lucida. for asian fonts i think OP has to pick a "better" or more "unicode complete fixed font" (even if he does not like that answer :)). – akira Jun 28 '10 at 09:16
0

Ok, this is a solution using PowerShell:

1) Click the Start button on Windows 7
2) Now, in the blank line, type in PowerShell
3) Choose PowerShell ISE <-- note it is ISE

Now, if you do ls, you will be able to see unicode characters...

4) if you also use chcp 65001, then if your program prints out UTF-8 characters, they will be nicely displayed as well.

You can also ls > list.txt and then type list.txt and the content shows up in Unicode characters as well.

tree will still not show unicode characters.

also, inside the PowerShell ISE, cmd /U /C dir /B will not work either.

ls -R will.

nonopolarity
  • 9,516
  • 25
  • 116
  • 172
  • `ls` in Powershell is actually just an alias for "Get-ChildItem" – KdgDev Jun 28 '10 at 12:07
  • and then? don't tell me you use `Get-ChildItem` on the command line every day instead of `ls`. For example, we usually drink water instead of hydrogen dioxide. – nonopolarity Aug 29 '10 at 21:12