4

I open Notepad and then type +2295 holding down the Alt key, then release the Alt key. I save the file with Unicode encoding. However the output is not http://www.fileformat.info/info/unicode/char/2295/index.htm as expected, but this http://www.fileformat.info/info/unicode/char/2248/index.htm instead. What am I doing wrong? Looking for some pointers.

For anyone else stumbling with this: Please note EnableHexNumpad needs to be a new String Type (See the Wiki page linked in the answer)

user1720897
  • 307
  • 2
  • 4
  • 11

2 Answers2

3

The Wikipedia entry on Unicode input methods lists a necessary prerequisite for this to work:

A prerequisite for this input method is that the registry key HKEY_CURRENT_USER\Control Panel\Input Method contains a string type (REG_SZ) value called EnableHexNumpad, which has the value data 1. Users need to log off/in on Windows 8.1/8.0, Windows 7, and Vista or reboot on earlier systems after editing the registry for this input method to start working.

After I added this registry key on my machine and rebooted, the input works just as advertised.

Boldewyn
  • 4,328
  • 5
  • 38
  • 52
  • The answer in it says use Notepad++. But I see the same behavior with Notepad++ as well. – user1720897 Dec 10 '15 at 09:32
  • OK, that's interesting. Np++ is definitively Unicode-aware. What locale do you have set? – Boldewyn Dec 10 '15 at 09:49
  • The System and Input Locale is set to `en-us` – user1720897 Dec 10 '15 at 10:11
  • OK... I'm curious what actually gets saved. May I ask you to get a hex dump of a file where this appears? Basically, open Notepad, type `Alt+2295` once, then save file and in cmd.exe run it once through a bin2hex tool like [the one from fileformat.info](http://www.fileformat.info/tool/byte2hex/index.htm). Then we can see, if it's really `≈` (in [any of its representations](https://codepoints.net/U+2248)) or something else. – Boldewyn Dec 10 '15 at 10:20
  • 1
    I had already done this. Used [HxD](http://mh-nexus.de/en/hxd/) to see what the binary representation looks like. When I type `Alt+2295` it gets saved as `2248`. (Note: I save the file with Unicode Big Endian encoding). However, when I type `Alt+2248`, the binary looks like `255A` – user1720897 Dec 10 '15 at 10:25
  • Thanks. That's indeed a strange problem. I'll head over to Twitter, maybe someone there has an idea. brb – Boldewyn Dec 10 '15 at 10:58
  • Neither Twitter nor a night's worth of sleep helped. But I'm still at it... – Boldewyn Dec 11 '15 at 08:56
  • By the way, `Alt+2295` gives me a Cedilla, U+00B8. And apparently @Burgi has a similar problem. – Boldewyn Dec 11 '15 at 08:58
  • Found it! Wikipedia to the rescue! :-) (So, at least, works for me.) I'm still curious as of what Windows actually does, when that registry key is not set... – Boldewyn Dec 11 '15 at 09:11
  • I'd say it's not related to encoding or problems in Notepad, but related to decimal vs hexadecimal. – Arjan Dec 11 '15 at 09:19
  • @Arjan nope, that doesn't check out. 2295 has nothing to do with 2248, independent how often you drag it through hexdec or dechex. – Boldewyn Dec 11 '15 at 09:50
  • @Boldewyn Thanks. It worked. I was actually aware of the registry entry. But the guides (linked [here](http://www.fileformat.info/tip/microsoft/enter_unicode.htm) and [here](http://www.johndcook.com/blog/2008/08/17/three-ways-to-enter-unicode-characters-in-windows/) I was following lacked an important detail which you found in the Wiki article. That `EnableHexNumpad` needs to be a new **String Type**. I was adding it as a new `Key`! – user1720897 Dec 11 '15 at 10:44
  • But, Boldewyn, I think that *"2295 has nothing to do with 2248"* is related to Windows not expecting Unicode codes at all when using Alt+numpad (when the registry hack is not applied). I guess it's then expecting some Windows codes, and I'm **sure** it's expecting decimal, without the registry hack. I really do not see how Notepad would be the culprit. That second part of your answer is just a wild guess, which I feel is wrong. – Arjan Dec 11 '15 at 11:28
  • @Arjan the second part of my answer is below a fine line, where the first part reads load "**Edit:**" :-P If you read the comments, it became clear quite quick, that the first version of the answer is not correct. Anyway, I'll remove the old part. – Boldewyn Dec 11 '15 at 11:55
  • Thanks for removing the old parts; see also [When is “EDIT”/“UPDATE” appropriate in a post?](http://meta.stackexchange.com/questions/127639/when-is-edit-update-appropriate-in-a-post) Cheers. – Arjan Dec 11 '15 at 16:04
3

To answer the question of why this specific value is present:

With the standard input method, decimal numbers are taken mod 256 and then interpreted as the OEM code page* if there is no leading zero, or the ANSI code page if there is a leading zero. So, the steps are:

  • 2295 mod 256 = 247
  • 247 [0xF7] is U+2295 in the OEM code page

Character sets that have U+2295 at this potion are Codepages 437, 737, 770, 772, 774, 860, 861, 862, 863, 864, 865, CWI, and MIK.

(The fact that "2295" and "2248" both start with 22 is an interesting coincidence, nothing more)

* Note: "ANSI Code Page" has little to do with ANSI, except that code page 1252 was based on a draft of what later became ISO 8859-1 [and some of the others had similar origins]. It is the 8-bit character set associated with the current locale, and "OEM Code Page" is another character set associated with the locale, typically the one that was used in MS-DOS in that country.

Random832
  • 631
  • 4
  • 10