54

When saving the text "Bush hid the facts" in notepad under Windows XP, how come when you reopen it shows squares instead of the text?

I saw it in this video if you need an example

http://www.youtube.com/watch?v=9bK9-sc_uus&feature=related

John T
  • 163,373
  • 27
  • 341
  • 348

1 Answers1

93

This is due to a problem with the Win32 API function IsTextUnicode dating back to Windows NT 3.5. If a file is encoded in ANSI, the function will interpret it as UTF-16LE resulting in unreadable characters.

This fascinated me too back when I discovered it since I was kind of young and naive, I thought it was an actual conspiracy :)

There is actually a Wikipedia article on this you can find here.

John T
  • 163,373
  • 27
  • 341
  • 348
  • 17
    Interesting. +1 for the Wiki article that taught me the word "mojibake" and its particularly meta warning that "without proper rendering support, you may see question marks, boxes, or other symbols..." :-) – jtb Jul 31 '09 at 01:33
  • 3
    +1 because, despite using Windows for as long as I can remember, I somehow **never** came across this! – Jared Harley Jul 31 '09 at 02:56
  • It's not actually a bug, as argued in Raymond Chen's article if you follow the external link in the wikipedia's article. The documentation of IsTextUnicode clearly states the function is statistical and "are not foolproof". Given a short string such as the one here, it is not surpising something is detected wrong. – KTC Jul 31 '09 at 03:44
  • 7
    Well, it's clearly a bug, because the software incorrectly. The best you can argue is that bugs like this are impossible to eliminate without losing other functionality. And, heck, Microsoft fixed it in Vista [according to Wiki], so someone there obviously thought it was a bug too. – John Fouhy Jul 31 '09 at 04:30
  • 11
    It's not a bug if it does exactly what it advertise (i.e. documented) to do. It's specified precisely that it's a statistical test and not foolproof, and the shorter the input, the higher the error rate. It just so happens that in this case, it happens with a sentence that make sense to human. This particular sentence doesn't work with Vista & 7 because the implementation of IsTextUnicode have been changed and presumingly improved and it now report correctly for this sentence. What we have is better or worse false positive / negative rate, not bugs. – KTC Jul 31 '09 at 06:00
  • 1
    -1 for an eighteen-year-old saying "back when ... I was kind of young". (kidding, I didn't really downvote you) – Graeme Perrow Sep 02 '09 at 17:48
  • 5
    "It's not a bug if it does what it's supposed to." Yeah maybe the technical term is 'design flaw' or something, but I think most people would still say it's ok to call it a bug. – davr Nov 18 '09 at 00:37
  • @John It was obviously a feature. – Mateen Ulhaq Jan 28 '11 at 06:06
  • Some how I got here, two years later. There is the obligatory Old New Thing blog post and the time machine reference. http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx – surfasb Dec 20 '11 at 10:26