The latest Notepad.exe has a Save as UTF-8 and UTF-8 with BOM.
Is UTF-8 with BOM the old UTF? What is UTF-8 now?
The latest Notepad.exe has a Save as UTF-8 and UTF-8 with BOM.
Is UTF-8 with BOM the old UTF? What is UTF-8 now?
The fact that Notepad allows the saving of files in “UTF-8” or “UTF-8 with BOM” seems to be an option that exists to allow flexibility in cases where a BOM (byte order mark) is needed. But in general, just saving the file without a BOM — meaning plain UTF-8 — is really the best way to handle text files with UTF-8 content.
As explained on the Wikipedia page for byte order mark:
“BOM use is optional. Its presence interferes with the use of UTF-8 by software that does not expect non-ASCII bytes at the start of a file but that could otherwise handle the text stream.”
And the article delves deeper into it by stating the following; bold emphasis is mine:
“The UTF-8 representation of the BOM is the (hexadecimal) byte sequence
0xEF,0xBB,0xBF.The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8, so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM. The standard also does not recommend removing a BOM when it is there, so that round-tripping between encodings does not lose information, and so that code that relies on it continues to work. The IETF recommends that if a protocol either (a) always uses UTF-8, or (b) has some other way to indicate what encoding is being used, then it "SHOULD forbid use of U+FEFF as a signature."
Not using a BOM allows text to be backwards-compatible with some software that is not Unicode-aware. Examples include programming languages that permit non-ASCII bytes in string literals but not at the start of the file.”
As for why Microsoft cares about saving UTF-8 with a BOM in Notepad? This explains it well; seems to be a specific requirement of Microsoft programming tools and not any other non-Microsoft tool out there:
“Microsoft compilers and interpreters, and many pieces of software on Microsoft Windows such as Notepad treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII. Google Docs also adds a BOM when converting a document to a plain text file for download.”
So unless you explicitly need to save a UTF-8 file with a BOM to be set for a file, just don’t worry about that saving option.
The other answer is wrong. It is some political thing. ANSI is the default text format in Windows and has been for 36 years.
In Windows files are assumed to be ANSI. Therefore you always use a BOM. Unix programs that can't handle BOMs are not Unicode compliant.
I write text editors. If the user doesn't specify it is ANSI - ALWAYS.
Assuming you will get BOMless Unicode means you have to call https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode to guess the format. Hardly proper programming.