-1

I would like to use Microsoft Word (on a PC specifically) to open, edit, and then save again a plaintext file in UTF-8 format, but without adding the BOM character sequence to the beginning.

Let's just go ahead and assume that I'm asking in regard to any version of Word after, say, Word 2010.

I see no option in the Save As dialog to do this, nor anywhere else that I can see.

I can see this question asked any number of times about other programs, but I don't see anything specific to Word.

psoft
  • 263
  • 2
  • 9
  • What version of Word? Please EDIT this question to add that information. – music2myear Apr 05 '19 at 21:50
  • http://knowledgebase.abercap.com/index.php?/article/AA-00279/0/Saving-a-Word-Doc-in-Plain-Text-with-UTF-8.html – music2myear Apr 05 '19 at 21:55
  • That pretty webpage doesn't mention the BOM now does it? – psoft Apr 05 '19 at 21:59
  • No, but it goes through the options that may be in the Save menu that could assist. It is a suggestion of possible information that might be beneficial to you. It was provided in good faith in an effort to help you. Replying in scorn will not tend to attract others with suggestions or information and is not particularly professional. – music2myear Apr 05 '19 at 23:31
  • 1
    Also, am I correct to assume that by "PC" you mean a Windows computer? Looking through your other questions I noticed a lot of Apple content and wanted to be sure. – music2myear Apr 05 '19 at 23:33
  • What's wrong with using "PC"? It's no better or worse than "Wintel". Why don't you like me? – psoft Apr 12 '19 at 18:49
  • Because an Apple computer is a "PC" too. You are the eyes, ears, and fingers of this solution. We only know what you've observed and told us. Knowing the OS you're working with helps us better understand the problem and get you a better answer. – music2myear Apr 12 '19 at 21:44
  • https://www.youtube.com/watch?v=0eEG5LVXdKo – psoft Apr 13 '19 at 18:50

1 Answers1

1

You can't do that directly in Word, because without the BOM there's no way to make sure that the file is encoded in UTF-8. Remember There Ain’t No Such Thing As Plain Text.

Despite the name, the BOM is not used for byte-order marking in UTF-8 but rather as a signature. Without the signature Word will ask you to confirm the encoding every time you open the file because what if the file is an ANSI code page (which is still the default in Windows). It has very good heuristics and guess correctly most of the time though, especially with encodings that are easy to guess like UTF-8. In my experience it works great even for tricky encodings in various languages

That said, you can write a macro to do the saving part instead of using Word's save feature. See

Alternatively just remove the BOM after saving with Word using other tools, like PowerShell, iconv, Notepad++ or a 3rd party editor. Here's the PowerShell script that does the conversion

$MyFile = Get-Content -Encoding UTF8 $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding)
phuclv
  • 26,555
  • 15
  • 113
  • 235
  • My own research also lead me to the conclusion that Word and other Microsoft tools automatically tack the BOM onto these types of documents as a sort-of "magic number" header by which to quickly ascertain what's in a file. Makes sense, but I don't want it in my data. My research also also brought me to your point to proceed with some other editor. Your list is helpful, though I would add Atom (atom.io), a newer entry in the field. So thank you for your all-around thorough, apropos, and helpful answer, a fine (if small) example of the best of the StackOverflow community. – psoft Apr 07 '19 at 18:16