1

I have a perl script that produces utf8 output. I tried using Set-Content to write a utf8 file as suggested by Powershell overruling Perl binmode?.

perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding Byte

produces the error

"Set-Content : Cannot proceed with byte encoding. When using byte encoding the content must be of type byte."

perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8

doesn't produce an error message, but it doesn't write a correct utf8 file either.

The output of the perl script is displayed correctly in the Powershell window. What is the correct way to write that output to a utf8-encoded file?

Thanks.

Update: I have seen many responses to this and similar problems, here at the link referenced above, and at https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8. None of them appear to work, leading me to believe that not one has actually been tested. A tested method for redirecting UTF8 text output from a CLI program to a file is desired. Thanks.

Here is the perl test script:

use strict;
use warnings;
use utf8;
binmode(STDOUT, ":utf8");
print("The Crüxshadows");
Freon Sandoz
  • 129
  • 2
  • 6
  • Have the perl script output to a text file then run Set-Content on that. Which while similar to what your doing isn't exactly the same – Ramhound Sep 04 '17 at 06:06
  • perl -S testbinmode.pl >binmode.txt...Set-Content "binmode.txt" -Encoding UTF8...produces cmdlet Set-Content at command pipeline position 1 Supply values for the following parameters: Value[0]:...how do I proceed?...Why does this box close and post when I attempt to separate my response into multiple lines? – Freon Sandoz Sep 04 '17 at 06:36
  • Update your question your unformatted comment can't be read. I don't answer question asked in a comment or consider any information contained within an comment when submitting an answer – Ramhound Sep 04 '17 at 07:44
  • try this: `$utf8 = New-Object System.Text.utf8encoding` and then use it as your encoding: `perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding $utf8` – SimonS Sep 04 '17 at 10:41
  • also: I guess the error message says that there is no content to write, or it doesn't know where to write it to – SimonS Sep 04 '17 at 10:50
  • Nope. Error message: "Set-Content : Cannot bind parameter 'Encoding'. Cannot convert the "System.Text.UTF8Encoding" value of type "System.Text.UTF8Encoding" to type "Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding". If you want a formatted response, you need to tell me how. This site doesn't permit me to use the return key to break my response into multiple lines. Can someone please provide a *tested* method for redirecting UTF8 text output from a CLI program to a file under PowerShell? – Freon Sandoz Sep 05 '17 at 01:10

1 Answers1

0

Make sure PowerShell uses UTF-8 when communicating with external programs. (The built-in cmdlets already default to UTF-8.) This requires setting [console]::InputEncoding and [console]::OutputEncoding to UTF-8.

On my Windows 10 system, PowerShell uses Code Page 437 by default:

PS C:\Users\Me> [Console]::OutputEncoding

IsSingleByte      : True
EncodingName      : OEM United States
WebName           : ibm437
HeaderName        : ibm437
BodyName          : ibm437
Preamble          :
WindowsCodePage   :
IsBrowserDisplay  :
IsBrowserSave     :
IsMailNewsDisplay :
IsMailNewsSave    :
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : False
CodePage          : 437

We fix this for the current PowerShell session with this command:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

(See the above-linked github.com issue for ways to persist this change.)

PS C:\Users\Me> [Console]::OutputEncoding

Preamble          :
BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : False
CodePage          : 65001

Windows 7 and later, i.e. all supported Windows versions, have codepage 65001, as a synonym for UTF-8
-- https://en.wikipedia.org/wiki/UTF-8

Now your script works as expected.

perl .\testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8

Successfully tested on PowerShell 5.1 and 7.1.

If you prefer BOM-less:

perl .\testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8NoBOM

Successfully tested on PowerShell 7.1. (The UTF8NoBOM encoding was introduced in PowerShell 6.)