Read character encoding with sed

Question

I'm trying to write a bash script to convert all special characters inside a file (é, ü, ã, etc) into latex format (\'e, \"u, \~a, etc). Usually, this stuff is really easy to do with sed, but I'm having trouble at getting sed to recognize the special characters. How can I tell the command to read the file using iso, or UTF-8 encoding?

If that's not possible, is there a way to get sed to understand special characters?

Latex can read non ascii character: set `\usepackage[latin1]{inputenc}` in the beginning of your document, substituting latin1 with whatever encoding you need. — enzotib, Apr 18 '11 at 06:59
And if you use biblatex and biber instead of bibtex, you can use non-ASCII characters in your bibliography as well. — Robin Green, Apr 18 '11 at 08:08
@enzotib @Robin: Thanks guys, but I do know that. =) I'm just looking for a quick way to convert two files of different encoding into something neutral. I sometimes have to put together the latex files of different people who use different encodings, and latex won't work with two different encodings in the same file. There a few ways to work around this, but I've reached the conclusion that a one-line-script to make all files encoding-neutral is the best. — Malabarba, Apr 18 '11 at 15:41

score 2 · Accepted Answer · answered Apr 18 '11 at 12:53

It can be as simple as

iconv --from-code $enc input-file |
    sed 's/é/\\'\''e/;s/ü/\\"e/;s/ã/\\~a/' |
    iconv -to-code $enc >converted-input-file

where variable enc contain the encoding of the input file, one of the strings obtained from iconv -l.

Read character encoding with sed

1 Answers1