1

I'm trying to write a bash script to convert all special characters inside a file (é, ü, ã, etc) into latex format (\'e, \"u, \~a, etc). Usually, this stuff is really easy to do with sed, but I'm having trouble at getting sed to recognize the special characters. How can I tell the command to read the file using iso, or UTF-8 encoding?

If that's not possible, is there a way to get sed to understand special characters?

enzotib
  • 92,255
  • 11
  • 164
  • 178
Malabarba
  • 9,956
  • 11
  • 41
  • 45
  • 2
    Latex can read non ascii character: set `\usepackage[latin1]{inputenc}` in the beginning of your document, substituting latin1 with whatever encoding you need. – enzotib Apr 18 '11 at 06:59
  • And if you use biblatex and biber instead of bibtex, you can use non-ASCII characters in your bibliography as well. – Robin Green Apr 18 '11 at 08:08
  • @enzotib @Robin: Thanks guys, but I do know that. =) I'm just looking for a quick way to convert two files of different encoding into something neutral. I sometimes have to put together the latex files of different people who use different encodings, and latex won't work with two different encodings in the same file. There a few ways to work around this, but I've reached the conclusion that a one-line-script to make all files encoding-neutral is the best. – Malabarba Apr 18 '11 at 15:41

1 Answers1

2

It can be as simple as

iconv --from-code $enc input-file |
    sed 's/é/\\'\''e/;s/ü/\\"e/;s/ã/\\~a/' |
    iconv -to-code $enc >converted-input-file

where variable enc contain the encoding of the input file, one of the strings obtained from iconv -l.

enzotib
  • 92,255
  • 11
  • 164
  • 178