49

Possible Duplicate:
Batch-convert files for encoding or line ending

I have a bunch of text files that I'd like to convert from any given charset to UTF-8 encoding.

Are there any command line tools or Perl (or language of your choice) one liners I can use to do this en masse?

jason
  • 655
  • 1
  • 7
  • 11

1 Answers1

58

iconv does convert between many character encodings. So adding a little bash magic and we can write

for file in *.txt; do
    iconv -f ascii -t utf-8 "$file" -o "${file%.txt}.utf8.txt"
done

This will run iconv -f ascii -t utf-8 to every file ending in .txt, sending the recoded file to a file with the same name but ending in .utf8.txt instead of .txt.

It's not as if this would actually do anything to your files (because ASCII is a subset of UTF-8), but to answer your question about how to convert between encodings.

u1686_grawity
  • 426,297
  • 64
  • 894
  • 966
Vinko Vrsalovic
  • 2,416
  • 1
  • 21
  • 20
  • 2
    You should quote the var $i, in order to handle filenames with spaces. – Richard Hoskins Aug 01 '09 at 01:47
  • It will do things, it'll add a BOM for one... – jason Aug 01 '09 at 01:58
  • Are you sure iconv will add a BOM? I was under the impression that it wouldn't with UTF-8. – Richard Hoskins Aug 01 '09 at 02:08
  • 6
    I just tested this with iconv (GNU libiconv 1.11), and it did not add a BOM. It is my understanding that iconv will only add a BOM if one is present in the input, which it would not be in ASCII. BOM are problematic, and not necessary with UTF-8. – Richard Hoskins Aug 01 '09 at 02:31
  • FYI, Windows has a tendency to drop BOMs in all Unicode files, even UTF-8. This can be seen with Notepad by choosing the encoding in the Save As dialog. The list "Unicode", "Unicode big endian" and "UTF-8" in addition to the classic "ANSI" encoding. All but ANSI include a BOM. – RBerteig Aug 01 '09 at 08:37
  • iconv follows the principle of least surprise, no BOM on input, no BOM on output. – Vinko Vrsalovic Aug 01 '09 at 09:24
  • 2
    if your version of iconv does not support the -o parameter you can directly replace it with >> to use the shell redirection. – rob Oct 09 '15 at 08:45
  • What if we don't know the encoding of the input file (and it's possible it might already be UTF-8)? – Aaron Franke Mar 18 '20 at 10:18
  • @rob I came here cause I couldn't get it to work with normal redirection, but I'm having trouble getting my head around why the append would be required for redirecting `iconv`. – Hashim Aziz Dec 02 '20 at 21:10
  • @Prometheus the -o parameter is the "output" setting. If not used/available the result goes to STDOUT. For my use case I needed the a new file so I used the >> redirection to create the new file. – rob Dec 07 '20 at 12:29