1

I initally had this problem: How to unzip a Japanese ZIP file, and avoid mojibake/garbled characters

But that "unzip -O shift-jis [filename.zip]" did the job and I got my nice Japanese characters in the file names, but that didn't seem to work for the file metadata?

I found this: Why does my VLC window show weird fonts?, but its solution seems to be for subtitles only, and my issue doesn't seem to be a VLC thing, since the audio file's Audio properties says that its title is, showing as mojibake blocks on my screen but when copy-pasted here, they turn into characters that take up no space: "Ôç©d¸UE - C[h"

enter image description here

Also, my Neptunia Re;Birth1 music is lining up with the reports of everyone else: Tracks 1 and 18 are Japanese, the rest seem to be Mojibake.

I guess if I just wanted to figure out the names, I do something like: the answers for How to turn mojibake text to readable form?

Malady
  • 238
  • 1
  • 3
  • 14

1 Answers1

1

First step: determine what encode the metadata is written?

Install Exif reader

sudo apt install libimage-exiftool-perl

Show exif information you want to play on VLC.

exiftool filename

Sample output:

ExifTool Version Number         : 12.49
File Name                       : 10 - グラスホッパー.flac
--cut--
File Type                       : FLAC
File Type Extension             : flac
MIME Type                       : audio/flac
--cut--
Track Number                    : 10
Discnumber                      : 1
Title                           : グラスホッパー
Artist                          : スピッツ
Album                           : ハチミツ
Genre                           : Unknown
Date                            : 1995-09-20
--cut--
Artistsort                      : Spitz
Discid                          : 9c0a320b
Musicbrainz Discid              : KcCfHpYnqpWm4siIth0whkxTBEU-
Tracktotal                      : 11
Duration                        : 0:03:31

If you can read exif metadata normally in your terminal, then the metadata is written in Unicode. (check echo $LANG) And also check the VLC font settings.

VLCFont.png Otherwise, it is written in another character encode. In Japanese, it probably is in Shift-JIS or EUC.

Now save text of exiftool exiftool filename > textfile.txt

Encode Shift-JIS (or EUC-JP 'eucjp') to Unicode UTF-8

iconv -f sjis -t utf8 textfile.txt

cat textfile.txt

If you see this file No Tofu characters then you can edit original exif with them.

For example: exiftool -Title="グラスホッパー" -Artist="スピッツ" -Album="ハチミツ" Let's play this song/video on VLC, see what is changed.

Sadaharu Wakisaka
  • 1,329
  • 1
  • 13
  • 25
  • Almost! I use from sjis to utf-8 and I get "窶ーテ板催ァ窶堋ゥ窶播ツ青クUE - ニ陳iconv: illegal input sequence at position 105" – Malady Nov 11 '22 at 22:26
  • Sorry there were two typos, #1 no `iconv` package to install #2 `iconv -f from -t to`. Now edited. – Sadaharu Wakisaka Nov 12 '22 at 01:05
  • Thanks! But sjis or eucjp, they both can't handle all of the info before spitting out a "iconv: illegal input sequence at position 75". I've seen this list of formats, but not sure which ones are Japanese or how to get iconv to use them: https://stackoverflow.com/a/8039467/4592583 – Malady Nov 12 '22 at 02:21
  • I don't know these two character encodes were famous in the 1990s - 2000s. Then the Unicode came in the middle of 2000s, since then no one intentionally tried to use them. I personally saw a lot of weird character encode from outside of Japan. – Sadaharu Wakisaka Nov 12 '22 at 03:40