2

I need to decompress zip files generated in Windows with Japanese language. I'm using unzip.

If I use unzip files.zip I will get bad file names. So, according to this question, I used unzip with -O cp932 to decompress them. In this way, I can get correct file names.

However, some of these zip files require passwords. I know the correct passwords, but unzip always tells me the passwords are wrong.

After some investigation, I found out I can successfully decompress zip files with pure English passwords. That is, zip files with password like "Hello" will work, but a password like "こんにちは" leads to "wrong password". So I guess it has something to do with character encoding.

Actually, I tried both of these:

  1. unzip -O cp932 compressed.zip and pasted "こんにちは" when it asked for password.
  2. unzip -O cp932 -P 'こんにちは' compressed.zip.

None of them work.

I found a similar question here which has no answer. It looks like that question was asking for a way to provide any byte sequence to unzip as the password. If that question has an answer, then the solution would also apply to my question, since I can manually convert the passwords into correct character encoding, and give the converted string to unzip.

RabidBear
  • 21
  • 2
  • I would try other zip programs or a different terminal emulator – golimar Sep 15 '21 at 07:01
  • @golimar You can recommend other zip programs of course! As long as it solves this problem. – RabidBear Sep 15 '21 at 08:45
  • 7z or bsdtar also read zip files – golimar Sep 15 '21 at 08:53
  • @golimar I know. But do they support decompressing zip files with "cp932" encoded file names and passwords? – RabidBear Sep 15 '21 at 09:02
  • @RabidBear Not if you're talking about p7z which is the port of 7z running on Linux, it doesn't. See [this](https://unix.stackexchange.com/a/252000/421531). But I've found a solution for this, see my answer – oeter Apr 01 '23 at 11:05

1 Answers1

0

If a zip file is created with a non unicode codec and also encrypted with a password, the password you pass to unzip command also needs to be encoded as bytes in this specific codec. On Linux, the argument you pass to unzip will be read as utf-8, this is why unzip -O cp932 -P 'こんにちは' compressed.zip doesn't work.

So to sum it up, you need a way to provide password encoded with cp932 as bytes to unzip. There's no simple way to do this with unzip command , but this can be done with a Python script:

from zipfile import ZipFile


def extract_zip(archive_name, out_path, pwd, codec):
    # password also needs to be encoded with codec
    password = pwd.encode(codec) if pwd else None
    # metadata_encoding argument is available in Python3.11
    with ZipFile(archive_name, "r", metadata_encoding=codec) as myzip:
        myzip.extractall(out_path, pwd=password)


extract_zip("compressed.zip", "output_dir", "こんにちは", "cp932")
oeter
  • 238
  • 1
  • 2
  • 7
  • The problem with Python `zipfile` module is that it only supports some of the compression formats but not all of them. For other formats it just raise an exception. So it is unlikely to be a general solution for all zip files. `zipfile` only supports "zipfile.ZIP_STORED", "zipfile.ZIP_DEFLATED", "zipfile.ZIP_BZIP2" and "zipfile.ZIP_LZMA" – RabidBear May 09 '23 at 18:25
  • Oh another problem with `zipfile` is, even for supported formats, it is super slow to extract anything. Maybe it is not designed for extracting large zip files ... – RabidBear May 09 '23 at 18:34
  • @RabidBear `zipfile` module supports DEFLATED and several other compression methods, which are the most commonly used methods and what most OS natively support, so it can handle most use cases .As for the low speed, It's because the decrypting algorithm is written in pure Python, there's a CPython implementation if you want to check it out: https://github.com/V-E-O/czipfile. – oeter May 18 '23 at 14:26