117

I often have to gather log files and upload them to a central server (Owned by another company). The central server has a size limit of the file, so I am trying to create the smallest file possible that is still in the zip format.

What are the best setting to use when compressing a text file to a zip format when my only need is a small file size?

7zip Options

I've done the obvious and chosen ultra compression, and I have noticed that LZMA does a better job than deflate, but there are far too many other permutations of options for me to test them all.

jjnguy
  • 1,827
  • 2
  • 15
  • 19
  • 1
    Is splitting the the zip in to multiple files an option? – JaredMcAteer May 10 '11 at 14:21
  • @Original, I don't think so. (Is that what the 'split to volumes' option is for?) I'd rather keep it simple and have just 1 file. If I really need, I can split the original file (which I have done in the past), but my goal is to keep it in one file. – jjnguy May 10 '11 at 14:30
  • Oh, and I saw this question http://superuser.com/questions/178111/what-settings-to-use-when-making-7zip-files-in-order-to-get-maximum-compression-w But it really doesn't answer my question at all. – jjnguy May 10 '11 at 14:32
  • I think the exact question you asked isn't answerable. Some text files compress better with different algorithms. Sometimes zip is better, sometimes gzip; sometimes, compression level makes a difference, and sometimes not. It all depends on the file. Therefore, instead of answering the precise question, I've addressed the motivating example, which deals with maximum allowed sizes. Even if you have the best possible algorithm, you're still limited by size, and a particularly large log might not be able to be compressed below that threshold, so you'll need splitting anyway. – Rob Kennedy May 10 '11 at 14:42
  • @Rob, ok. Makes sense. I know that the input data is very important in determining the size of a resulting zip file. I wasn't sure if there was a canonical set of settings that usually work best. – jjnguy May 10 '11 at 14:44
  • 9
    As soon as you pick anything but the `Deflate` format, it's not a "normal" .zip file anymore, but an "extended" zip file, pioneered by WinZip. They originally kept the extension as .zip, to much consternation (since most normal zip-handling tools can't deal with them), but most archivers use .zipx now to distinguish them from traditional .zip files. If you can use LZMA, switch to .7z and pick PPMd -- it should compress better (and faster!) for text files. – afrazier May 20 '11 at 16:04
  • @afra, hmmmm. Thanks for the info. I need to keep it in a format that most normal zip tools can unzip. Otherwise I'd be using the 7z format already. – jjnguy May 20 '11 at 16:52
  • @Justin: That sucks. Can you use a self-extracting archive? – afrazier May 20 '11 at 18:55
  • @afrazier, I'm sending these files to a 3rd party vendor, and they expect to get 'regular' zip files. (Or files they can unzip using the 'standard' method.) – jjnguy May 20 '11 at 19:48
  • 1
    @afrazier: "The .ZIP File Format Specification documents the following compression methods: stored (no compression), Shrunk, Reduced (methods 1-4), Imploded, Tokenizing, Deflated, Deflate64, bzip2, LZMA (EFS), WavPack, PPMd." https://en.wikipedia.org/wiki/Zip_%28file_format%29#Compression_methods – endolith Dec 13 '13 at 22:26
  • 4
    @endolith: bzip2, lzma, wv, and ppmd are all very recent additions to the file format. It's not even safe to assume that your recipient can handle deflate64, much less anything newer. – afrazier Dec 13 '13 at 22:33
  • 1
    define "normal zip tools". Most "normal zip tools" nowadays like 7z and winrar can extract 7z files. – phuclv May 29 '16 at 11:04

6 Answers6

106

To create the smallest standard ZIP file that 7-Zip can create, try:

7z a -mm=Deflate -mfb=258 -mpass=15 -r foo.zip C:\Path\To\Files\*

Source: How can I achieve the best, standard ZIP compression?

Otherwise if you don't care about the ZIP standard, use the following ultra settings:

7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on archive.7z dir1

Which are:

-t7z   7z archive

-m0=lzma
       lzma method

-mx=9  level of compression = 9 (Ultra)

-mfb=64
       number of fast bytes for LZMA = 64
-md=32m
       dictionary size = 32 megabytes

-ms=on solid archive = on
Tanja
  • 118
  • 1
  • 3
  • 9
kenorb
  • 24,736
  • 27
  • 129
  • 199
  • 5
    @Tek: Why? It's not a good one. The question was about using the "standard ZIP format", so the answer shouldn't be specifying LZMA. -ms=on is for .7z, not standard zip files. -md is related to BZip2, so I don't expect it to affect ZIP (or even LZMA). -mfb=64 is an unoptimized value: -mfb=258 makes smaller zip files. And this answer doesn't even mention -mpass=15 which can affect zip files. This is a nicely formatted answer which is, unfortunately, wrong in multiple ways. – TOOGAM Nov 08 '15 at 12:32
  • 17
    I would use lzma2 – Lance Badger Jul 07 '16 at 15:22
  • If you look at the 7-zip FAQ, it states that newer versions of 7z may have worse performance than older versions in some circumstances. Read the FAQ for more detail, but in short use the 'qs' in Parameters field in the GUI or use -mqs in the command line version to use the old sort by file extension method. https://www.7-zip.org/faq.html. – drojf May 15 '19 at 14:21
  • LZMA2 `-mx9` supports 64 MB dictionary https://sevenzip.osdn.jp/chm/cmdline/switches/method.htm – qwr Aug 28 '22 at 19:37
44

After much experimentation, digging into the detailed 7zip documentation, and reading some of the 7z source code regarding the advanced LZMA2 parameters, here is a better method below. It reduced some 1GB real-world test files more than 2 to 4 times better than the previously accepted solutions posted here or even in the 7z manpage.

7z a -t7z -mx=9 -mfb=273 -ms -md=31 -myx=9 -mtm=- -mmt -mmtf -md=1536m -mmf=bt3 -mmc=10000 -mpb=0 -mlc=0 archive.7z inputfileordir

The LZMA2 compression is assumed here, but you might be able to get even better performance in 7zip with passing advanced LZMA2 options like -m0=LZMA2:27, or -m0=LZMA2:d25, or an array of parameters like

-m0=BCJ2 -m1=LZMA:d25 -m2=LZMA:d19 -m3=LZMA:d19 -mb0:1

Such parameters didn't seem to be respected by the 7z versions I tested, but you may want to explore further or patch the 7z code to properly parse them. Or maybe it is supposed to work and is just broken in the builds that were tested.

zx485
  • 2,170
  • 11
  • 17
  • 24
91735472
  • 441
  • 4
  • 2
  • 4
    wow, this made a really big difference. For my archive, I experimented with a lot of other suggestions, including other answers here, and the best result I got was 99MB, vs 85MB using these settings. – user9399 Aug 25 '19 at 23:42
  • How would you call this on Windows 10 in command line? I get "The parameter is incorrect" on version 19.00 2019-02-21 – user1306322 Dec 04 '19 at 09:11
  • 1
    To run it on Windows, you must add the 7-Zip installation path to your system environment variable. Then you can use 7z inside Command Prompt. –  Jun 08 '20 at 16:17
  • [This link](https://help.goodsync.com/hc/en-us/articles/360007773451-Automated-Backup-with-Compression-and-Encryption) gives more info on adding system environment variables for 7zip. – Jonas Jun 11 '20 at 15:26
  • 5
    your command uses the incredible amount of 45 GB virtual memory. on my PC this caused the OOM-Killer to just kill it. So, this does not seem to be a solution for people with 16 GB of RAM or less. – JPT Jul 17 '20 at 15:46
  • You are specifying the`-md` option twice, which seems like an error. The first time is `-md=31`, which, because it omits a magnitude suffix character, would set the dictionary size to `2³¹ = 2,147,483,648` bytes (2GB), according to the doc. That's far beyond the maximum limits mentioned in the docs, so it might explain the problem @JPT was seeing. Later on you specify `-md=1536m`, which seems more legitimate. – Glenn Slayden Jul 27 '20 at 11:59
  • 1
    in windows powershell you could type `& 'C:\Program Files\7-Zip\7z.exe'` to invoke 7z, depending on whether that's the correct path to the exe file for you. – Blaisem Aug 25 '20 at 12:36
  • Fantastic, I get 118 MB with default 7z compression options, and 77 MB with your options! – Dmitry Mikushin Jan 10 '22 at 19:10
  • my 33mb zip file became 66mb.... this made it worse. – Mark Oct 18 '22 at 20:54
  • @Mark Care to elaborate? Did you try to compress the zip file as is, or the files inside it? This method seems to be optimized for extremely large files / lots of them. – user2962533 Mar 28 '23 at 12:30
  • Could you explain what the parameters such as myx md ms etc all do? Or link to some documentation? – rollsch Jun 30 '23 at 03:22
19

If you can use .7z format rather than just .zip, I would simply use PPMD with the following options and leave everything else as set by the Compression Level:

  • Archive Format: 7z
  • Compression Method: PPMD
  • Compression Level: Ultra

I regularly compress server/text logs (60MB+) using these options and they usually come out at 1-2% of the original size.

Umber Ferrule
  • 3,439
  • 9
  • 40
  • 54
15

I have decided to do some experiments for empirically finding the optimal compression parameters.

The tool I have used was 7-ZIP finetuner. This tool hunts for the optimal parameters by simply repeating the compression with varying parameters looking for the optimal combination. A run for one file may sometimes take more than an hour even on a fast computer.

The parameters that it tries are:

LC : number of Literal Context bits
LP : number of Literal Pos bits
PB : number of Pos Bits
YX : level of file analysis
FB : number of Fast Bytes

I have left the default parameters of dictionary size as 512 MB and solid block size On. The tool uses the LZMA method.

The best combinations of parameters on several types of files were as follows:

enter image description here

I note that the best values were not constant even for files of the same type.

Conclusion: There are no best options, as each file may have its own unique best combination. One may drive all parameters up to their limits, but an improvement is not at all guaranteed.

The most common combination seems to be:

LC : 8
LP : 0
PB : 1
YX : 5
FB : 273

Some 7-Zip references:

harrymc
  • 455,459
  • 31
  • 526
  • 924
8

I compare for db.fdb 1,2 GB (1236598784 B) in Ubuntu server 14.04.03 with p7zip [64] 9.20 on VM:

1. 7z a -mx=9 1.7z db.fdb
2. 7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on 2.7z db.fdb
3. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on 3.7z db.fdb
4. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on -pass=15 4.7z db.fdb
5. 7z a -mx=9 -mmt=on 5.7z db.fdb
6. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on -mmt=on 6.7z db.fdb

and have that results:

1.7z 96 MB (100108731 B) with 6' 25"
2.7z 95 MB ( 99520375 B) with 5' 18"
3.7z 93 MB ( 97512311 B) with 9' 19"
4.7z 93 MB ( 97512345 B) with 9' 40"
5.7z 96 MB (100108731 B) with 5' 26"
6.7z 93 MB ( 97512311 B) with 9' 09"

I think second method works fine = (almost) best compress with best time. But for best "view" and easy to remember is first method - with small files and no point of max compress. Between 2 and 3 method we don't get extra smaller 7z but pay almost twoo more time for compression. Anyone decide with his own.

SULIMa
  • 81
  • 1
  • 2
0

Set the "split to volume, bytes" field to the server's maximum allowed file size (in bytes, I think, although it looks like it accepts common abbreviations like "KB" and "MB"). If the zip file exceeds that size, 7-zip will split it into multiple files automatically, such as integration_serviceLog.zip.001, integration_serviceLog.zip.002, etc. (Way back when, PK Zip used this to span zip files across multiple floppy disks.) You'll need all the files to be present to unzip them. Use that instead of worrying about the absolute best compression settings to use for any particular set of files, because what's best for one file may be different for another file, and you don't want to have to go through this every time you need to copy logs.

Rob Kennedy
  • 185
  • 2
  • 12
  • 1
    I'm worried about how the people on the other side will uncompress the files. I need it to be as simple as possible for them. Do you know if you can unzip the split volumes using the built-in windows zip, or gzip? – jjnguy May 10 '11 at 14:40
  • Apparently, no, the built-in Windows zip-folder feature doesn't do spanned zip files. That's too bad, since it's been a standard feature of the format since before Windows 3. I'd be very surprised if gzip couldn't do it, though. WinZip definitely can. – Rob Kennedy May 10 '11 at 14:47