I've known gzip for years, recently I saw bzip being used at work. Are they basically equivalent, or are there significant pros and cons to one of them over the other?
-
3While this is an old question with a valid and correct answer, I would like to point people to this google result: http://tukaani.org/lzma/benchmarks.html as it does break it down further – Angry 84 Jan 07 '16 at 09:12
-
Isn't bzip for compression and gzip for archival? – Joseph Dec 29 '16 at 20:09
-
@juniorRubyist source? – ripper234 Dec 30 '16 at 16:55
-
I just heard that. I forgot where. – Joseph Dec 30 '16 at 17:41
-
No mention of random access? https://stackoverflow.com/questions/14225751/random-access-to-gzipped-files – neverMind9 Feb 02 '19 at 03:40
7 Answers
Gzip and bzip2, as well as xz and lzop, are functionally equivalent. (There once was a bzip, but it seems to have completely vanished off the face of the world.) Other common compression formats are zip, rar and 7z; these three do both compression and archiving (packing multiple files into one). Here are some typical ratings in terms of speed, availability and typical compression ratio (note that these ratings are somewhat subjective, don't take them as gospel):
decompression speed (fast > slow): lzop > gzip, zip > xz > 7z > rar > bzip2
compression speed (fast > slow): lzop > gzip, zip > xz > bzip2 > 7z > rar
compression ratio (better > worse): xz > 7z > rar, bzip2 > gzip > zip > lzop
availability (unix): gzip > bzip2 > xz > lzop > zip > 7z > rar
availability (windows): zip > rar > 7z > gzip > bzip2, lzop, xz
As you can see, there isn't a clear winner. If you want to rely on programs that are likely to be installed already, use zip on Windows (or if possible, self-extracting archives, as Windows doesn't ship with any of these) and gzip on unix. If you want maximum compression, use 7z or xz.
Non-Unix native formats (zip, rar, 7z) don't preserve all Unix metadata (ownership, permissions). If you need that, use compressed tar.
Rar also has downside that, as far as I know, there is no open source software that creates rar archives or that can unpack all rar archives. The other formats have free implementations and no (serious) patent claims.
- 69,786
- 21
- 137
- 178
-
3as far as I can tell, all versions of Windows since XP, can open zip file natively using the file explorern – Lie Ryan Nov 02 '10 at 15:00
-
1`bzip2` is less available than `gzip`? What UNIX systems *don't* come with `bzip2`? – new123456 Jul 03 '11 at 14:19
-
23@new123456 On OpenBSD, gzip is in the base system but bzip2 has to be installed from a package. Many *WRT routers include gzip but not bzip2. – Gilles 'SO- stop being evil' Jul 03 '11 at 17:53
-
2@Gilles I can confirm that my DD-WRT Release: 08/12/10 (SVN revision: 14929) does not have bzip2, but does have gzip. – Urda Mar 31 '12 at 16:10
-
-
@shgnInc Less commonly available than `bzip2`. As for speed, it depends how many processors you have. Hmm, I should add xz. – Gilles 'SO- stop being evil' Jan 20 '15 at 08:31
-
16
-
-
@JopV. Last I looked, there were some options of the rar format that the open-source unrar didn't support. I don't remember what options these are but I have had rar archives in my hand that only worked with the closed-source version. – Gilles 'SO- stop being evil' Sep 13 '16 at 17:37
-
4_it seems to have completely vanished_ - Plain old `bzip` vanished because it was using the patented algorithmic coding. Because of the patent, it was re-designed to use Huffman coding instead. During this re-design, new features and improvements were added. The fundamental thing that makes it a unique compression algorithm though, the Burrows–Wheeler transform, stayed the same in both versions. – forest Jan 01 '19 at 03:23
-
This is a major difference between gzip and bzip2 for those working with data processing tools like Apache Spark: bzip2 is splittable and gzip is not. This means that Spark can read a single bzip2 file using [multiple concurrent tasks](https://stackoverflow.com/a/27631722/877069), whereas a gzipped file can only be read with a single task. – Nick Chammas Sep 16 '19 at 19:06
-
1Unless I'm mistaken, 7z is an archive format, and LZMA is the compression algorithm commonly used to create it. – BallpointBen Mar 12 '20 at 20:55
As far as I can tell, gzip is overall faster, while bzip overall produces better (smaller) compression.
- 4,507
- 3
- 22
- 26
-
1Also, gzip seems to be slightly better supported, especially on Windows.. – Dentrasi Oct 30 '10 at 17:32
-
6
-
@whitequark: being widely supported is mostly important for unix since users may not have root access and must work with what is already installed. Also applies to Windows environments where the user does not have admin access (schools/libraries/etc). – Matthew Nov 26 '12 at 19:23
-
5@Matthew, you don't need admin rights to use a lot of ported free software, including 7zip. – Catherine Nov 28 '12 at 00:26
-
-
2@IQAndreas: some benchmarks: [1](http://tukaani.org/lzma/benchmarks.html), [2](http://bashitout.com/2009/08/30/Linux-Compression-Comparison-GZIP-vs-BZIP2-vs-LZMA-vs-ZIP-vs-Compress.html), [3](https://www.rootusers.com/gzip-vs-bzip2-vs-xz-performance-comparison/) – Lie Ryan Feb 09 '16 at 12:45
-
Although bzip2 is often better, gzip usually pulls ahead for text compression. – forest Jan 01 '19 at 03:25
The algorithms have different time, memory, space tradeoffs. Bear in mind these algorithms were written quite a while back and your smartphone has many times more CPU than desktops of those days.
Your pick is between universality (.gz) and a bit more compression (.bz2). Only you can say whichyou care about more.
One advantage of .gz is that it can compress a stream, a sequence where you can't look behind. This makes it the official compressor of http streams. I needed to use gzip once because of that, but unlikely you'll need to think about it.
- 31,057
- 6
- 55
- 80
-
5Another way to phrase "gz can compress a stream" is that gz is homomorphic under concatenation: gz(concat(x, y)) == concat(gz(x), gz(y)). IMO this is one of gz's most useful features. – BallpointBen Mar 04 '20 at 16:57
-
@BallpointBen you hit the nail on its head. Couldn't have explained it any better. – Peter Chaula May 10 '22 at 14:28
Here is a list of sites that test compression algorithms, to find just bzip and gzip you will have to do some digging, but most sites will list characteristics of the algorithms. This way you can compare what is important to you, size (compression ratio), time, memory, cpu.
http://www.maximumcompression.com/benchmarks/benchmarks.php
https://web.archive.org/web/20210126053224/https://maximumcompression.com/benchmarks/benchmarks.php
- 3,545
- 18
- 19
Per http://tukaani.org/lzma/benchmarks.html , gzip compresses twice as fast as bzip2, and decompresses ten times as fast.
Eg for use with s3 caching, on travis etc, where you want speed of compress/decompress, not just small sizes, gzip might be a good trade-off.
- 571
- 5
- 10
gzip is way faster, bzip2 makes way smaller archives.
since memory is cheap gzip is usually better for general usage, where bzip2 may be better for preservation of many old files.
there's also the newer zstandard format, which has alike compression ratios than gzip but performs even faster.
- 274
- 2
- 6
In my experience bzip has offered consistently better compression ratios than gzip. Plus with 7zip as manager and bzip algorithm, 7zip can make use of multi core processors.
- 61,504
- 38
- 179
- 264