41

Bzip2 and gzip only use one core, although many computers have more than one core. But there are programs like lbzip2, pbzip2 and pigz, which use all available cores and promise to be compatible with bzip2 and gzip.

So what's the best way to use these programs by default, so that tar cfa file.tar.bz2 directory uses lbzip2/pbzip2 instead of bzip2? Of course I don't want to break anything.

elmicha
  • 9,628
  • 4
  • 34
  • 48
  • 2
    Out of curiosity to all: Is parallel gzip/bzip really faster than serial? I would image that the hdd writing speed and other restraints are more of a problem. – con-f-use Sep 22 '11 at 21:13
  • @con-f-use Not unless you have SSDs theoretically it could be faster as the total size of the archive increases. – Marco Ceppi Sep 22 '11 at 21:15
  • 2
    On a system with 16 cpus, switching from gzip to pigz reduced the time to tar 1.2TB and transfer it over the network and test the result from 18 hours of backup and 14 hours of test to 4 hours of backup and 2 hours of test. There are a lot of potential bottlenecks, disk speed, network speed, processing power however in this case this was definitely cpu bound more than IO bound. This is a high end system, your results may vary. Not that it matters, but this was on RHEL6 – kzarns Sep 27 '15 at 14:19

5 Answers5

36

You can symlink bzip2, bunzip2 and bzcat to lbzip2, and gzip, gunzip, gzcat and zcat to pigz:

sudo apt-get install lbzip2 pigz
cd /usr/local/bin
ln -s /usr/bin/lbzip2 bzip2
ln -s /usr/bin/lbzip2 bunzip2
ln -s /usr/bin/lbzip2 bzcat
ln -s /usr/bin/pigz gzip
ln -s /usr/bin/pigz gunzip
ln -s /usr/bin/pigz gzcat
ln -s /usr/bin/pigz zcat

I chose lbzip2 instead of pbzip2 because the /usr/share/doc/lbzip2/README.gz looks "nicer" than /usr/share/doc/pbzip2/README.gz. Also, the tar manual talks about lbzip2.

Edit:

pigz-2.1.6, which is included in Precise Pangolin, refuses to decompress files with unknown suffixes (e.g. initramfs-*.img). This is fixed in pigz-2.2.4, which ships with Quantal. So you might want to wait until Quantal, install the Quantal package manually, or don't link gunzip/gzcat/zcat yet.

elmicha
  • 9,628
  • 4
  • 34
  • 48
  • 7
    This works good because /usr/local/bin/ comes before /bin/ in most people's $PATH . If something calls /bin/gunzip directly or someone has /bin first in their $PATH, they won't use pigz. To make this work for them as well you could use [dpk-divert](http://www.debian-administration.org/articles/118) and do something like this for all the binaries `sudo dpkg-divert --divert /bin/gunzip.orig --rename /bin/gunzip; sudo ln -s /usr/bin/pigz /bin/gunzip` but there is a possibility that pigz isn't 100% compatible with all the gzip flags so be careful. – Mark McKinstry May 04 '12 at 15:31
35

The symlink idea is really fine.
Another working solution is to alias tar:

alias tar='tar --use-compress-program=pbzip2'

or respectively

alias tar='tar --use-compress-program=pigz'

It creates another kind of default.

Bastian Ebeling
  • 563
  • 1
  • 4
  • 12
  • 1
    added benefit: you can use alias like 'partar' if you want to preserve the original functionality (for some reason).. sadly 'ptar' is taken by perl implementation – jena Mar 10 '17 at 11:25
  • It would be interesting testing if pigz is installed on the system, like this (zsh): `(( $+commands[pigz] )) && alias tar='tar --use-compress-program=pigz'`. – SergioAraujo Apr 15 '20 at 11:52
14

The symlink answer is really incorrect. It would replace the default gzip (or bzip2) with pigz (or pbzip2) for the entire system. While the parallel implementations are remarkably similar to the single process versions, subtle differences in command line options could break core system processes who depend on those differences.

The --use-compress-program option is a much better choice.

A second option (much like the alias) would be to set the TAR_OPTIONS environment variable supported by GNU tar:

export TAR_OPTIONS="--use-compress-program=pbzip2"
tar czf myfile.tar.bz2 mysubdir/
muru
  • 193,181
  • 53
  • 473
  • 722
user154053
  • 141
  • 1
  • 2
  • 7
    I have been using the symlinks since 2011 now and didn't see any breakage (apart from the case mentioned in the edit). And if such subtle differences are not found and reported, we will be stuck with non-parallel versions forever. If you use TAR_OPTIONS="--use-compress-program=pbzip2" it doesn't seem like you can differentiate between bzip2 and gzip. – elmicha May 01 '13 at 16:45
  • This didn't work for me. – Derek Perkins Sep 26 '18 at 05:02
5

One fascinating option is to recompile tar to use multithreaded by default. Copied from this stackoverflow answer

Recompiling with replacement

If you build tar from sources, then you can recompile with parameters

--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip

After recompiling tar with these options you can check the output of tar's help:

$ tar --help | grep "lbzip2\|plzip\|pigz"
  -j, --bzip2                filter the archive through lbzip2
      --lzip                 filter the archive through plzip
  -z, --gzip, --gunzip, --ungzip   filter the archive through pigz
Tom Koch
  • 53
  • 1
  • 3
-2

Use in your ~/.bash_aliases:

alias gzip="pigz"
alias gunzip="unpigz"
Eric Carvalho
  • 53,609
  • 102
  • 137
  • 162
  • 2
    This will only work when calling the `gzip` (or `gunzip`) program directly on the shell's command-line. Other programs (like `tar`) won't be impacted by that. – Christian Hudon Oct 28 '15 at 20:52