44

Is there a way to print the decompressed size of a .bz2 file without actually decompressing the entire thing?

endolith
  • 7,507
  • 25
  • 84
  • 121

2 Answers2

46

As noted by others, bzip2 doesn't provide much information. But this technique works -- you will have to decompress the file, but you won't have to write the decompressed data to disk, which may be a "good enough" solution for you:

$ ls -l foo.bz2
-rw-r--r-- 1 ~quack ~quack 2364418 Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c         # bzcat decompresses to stdout, wc -c counts bytes
2928640                         # number of bytes of decompressed data

You can pipe that output into something else to give you a human-readable form:

$ ls -lh foo.bz2
-rw-r--r-- 1 quack quack 2.3M Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c | perl -lne 'printf("%.2fM\n", $_/1024/1024)'
2.79M
quack quixote
  • 42,186
  • 14
  • 105
  • 129
  • 11
    Well, that only took five minutes of 100% CPU to calculate. – endolith Oct 11 '09 at 22:39
  • 2
    only? AND it would fill up a disk? i've got a compressed tarball of an old linux install that's only 407meg yet took my poor ancient server 30-45 minutes to extract. that included writing to disk, tho, i'll have to run that script to time it. get back to ya in half an hour... :) – quack quixote Oct 11 '09 at 23:12
  • I picked the smallest file for the first test, of course. 140 MB compressed --> 3 GB uncompressed. The larger files are 5 GB compressed... – endolith Oct 12 '09 at 04:50
  • heh .. lemme know how big the 5GBs turn out to be... and how long it takes to figure it out via this XD – quack quixote Oct 12 '09 at 05:25
  • FWIW, it took about 30 minutes (7’35 sys) to find a 5.6 GB archive was 71 GB uncompressed on a 2×2.4 GHz, 8 GB RAM virtual machine. – Skippy le Grand Gourou May 28 '20 at 13:37
  • @SkippyleGrandGourou The performance of the CPU almost doesn't matter – it's the read speed of the disk which is going to dominate the performance equation, here. – Christopher Schultz May 27 '22 at 20:31
  • @ChristopherSchultz I may be wrong but I seem to recall (b)zip(2) compression and decompression are CPU-intensive operations. – Skippy le Grand Gourou May 27 '22 at 21:21
-3

To read .bz extension text file without unzipping.

bzcat dbtax_ext_en.ttl.bz2 |zless
  • 1
    bzcat and zless don't work together like this. Use "bzcat file.bz2 | less" or "bzless file.bz2", or if you have a gzipped file, "zcat file.gz | less" or "zless file.gz". In fact, the man page for zless notes that "Zless does not work with compressed data that is piped to it via standard input; it requires that input files be specified as arguments." – Nick Russo Apr 21 '18 at 23:28