35

So I have this wikipedia dump sized at about 10gb and named as "enwiki-latest-pages-articles.xml.bz2". I have been trying the following commands in the terminal to unzip the dump:

tar jxf enwiki-latest-pages-articles.xml.bz2

And

tar xvf enwiki-latest-pages-articles.xml.bz2

But both of them returns the following error

tar: This does not look like a tar archive
tar: Skipping to next header
Braiam
  • 66,947
  • 30
  • 177
  • 264
Arun Mohan
  • 735
  • 3
  • 7
  • 9
  • 1
    arun run the following command and past the out put **"file enwiki-latest-pages-articles.xml.bz2"** – PKumar Mar 24 '15 at 06:19
  • [This Q/A](http://askubuntu.com/questions/552188/is-there-a-utility-to-extract-compress-files-using-any-type-of-archiving-algorit) can help, use `7z` util for extracting everything – c0rp Mar 24 '15 at 11:43
  • 2
    I notice you say this is a huge file - so another things you might want to do is pipe it into something, bzcat enwiki-latest-pages-articles.xml.bz2 | someotherprogram – nwaltham Mar 24 '15 at 15:34
  • @nwaltham: You'd have my upvote if you made that an answer. – Ilmari Karonen Mar 24 '15 at 19:49
  • Because tar extracts tar files, and it's not a tar file? – user253751 Mar 25 '15 at 03:22

2 Answers2

77

You can't use the tar command because the archive isn't a .tar.* file. To uncompress a bzip2 file, use the following command (this won't preserve the original .bz2 file):

bzip2 -d enwiki-latest-pages-articles.xml.bz2

If you want to extract it and keep the original, run this command:

bzip2 -dk enwiki-latest-pages-articles.xml.bz2

Source: https://superuser.com/questions/480950/how-to-decompress-a-bz2-file

Terrance8D
  • 999
  • 8
  • 15
21

Just use bunzip2:

bunzip2 enwiki-latest-pages-articles.xml.bz2

And if its a gzip commpressed file:

gunzip enwiki-latest-pages-articles.xml.gz
chaos
  • 27,106
  • 12
  • 74
  • 77