3

It has been determined already in this question that tar cannot read input from stdin.

How else can a dd output be archived directly possibly without using any compression? The purpose of doing everything in a single task is to avoid to write the dd output to the target disk twice (once as a raw file and once as an archive), and to avoid to perform two different tasks, which is a waste of time (since the input file must be read and written and the output read, processed and written again), and can be impossible if the target drive is almost full.

I'm planning to do multiple backups of drives, partitions and folders, and I'd like to benefit both from the ease of having everything stored into a single file and from the speed of each backup / potential restore task.

kos
  • 35,535
  • 13
  • 101
  • 151
  • 1
    I assume you're aiming for: `dd if=/dev/sdXY | tar ...`, but why not `tar` directly: `tar cf foo.tar /dev/sdXY`? – muru Feb 18 '15 at 20:29
  • 1
    @murru From your last answer i understood that since the file at that point of the pipeline is just a raw binary in `stdout` and has not a proper entry into the filesystem it cannot be archived directly by simply piping it to `tar`. Anyway my bad, i wrote partitions but i actually want this to work for whole drives, and that's the reason why i need to use `dd` first – kos Feb 18 '15 at 21:07
  • What difference does that make? `tar cf foo.tar /dev/sdX` instead. – muru Feb 18 '15 at 21:07
  • 1
    @muru The difference is `tar` just packs all the files together, while i need to keep the partition scheme as well – kos Feb 18 '15 at 21:27
  • What? `tar cf foo.tar /dev/sdX` makes an archive of the disk image. It does not pack files. It retains whatever structure the disk had, because that is all `tar` sees. – muru Feb 18 '15 at 21:30
  • 1
    @muru http://serverfault.com/questions/478674/linux-backup-using-tar Look at the top answer. `tar` does not backup `MBR`/`GPT`, since those sectors are outside of the filesystem "scope", therefore no partition table is backed up, and you'll need to recreate the partition scheme manually once you made a back up with `tar`, while with `dd` there's no such need – kos Feb 19 '15 at 08:32
  • That post is backing up a filesystem, *not* a block device. – muru Feb 19 '15 at 08:33
  • 1
    @muru Ok, so you are implicitly stating that `tar` *does* back up `MBR`/`GPT` right? So how would you restore from a backup? Let's say i `tar cf sda.tar /dev/sda` and move the tarball to `sdb`, then i switch the original `sda` disk for a newer one of the same capacity. This disk comes of course with just unallocated space. How would you restore starting from this point? – kos Feb 19 '15 at 08:46
  • `tar xf sda.tar -O >/dev/sdb`. – muru Feb 19 '15 at 08:52
  • 1
    @muru I tested this, and issuing `tar cf sdb.tar /dev/sdb` just generates a 10,2KB archive with a `dev` folder inside and a `sdb` 0-lenght binary file inside `dev`, this both with the drive mounted and unmounted and both with `sudo` and without – kos Feb 19 '15 at 09:08
  • 1
    @muru For the record, the same applies even to `tar cf sdb1.tar /dev/sdb1` – kos Feb 19 '15 at 09:10
  • 1
    Hmm. I was certain `tar` could read from block devices. In that case, you'll probably have to forgo tar, and use `gzip` or some other compression utility why doesn't care aout file structure, like so: `dd if=/dev/sda | gzip -c >sda.gz` – muru Feb 19 '15 at 09:22
  • 1
    @muru Probably yes, i was just curious on how to accomplish this with `tar`, if for example i wanted to backup multiple drives on the same tarball, or if i wanted to add some file afterward before the compression – kos Feb 19 '15 at 09:41
  • If you're willing to take some trouble, http://unix.stackexchange.com/questions/151009/how-to-convince-tar-etc-to-archive-block-device-contents might be of interest, – muru Feb 19 '15 at 10:29
  • @muru Thanks for the link, but to be honest the top answer digs in a little too much at this point, considering i'm not an expert in shell scripting at all. If i don't find any other solution i'll consider doing something like that, anyway. The other answer would work but the only reason why i'd like to use `tar` is that that way i can avoid long on-the-fly compression time, differently from using `gzip` and, even more, from using `7z` – kos Feb 19 '15 at 12:25
  • @muru Probably at this point i need another archiving utility which can do the same as `tar` – kos Feb 19 '15 at 12:27
  • Try using the fastest compression levels (`--fast` for `gzip`). Maybe the penalty won't be too high. – muru Feb 19 '15 at 12:31
  • @muru Thanks for the advice, if i don't find anything i'll give it a shot, even tough what i really wanted to do was to archive everything and only once done to compress everything with `7z` or with some other high compression utility – kos Feb 19 '15 at 12:41
  • If you do it that way, you'd still require 2x the space while 7z is doing compression - and if you have that space, you can make a dd image and tar + compress it – muru Feb 19 '15 at 12:44
  • @muru True, but personally i can deal with compressing if and when reaching about 1TB (half my unit) once and then maybe figure out something different, but not with compressing every time with crazy algorythms like LZMA2. It took more than an hour to compress a 16GB raw image with `7z`, and i really don't want to deal with this every time. Anyway i'm open to other way to perform the same thing without having to deal with writing the file twice, which is a waste of time, or compressing while archiving – kos Feb 19 '15 at 13:43
  • Have a look at `lzop`. It's much faster than `gzip` and should operate near the speed of I/O throughput on modern CPUs. – David Foerster Mar 02 '15 at 12:41
  • @DavidFoerster I tested `lzop`, I didn't test it in the appropriate environment but so far it's the fastest method I've tried, using compression of course. I'd rather find a solution which doesn't involve compression, but it seems like there's no other way to accomplish this, so most likely I'll be forced to use something involving compression. I feel like I should award the bounty to you since it's the fastest method proposed so far, so do you want to write an answer for that? – kos Mar 08 '15 at 20:15
  • I posted this as a comment, because it doesn't answer your original question. Maybe you should rephrase it along the lines “Archive dd output fast and preferrably without compression” as too include solutions that come close to your ideal. Drop me another comment to notify me of the change and I'll convert my comment to a suitable answer. – David Foerster Mar 08 '15 at 21:34
  • @DavidFoerster I changed the question so that the answer can fit – kos Mar 08 '15 at 21:41

1 Answers1

4

If you want to dump a whole block device to a file, tar won't be of any use, because it doesn't work with block devices. Instead you'll need to use dd or similar:

dd if=/dev/sdX of=/path/to/backup bs=16m

Even like this it would be better to use at least a little compression, as long as it doesn't slow down the transfer too much. In short, you need a compression algorithm with a throughput not much lower than that of your slowest storage medium. There are several such compression algorithms. The most notorious are Lempel–Ziv–Oberhumer, its derivate L4Z, and Snappy. There's a comparison of various compression algorithms including those three on the L4Z project page:

Name            Ratio  C.speed D.speed
                        MB/s    MB/s
LZ4 (r101)      2.084    422    1820
LZO 2.06        2.106    414     600
QuickLZ 1.5.1b6 2.237    373     420
Snappy 1.1.0    2.091    323    1070
LZF             2.077    270     570
zlib 1.2.8 -1   2.730     65     280
LZ4 HC (r101)   2.720     25    2080
zlib 1.2.8 -6   3.099     21     300

For the sake of this answer, I'll choose an example with LZO, because it's readily available in Canonical's repositories in the form of lzop, but ultimately all those stream compressors have front-ends that read from standard input and write to standard output.

dd if=/dev/sdX bs=16m | lzop > /path/to/backup.lzo

If you want to work on the same machine during backup, you may want to use ionice and/or nice/schedtool:

ionice -c 3 dd if=/dev/sdX bs=16m | ionice -c 3 schedtool -B -n 10 -e lzop > /path/to/backup.lzo
David Foerster
  • 35,754
  • 55
  • 92
  • 145