69

What's the effect of adding conv=sync,noerror when backing up an entire hard disk onto an image file?

Is conv=sync,noerror a requirement when doing forensic stuff? If so, why is it the case with reference to Linux Fedora?

Edit:

OK, if I do dd without conv=sync,noerror and dd encounters a read error when reading the block (let's say 100M), does dd just skip the 100M block and reads the next block without writing something (dd conv=sync,noerror writes zeros to 100M of output - what about this case?)?

And are the hashes of the original hard disk and the output file different if done without conv=sync,noerror? Or is this only so when a read error occurred?

Matthias Braun
  • 1,162
  • 1
  • 17
  • 29
dding
  • 747
  • 1
  • 5
  • 7

2 Answers2

69

conv=sync tells dd to pad each block to the left with nulls, so that if, due to error, the full block cannot be read, the full length of the original data is preserved, even though not all of the data itself can be included in the image. that way you at least know how damaged the data is, which might provide you with forensic clues, and if you can't take an image at all due to bad blocks or whatever, you can't analyze any of the data. some is better than none.

If there are read errors while reading the source, conv=sync,noerror is necessary to prevent dd from stopping on error and performing a dump. conv=sync is largely meaningless without noerror.

https://www.man7.org/linux/man-pages/man1/dd.1.html

http://vlinux-freak.blogspot.com/2011/01/how-to-use-dd-command.html

Frank Thomas
  • 35,097
  • 3
  • 77
  • 98
  • 1
    Question: if one does dd without conv=sync,noerror does hash of hard disk and image file become different? – dding Jul 22 '13 at 03:32
  • 1
    Also if dd encounters read error, does it stop at that moment then? – dding Jul 22 '13 at 03:33
  • 3
    dd itself dossn't hash, so are you thinking about tools like dcflDD http://www.forensicswiki.org/wiki/Dcfldd ? in theory, the hash of the disk and the hash of the image should be the same, as long as the tools calculating the hashes encounter the errors in the same way. – Frank Thomas Jul 22 '13 at 04:16
  • 2
    Upvoted for being the only answer on this question that answers the question clearly, but what do you think of the other answer's conclusion that it actually corrupts the backup? Your two answers seem to contradict each other, but maybe I'm misunderstanding. – Hashim Aziz Nov 28 '19 at 00:41
  • 1
    @Hashim, agreed, this answer is slightly misleading. It almost sounds like one should *always* use `conv=sync,noerror` with dd. But in fact that should be done only after a normal `dd` fails, and one is willing to spend time trying to recover data. A failed dump might produce only the first 10% of the disk, but using those options it will produce a dump off *all* the usable data, though in a corrupted form. – Ray Butterworth Feb 21 '20 at 14:27
  • @Hashim, I am not responsible for the second answer, but I would point out that 'conv=sync,noerror' will not do anything if DD doesn't encounter any read-errors. the configuration is there for handling those errors (eg its only important if the data is already corrupt). Thats why using 'conv=sync' is useless without 'noerror'. – Frank Thomas Mar 29 '20 at 02:51
  • @RayButterworth, I touched it up and tried to make it more clear that it is `conv=sync` that needs `noerror`, not DD itself. hope that helps. – Frank Thomas Mar 29 '20 at 02:54
  • @FrankThomas So, reading you it seems that Ray Butterworth is right, i.e. that the implied conclusion is to always use `conv=sync, noerror` ? – Atralb Oct 25 '20 at 16:24
  • This explanation of `conv=sync` contradicts the manpage, which doesn't talk about padding with zeroes, but rather synchronous I/O: "dsync: use synchronized I/O for data; sync: likewise, but also for metadata" – Luke Hutchison Dec 03 '20 at 18:44
  • @Atralb, yes, you should always use `noerror` if you use `conv=sync`. if you are not using `conv=sync`, then it is up to you and your specific invocation to determine if noerror is useful to you. – Frank Thomas Jan 20 '21 at 23:41
  • 3
    @LukeHutchison, it looks like you are looking at operands for the Flags option, (`iflag=`, `oflag=`), which include `sync` and `dsync`, rather than the `conv=` option. I'll admit the text formatting for that manpage leaves much to be desired, with no clear separation between topics. Hard to read for no good reason. Look at the block above it, and you'll see the conv= operands. note that conv=sync padds with NUL or space, depending on the other options (`block`, `unblock`), not the value 0. – Frank Thomas Jan 20 '21 at 23:52
65

dd conv=sync,noerror (or conv=noerror,sync) corrupts your data.

Depending on the I/O error encountered, and blocksize used (larger than physical sector size?), the input and output addresses do not actually stay in sync but end up at the wrong offsets, which makes the copy useless for filesystem images and other things where offsets matter.

A lot of places recommend using conv=noerror,sync when dealing with bad disks. I used to make the same recommendation, myself. It did work for me, when I had to recover a bad disk some time ago.

However, testing suggests that this is not actually reliable in any way at all.

Use losetup and dmsetup to create an A error B device:

truncate -s 1M a.img b.img
A=$(losetup --find --show a.img)
B=$(losetup --find --show b.img)
i=0 ; while printf "A%06d\n" $i ; do i=$((i+1)) ; done > $A
i=0 ; while printf "B%06d\n" $i ; do i=$((i+1)) ; done > $B

The A, B loop devices look like this:

# head -n 3 $A $B
==> /dev/loop0 <==
A000000
A000001
A000002

==> /dev/loop1 <==
B000000
B000001
B000002

So it's A, B with incrementing numbers which will help us to verify offsets later.

Now to put a read error in between the two, courtesy of Linux device mapper:

# dmsetup create AerrorB << EOF
0 2000 linear $A 0
2000 96 error
2096 2000 linear $B 48
EOF

This example creates AerrorB as in 2000 sectors of A, followed by 2*48 sectors of error, followed by 2000 sectors of B.

Just to verify:

# blockdev --getsz /dev/mapper/AerrorB
4096
# hexdump -C /dev/mapper/AerrorB
00000000  41 30 30 30 30 30 30 0a  41 30 30 30 30 30 31 0a  |A000000.A000001.|
00000010  41 30 30 30 30 30 32 0a  41 30 30 30 30 30 33 0a  |A000002.A000003.|
[...]
000f9fe0  41 31 32 37 39 39 36 0a  41 31 32 37 39 39 37 0a  |A127996.A127997.|
000f9ff0  41 31 32 37 39 39 38 0a  41 31 32 37 39 39 39 0a  |A127998.A127999.|
000fa000
hexdump: /dev/mapper/AerrorB: Input/output error

So it reads until A127999\n, since each line has 8 bytes that totals at 1024000 bytes which is our 2000 sectors of 512 bytes. Everything seems to be in order...

Will it blend?

for bs in 1M 64K 16K 4K 512 42
do
    dd bs=$bs conv=noerror,sync if=/dev/mapper/AerrorB of=AerrorB.$bs.gnu-dd
    busybox dd bs=$bs conv=noerror,sync if=/dev/mapper/AerrorB of=AerrorB.$bs.bb-dd
done

ddrescue /dev/mapper/AerrorB AerrorB.ddrescue

Results:

# ls -l
-rw-r--r-- 1 root root 2113536 May 11 23:54 AerrorB.16K.bb-dd
-rw-r--r-- 1 root root 2064384 May 11 23:54 AerrorB.16K.gnu-dd
-rw-r--r-- 1 root root 3145728 May 11 23:54 AerrorB.1M.bb-dd
-rw-r--r-- 1 root root 2097152 May 11 23:54 AerrorB.1M.gnu-dd
-rw-r--r-- 1 root root 2097186 May 11 23:54 AerrorB.42.bb-dd
-rw-r--r-- 1 root root 2048004 May 11 23:54 AerrorB.42.gnu-dd
-rw-r--r-- 1 root root 2097152 May 11 23:54 AerrorB.4K.bb-dd
-rw-r--r-- 1 root root 2097152 May 11 23:54 AerrorB.4K.gnu-dd
-rw-r--r-- 1 root root 2097152 May 11 23:54 AerrorB.512.bb-dd
-rw-r--r-- 1 root root 2097152 May 11 23:54 AerrorB.512.gnu-dd
-rw-r--r-- 1 root root 2162688 May 11 23:54 AerrorB.64K.bb-dd
-rw-r--r-- 1 root root 2097152 May 11 23:54 AerrorB.64K.gnu-dd
-rw-r--r-- 1 root root 2097152 May 11 23:54 AerrorB.ddrescue

From the file sizes alone you can tell things are wrong for some blocksizes.

Checksums:

# md5sum *
8972776e4bd29eb5a55aa4d1eb3b8a43  AerrorB.16K.bb-dd
4ee0b656ff9be862a7e96d37a2ebdeb0  AerrorB.16K.gnu-dd
7874ef3fe3426436f19ffa0635a53f63  AerrorB.1M.bb-dd
6f60e9d5ec06eb7721dbfddaaa625473  AerrorB.1M.gnu-dd
94abec9a526553c5aa063b0c917d8e8f  AerrorB.42.bb-dd
1413c824cd090cba5c33b2d7de330339  AerrorB.42.gnu-dd
b381efd87f17354cfb121dae49e3487a  AerrorB.4K.bb-dd
b381efd87f17354cfb121dae49e3487a  AerrorB.4K.gnu-dd
b381efd87f17354cfb121dae49e3487a  AerrorB.512.bb-dd
b381efd87f17354cfb121dae49e3487a  AerrorB.512.gnu-dd
3c101af5623fe8c6f3d764631582a18e  AerrorB.64K.bb-dd
6f60e9d5ec06eb7721dbfddaaa625473  AerrorB.64K.gnu-dd
b381efd87f17354cfb121dae49e3487a  AerrorB.ddrescue

dd agrees with ddrescue only for block sizes that happen to be aligned to our error zone (512, 4K).

Let's check raw data.

# grep -a -b --only-matching B130000 *
AerrorB.16K.bb-dd:  2096768:B130000
AerrorB.16K.gnu-dd: 2047616:B130000
AerrorB.1M.bb-dd:   2113152:B130000
AerrorB.1M.gnu-dd:  2064000:B130000
AerrorB.42.bb-dd:   2088578:B130000
AerrorB.42.gnu-dd:  2039426:B130000
AerrorB.4K.bb-dd:   2088576:B130000
AerrorB.4K.gnu-dd:  2088576:B130000
AerrorB.512.bb-dd:  2088576:B130000
AerrorB.512.gnu-dd: 2088576:B130000
AerrorB.64K.bb-dd:  2113152:B130000
AerrorB.64K.gnu-dd: 2064000:B130000
AerrorB.ddrescue:   2088576:B130000

While the data itself seems to be present, it is obviously not in sync; the offsets are completely out of whack for bs=16K,1M,42,64K... only those with offset 2088576 are correct, as can be verified against the original device.

# dd bs=1 skip=2088576 count=8 if=/dev/mapper/AerrorB 
B130000

Is this expected behaviour of dd conv=noerror,sync? I do not know and the two implementations of dd I had available don't even agree with each other. The result is very much useless if you used dd with a performant blocksize choice.

The above was produced using dd (coreutils) 8.25, BusyBox v1.24.2, GNU ddrescue 1.21.

frostschutz
  • 1,137
  • 12
  • 11
  • 4
    Very interesting and detailed, but still confusing. Do you see this as a bug? Has it been reported? Or is it simply that the user needs to be sure to use a bs= argument that corresponds to the actual blocksize of the device? – nealmcb Jun 28 '16 at 03:13
  • 1
    @frostschutz would you recommend using `ddrescue` instead of `dd` when working with drives with bad sectors? – ljk Jan 31 '17 at 00:26
  • 2
    No; the `sync` argument tells it to keep the output the correct length. It doesn't work if you use the wrong block size, so just don't do that. – psusi Mar 15 '17 at 12:46
  • @psusi was that a reply to my question? for a mac drive with bad sectors is it still the best to use `dd`? that won't overwrite existing data on the current drive, correct? – ljk Mar 18 '17 at 08:48
  • 42
    Hey, `iflag=fullblock` seems to save it. Although `md5sum`s of images made with `iflag=fullblock` still differ (of course! because numbers of bytes that were skipped due to the read error differ — i.e. amounts of `\0`s in the images differ), but alignment is saved with `iflag=fullblock`: `grep -a -b --only-matching B130000` returns `2088576` for all the images. – Sasha Sep 30 '17 at 08:33
  • 7
    @Sasha is right and needs more upvotes! *fullblock* is mentioned in the docs https://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html – mlt Nov 23 '18 at 19:56
  • Nice demo, but I don't understand what's goal of this!? `dd` work well if you know what you do! And reading this, you seem to know what you do!? So why did you ask so strange operation to your OS? – F. Hauri - Give Up GitHub Jan 23 '19 at 16:04
  • 4
    @Sasha `iflag=fullblock` seems to work for GNU dd, although not BusyBox dd. – Boann Mar 19 '21 at 19:04
  • The recommendation from the gnu dd manpage is: `dd conv=noerror,sync iflag=fullblock /mnt/rescue.img` – Bill McGonigle Jul 25 '22 at 20:54
  • @BillMcGonigle On debian the recommendation to use `dd conv=noerror,sync iflag=fullblock` doesn't appear on gnu dd's man page. It's in the gnu info documentation, which I normally don't consult. It contains a lot of other information about dd "gotcha's", which manes it required reading if you are using dd for HDD rescue. – Russell Stuart Apr 07 '23 at 07:09