4

I'm considering backing up some data to DVDs/BDs. (I know about DVDisaster.) I've read somewhere that the outer sectors on a DVD are more likely to wear down than the inner ones. This suggests that over time, the blocks in the second half of an image are more likely to be corrupted than those in the first half. So my question is, is there a way to write the same data to 2 DVD images, such that

  1. one can mount either DVD, possibly in 2 steps (eg mount an iso file stored on the DVD), but without copying everything to HD first and manipulating it
  2. the main data should be written to the 2 DVDs in significantly different order

If I'm not being clear, here is what a possible solution could be.

Suppose for a second there existed a linux driver that could mount an iso image backwards. So, I give it an iso file, and when it wants the 1st sector, it reads the last 2048 bytes of the file backwards, instead of the first 2048 bytes in normal order. I don't know if such a driver exists, but if it did, it would be one solution to my problem because I could do this: put my data in an iso image; compute a second image as the first one backwards; encapsulate each image into a simple UDF filesystem with only one file; write the two UDF filesystems to different DVDs. Now, when I plug in either DVD, I could just mount the single iso file to get back my data. Furthermore, the data would be written to the 2 DVDs in different ways, so if both DVDs each lose the last 1/3 of their sectors, I could still retrieve it all (by hand, but it would be possible).

Other possible solutions would be a general driver to permute/rotate/reorder arbitrary-size blocks in an arbitrary block device. Or perhaps, is there a way to store a file in a UDF file system using a specific order of sectors? Given that UDF is a full-fledged file system, that is definitely possible, but is there a tool to do it?

Thanks!!

Edit: As I explained under the first reply, I don't mean to replace DVDisaster, but to complement it. Consider 2 strategies for backing up 4G of data. Strategy A: use 2 identical DVDs, each with 15% ecc from DVDisaster. Strategy B: use 2 DVDs, each with 15% ecc, but with the additional permutation thing I mention above (on 1 of the 2 DVDs). I claim that because of the wear patterns of DVDs (specifically, the correlation of errors), after a certain time, the probability of full recovery from B is significantly larger than from A (all other things being equal).

Edit2: To substantiate my claim that DVDisaster isn't a cure for everything, here is a script demonstrating how DVDisaster with 33% ECC data suffers data loss with only 1.3% corruption. The apparent contradiction is that 33% only applies to the "best case" corruption, not to "any" corruption. FYI, I'm creating a file spanning the entire filesystem in test.1.udf, changing only the last sector in the file to zero in test.2.udf, computing ecc data for both, and comparing the sectors, including the ecc data. The point is that if test.1.udf is your data and you lose the sectors that are different and only those, you cannot possibly recover test.1.udf because test.2.udf is another possibility.

n_blocks=8192
tdir=$(mktemp -d)
mkudffs -b 2048 test.1.udf $n_blocks
sudo mount test.1.udf $tdir -o bs=2048
sudo chown $USER.$USER $tdir
n=$(df -B 2K $tdir | tail -n 1 | awk '{print $4}')
let n-=1
dd if=/dev/urandom of=$tdir/file bs=2K count=$n 2>/dev/null
last=$(od <$tdir/file -Ad -t x1 | tail -n 2 | head -n 1 | cut -d ' ' -f 2-)
sudo umount $tdir
start_of_last_block=$(od <test.1.udf -Ad -t x1 | grep -A 1 "$last" | tail -n 1 | awk '{print $1}')
last_block=$(($start_of_last_block / 2048))
dd if=test.1.udf bs=2K count=$(($last_block - 1)) >test.2.udf 2>/dev/null
dd if=/dev/zero bs=2K count=1 >>test.2.udf 2>/dev/null
dd if=test.1.udf bs=2K skip=$last_block count=$(($n_blocks - $last_block)) >>test.2.udf 2>/dev/null
n_blocks_with_ecc=$(echo "$n_blocks * 133 / 100" | bc)
echo "add dvdisaster ecc data, using in total $n_blocks_with_ecc"
#run dvdisaster on the 2 files, then...
n_blocks_different=$(for i in $(seq 0 $(($n_blocks_with_ecc - 1))); do
  if [ $((($i / 100) * 100)) -eq $i ]; then
    echo "$i..." >&2
  fi
  diff -q <(dd if=test.1.udf bs=2K skip=$i count=1 2>/dev/null) \
      <(dd if=test.2.udf bs=2K skip=$i count=1 2>/dev/null) >/dev/null || echo $i
done | wc -l)
echo "number of blocks different: $n_blocks_different / $n_blocks_with_ecc ($(echo "scale=6; $n_blocks_different / $n_blocks_with_ecc * 100" | bc)%)"

Output:

number of blocks different: 145 / 10895 (1.330800%)
Matei David
  • 665
  • 5
  • 15
  • 1
    Another possible solution would be to store the payload data on a linear virtual md-array split across a few files (e.g., 4). Then I could write those files in different orders on the 2 DVDs. The problem with this is that I need a loopback device for each chunk in order to mount the array, and those are limited. Also, I could only mount this if my system runs md (as opposed to an artificially fragmented file on a UDF, which every system should understand). – Matei David Feb 23 '13 at 21:57
  • I don't understand the command: "dd if=test.1.udf bs=2K skip=$last_block count=$(($n_blocks - $last_block)) >>test.2.udf 2>/dev/null". What is it for? the previous two first created test.2.udf like test.1.udf with just one less block and then you added a block of zeroes. – FarO Feb 07 '15 at 11:46
  • Also, it seems to me (if I understood correctly) that you should 1) generate test.1.udf 2) add ECC to that one 3) corrupt it 4) recover it with dvddisaster 5) check the amount of uncorrectable sectors. Or did I misunderstood the goal of your test? – FarO Feb 07 '15 at 11:50

1 Answers1

3

The problem you are describing already has a more elegant and efficient solution: Reed-Solomon Error correction. This works by inserting error correcting code at the end of the disk, such that you can lose a certain amount of the data from arbitrary locations on the single disk and still recover the whole file.

This is possible because the RS decoder does not differentiate between user data and error correction data. In the view of the RS decoder each block is a sequence of 100 bytes from which an arbitrary subset of 20 bytes can be recovered.

While some parts of the disk may be more likely to lose data, data loss can still occur in any location. In the two disk method described, you will lose data if any two failed sectors overlap. At the higher levels of data loss, this would be fairly common. In comparison, Reed-Solomon Error correction allows you to recover from losing 14.3% (normal mode) or 33.5% (high mode) of the disk without issues.

DVDisaster is designed to do just this and is perfectly capable of meeting your needs here. If you are felling particularly paranoid, you can set the redundancy to High (or custom) and still use less space and have higher reliability then you would with two disks.

Happy Back Ups!

enter image description here

David
  • 9,144
  • 3
  • 24
  • 41
  • Also mention `par2` ;) – 0xC0000022L Feb 22 '13 at 21:10
  • I thought about it, but I wasn't sure how it well would work if you have files that went across multiple CD's. Also, I wanted to keep the length down. – David Feb 22 '13 at 21:14
  • Thanks for the reply, but with my scheme I didn't mean to replace DVDisaster, only to complement it. – Matei David Feb 23 '13 at 21:19
  • Some issues with what you said: 1. DVDisaster doesn't insert parity at "random" spots, but only at the end- I tested it. If you `diff -qb <(dd if=orig.iso bs=2K count=$x) <(dd if=parity.iso bs=2K count=$x)` with `x` equal to number of sectors in `orig.iso`, the will be equal. – Matei David Feb 23 '13 at 21:27
  • 1
    2. I am 90% sure that the redundancy guarantee you mention (eg, 15%) means: "under the best circumstances, if 14.99% errors occur, you can recover them all; also, if 15.01% errors occur, you can certainly not recover them all." That's different than what you seem to suggest: "for every possible 14.99% errors, you can recover them all." Maybe `par2` does that, but I bet (like I said, it's an informed guess) that to be efficient, DVDisaster splits the input in clusters and protects clusters independently of each other. To lose data, you only need to lose 15.01% from 1 cluster & its parity. – Matei David Feb 23 '13 at 21:34
  • So my take on DVDisaster is that it's very useful, and quite sufficient if the errors on every DVD are "random". But IRL they aren't. From my experience with 5+ year old DVDs that I tried to backup, of those that had any error at all, 90% had those errors only in the second half (outer layers). – Matei David Feb 23 '13 at 21:42
  • You are right about the location of the data, but it does not affect the recovery rate, as the Reed-Soloman decoder does not differentiate between user data and error correction data, as such you can lose up to 15% of the data from anywhere without any data lose (but if it's 15.01% you lose it all). I'll update the answer to reflect this. http://dvdisaster.net/en/qa.html#eccpos – David Feb 25 '13 at 14:28
  • I added an example demonstrating that the claim about recovering from "any 15% corruption" is wrong. – Matei David Feb 26 '13 at 06:55
  • I was referring to the type of corruption that occurs in normal use, not what happens when you deliberatly place all the failures into the same ECC block. This test has little resemblance to the way disk's degrade in real life. Trying putting some data on a disk and then scratching it if you want a more realistic test. http://dvdisaster.net/en/qa31.html – David Feb 26 '13 at 19:18
  • My question was not about DVDisaster. Regardless of how well it protects each individual DVD, 2 DVDs using my scheme (permutation+DVDisaster) offer better protection than 2 identical DVDs with DVDisaster only. The reason: error patterns on 2 DVDs are not independent of each other. If DVD1 develops an error pattern that leads to data loss, DVD2 has a high chance to develop the same error pattern. If DVD2 is identical to DVD1, you lose both. If DVD2 employs a permutation, then you might lose some sectors on DVD1 and others on DVD2, but still be able to reconstruct all data by combining them. – Matei David Feb 26 '13 at 23:16
  • 1
    If you think that a lack of redundancy will cause data loss, the solution is to increase the redundancy setting, not to invent a whole new method of storing data. – David Feb 27 '13 at 16:33
  • As explained in my "Regardless..." argument above, I can increase the redundancy in my scheme just as much, and again beat DVDisaster alone. Furthermore, my original question wasn't "How to back up data?" or "How does DVDisaster work?". These seem to be the only questions answered so far. – Matei David Feb 27 '13 at 19:55
  • Reed-Solomon ECC is more efficient per byte at providing redundancy then naive data-duplication. You wanted a way to make sure your data is secure, and I told you. If you want to avoid using a well tested solution in favor of a system that no-one uses (and may not exist) because it's not as good, you are on your own. – David Feb 27 '13 at 21:41
  • You really seem to not get it... For the 3rd time now: I want to complement RS ECC, not just duplicate data. As stated in the 2nd sentence of my original question, I'm not asking details about DVDisaster. – Matei David Feb 27 '13 at 22:58
  • "I am 90% sure that the redundancy guarantee you mention (eg, 15%) means: "under the best circumstances, if 14.99% errors occur, you can recover them all; also, if 15.01% errors occur, you can certainly not recover them all" -> read here, the footnote: http://dvdisaster.net/en/qa31.html it states exactly the same thing: each block is independent and if you have too many errors in one you lose it all. But they are spread uniformly on the disk and this makes me wonder how your example (see first post), with only artificial damages at the end of the image, can produce a loss of data. – FarO Feb 07 '15 at 11:49
  • Also, I think par2 can actually (given uncorrupted par2 ecc files) recover ANY damage in the whole archive, provided the total number of damaged sectors are less than the parity data. – FarO Feb 07 '15 at 11:53